Fighting an Endless War with Crawlers

Crawler = Crawling Data?

The reason why I raise the old-fashioned question “what is a crawler” is because a user who had a verification code interface refreshed a few days ago discussed the protection scheme in the group. He thinks that this is not a crawler, but one that crawls data, such as air tickets, hotels, accommodation prices, news, novels, comics, and comments, SKU, is a crawler.

Is the Confrontation between Anti-crawler and “Anti-Anti-Crawler” Endless?

This issue is dialectical. According to Alibaba Cloud security team, the answer really depends on the level of crawler you are fighting against. We can group crawler traffic sources on the Internet into the following categories:

  1. Professional black hat hackers
  2. Advanced hackers using a large number of proxy IP addresses
  3. Attackers using simulator
  4. Attackers good at disguise
  5. Beginners

What Is Depth?

You must adapt your methods based on the situation. You need different approaches to deal with various levels of attackers, as we just talked about. Our countless days and nights fighting against web crawlers have witnessed large-scale credential stuffing activities by simply banning an IP address and round-the-clock black production teams with sophisticated monitoring systems and technicians.

1. Feature Detection

Experienced security personnel can quickly detect abnormal behaviors in the access log, such as:

  1. Normal users will not directly request page access without any referer.
  2. Requests that are redirected from the primary domain do not carry any cookies.
  3. UA includes Python/Java/xxBot/Selenium.
  4. A lot of overseas IP addresses emerge in a provincial life forum.
  5. The request body contains a large number of same phone numbers.

2. JavaScript Transparent Human-Computer Recognition

In addition to access control, another common idea is to determine whether a request comes from an automated tool by collecting operating behaviors, device hardware information, and fingerprints in the web environment using JavaScript. The idea is simple, but it is painstaking for professional security teams to ensure the accuracy of the collected information and risk judgment model in the front-end confrontation environment without any secret.

3. Abnormal Behavior Detection

When it comes to abnormal behaviors, most of us think of speed limiting for overly active servers. That’s right, but there are a lot of details about speed limiting. For example, which path is used as the speed limiting condition?

4. Threat Intelligence Capabilities

Collaborative threat defense is a powerful way to ward off the threat of crawlers. In the aviation industry, air tickets are always the focus of crawlers. From the perspective of crawlers, a crawler behind a scalper or travel agency often visits major airlines to obtain the most complete fare information. Therefore, when we detect a crawler visiting A, B, C, D, and E airlines, is the crawler likely to crawl on the X airline? Yes, of course. This is not an assumption, but a fact we have encountered in actual traffic.

A Good Anti-Crawler System Should Reflect the Value of Users

Attack and defense are always dynamic. No policy is ideal for all scenarios. Therefore, a good security product should reflect users’ value and help security engineers make full use of their expertise and experience. The Alibaba Cloud security team is committed to providing crawler risk management products and creating a set of flexible “tools”, helping users to skip cumbersome implementation details and deploy protection rules directly at the policy or even business level. In addition, Alibaba uses its massive data and computing power, elastic capacity expansion, and threat intelligence on the cloud to help users customize anti-crawler systems quickly.

Conclusion

Anti-crawler and anti-anti-crawler technologies are at an endless war. Like any war, it is the fight for resources. Because a large majority of attacks are at a basic level, hardening your website, app and API’s security can protect you from most crawlers. For advanced crawlers and anti-anti-crawlers, using professional products such as that from the Alibaba Cloud Security Team can provide you with an additional layer of protection.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com