Protect against Web Crawlers with Alibaba Cloud’s Anti-Bot Service
By Alibaba Cloud Security Team
Alibaba Cloud Anti-Bot Service is a new security product launched by Alibaba Cloud Security early this year. The service provides anti-bot solutions for Web applications, HTML5 websites, APIs, and mobile apps, and manages crawlers in an orderly manner.
Malicious Crawlers Pose High Risks to Businesses
With the Internet-based development of traditional industries and the data-based development of major businesses, crawlers have gradually become a point vulnerable to risks. According to network data statistics, more than 60% of Internet traffic is automatically generated in bulk by crawlers.
In a broad sense, crawlers are not simply defined by “data crawling”. Malicious hackers use some automated programs (crawlers) to attack businesses and commit fraud, for example, through credential stuffing, seat occupancy, ticket snatching, ranking, interface abuse, and red packet scalping. All these behaviors have a significant profit-seeking feature. Common crawlers are mainly found in industries with high value data, original content, and high profit points, for example, the aviation, e-commerce, consultation, data, finance, and travel industries.
Malicious bot traffic often causes a series of security issues for enterprises, and results in business losses. Moreover, excessive requests may cause unavailability of servers.
Manual Anti-Crawler Implementation Is Challenging
Not all crawlers are malicious. Friendly crawlers, such as search engine crawlers, third-party partner programs, and robot protocol programs, are harmless and may be beneficial for websites. A robust cybersecurity operations and maintenance (O&M) needs to not only distinguish normal requests from crawler requests, but also allow friendly crawlers. This requires that crawlers be accurately identified and detected, and that different types of crawlers be classified and labeled.
Another difficulty lies in the diversity of business channels. With the rapid development of Internet in the industry, especially the rise of mobile terminals, the traffic of HTML5 websites, apps, and applets on mobile terminals accounts for a large proportion of the customers’ business traffic, in addition to basic Web traffic. Crawlers often attack the business channels with the weakest protection and the lowest attack cost among all channels. They continuously switch between different channels in an attempt to find the weak link in the security. If the security scheme is relatively simple, not all aspects of the business can be protected, and the business is still affected.
In the process of continuous confrontation, crawlers also learn to evolve from the initial simple automated scripts to scripts that can simulate common user access requests, community broadband IP addresses, page browsing pause operations, normal business process paths, and so on. This makes it increasingly difficult to identify malicious crawlers.
Despite these many difficulties, there are still solutions. Confrontation with crawlers is like a game. Businesses can utilize the advantages of a cloud ecosystem to reduce protection costs and improve the timeliness and accuracy of identification and detection. This forces crawlers to give up when the cost of camouflage is higher than the benefits that can be obtained. We have established this independent detection system and protection system to identify all crawlers without eliminating all of them. We use AI models to identify crawlers, making it difficult for crawlers to perceive the logic of recognition, and slowing down the progress of mutation. We also use bot verification methods to implement engine judgment at the second layer, in order to flexibly process recognition results and further reduce false positives for common users.
Managing Crawlers in an Orderly Manner
Alibaba Cloud Anti-Bot Service is a new security product launched by Alibaba Cloud Security.
This service works in SaaS mode, and reverse proxy access is lightweight and flexible. Layer-7 traffic is forwarded once, and bot traffic is identified and filtered by the comprehensive anti-bot service engine on the cloud. This reduces the negative impacts on the customers’ businesses caused by malicious automated programs. Clean business traffic is then forwarded to the origin site to ensure normal business operations.
The service provides a complete set of crawler detection modules, which are divided into the basic protection layer, cloud intelligence layer, and machine learning layer. These modules transfer and identify information, provide tools to customize crawler features and rules based on traffic, share the industry crawler intelligence data on the cloud, and customize machine learning algorithms for customers’ businesses. With these modules, customers can quickly build a custom anti-bot policy system.
In addition to quickly identifying the behavioral characteristics of crawlers, the service can also handle the recognition results of crawlers of different risk levels in different ways, that is, allowing friendly crawlers and blocking malicious crawlers. If a suspicious crawler is detected, a challenge or verification is performed to make a final assessment.
The service also provides a data visualization module, which displays data (including associations between data) from different dimensions, and allows users to explore the relationship between the crawler feature data and protection data to implement continuous communication and iteration. The module not only clearly displays every step of crawler intrusion, but also improves the users’ capability to determine anti-bot policies. The data module also integrates Alibaba Cloud’s Log Service. Users can query and locate detailed log content to view the protection status and traffic details.
At present, Anti-Bot Service is mainly applicable to the following scenarios:
Key Advantages of Anti-Bot Service
Deployment on the Cloud
Technical experts update product rules on the cloud to quickly tackle real-time risks.
Cloud resources can be flexibly scaled up or down at any time to cope with traffic peaks. This feature helps customers reduce the costs incurred when new machines need to be added, for example, during major promotions.
The threat intelligence resources on the cloud are abundant. These resources help customers detect centralized attacks against the industry, and the intelligence data can be used in the defense systems of industry customers.
Through crawler behavior analysis in a wide range of industries, Anti-Bot Service detects malicious crawlers based on relational networks.
The purpose of Anti-Bot Service is very clear, that is, identifying centralized attacks in the industry and sharing the risk control information within the industry.
Anti-Bot Service has accumulated millions of crawler-specific IP addresses and UAs that are already known in the network to be associated with black and gray industries.
Anti-Bot Service also provides access to threat intelligence data that is generated by hundreds of millions of devices connected with Alibaba Cloud services.
There are many vendors worldwide that provide products for malicious bot traffic management, and their focuses are different. Alibaba Cloud’s Anti-Bot Service focuses on multi-layer protection. In addition to bot recognition and other detection methods, Anti-Bot Service also uses behavior analysis, threat intelligence, machine learning algorithms, and other methods to help with crawler detection. It covers diversified environments including apps, and on-cloud reverse proxy access is also lightweight and flexible.
To learn more about Alibaba Cloud’s Anti-Bot Service, visit the official documentation at https://www.alibabacloud.com/help/doc-detail/84635.htm?spm=a2c41.12663126.96.36.1998b7bb5Wpe8dx