Alibaba EagleEye: Ensuring Business Continuity through Link Monitoring

Alibaba Group EagleEye Monitoring System

As Alibaba Group’s link tracking system, EagleEye monitors the link status across the whole group although EagleEye’s own services are not included in the transaction link itself. It covers the majority of Alibaba Group’s scenarios covering the remote call to middleware, and plays a key role in troubleshooting. EagleEye ensures the stability of individual systems and provides significant support to the whole technical team towards winning this “battle”.

EagleEye Performance Improvements

Whether during normalized stress testing, end-to-end stress testing, or during Double 11 Shopping Festival itself, the major problem that EagleEye is faced with is how to ensure the stability of its own system under the impact of large volumes of data, present status of individual systems more quickly, and help developers determine and locate problems. This year, with a series of alterations and upgrades, EagleEye has performance improvements and significantly helps developers on the business side perform troubleshooting in an efficient and effective manner.

Computing Capability Sinking

In the early stages, EagleEye’s link tracking and statistics were implemented based on detail logs, with complete detail logs collected and then aggregated in stream computing. As business size grew, log data volume increased sharply, and computing volume also grew linearly, which led to high resource consumption. In addition, the number of logs reaches a peak during end-to-end stress testing or big promo events, which often results in overloading computing cluster systems, data latency, or even data loss.

Scenario-Specific Link

EagleEye always focuses on calls to the middleware layer, but Alibaba has a large business volume and complicated systems. Each component has clear and specific functions, therefore some data in the middleware layer is difficult to associate with business data. As a result, it is a challenge to implement link tracking, troubleshooting and capacity planning targeting specific business scenarios.

Refined Monitoring

EagleEye’s link data plays a crucial role in determining and locating problems. Richer data forms and presentations significantly improve troubleshooting efficiency.

Richer Ecology

As one of Alibaba Group’s efficient troubleshooting tools, EagleEye provides great service for business-side developers to quickly locate and solve problems, therefore reducing failure duration and improving maintenance efficiency. In fact, EagleEye also includes a large volume of data in its bottom layer. Over the past year, we have been leveraging and mining this data in an attempt to make the most of this data. Meanwhile, we want to establish an ecosystem based on this data to help users grow their business. During this process, we created many useful products, laying a solid foundation for Alibaba Group’s technical development.

Conclusion

Last year’s Double 11 promotional event was a great success. The technical team won this “battle” as EagleEye provided perfect support for the teams at Alibaba. The system’s stability and real-time capability reached the expected standard during end-to-end testing and during the Double 11 Shopping Festival. It provided powerful business support and improved troubleshooting efficiency.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com