Learn How an Open-Source Microservice Component Has Supported Double 11 for the Past 10 Years
In order to win this inevitable battle and fight against COVID-19, we must work together and share our experiences around the world. Join us in the fight against the outbreak through the Global MediXchange for Combating COVID-19 (GMCC) program. Apply now at https://covid-19.alibabacloud.com/
By Zhao Yihao, nicknamed Suhe, from the Alibaba Developer team.
With everyone now logging in from home as the coronavirus spreads, “lag,” “slow,” and “buffering” might just probably be some of the words starting to creep up into your vocabulary. More than ever before, we’re working from home, schooling from home, and binging the latest online show or movie, and so several websites are seeing unprecedented spikes in traffic and these spikes are sustaining for long periods of time. Even websites offering e-learning services have been frequently bogged down by the number of requests it was getting. And, similarly, video streams for conference calls and even online video platforms were lagging, frequently buffering, or being downgraded to a lower quality setting. Something I’m sure everyone’s started to notice.
This happened in China and it’s now happening all over the world. Such availability issues seriously affect user experience and can even reduce efficiency at the workplace, and do so much than you might originally think. To deal with these kinds of problems, developers need to prepare in advance by taking preventive measures and be equipped to quickly stop losses in case of traffic spikes.
In recent years, developers have shown great concern for the stability of microservices. As businesses evolve from monolithic architectures to distributed architectures and adopt new deployment modes, the dependencies between their various services have become increasingly complex, which have caused business systems to face new challenges in providing high availability.
How can we ensure the availability of services? This is a very complex topic that must necessarily take into account several different considerations, including throttling and degradation.
Why Do We Need Throttling and Degradation?
Traffic by its very nature is random and unpredictable. This is something we at Alibaba have had to deal with at midnight every year on Double 11.
System capacity is always limited. If the traffic peaks exceed system capacity, requests are handled slowly and quickly accumulate, which leads to several problems like high CPU utilization and loads and can ultimately cause the system to crash. Therefore, we need to limit such traffic spikes, so that our services will not crash while handling as many requests as possible.
Online service applications often call other modules, which may be other remote services, databases, or third-party APIs. When you make a payment in China, for instance, the API provided by UnionPay may need to be called. Or when you query the price of an item, a database may need to be queried. However, the stability of the dependent service cannot be guaranteed. If the dependent service is unstable, the response time for the request will go longer than usual, and the response time for the method that calls the service will also go longer, and threads will build up. Ultimately, this may exhaust the thread pool of the business, resulting in services becoming unavailable.
Modern microservice architectures are distributed and consist of many different services. Different services call each other, forming complex calling chains. The problems described above may be magnified in calling chains. For example, if a part in the complex chain is unstable, a cascade effect may eventually ensue, causing the entire chain to temporarily go offline. So, to be able to deal with this, we need to perform circuit breaking on unstable services to temporarily cut off unstable calls and avoid any crashes caused by local instability.
But does this mean that throttling is not required for small services? Or does it mean circuit breaking is not required for simple microservices models?
The reality of the situation is that this has nothing to do with the request volume or complexity of the relevant architecture. Rather, in many cases, the failure of an edge service may affect the entire business and cause huge losses. As a result, we need to be aware of failure-oriented design. During normal times, we need to plan the capacity, sort out strong and weak dependencies, and then rationally configure throttling and degradation rules to prevent problems before they actually occur.
You may ask then, does there really exist any methods that can be used to quickly ensure high availability? How should my team achieve even and smooth user access? How can I prevent the impact caused by external factors?
To answer these questions as well as address other issues, let’s turn to how you can use Sentinel, a throttling component, which has supported the stability of Alibaba’s Double 11 Shopping Festival over the past 10 years.
Sentinel: A Throttling and Circuit Breaking Component Used for Cloud-native Microservices
What is Sentinel: Introduction and Specifications
Sentinel is Alibaba’s open-source throttling component that was developed with distributed service architecture systems in mind. To date, it has earned 11,071 stars on GitHub.
Starting with traffic, this component ensures the stability of developers’ microservices in multiple ways, including through throttling, traffic shaping, circuit breaking, and adaptive system protection. Sentinel has supported the core traffic scenarios during Alibaba’s Double 11 Shopping Festival over the past 10 years, during flash sales, cold starts, message load shifting, cluster throttling, and real-time circuit breaking for unavailable downstream services. It effectively ensures the high availability of microservices.
Sentinel has two core concepts: resources and rules. Resources are code blocks (or calls) that need to be protected. For example, SQL access, RESTful API access, Dubbo service calls, reactive services, API gateway route access, and any code blocks that can be considered as Sentinel resources.
You can manually track resources by using the Sentinel API or annotations, or use the framework adaptation module provided by Sentinel to introduce dependencies for one-click access. Rules are the measures for controlling resources. For example, you can configure throttling and degradation rules for a service or method to implement high-availability protection.
The core features and technologies of Sentinel are summarized in the following points:
- Real-time statistics based on a sliding window structure provide excellent performance and ensure statistical accuracy.
- Highly extensibility, including the extension of basic core capabilities and SPI interfaces, allows you to extend throttling, communication, monitoring, and other features.
- Diversified throttling policies, such as policies determining resource granularity, call relationships, throttling metrics, and throttling effect, support distributed cluster throttling as well as hot spot traffic detection and prevention.
- Circuit breaking and isolation are provided for unstable services.
- Global adaptive protection for system load can adjust the traffic in real time based on system resource usage.
Besides this, Sentinel also supports API gateway scenarios and provides gateway throttling for Spring Cloud Gateway and Zuul. It provides throttling for the Envoy service mesh, and supports real-time monitoring and the dynamic configuration and management of rules.
Sentinel provides a simple what-you-see-is-what-you-get (WYSIWYG) console. You can use the console to monitor services and configure and manage rules in real time.
Scenarios and Best Practices
Next, let’s discuss some common scenarios and best practices of Sentinel.
In service provider scenarios, the service provider must be protected from being overwhelmed by traffic peaks. So throttling is often performed based on the service capabilities of the service provider or based on a specific service consumer. You can evaluate the capability of core interfaces based on preliminary stress testing and configure query per second throttling. When the number of queries exceeds the threshold you set, excess query requests are automatically rejected.
To avoid problems due to unstable services when calling other services, you need to isolate and break the unstable dependent services of the service consumer. You can use various methods, such as semaphore isolation, exception ratio-based degradation, and response time-based degradation.
If the system uses a low amount of resources for a long time and experiences a traffic spike, the system may immediately crash when the amount of resources used by the system is directly pushed to a high level. If this is the case, you can use the WarmUp mode of Sentinel to allow the traffic to slowly increase to the upper threshold within a certain period of time, instead of releasing all the traffic at once. This allows the cold system to warm up, preventing it from being overwhelmed.
You can use Sentinel’s constant queuing mode for “load shifting” so that the spike in requests can be evenly distributed across a period of time. As such, the system load is kept within the request processing level and as many requests as possible are processed.
You can use Sentinel’s gateway throttling feature for traffic protection at the gateway entrance and to limit the API calling frequency for different users and IP addresses.
In the Istio + Envoy architecture, you can quickly access Sentinel RLS token servers to provide global throttling capabilities for Envoy clusters.
Sentinel’s Open-Source Ecosystem
Sentinel has a rich open-source ecosystem, covering several core ecosystems such as microservices, API gateway, and service mesh.
Soon after going open-source, Sentinel was incorporated into the Cloud Native Computing Foundation (CNCF) landscape and became one of the throttling and degradation components officially recommended by Spring Cloud. The community provides out-of-the-box adaptations of common frameworks, such as Spring Cloud, Dubbo, and gRPC, and supports reactive ecosystems as well as the Reactor and Spring Webflux asynchronous response architectures. Sentinel is gradually expanding to cover API gateway and service mesh scenarios and assuming a greater role in cloud-native architectures.
Multi-language Evolution and Prospects of Sentinel
Sentinel was initially oriented to Java microservices but is constantly exploring extensions for multiple languages. In the middle of last year, Sentinel released a native version for C++. At the same time, it also launched support for Envoy cluster throttling, which can solve the problem of multi-language throttling in service mesh scenarios.
Recently, Sentinel’s multi-language club officially launched a new member, Sentinel Go, which is the first official native version. It provides native support for throttling and degradation as well as system protection for Go microservices. You can quickly access Sentinel in a few steps and then enjoy the following capabilities:
- Accurately limit QPS at the interface level to prevent core interfaces from crashing.
- Perform load shifting to handle surging requests in queues.
- Implement adaptive traffic protection at the system level in combination with system metrics, such as load, and the real-time request volume and response time of services to automatically reject extra traffic. As such, you can ensure service continuity while maximizing throughput.
- Use real-time seconds-level monitoring to observe the real-time traffic in the system through monitoring logs.
In upcoming versions, Sentinel Go will launch a series of capabilities for ensuring stability, including circuit breaking, hot spot parameter statistics, and throttling. Meanwhile, the community will gradually provide modules that integrate commonly used frameworks and cloud-native components. And, in the future, Sentinel will continue to evolve to better support multiple languages and cloud native.
Currently, Sentinel supports Java, Go, and C++. The community will provide support for more languages in the future. We will also constantly improve throttling in API gateway and service mesh scenarios. For example, we may integrate the native Istio Service Mesh to help developers quickly access Sentinel and enjoy high-availability protection in various cloud-native scenarios.
The community also plans to support integration with cloud-native monitoring components, such as Prometheus, in the future. As such, the metric statistics from Sentinel can be used to monitor interfaces. This information can be used with the scaling mechanism of Kubernetes Horizontal Pod Autoscaler (HPA) and adaptive throttling to automatically ensure stability.
While continuing to wage war against the worldwide outbreak, Alibaba Cloud will play its part and will do all it can to help others in their battles with the coronavirus. Learn how we can support your business continuity at https://www.alibabacloud.com/campaign/fight-coronavirus-covid-19