In the past 10 years, databases on the cloud have become universal. In comparison to traditional on-premises databases, cloud databases solve the basic database O&M requirements, such as resource shortage, flexibility, high availability, backup, and monitoring. However, there are still difficult challenges for general developers. Problems of how to quickly run diagnostics and continuously optimize databases persist
To solve these problems, Alibaba Cloud has released Database Autonomy Service (DAS) for cloud database users, which implements database self-awareness, self-diagnosis, self-repair, and self-security based on machine learning and expert experiences.
DAS is the industry’s first cloud service that provides database autonomy. It not only provides six autonomy features for database engines (OLTP — represented by ApsaraDB for RDS and ApsaraDB for PolarDB, NoSQL — represented by ApsaraDB for Redis, and OLTP — represented by AnalyticDB), but it also supports hybrid clouds and offers enterprise-level database O&M features such as SQL audit and high-risk requests identification.
From data-driven tests based on expert experience and machine learning, it is evident that DAS will allow you to stream a closed loop for anomaly detection, root cause analysis, repairs and optimization, tracing analysis, and feedback. No human intervention is required for realizing the autonomous, stable, and efficient operations of databases.
Six Core Autonomy Features
Feature 1: Around the clock detection of anomalies: The anomalies of database workloads are being detected in real-time by using machine learning algorithms. Unlike the threshold-based alert method, this approach can detect database anomalies in a timely manner, rather than relying on crashes and faults.
Feature 2: Auto recovery from anomalies: After detecting an anomaly, DAS automatically performs root cause analysis, and implements any required damage control, repair, or optimization operations to facilitate database auto recovery, reducing the impact on enterprise services.
Feature 3: Auto optimization: DAS constantly performs SQL review and optimization on your database based on the global workloads and with real service scenarios instead of only optimizing individual SQL statements. This means DAS continuously protects your database like an around the clock database administrator.
Feature 4: Intelligent parameter tuning: Databases have hundreds of parameters and various user cases, which make it impossible to effectively manually tune the parameters to the optimal configurations. The DAS team has developed the intelligent parameter tuning feature in cooperation with the DAMO Academy. This feature combines AI technologies with intelligent stress testing to recommend an optimal parameter for each database instance.
Feature 5: Auto scaling: DAS automatically calculates and predicts the business model and capacity level of databases based on machine learning, to achieve proactive auto scaling.
Feature 6: Intelligent stress testing: DAS provides you with custom test scenarios. It can automatically learn the service model of a database and generate workloads by simulating real services whenever required. You can use varied and tailored test scenarios that help DAS solve database management problems such as user challenges with major promotions and database selection.
Database autonomy took a lot of hard work and time to develop. We divide the autonomy capabilities of a database into 5 levels:
- Level-0: Database O&M solely relies on user intervention without any product assistance.
- Level-1: Basic monitoring and alert information are provided but no optimization suggestions are generated.
- Level-2: In some scenarios, diagnosis and optimization suggestions are provided, but manual intervention is still required to decide on whether to adopt and apply the suggestions. The SQL diagnostic engine is an example.
- Level-3: In scenarios such as SQL throttling and auto scaling, comprehensive autonomy can be implemented.
- Level-4: Autonomous databases are provided. DAS is currently in the middle of a hard-working procedure to achieve level-4 status.
We have been practicing database autonomy services for 6 years. Starting from 2014, we began thinking about how to convert the database administrator experience into products that would provide more efficient and intelligent database services for business development. We have built a rule-based SQL diagnostic engine. By inputting one or more SQL statements, the SQL diagnostic engine will directly output optimization suggestions.
In 2016, the web version of CloudDBA was released with an upgraded SQL diagnostic engine. The CloudDBA in Alibaba Cloud allows you to directly view database workloads and run SQL diagnostics and optimization.
In 2018, we developed and improved our database autonomy services through our Alibaba businesses and application scenarios.
As of April 2020, through the autonomous database platform more than 42 million SQL statements were automatically optimized, more than 4 PB of empty space was automatically reclaimed, and 27 TB of memory was automatically optimized. In November 2019, to better serve our clients, we upgraded the solution of Hybrid Cloud Database Management (HDM), CloudDBA, and database autonomy capabilities to the database autonomy service, DAS.
Core Innovations and Breakthroughs in Four Areas
The world’s first comprehensive database autonomy engine: DAS implements comprehensive database autonomy in various scenarios through centralized decision-making, conflict settlement, decision-making, and decision distribution in special autonomy scenarios based on root cause analysis and aggregated instance information.
The world’s first external cost-based SQL diagnostics engine: DAS uses a cost-based diagnostic engine, a set of external optimization tools, and a union of adaptive statistical data collection mechanisms to achieve planned cost-evaluation procedures. Therefore, the autonomous database will implement precise SQL diagnosis and output optimization suggestions.
A global workload-based SQL optimization technology: DAS optimizes performance in the global workloads of the database. It considers the overall database workload metrics for potential effects on the overall database performance, for example, the SQL execution resource usage and the read/write ratio. In this way, DAS can minimize memory consumption while maximizing global database performance.
Machine learning-based workload anomaly detection and prediction: Due to the workload anomaly detection feature of machine learning, DAS can automatically detect any abnormal SQL statements that cause workload changes and triggers global optimization. This means that the passive optimization mode has been changed to an immediate and active global optimization mode.
The international academic community recognizes the research results of the DAS team and the DAMO Academy.
Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Application, WWW, 2018
iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases, VLDB, 2019
Diagnosing Root Causes of Intermittent Slow Queries in Large-Scale Cloud Databases, VLDB, 2020
DAS can help enterprises save 90% on database management costs and reduce O&M risks by 80%. This allows users to focus on business development, stay innovative, and keep the business running optimally.
Visit www.alibabacloud.com/product/das to learn more about Alibaba Cloud Database Autonomy Service (DAS).