Leaving Time-Consuming and Troublesome SQL Optimization to DAS

By Sigan, senior database expert from Alibaba Cloud ApsaraDB

Databases play an increasingly important role in our business systems. Like other companies, most Alibaba services rely heavily on databases. A minor database error may cause a significant impact on our business. Stabilizing and constantly optimizing the databases have always been crucially important.

Database optimization is usually implemented in three levels:

1) Application level: Optimizes application code logic to process data more efficiently.
2) Instance level: Adjusts environment parameters to improve the operating efficiency of instances.
3) SQL level: Ensures data is optimally retrieved through physical database design and SQL statement modification.

Developers are usually familiar with the first two levels and somewhat unfamiliar with SQL optimization, which may even lead to disputes over who should be responsible (the database administrator or application developer). However, SQL optimization is critical to database optimization. Poor SQL performance may lead to low execution efficiency or even interrupted services.

Based on experience, about 80% of database performance problems can be solved through SQL optimization. However, SQL optimization is a very complex process that requires a wide range of database expertise and experience.

For example, it requires knowledge about how to accurately identify bottlenecks in an execution plan and optimize the execution plan by improving the physical database design or rewriting SQL statements. In addition, SQL optimization is time-consuming and costly due to the large and constantly changing SQL workload and basic data.

SQL diagnosis and optimization is one of the most important services of Alibaba Cloud Database Autonomy Service (DAS). DAS performs diagnostic analysis by using SQL statements as the input and provides expert optimization suggestions such as indexing, statement optimization, and expected benefits. You can obtain expert suggestions on SQL diagnosis, rewriting, and optimization to maximize SQL execution performance, without profound knowledge about database optimization.

In addition, based on this capability, the DAS auto-SQL optimization service has pushed SQL optimization to a higher level. The service transforms passive optimization from relying on manual work into intelligence-based active optimization, enabling unattended SQL optimization through an autonomous process.

Next, we will explain the capability building process for the DAS SQL diagnosis and optimization service in detail.

1) Challenges

  • 1) Are the capabilities reliable?
  • 2) Are the capabilities comprehensive?

Answering these two questions perfectly poses several challenges:

Challenge 1 — How can we select a reliable optimization recommendation algorithm to generate reliable suggestions?

SELECT * 
FROM t1
WHERE time_created >= '2017-11-25'
AND consuming_time > 1000
ORDER BY consuming_time DESC

In most cases, the following four candidate indexes are generated based on a specified rule:

IX1(time_created)
IX2(time_created, consuming_time)
IX3(consuming_time)
IX4(consuming_time, time_created)

Which index (or indexes, depending on the AND or OR relationships between indexes) is (or are) finally recommended to users? It is difficult to give an exact answer. This example only uses a simple single-table SQL query. What should we do with complex SQL queries (for example join-table queries) and queries with complex subqueries? The recommendation accuracy is even lower.

To solve this problem, the DAS SQL diagnosis and optimization service is implemented based on cost models, in the same way as the database optimizer. It quantitatively evaluates all possible candidate recommendations (or at least most of them in some extreme cases, where a suboptimal solution is obtained for NP optimization problems) and finally makes recommendations based on the execution costs. Using this algorithm, we find ourselves faced with other challenges when providing support for open-source databases such as MySQL.

  • Lack of the WHAT-IF kernel capability: We cannot use the kernel’s database optimizer to quantitatively evaluate the costs of the candidate optimization plans.
  • Lack of statistical information: Evaluating candidate optimization plan cost is essentially calculating the execution plan cost, which cannot be done without statistical information.

Challenge 2 — How can we ensure sufficient SQL compatibility?

Challenge 3 — How can we build a capability-based test set with sufficient coverage?

However, building a desirable test-case library is very difficult. The two main challenges are as follows:

(1) Many factors affect SQL optimization. For example, hundreds of factors affect index selection, and the factors can form various combinations. As a result, we have an enormous feature set for test cases. It also requires significant effort to map these features to test cases.
(2) Professional knowledge and massive amounts of information are required for designing test cases, even for a single test case. For example, an index recommendation test case involves:

a)schema设计:如表、已有索引、约束等;
b)各类统计信息数据;
c)环境参数等等。

Challenge 4 — How can we build diagnosis and optimization capabilities for a large number of SQL statements?

2) Capability Building

2.1 Core Technical Architecture

Figure 1. Core technical architecture of the SQL diagnosis and optimization engine

Figure 1 shows the core technical architecture of the SQL diagnosis and optimization engine, which consists of optimizers independent of the database. This engine adaptively collects statistics and calculates the costs of the execution plans, to address the lack of the WHAT-IF kernel capability and statistical information. The engine works in the following process:

  • SQL parsing and verification: The engine parses query statements, verifies whether they meet the requirements, identifies the components of the query statements to form a syntax tree comprising predicates, predicate types, sorting fields, aggregation fields, and query fields, and determines the data types of fields in query statements. The engine also verifies whether the tables and fields used in SQL statements comply with the schema design of the target database.
  • Candidate index generation: The engine generates multiple candidate index combinations based on the syntax tree formed during SQL parsing and verification.
  • Cost evaluation: The engine uses a built-in optimizer independent of the database kernel to retrieve database statistics and stores the statistics in the engine’s cache. The built-in optimizer calculates the costs, based on statistics. It evaluates the cost of each index and SQL rewriting method, then selects the optimal index or SQL rewriting method based on the costs.
  • Index merging and selection: The input of the engine can be one or more query statements or all query statements of the entire database instance. During index recommendation for multiple query statements, indexes recommended for different query statements may be identical or similar to each other or have the same prefixes. In some cases, physical indexes identical or similar to the recommended ones may already exist or share the same prefixes with them.

2.2 Building a Capability-based Test Set

Figure 2. Test case system

The basic idea for building the capability-based test set is to formalize the feature-based description of test cases through feature extraction, to build a sufficiently complete, formal feature-based test-case library.

Within Alibaba Group, we have been collecting and storing all SQL statements executed in Alibaba’s internal database instances in real-time. Based on Alibaba’s diversified services and SQL scenarios and formalized feature descriptions, we have analyzed all our online SQL resources and searched for real cases that match the specific features, extract the required information, and finally build a test case library. Note that all data in the test case library was collected from Alibaba Group’s internal business systems. All online data such as statistical information has been encrypted and desensitized. This process is fully automated.

Finally, we compare the formal feature-based test-case library with features of the actual test-case library to evaluate the test completeness and coverage. This step answers the following questions:

1) Which formal feature-based test cases are covered by the actual test-case library? What is the percentage?
2) Which formal feature-based test cases are not covered by the current diagnosis and optimization capabilities? Which test cases fail the verification?
3) Which formal feature-based test cases encounter frequent regression problems in a specific period of time?
4) What is the test case coverage of each capability?

3) Practices

1) Self-service optimization: The service diagnoses exceptions reported by users and provides expert optimization suggestions.
2) Auto-optimization: The service automatically identifies slow queries in the workload of business database instances, proactively diagnoses these queries, and generates optimization suggestions. After a comprehensive evaluation, it schedules optimization tasks. Then it implements the optimization task and tracks the effectiveness of the solution. This is a fully automated, closed-loop optimization process that improves database performance and continuously keeps the database instances in the optimal state.

Over the past three years, the SQL diagnosis accuracy remained above 98%, and more than 75% of the diagnosed slow queries were provided with optimization suggestions.

By the end of March 2020, more than 42 million slow SQL queries had been optimized through auto-SQL optimization, reducing Alibaba Group’s slow SQL queries by about 92%.

More importantly, the SQL diagnosis and optimization service has built an effective and proactive analysis and feedback system. Online diagnosis failures, user-reported cases, and cases rolled back during auto-optimization have been automatically sent to the case system to drive rapid iteration and evolution of the service.

4) Conclusion

SQL diagnosis and optimization is free if you have an Alibaba Cloud DAS instance. Visit the Database Autonomy Service page to apply for a free trial of DAS.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.