Leaving Time-Consuming and Troublesome SQL Optimization to DAS

1) Challenges

  • 1) Are the capabilities reliable?
  • 2) Are the capabilities comprehensive?

Challenge 1 — How can we select a reliable optimization recommendation algorithm to generate reliable suggestions?

SELECT * 
FROM t1
WHERE time_created >= '2017-11-25'
AND consuming_time > 1000
ORDER BY consuming_time DESC
IX1(time_created)
IX2(time_created, consuming_time)
IX3(consuming_time)
IX4(consuming_time, time_created)
  • Lack of the WHAT-IF kernel capability: We cannot use the kernel’s database optimizer to quantitatively evaluate the costs of the candidate optimization plans.
  • Lack of statistical information: Evaluating candidate optimization plan cost is essentially calculating the execution plan cost, which cannot be done without statistical information.

Challenge 2 — How can we ensure sufficient SQL compatibility?

Challenge 3 — How can we build a capability-based test set with sufficient coverage?

a)schema设计:如表、已有索引、约束等;
b)各类统计信息数据;
c)环境参数等等。

Challenge 4 — How can we build diagnosis and optimization capabilities for a large number of SQL statements?

2) Capability Building

2.1 Core Technical Architecture

Figure 1. Core technical architecture of the SQL diagnosis and optimization engine
  • SQL parsing and verification: The engine parses query statements, verifies whether they meet the requirements, identifies the components of the query statements to form a syntax tree comprising predicates, predicate types, sorting fields, aggregation fields, and query fields, and determines the data types of fields in query statements. The engine also verifies whether the tables and fields used in SQL statements comply with the schema design of the target database.
  • Candidate index generation: The engine generates multiple candidate index combinations based on the syntax tree formed during SQL parsing and verification.
  • Cost evaluation: The engine uses a built-in optimizer independent of the database kernel to retrieve database statistics and stores the statistics in the engine’s cache. The built-in optimizer calculates the costs, based on statistics. It evaluates the cost of each index and SQL rewriting method, then selects the optimal index or SQL rewriting method based on the costs.
  • Index merging and selection: The input of the engine can be one or more query statements or all query statements of the entire database instance. During index recommendation for multiple query statements, indexes recommended for different query statements may be identical or similar to each other or have the same prefixes. In some cases, physical indexes identical or similar to the recommended ones may already exist or share the same prefixes with them.

2.2 Building a Capability-based Test Set

Figure 2. Test case system

3) Practices

4) Conclusion

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com