A Note on Performance Degradation When Migrating from Oracle to MySQL
After transitioning from Oracle to MySQL, a large number of customer systems have encountered performance issues during peak traffic, when CPU usage reaches 100%, or the background CRM system becomes complex and causes a service fault due to SQL accumulation. As far as I recall, when Taobao first migrated from Oracle to MySQL, there were also lots of SQL performance problems, among which, the one that left the deepest impression on me is the subquery problem. The original version was MySQL 5.1, which is poorly optimized for subqueries, resulting in many systems that migrated from Oracle to MySQL encountering performance issues. As a result, our development regulations have since specified that no complex table shall be joined in the foreground transaction system. In this article, I will list a few common operations that may cause problems during migration from Oracle to MySQL:
1) When the customer executes a 0 data migration, there are some SQL statements that need to be changed, some that can be changed but don’t need to be, and some that should not be changed. The SQL statements that need to be changed depend on the situation. I will try to give you some guidance to help avoid problems in the future.
2) Database developers and administrators must be guided on how to reasonably use MySQL to its full potential.
1. Parallel Processing
Oracle can physically divide a large serial task (any DML and common DDLs) into multiple smaller parts. These smaller parts can be simultaneously processed, and finally combined to obtain the final result. This allows Oracle to use parallel processing technology to operate very complex SQL queries in OLAP applications.
Two methods for enabling parallel queries:
(1) Use a hint during the query: select /*+ parallel(4) / count() from test_a, and specify a parallel query with a degree of parallelism of 4.
(2) Use an alter table: alter table test_a parallel 4; and inform Oracle to use a parallel query with a degree of parallelism of 4 when creating an execution plan for the table.
1.2 Transition Suggestions
MySQL does not support parallel processing. Therefore, after applications are migrated from Oracle to MySQL, only SQL statements subject to parallel processing can be used. Processing advice:
(1) AnalyticDB products, such as ADS, can be used on the Alibaba Cloud platform to process parallel analysis queries in Oracle.
(2) Service decomposition is performed on a complex SQL statement to split it into single SQL statements, and then the computing result is put in the application for processing.
2. Logical Reads, Physical Reads, and Consumed Time during SQL Execution
The Oracle optimizer contains more abundant and complete optimization algorithms than the MySQL optimizer. Just in terms of table connections, Oracle supports three algorithms, i.e., nested loop, hash join, and sort-merge join, while MySQL only supports nested loop. This means that MySQL often performs poorly when processing complex queries associated with large or multiple tables. How can we then identify which queries should not be migrated to MySQL? Well, it can be determined based on the logical reads, physical reads, and consumed time during SQL execution.
Physical reading: reads data from a data block to a buffer cache. Logical reading: refers to reading a data block from a buffer cache. Execution time: refers to the time it takes for Oracle to execute one SQL statement.
(1) The first query for table t: select * from t.
(2) The second query: select * from t.
The first query has 6 physical reads, and the second query has no physical read and 6 logical reads. A data block is cached to a buffer cache during the first read, thus the second read and modification of the data block is performed in the buffer cache.
Logical reads are an important metric to look at when optimizing Oracle performance, as it can be easily collected and provides a wealth of information concerning the workload on the database engine. The logical read refers to the number of blocks read from the buffer when executing an SQL statement.
2.2 Transition suggestions
MySQL can rapidly execute simple SQL statements, but has difficulty executing the same statements that have high logical read, physical read, or execution requirements after migration to MySQL. This means that we need to make some transitions:
(1) When executing a single table query with long logical reads, physical reads, or execution time in SQL, it’s possible to encounter a full table scan (dump requirements), or poor indexing. In this case you can use a read only node to handle the dump or optimize the indexes.
(2) When executing a multiple table query with long logical reads, physical reads, or execution time, the AnalyticDB product ADS can be used for processing.
(3) When executing a multiple table query with long logical reads, physical reads, or execution time, service decomposition can be performed to split an SQL statement into single SQL statements, and then the result can be put processed in the application.
Note: An SQL query with more than 100 million logical and physical reads and an execution time of more than 5 seconds is considered a large SQL query.
The number of in(…) parameters in Oracle is limited to within 1000. Although MySQL has no limit in number, it has a limit in SQL length. Meanwhile, a MySQL optimizer uses binary search when optimizing in(…) queries, meaning that the more parameters in the query the worse performance will be, so the number of parameters should generally be kept within 100.
3.2 Transition Suggestions
Oracle：select * from t where id in(id1,id2…..id1000);
MySQL：select * from t where id in(id1,id2…..id100);
MySQL 5.6 and below may encounter performance bottlenecks when processing subqueries associated with a large table because its optimizer only supports nested loop. I once participated in a large project that involved migrating from Oracle to MySQL. The database version at the time was 5.5 and the original Oracle application had stored a large number of subqueries, so after migrating to MySQL, a significant number of SQL statements were accumulated which filled up the available connections and overloaded the database’s CPU. We eventually had to modify the subqueries to recover the system.
WHERE emp_no IN
(SELECT emp_no FROM salaries_2000 WHERE salary = 5000);
The MySQL processing logic involves traversing each record in an “employees” table and then substituting the records to a subquery
4.2 Transition Suggestions
Rewrite the subquery
FROM employees emp,
(SELECT emp_no FROM salaries_2000 WHERE salary = 5000) sal WHERE emp.emp_no = sal.emp_no;
Note: Both MySQL 5.1 and 5.5 carry a large risk of modifying the subquery to an association.
MySQL 5.6 can avoid the awkward problem of rewriting the subquery.
5. View Optimization
Indexing cannot be created on a common view as they do not store substantial information and the data it operates on comes from basic tables. So what can we do when we need to perform a large number of low efficiency queries on this kind of view? Oracle has a materialized view that is physically real and allows indexing.
However, MySQL does not support materialized views. Therefore, performance may be degraded after being migrated to MySQL.
5.2 Transition Suggestions
Split the view and execute it in the application.
6. Functional Index
Functional index refers to a function-based index, similar to a common index. The only difference is that a common index is created based on a column, while a functional index is created based on a function. This, of course, will partly influence data insertion because the index is generated after the function has been computed. However, inserted data is usually quite small, while a data query involves a much larger data set. Slightly lowering insertion speed can help optimize query speed.
MySQL does not support functional indexes, so after SQL statements that use functional indexes in Oracle are migrated to MySQL, full table scan performance could suffer due to the inability to use functional indexes.
For example, when executing the following SQL statement:
select * from emp where date(gmt_create) = ‘2017–02–20’
Even though an index has been created on gmt_create, the emp table will still be fully scanned, and then a comparison is performed on the gmt_create field after deleting the minutes and seconds. If we create a function-based index, for example, index emp_upper_idx on emp(date(gmt_create)), we only need to scan a small part of the data in intervals and then obtain rowid to access the table data. This significantly improves speed.
6.2 Transition Suggestions
Functions on fields are removed by means of SQL rewriting, such that the fields can be indexed:
select * from emp where gmt_create>=’2017–01–20 00:00:00' and gmt_created<’2017–01–21 00:00:00'
(1) Transition is required because MySQL does not support parallel query (keyword: parallel).
(2) MySQL optimizer is relatively ineffective, so special attention should be paid to SQL statements that have long logical reads, physical reads, and execution time.
(3) For MySQL, the number of in(…) parameters should be kept within 100.
(4) MySQL is poor at subquery optimization, so it is recommended to perform subquery transition or use MySQL 5.6.
(5) View transition is suggested since MySQL does not support materialized views.
(6) SQL rewriting is suggested to avoid indexing failure due to MySQL not supporting functional indexing.