Troubleshooting RDS Performance (MySQL, SQL SERVER and MongoDB)
A lot of questions have been raised by our users concerning why ApsaraDB for RDS sometimes perform worse than self-built databases on RDS.
Firstly, all tests must be fair if the performances are to be compared. As a public database service, Alibaba Cloud ApsaraDB for RDS must prioritize on high-availability and security, which can sometimes work against performance. Users, of course, won’t be willing to use an unstable, unsafe service no matter how powerful it may be. To ensure stability, RDS must have master and slave nodes, even setting them up in different data centers. This means that if an error occurs in one data center, the system can easily switch to another to keep providing services.
ApsaraDB for RDS must also ensure data security. After all, who would want to continue using our services if their data was stolen? As a service vendor, we must consider the security of users’ data. Because of this, ApsaraDB adds an intermediate layer to intercept SQL injection requests, and applies the highest security standards for base-level data writes to prevent data loss if the host doesn’t shut down normally.
Regarding performance, the Alibaba Cloud ApsaraDB for RDS source code team is continually making optimizations to MySQL. Both performance and stability are higher than in the community edition, as evidenced by standard benchmark tests. Here we will summarize some of the issues with the performance tests between RDS and self-built databases.
I. Network Differences
1. Available Zones
ApsaraDB for RDS can be divided into single-zone and the multi-zone RDS. For single-zone RDS, the master and slave databases are in the same data centers, while for the multi-zone RDS, the master and slave databases are in different data centers. As a result, the ECS and RDS must be built in the same zone during RDS testing.
2. Network Links
ECS to RDS contains many network links, such as ECS — >DNS — >SLB — >Proxy — >DB. Self-built databases on ECS have 2 (ECS — >ECS), meaning RDS has 3 more links than self-built databases on ECS.
3. Case Study
- A particular online company found that performance had suffered after migrating its system to the cloud.
- The application code and database configuration are identical.
RDS access links:
II. Configuration Differences
1. Specification Configuration
The RDS specification configuration mainly includes memory and CPU.
The ECS and RDS CPUs must have the same number of cores for RDS testing.
2. Parameter Configuration
(1) Security configuration: RDS applies the highest protection standards when committing transactions and flushing binlogs in order to better ensure data security.
- The innodb_flush_log_at_trx_commit parameters specify the log write frequency after the transaction is committed by InnoDB. When the value is 1, the log buffer will be written to the log file and flushed to disk during each transaction commit. 1 is the default value. This configuration is the safest, but there is a certain performance loss due to the operation of disk I/O during each transaction commit.
- The sync_binlog parameter is the frequency at which the MySQL binlog syncs to disk. After the binlog is written sync_binlog times, it will be flushed to disk by MySQL. A value of 1 is the safest. It will synchronize the binary log once after writing each statement or transaction, so at most one statement or transaction log will be lost even in the event of total failure. The trade off is that this is the slowest configuration.
(2) Performance configuration: RDS allows user configured parameters except for specification parameters. Most of the parameters have been optimized by the official team, so most users can run it out of the box without having to adjust any parameters. However, while these parameters are suitable for most application scenarios, there will be times where customization is appropriate for performance purposes.
Function: This parameter is used to determine the maximum value of the internal temporary memory table, which will be assigned to each thread (The hard caps are the values of tmp_table_size and max_heap_table_size). If the temporary memory table exceeds this limit, MySQL will automatically convert it to the disk-based MyISAM table. When optimizing the query statement, avoid using temporary tables. If you really cannot avoid using them, they must be ensured in memory.
Phenomenon: If temporary tables are used when a complex SQL statement contains group by/distinct, which cannot be optimized through an index, it will lead to longer SQL execution.
Recommendation: If the application involves a lot of group by/distinct statements, and the database has enough memory, you can increase the value of tmp_table_size (max_heap_table_size) to improve query performance.
Function: This parameter is used to control the size of MySQL query cache memory. If activated, MySQL will first lock the query cache before execution of each query, and then determine whether the query already exists in the query cache. If it is in the query cache, the result will be returned immediately. If not, the engine query and other operations will be carried out. At the same time, operations like insert, update and delete will cause the query cache to fail. The failure also includes any changes in structure or index. Recovering from cache failure can be costly and creates heavy stress on MySQL. Query cache can be very useful if the database is not frequently updated. However, if you write data frequently, and to a small number of tables, the query cache lock function will often cause lock conflicts. A lock conflict occurs when you attempt to write or read a table that is currently in query cache lock. Since you cannot operate on the table until the lock is lifted, this feature can decrease the efficiency of your select queries.
Phenomenon: The database goes through a number of different statuses, including checking query cache, waiting for query cache lock, and storing results in query cache;
Recommendation: The query cache function is disabled in RDS by default. If you enable query cache in your instance, you can choose to disable it when you encounter any of the above statuses. Of course, query cache can be quite useful in some instances, for example it can be a handy tool for resolve database performance issues.
3. Case Study
- A user is migrating a local service system to the cloud
- The execution time on RDS is twice that of the offline self-built database
The user’s local parameter configuration:
join_buffer_size = 128M
read_rnd_buffer_size = 128M
tmp_table_size = 128M
RDS parameter configuration
join_buffer_size = 1M
read_buffer_size = 1M
III. Framework differences
1. Master-Slave Mechanism
RDS utilizes a high availability operational model called the master-slave mechanism. At the same time, it also uses a semi-synchronous mechanism, which is an improvement on the asynchronous mechanism used by MySQL. After the master database finishes executing a transaction submitted by the client, it waits for acknowledgment from the slave database, which is responsible for writing the data to the relay log. Only then does the master database send the reply to the client, rather than sending it right away. The semi-synchronous mechanism offers data security improvements compared to the asynchronous mechanism, however it also causes a certain degree of latency, which is at least equal to the round-trip time for the TCP/IP connection. This means that the semi-synchronous mechanism increases the response time of the transaction.
Note: The same problems also exist when the high availability SQLSERVER uses a mirror.