How Did the Database Administrators at Alibaba Cloud Handle a Potential 40-fold Increase in DingTalk Traffic?

Challenges for the Databases

The Response

Talent Organization

Emergency Resource Coordination

Emergency Response and Optimization

  • Parameter downgrading: Adjusted database parameters to optimize database capabilities and improve throughput.
  • Resource downgrading: Adjusted resource limits, released CPU isolation, and increased database BP size.

Database Capacity Estimation and Performance Analysis

  • The stress test data set was usually relatively small compared with the total database volume, which meant that database hit rate was basically 100%. This is not suitable for analyzing I/O heavy business models.
  • High costs. The entire chain, including both the upstream and downstream channels, needed to be involved, which involved a large number of people.
  • A comprehensive stress test only hits several core database interfaces, but we needed to cover all interfaces that were online because many performance-impacting SQL states can come from those ignored interfaces.

Results

Talent Organization

Technology and Architecture

Emergency Measures

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com