5-year Evolution of Ele.me’s Transaction System — Part 3

Creation of the Test Team


Performance Testing

Random Fault Drills

Version 1.0 of the Random Fault Drill

  1. Create a dedicated test environment and provide a separate monitoring node and database in the environment.
  2. Build a client and simulate user behavior to create data. (Our experiences grained from automated integration testing really came in handy here.)
  3. Provide a tool to build a Mock Server of the dependent service to resolve the long-chain service dependency. The Mock Server can return some preset output according to the input.
  4. Tag the traffic according to the client. This feature was enabled by a special version released by the framework team. Based on traffic tags, the Mock Server can simulate abnormal behaviors, such as blocking and timeout, and send feedback to the tested server.

Version 2.0 of the Random Fault Drill

  1. A distributed transaction takes an extremely long time to complete.
  2. An API exception causes a whole service to crash.
  3. When a node or machine in a cluster restarts, API callers are severely affected.
  4. The CPU load on a node in a cluster increases, causing imbalanced load distribution in the cluster.
  5. A service takes effect for a single server in a cluster, causing behavior inconsistencies between the servers in the cluster.

Original Source:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com