Using Second-Level Monitoring to Troubleshoot MongoDB Errors

Case Study: Troubleshooting Slave Delays

Previously there was an online service using the MongoDB replica set with the read/write splitting getting performed on the service end. Suddenly one day, the service experienced a large number of online read traffic timeouts. Through Inspector, we can see that the delay in the slave database was abnormally high at the time.

Case Study: Sharding Timeout Errors

One day, an online service using sharding cluster suddenly experienced a wave of access timeout errors, and, after a short time, quickly recovered. Judging by experience, it is very likely that there were some lock operations ongoing at that time and these were what led to the access timeouts.


From the two cases mentioned above, we can see that the fine enough monitoring granularity and comprehensive adequately monitoring indicators are essential for troubleshooting, and the real-time performance is instrumental in the scenario of monitoring well.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: