Technical Analysis of the Alibaba Cloud Self-Diagnostic System

By Alibaba Cloud ECS Team

Whenever you encounter problems when using cloud resources on Alibaba Cloud, you can simply query related documents and blogs or raise support tickets on the console to solve these issues. However, problem resolution using this approach may be time consuming and complex. To address this issue, Alibaba Cloud has released a self-service diagnostic system to help you submit problems directly with a single click of a button to quickly obtain a diagnosis.

This article describes the advantages and implementation of the Alibaba Cloud self-service diagnostic system from a technical perspective, and further expounds on the troubleshooting capabilities of this system. For more information about the Alibaba Cloud self-service diagnostic system, see the ECS documentation page.

Advantages

The Alibaba Cloud self-service diagnostic system allows you to reduce feedback and communication costs, shortening the troubleshooting time and improving the troubleshooting efficiency for a problem. This system is also continuously improved in terms of timeliness and accuracy.

Backed by Alibaba Cloud’s technologies, the self-service diagnostic system has the following advantages compared to other problem feedback and communication channels:

  1. Easy and fast problem feedback with a single click
  2. Accurate and complete feedback information, facilitating troubleshooting
  3. Second-level automatic response, ensuring timeliness
  4. Accurate and immediate delivery of problem information to the appropriate personnel
  5. Short troubleshooting period and less impact on business
  6. Closed-loop troubleshooting process and good user experience

Implementation

The Alibaba Cloud self-service diagnostic system consists of the following system modules:

  1. Intelligent diagnostic base
  2. Intelligent solution matching
  3. Diagnostic presentation
  4. Diagnostic feedback

Currently, the Alibaba Cloud self-service diagnostic system supports diagnosis of multiple types of cloud resources and operations. The following figure shows the overall architecture of the diagnostic system.

Image for post

As shown in the figure, each module in the Alibaba Cloud self-service diagnostic system has different components and functions.

Intelligent Diagnostic Base

The intelligent diagnostic base is a collection of exceptions of different operation types and cloud resources, such as ECS instances, images, disks, and auto scaling groups. The intelligent diagnostic base has the following supporting elements:

  1. Nearly 1,000 diagnostic templates, covering causes and troubleshooting methods for exceptions of different cloud resources and operation types. Multiple templates can be combined to produce a diagnostic solution.
  2. Massive amounts of input data, including diagnostic and feedback information submitted by users, information entered by R&D personnel based on system problems, and exceptions collected by Alibaba Cloud. Such input data promotes continuous enrichment and optimization of the diagnostic base.
  3. Intelligent learning and optimization. Based on different exceptions and user feedback collected each day, the intelligent diagnostic base uses adaptation and other learning and optimization algorithms to continuously optimize the templates.
  4. Diagnostic dashboard, which generates diagnostic data in real time each day, including the diagnostic rate, satisfaction rate, and diagnostic time. The diagnostic dashboard is a catalyst because it reflects the coverage and accuracy of the diagnostic base templates in a timely manner and promotes addition of new templates and optimization of the existing templates.

The preceding four elements ensure that diagnostic solutions generated by the intelligent diagnostic base are typical and accurate.

Intelligent Solution Matching

After receiving a user exception distributed by the diagnostic system, the intelligent solution matching module analyzes the corresponding cloud resource type and operation type related to the exception. The module creates a two-dimensional model, imports the model to the intelligent diagnostic base, and uses the optimal query and matching algorithm to match the cause of the problem and a solution. If the matching cause and solution are located, the module generates a diagnostic solution and exports it to the diagnostic presentation module. If no matching cause or solution is located, the module immediately pushes the problem to the matching problem owner in the internal system. After completing troubleshooting, the owner submits the diagnostic feedback to the diagnostic presentation module.

Diagnostic Presentation

The diagnostic presentation module displays the diagnostic status and diagnostic solution. You can log on to your Alibaba Cloud console and navigate to Diagnosis to view the current diagnostic status about diagnostic requests submitted within 30 days. If a problem has been diagnosed, you can check the diagnostic solution to obtain the cause of the problem and the corresponding troubleshooting solution.

Image for post

Diagnostic Feedback

The diagnostic feedback module is used to receive your feedback on diagnostic solutions. We recommend that you submit diagnostic feedback regardless of whether you are satisfied with the diagnosis. As one of the input sources of the intelligent diagnostic base, the feedback helps optimize and improve the diagnostic base and is important for the self-service diagnostic system.

The following figure shows a complete self-service diagnostic process.

Image for post

If you find a problem when operating a cloud resource, you can submit a diagnostic request. After receiving the request, the diagnostic system matches the problem and determines whether to implement intelligent diagnosis or manual diagnosis based on the matching result. After the troubleshooting is completed, you can check the diagnostic solution and provide feedback on the solution.

Availability and Prospects

Currently, the Alibaba Cloud self-service diagnostic system is available on the ECS console and Auto Scaling console. We will be supporting more cloud service consoles in the future. The Alibaba Cloud self-service diagnostic system supports intelligent diagnosis and manual diagnosis and is expected to support more diagnosis modes. The intelligent diagnostic rate has reached over 90%, and is still growing.

Please note that the Alibaba Cloud self-service diagnostic system does have limitations. For example, the diagnostic system can only support a limited number of cloud services and operation types. The diagnostic solution accuracy also needs to be fine-tuned to solve all problems submitted by users.

Conclusion

Currently, the Alibaba Cloud self-service diagnostic system is being continuously optimized and improved. Backed by powerful technologies and our R&D personnel, the Alibaba Cloud self-service diagnostic system is expected to troubleshoot your problems faster and more accurately in the near future to save your time and improve your experience.

Note: At the time of writing, the Alibaba Cloud self-service diagnostic system is available on international regions except for the Japanese site.

Reference:https://www.alibabacloud.com/blog/technical-analysis-of-the-alibaba-cloud-self-diagnostic-system_594394?spm=a2c41.12531838.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store