Diagnosing ECS Faults with Serial Port Logs

Why Do You Need System Serial Port Logs?

When ECS instances are down, reboot abnormally, or fail to boot, maintenance engineers need to locate the root cause of the problem, resolve the problem, and prevent it from happening in the future.

  1. Faults of the hardware infrastructure and software environment on which ECS instances run
  2. Faults of the operating system environment on which ECS instances run

What Is in System Serial Port Logs?

The system uses serial ports to print two types of logs, namely, system boot logs and system kernel fault or exception logs.

  1. When the Linux operating system boots, the system prints logs about information generated during the boot. The boot information, including information about the system architecture, CPU, RAM, mounted hardware, and software boot, is stored by the system kernel in the ring buffer. Such information helps the system administrator check whether the system started properly and whether preset application programs booted along with the system.
  2. When a kernel fault or exception occurs, the system prints log information based on the log level specified by the kernel parameter kernel.printk (which is set to 4 by default). Kernel panic occurs when the operating system detects some internal critical errors that the operating system cannot safely handle. The subprogram for handling kernel panic in the operating system kernel is usually designed to print error information to the serial port console for debugging. It then waits for the system to automatically reboot or be manually rebooted. The technical information provided by the subprogram is often used to help the system administrator or software developers diagnose problems.

How Do I Use System Serial Port Logs?

On the ECS console, you can obtain the system logs of the ECS instances in the running state through the following operations in the instance list or on the instance details page.

  1. Log on to the ECS console.
  2. Click Instance in the left-side navigation pane.
  3. Select Area.
  4. Find the Operation menu of the instance to troubleshoot.
  5. Choose More > Maintenance and diagnosis > Obtain instance system logs to view the logs.
  6. Alternatively, you can click an instance to access the Instance details page, and choose More > Obtain instance system logs to view the logs.

Summary

Alibaba Cloud Elastic Compute Service (ECS) provides proactive maintenance and system events to help you discover the impact of infrastructure faults and exceptions on ECS operation in advance. It also allows maintenance engineers to detect the faults and exceptions in time to take preventive measures to protect ongoing services. Moreover, the diagnostic log function introduced today can help maintenance engineers find the root causes of instance exceptions caused by operating system internal errors that can interrupt services and prevent the future occurrence of such problems.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com