Open, Universal, and High-Performance: Time-series Data Storage for Log Service Empowers Comprehensive Enterprise-level Monitoring Solutions
By Yuanyi
Alibaba Cloud Log Service (SLS) now supports time-series data storage. This feature provides customers with comprehensive access, storage, visualization, alerting, and intelligent O&M for time-series data. This solution fully supports mainstream open-source monitoring platforms and provides low-cost monitoring data storage and services without the need for O&M. In this blog, we will introduce the concept of time-series data and discuss how we can apply it across various scenarios with Alibaba Cloud Log Service (SLS).
Time-series Data Is Everywhere
“Time takes away everything, and your name, appearance, character, and fate will change over the years.” — Plato
As time passes, everything is constantly in a state of change. People use various numbers, such as age, weight, speed, temperature, and currency, to measure these changes. In the digital era, these pieces of data that vary with time can be stored, and value can be obtained from them. This kind of data is known as time-series data. It describes the changing status of objects with respect to time.
Time-series data is widely used in all industries. Its applications include tracking stock and transaction trends, server metrics, heartbeat information, positioning data, and energy consumption. The following are several of the many scenarios in which time-series data is used:
- Stock trading software provides investors with candlestick charts covering many different aspects that they can use as a reference.
- The Apple Watch monitors the wearer’s heart rate information to help detect serious heart diseases early.
- The State Grid analyzes the electricity consumption curve of each community and household to detect electricity leakage and theft.
- E-commerce companies quickly detect various abnormalities by monitoring changing trends in key processes such as order placements, transactions, returns, and reviews.
- Gaming platforms analyze user behavior patterns, such as their actions and locations, to determine whether cheating tools are being used.
What Kind of Time-series Storage Is Needed?
Recent years have seen the development of a number of time-series storage engines that support time-series analysis and monitoring in various scenarios. These engines include TimescaleDB, CrateDB, InfluxDB, OpenTSDB, and Prometheus. Each of these engines has its own ecosystem and application scenarios. For example, TimescaleDB is based on PostgreSQL, and therefore users familiar with PostgreSQL have an advantage in learning and beginning to use TimescaleDB. InfluxDB has a rich ecosystem, known as the TICK Stack, that includes Telegraf, InfluxDB, Chronograf, and Kapacitor. Prometheus has become the de facto standard for monitoring in Kubernetes thanks to its ease of use in cloud-native scenarios and its convenient, flexible query language, PromQL.
However, actual business scenarios require more than this from time-series data:
- High Performance: Time-series data usually generates a high traffic load, requires a long retention period, and must be searchable over a long time range. For these reasons, support for large-scale writes and fast queries is a prerequisite for time-series storage.
- Openness: Generally, multiple departments in a company perform different types of analysis and monitoring on the time-series data in different systems. Time-series storage must be open enough to support various methods of data access and downstream consumption.
- Low Cost: Time-series storage requires low resource and manual O&M costs. In accordance with Moore’s law, the cost per unit of resources is constantly decreasing, but personnel cost per unit is increasing every year. Controlling the labor cost of O&M for time-series storage is key to reducing overall cost.
- Intelligence: Static rules alone are not always sufficient to find abnormalities in monitored objects, in particular when a large number of objects are being monitored. Intelligent algorithms are required on the upper layer of time-series storage systems to improve monitoring accuracy.
Release of Time-series Data Storage for Log Service (SLS)
The Alibaba Cloud Log Service (SLS) log storage engine was released in 2016. At present, dozens of petabytes of log data from Alibaba and other enterprises are added to SLS each day. Time-series data and data used to compute time-series metrics make up a large part of the log data in SLS. The new time-series storage feature for SLS provides users with comprehensive data access, cleansing, processing, extraction, storage, visualization, monitoring, and problem analysis throughout the entire DevOps lifecycle. Together with SLS, this feature can solve data storage-related problems on all kinds of systems.
The architecture of the time-series storage feature is shown in the preceding figure. The access layer can connect to Logtail, the high-performance log collection component of SLS, as well as various open-source log collection products. In addition, the access layer can directly write data by using the SDKs of various languages, and open protocols such as Kafka and Syslog are also supported. The storage layer uses a fully distributed architecture. Each time-series database can be horizontally expanded in sharding mode. Three replicas of data are stored by default to achieve high reliability. The computing layer, which is separated from the storage layer, provides intelligent analysis and purely analytical SQL and PromQL syntax. The collection, storage, and analysis features provided by SLS enable enterprises to build their own business and microservice monitoring solutions.
Features
From its inception, time-series storage for Log Service (SLS) was designed to meet the needs of Alibaba and its major enterprise customers. The years of technical experience that Alibaba has accumulated ensure that the feature is adaptable to the monitoring and analysis requirements for time-series data at the vast majority of enterprises. Time-series storage for SLS has the following advantages:
- Rich variety of upstream and downstream systems: SLS supports many methods of data access, including various open-source agents as well as a channel for monitoring data within Alibaba Cloud. Time-series data stored in SLS can also be connected with various stream computing and offline computing engines, making data completely open.
- High performance: The separation of computing and storage in SLS ensures optimal use of cluster capabilities. The end-to-end speed increases significantly when a large amount of data is processed.
- Zero O&M: Time-series storage for SLS is provided as a service. Users do not need to operate and maintain instances themselves, and three replicas of all data are stored, making it unnecessary to worry about data reliability.
- Open-source-friendliness: Time-series storage for SLS has native support for writing and querying data in Prometheus. It supports SQL-92 analysis methods and can natively connect to visualization solutions such as Grafana.
- **Intelligence**: SLS provides a variety of AIOps algorithms with which you can build an intelligent alerting and diagnosis platform suited to your company. These time-series algorithms include multi-period estimation, prediction, error detection, and classification.
Typical Scenarios
Application and Service Monitoring
Application and service monitoring is one of the most important tasks at the company level. It has always been considered the most important monitoring metric at Alibaba. The various data collection functions provided by SLS allow all application and service data to be collected in a unified and real-time manner. Data from different time periods and in different styles is transformed into structured data, which can then be analyzed. Owing to the large size of service data, SQL aggregate functions are used on the data to reduce dimensionality. The aggregated time-series data is then used as the basis for alerts and long-term tracing of monitoring metrics.
Cloud-Native Monitoring
With the popularity of cloud-native technologies, an increasing number of companies are moving their technologies to a cloud-native architecture. By using open-source Cloud Native Computing Foundation (CNCF) projects such as Prometheus and OpenTelemetry, they can collect monitoring information about Kubernetes and various middleware products and applications. Cloud Monitor can then obtain monitoring data from all cloud services. The time-series storage, log storage, and tracing data storage capabilities of SLS enable various types of monitoring data to be stored in a unified manner and seamlessly visualized in Grafana. In addition, a comprehensive monitoring dashboard covering infrastructure, cloud services, middleware, and application software can be built on Grafana.
Access Log Analysis
Access logs are necessary metrics for O&M. They record the inbound traffic of a website or application and directly indicate whether the website or application is running normally. You can collect raw access logs with Logtail and analyze or investigate the requests of each user. These logs can also be archived or audited. However, due to the large amount of raw access logs, they are not suitable for direct monitoring. Instead, pre-aggregation is performed to reduce the dimensionality of these logs, and real-time monitoring is performed based on the aggregated time-series data. You can also use the intelligent inspection feature of SLS to independently perform intelligent monitoring on each service site.
Visit Alibaba Cloud Log Service to learn more about its full capabilities!