By Qian Yuxin, Product Manager in the Search and Recommendation Division of the Alibaba Group
Released by ELK Geek
What Is Elasticsearch?
Elasticsearch, an open-source product launched in 2010, is a distributed real-time search and analysis engine. Over the years, the Elasticsearch ecosystem has evolved into Elastic Stack, which covers Elasticsearch, Logstash, and Kibana. Elasticsearch is a search engine, Logstash is responsible for data collection, conversion, and output, and Kibana provides powerful data visualization. According to DB-Engines, Elasticsearch ranks first among open-source databases. Elasticsearch has been widely used by developers.
Alibaba Cloud Elasticsearch provides a fully managed Elasticsearch service and is compatible with the open-source version. It optimizes kernel performance and provides commercial features (formerly X-Pack) that are out-of-the-box (OFTB), highly available, elastically scalable, and billed in pay-as-you-go mode. In the following figure, we compare Alibaba Cloud Elasticsearch and other vendors’ Elasticsearch products in terms of reliability, security, and system hosting.
In terms of reliability, Alibaba Cloud Elasticsearch has data durability of 99.9% and regularly backs up data to Object Storage Service (OSS) to facilitate data recovery. In addition, the active zone-redundancy solution provides a powerful disaster recovery capability. Alibaba Cloud Elasticsearch has also significantly improved upon the open-source versions. For kernel performance optimization, Alibaba Cloud Elasticsearch not only separates storage from computing but also optimizes Elastic Compute Service (ECS) instances. For the index build service, Alibaba Cloud Elasticsearch accelerates high-concurrency data writing, which allows mutual influence between data write and query. Alibaba Cloud Elasticsearch uses the index build service to build offline indexes, splits native indexes into smaller shards, and merges them with online indexes. This avoids I/O overhead of the online cluster and ensures query stability during high-concurrency data writing. In terms of smart operation and maintenance (O&M), Alibaba Cloud Elasticsearch provides the EU smart O&M system for O&M, monitoring, and intelligent analysis on clusters, helping users see the health status of clusters. It also provides warnings and suggestions for improvement. In addition, Alibaba Cloud Elasticsearch has been integrated with the Natural Language Processing (NLP) analyzer provided by Alibaba DAMO Academy for better business analysis and retrieval. The X-Pack service is integrated into Elasticsearch and Kibana to provide commercial plug-ins. Previously, users needed to pay for these commercial plug-in packages. However, Alibaba Cloud Elasticsearch provides many services through X-Pack, such as authentication, permission management, report visualization, and machine learning. In general, compared with other vendors’ Elasticsearch solutions and user-created Elasticsearch solutions, Alibaba Cloud Elasticsearch offers more powerful product capabilities at better cost performance.
Due to these capabilities, Alibaba Cloud Elasticsearch is suitable for various scenarios such as IT O&M, information retrieval, and log analysis. In terms of IT O&M, Alibaba Cloud Elasticsearch supports metric monitoring and network log analysis. In terms of information retrieval, it supports app retrieval, database acceleration, and aggregate searches. In terms of log analysis, it is applicable to web log analysis, risk control, risk auditing, risk analysis, user behavior analysis, user profiling, business intelligence (BI) analysis, and ad hoc data analysis. Alibaba Cloud Elasticsearch is available as a subscription and in pay-as-you-go mode.
Alibaba Cloud Elasticsearch can be output in the public cloud or private cloud mode. In public cloud mode, Alibaba Cloud Elasticsearch supports Alibaba Finance Cloud, Alibaba Retail Cloud, and Alibaba Cainiao Cloud, and is available through the Japanese and international sites. In private cloud mode, at the end of August 2019, Alibaba Cloud Elasticsearch provided a lightweight platform as a service (PaaS) as an independent output format. It can be deployed on Elastic Compute Service (ECS) instances and physical machines of the Enterprise Edition.
Alibaba Cloud Elasticsearch is deployed in the CIDR blocks of Elastic Compute Service (ECS) instances, which is equivalent to purchasing a large number of ECS instances. You may purchase many Elasticsearch clusters, each cluster contains many nodes, and each node is an ECS instance. All ECS instances are deployed in the Virtual Private Cloud (VPC) network of the system and support zone-disaster recovery across zones. This means services are easily deployed in different zones in a region. By configuring IP address mappings between Alibaba Cloud VPCs and your VPCs, you can deploy nodes of each cluster in different zones.
For disaster recovery, nodes regularly backup snapshots to OSS. If a data fault occurs, it’s easy and quick to restore data from OSS. Ultra-disks, solid-state drives (SSDs), and on-premises disks are used for overall data storage. Alibaba Cloud Elasticsearch has recently improved its kernel to support storage and computing separation. An Elasticsearch index needs to be sharded for convenient storage. In order to improve query efficiency, each shard has multiple replicas that improve the speed by expanding the storage space. However, this causes a large amount of redundant data which results in high storage costs. In addition, to improve query efficiency, more memory overhead is incurred when you write data which results in slow write speed. In this situation, Alibaba Cloud Elasticsearch optimizes the kernel by separating storage from computing. This allows it to shard and map multiple replicas of data to the same physical media. Compared with native Elasticsearch, Alibaba Cloud Elasticsearch reduces storage costs by at least 50%, improves real-time data writing performance by 70%, and improves replica and shard change performance by 99%.
Alibaba Cloud Elasticsearch has been deployed in Alibaba Cloud’s data centers all over the world, except in the US (Virginia), UK (London), and UAE (Dubai) regions, and will be available in more regions in the future.
Audit Solution for Persistent Financial Databases
This is a practical case of an audit solution designed by Alibaba Cloud for the persistent financial database of a credit card settlement company. The customer required strong financial data governance, so the data had to be stored for a long time, resulting in a large amount of data. Therefore, this audit solution provides three-layer data storage. The recently generated hot data is stored on the ECS instances in the first layer for about two months. When the data becomes warm data or older data, it is stored on ECS or OSS instances at the lower layers. This not only ensures the data query timeliness for large amounts of data but also reduces storage costs significantly.
Example — Log Analysis
The following section further describes the log analysis scenario. For log analysis, user behavior log data from websites, games, and applications are collected and delivered to Hadoop and Elasticsearch as offline and online data. This meets the requirements of user tag and profile processing (offline) and user behavior statistics and status query (online). In log analysis scenarios, Alibaba Cloud Elasticsearch provides various features such as aggregated search, real-time query, and quick indexing and archiving of incremental data. Based on the X-Pack service, Alibaba Cloud Elasticsearch provides advanced analysis capabilities such as location-based service (LBS) search, visual analysis reports, and data visualization. It also implements data query, statistics, and analysis, such as user retention analysis, browsing path analysis, and geo-fence-based user profiling and tagging.
How Does Elasticsearch Process Logs?
Log data originates from many sources, including log files, databases, sensors, and web APIs. Using such log data in log search and analysis scenarios requires centralized collection and storage, log search capabilities, aggregate analysis and visualization, security and role management, and scalability.
- In terms of centralized collection and storage of log data, Alibaba Cloud Elasticsearch collects regular log data, including log files, log system data, and network congestion logs. By collecting data and migrating offline Hadoop data, Alibaba Cloud Elasticsearch quickly gathers log data, stores the data in Elasticsearch, and builds indexes.
- In terms of log search capabilities, Alibaba Cloud Elasticsearch supports full-text search, metadata search, metric and tag search, and location-based search.
- In terms of aggregate analysis and visualization, after data is aggregated in Alibaba Cloud Elasticsearch, you can implement an aggregate analysis by using aggregate functions such as sum, average, min, and max. Also, implement machine learning analysis through X-Pack and visualize online data through Kibana. In Alibaba Cloud Elasticsearch, you may directly implement configurations and create a visualization panel in the Kibana console.
- In terms of security and role management, Alibaba Cloud Elasticsearch provides role-based access control (RBAC) user permissions and Transport Layer Security (TLS) and Secure Socket Layer (SSL) protocols. It also implements real-time monitoring and alerting. In addition, the X-Pack features of Alibaba Cloud Elasticsearch provide services such as automatic data reports and triggered reports, helping you better manage and query data.
- In terms of scalability, Alibaba Cloud Elasticsearch supports elastic scaling. Nodes in an Elasticsearch cluster use peer-to-peer connections, supporting quick replication and elastic scaling to manage data at different scales.
Architecture of Alibaba Cloud Elasticsearch Ecosystem
Data is transferred from data storage such as ApsaraDB for RDS (RDS) instances and processed by downstream computing engines such as Flume, EMR, and MaxCompute. After data profiling or tagging, data is indexed in Elasticsearch. Alibaba Cloud Elasticsearch is compatible with the entire big data ecosystem and can seamlessly connect to the entire Alibaba Cloud ecosystem, greatly facilitating data processing. Moreover, Kibana helps in better data management in a visualized manner.
This article discussed various aspects of Alibaba Cloud Elasticsearch. It explored how Alibaba Cloud Elasticsearch compares with similar solutions provided by other vendors. It also explained the solution’s output modes, architecture, regions in which the service is available, and detailed log analysis by illustrating an example.