Getting Started with Beats

Image for post
Image for post

By Liu Xiaoguo, an Elastic Community Evangelist in China

Released by ELK Geek

Elasticsearch

Elasticsearch is famous for its simple RESTful APIs, distributed nature, speed, and scalability. It also provides a search experience with scale, speed, and relevance. These three properties differentiate Elasticsearch from other products, making Elasticsearch very popular.

Image for post
Image for post
  • Scale: Scalability refers to the capability of ingesting and processing petabytes of data. An Elasticsearch cluster is distributed. So, if more data needs to be stored, users can easily scale-out servers in the Elasticsearch cluster to meet commercial needs.
Image for post
Image for post
  • Speed: Elasticsearch allows obtaining a search result from petabytes of data in milliseconds. New data imported into Elasticsearch is searchable within one second, allowing near real-time search. In contrast, other databases may require several hours to perform a search.
  • Relevance: Elasticsearch allows querying text, numbers, geospatial data, and other data types using any method to obtain relevant results. Elasticsearch returns data based on the degree of data matches. Each search result has a score that represents the match relevance. Among returned data results, the result with the highest match degree is ranked first.

Elastic Stack

Image for post
Image for post

“ELK” is the abbreviation of three open-source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine and also a core component of the Elastic Stack. Logstash is a server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to a “storage” similar to Elasticsearch. Beats is a collection of lightweight data collectors that sends data directly to Elasticsearch or to Logstash for further processing before the data goes to Elasticsearch. Kibana allows using charts to visualize data in Elasticsearch.

Elastic Solutions

Image for post
Image for post

Elastic has provided many out-of-the-box solutions for the Elastic Stack. A lot of search or database companies have good products. However, to implement a solution, they need to expend a lot of effort to combine their products with products provided by other companies. So, Elastic has released the 3+1 Elastic Stack.

Elastic’s three major solutions are as follows:

  • Enterprise search
  • Observability
  • Security

These three solutions are based on the same Elastic Stack: Elasticsearch, Logstash, and Kibana.

Beats

Beats are a collection of lightweight (resource-efficient, non-dependent, and small) log shippers with open source code. These log shippers function as proxies installed on different servers in the basic structure, where they collect logs or metrics. The collected data can be log files (Filebeat), network data (Packetbeat), server metrics (Metricbeat), or data of other types, which can be collected by the increasing number of Beats developed by Elastic and the community. Beats send collected data to Elasticsearch or Logstash for processing. Beats are built on a Go framework called libbeat, which is used for data forwarding. The community has constantly developed and contributed new Beats.

Elastic Beats

Image for post
Image for post

Filebeat

Filebeat can be installed on almost any operating system. It may also be installed as a Docker container or used for internal modules, including default configurations and Kibaba objects of certain platforms, such as Apache, MySQL, and Docker.

I have presented several examples of how to use Filebeat in my previous articles:

  • Beats: Use Filebeat to Write Logs to Elasticsearch
  • Logstash: Import Apache Logs to Elasticsearch

Packetbeat

Packetbeat can be installed on monitored servers or a dedicated server. Packetbeat tracks network traffic, decodes protocols, and records data for each transaction. Packetbeat supports DNS, HTTP, ICMP, Redis, MySQL, MongoDB, Cassandra, and other protocols.

Metribeat

Heartbeat

Auditbeat

Winlogbeat

Functionbeat

Image for post
Image for post

Incorporation of Beats in Elastic Stack

Image for post
Image for post

As shown in the preceding figure, these methods are as follows:

1) Beats: Use Beats to import data into Elasticsearch.
2) Logstash: Use Logstash to import data into Elasticsearch. The Logstash data source can also be Beats.
3) RESTful APIs: Import data into Elasticsearch using APIs provided by Elastic, such as Java, Python, Go, and Node.js APIs.

Next, let’s see how Beats work with other Elastic Stack components. The following block diagram shows the interworking between different Elastic Stack components.

Image for post
Image for post

As shown in the preceding figure, Beats data can be imported into Elasticsearch using any one of the following three methods:

  • Beats > Elasticsearch: Directly transmit Beats data to Elasticsearch. This is a popular solution in many scenarios and can provide more powerful features when combined with the pipelines provided by Elasticsearch.
  • Beats > Logstash > Elasticsearch: Use powerful filter combinations provided by Logstash to process data streams, including parsing, enrichment, conversion, deletion, and addition. For more information, see my article “Data Conversion, Analysis, Extraction, Enrichment, and Core Operations.”
  • Beats > Kafka > Logstash > Elasticsearch: In some scenarios with uncertain data streams, such as when a large amount of data is generated at a specific point of time and Logstash cannot process it timely, use Kafka for caching. For more information, see my article “Using Kafka to Deploy Elastic Stack.”

Ingest Pipeline

  • Parse, convert, and enrich data
  • Configure the processors to be used
Image for post
Image for post

As shown in the above figure, use the ingest nodes in the Elasticsearch cluster to run defined processors. For more information about these processors, see Processors on the official Elastic website.

Libbeat: Building Beats Go Framework

To build your own beat, see the following articles:

  • Build Your Own Beat
  • Generate Your Beat

Also, refer to my article “How to Customize an Elastic Beat.”

A Beat consists of two parts: data collector, and data processor and publisher. Libbeat provides the data processor and publisher.

Image for post
Image for post

For more information about the preceding processors, see “Define processors.” Some processor examples are as follows.

- add_cloud_metadata
- add_locale
- decode_json_fields
- add_fields
- drop_event
- drop_fields
- include_fields
- add_kubernetes_metadata
- add_docker_metadata

Start Filebeat and Metricbeat

Filebeat Overview

  • Correct Log Processing: Filebeat helps to correctly process new logs generated periodically.
  • Backpressure Sensitive: If logs are generated excessively, Filebeat automatically adjusts the processing speed to allow Elasticsearch to process the logs in a timely manner.
  • Processing Log Events at Least Once: Filebeat processes events generated for each log at least once.
  • Structured Logs: Filebeat processes structured log data.
  • Multi-line Events: Filebeat processes logs that contain multiple lines of information, such as error logs.
  • Conditional Filtering: Filebeat conditionally filters certain events.

When Filebeat is started, it will start one or more inputs that are found in the location specified for the log data. Filebeat starts a harvester for each log that it finds. Each harvester reads a log to obtain new content and sends the new log data to libbeat. Libbeat summarizes events and sends the summarized data to the output configured for Filebeat.

Image for post
Image for post

As shown in the above figure, the spooler has cached some data that can be re-sent to ensure event consumption at least once. This mechanism is also used in backpressure-sensitive scenarios. When Filebeat generates events faster than Elasticsearch can handle them, some events are cached.

Metricbeat Overview

Metricbeat helps to monitor the server by collecting metrics from systems and services running on the server, including:

  • Apache
  • HAProxy
  • MongoDB
  • MySQL
  • Nginx
  • PostgreSQL
  • Redis
  • System
  • Zookeeper

Metricbeat features the following:

  • Polls service APIs to collect metrics.
  • Effectively stores metrics in Elasticsearch.
  • Collects metrics of the JMX/Jolokia, Prometheus, Dropwizard, and Graphite applications.
  • Labels metrics collected from AWS, Docker, Kubernetes, Google Cloud, or Azure.

Metricbeat consists of modules and metricsets. Metricbeat modules define the basic logic for collecting data from specific services, such as Redis and MySQL. They also specify details about services, such as service connection, metrics collection frequency, and metrics to be collected.

Each module has one or more metricsets that acquire and construct data. A metricset does not collect each metric as a separate event but retrieves a list of relevant metrics by a single request to a remote system. For example, the Redis module provides an information metricset that collects information and statistics from Redis by running the INFO command and parsing the output.

Image for post
Image for post

The MySQL module also provides a status metricset that collects data from MySQL by running SHOW GLOBAL STATUS SQL queries. Relevant metricsets are combined in a single request returned by a remote server. If there are no user-enabled metricsets, most modules have default metricsets.

Metricbeat retrieves metrics periodically from the host system based on the cycle specified while configuring a module. Metricbeat reuses connections as much as possible because multiple metricsets send requests to the same service. If Metricbeat fails to connect to the host system within the specified time during timeout configuration, it will return an error. Metricbeat sends events asynchronously, so event retrieval is not acknowledged. If the configured output is unavailable, events may be lost.

Filebeat and Metricbeat Modules

Image for post
Image for post

A Filebeat module simplifies the collection, parsing, and visualization of logs in common formats

A typical Filebeat module consists of one or more filesets, such as the access and error filesets for Nginx logs. A fileset contains the following content:

  • Filebeat input configuration, including the default paths for finding log files. These default paths vary depending on the operating system. The Filebeat configuration is also used to combine multiple rows of events.
  • Elasticsearch ingest node pipeline definitions, which are used to parse log lines.
  • Field definitions that are used to configure the correct Elasticsearch type for each field and contain a brief description of each field.
  • Sample Kibana dashboard (if available), which can be used to visualize log files.

Filebeat automatically adjusts these configurations based on the user environment and loads them into the corresponding Elastic Stack components.

Other Beats modules are basically the same as the Filebeat module. At present, many modules are provided for Elasticsearch.

Image for post
Image for post

This article was authorized for publication by the official blog of the CSDN-Elastic China community.

Source title: Beats: Getting Started with Beats (1)

Source link: (Page in Chinese) https://elasticstack.blog.csdn.net/article/details/104432643

Alibaba Cloud One-Stop Fully-Managed Beats Service

Image for post
Image for post

The Alibaba Cloud Elastic Stack is completely compatible with open-source Elasticsearch and has nine unique capabilities.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store