MARGO

News

Establishment of a centralised log management platform with the Elastic suite

Have a quick and clear view of what's going on with the monitored applications

By Imen Jaouani Software Engineer

14/05/2018

The volume of data generated by our systems and applications continues to grow, resulting in the proliferation of data centers and data storage systems.
In the face of this data explosion and the investment in skills and resources, decision-makers need sophisticated analysis and sophisticated dashboards to help them manage their systems and customers.

In this context, Margo consultants had the opportunity to discover one of the solutions that meet this need in the context of Devoxx France 2018 days : the elastic suite (Elasticsearch, Logstash, Kibana, Beats, …).
The conference ” Setting up a centralized log management platform with the elastic suite ” took place on Wednesday, April 18th under the ‘University’ concept which is a long presentation over a period of three hours. It was led by David Pilato , developer and evangelist at elastic.co, and Emmanuel DEMEY , Technical Director and Web Trainer at Zenika Lille.

During this presentation, the speakers combined theory and practice. They began by presenting the latest features, as well as upcoming ones for the different components, followed by a brief demo and ended up with an architectural view in the elastic suite.

Taking inspiration from the conference, we will present the elastic suite in this article. However, we chose a more top-down approach starting with a presentation and use cases. We will detail the global architecture and we will eventually tackle less known products like X-PACK or Elastic APM.

 

The elastic suite, what is it?

Elastic Stack is an open source product group designed to retrieve data (from any source and in any format), analyze it, and visualize it to reveal trends in real time. This group of products was historically referred to by the acronym ELK (ElasticSearch, Logstash, Kibana). The addition of a fourth product named “Beats” made the acronym unpronounceable, which gave birth to the new name Elastic Stack. The latter can be deployed on site or made available as Software as a Service (SaaS).

Elasticsearch-Portfolio

 

Let’s talk about practice, what are the use cases of the Elastic suite?

The use cases of the elastic suite are numerous. As an example, we mention the correction of an application malfunction by the recovery of explicit errors (exception, error message ..etc) and the monitoring of the load of an application (memory consumed, CPU ..etc) , which allows to have a global vision of the production. Similarly, at the business level, it is possible to use the suite to validate a workflow chain by extracting specific data and their analyses.
If you want to see more use cases, elastic.co quotes some testimonials from its customers who opted for the elastic suite to take advantage of their data. (See https://www.elastic.co/en/use-cases )

 

The overall architecture of the Elastic Suite

Below is an overview of all components and their relationships:

Margo - L’architecture globale de la suite Elastic

The various components of the elastic suite have been designed to work and interact with each other without having to operate many specific configurations. However, the initial setup will depend greatly on your environment and the framework of use.

 

The elastic suite: which components?

ElasticSearch:

logo elasticsearch

Elastic search is at the heart of the elastic suite and plays a central role. It presents itself as a modern open source RESTful search engine based on Apache Lucene. It allows you to search and index documents in various formats. We often talk about an Elasticsearch cluster which is a group of one or more instances of Elasticsearch nodes connected together. The power of an Elasticsearch cluster resides in the distribution of tasks, search and indexing, on all nodes of the cluster. For more details on the different types of existing nodes, see this link: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
Among the new features presented at the conference: the new zero-downtime update experience, the reduction of storage space used and the improvement of replication. For future releases, new features are planned such as the addition of a Rollup API for Elasticsearch, the deletion of types or the support of SQL.

Kibana:

Logo kibana

Kibana is an open source analysis and visualization platform designed to work with Elasticsearch. It’s simple, browser-based interface lets you quickly create and share dynamic dashboards in real time, making it easy to understand large volumes of data.
During the demonstration, we could notice the simplicity of configuration of Kibana. You can install Kibana and start exploring your Elasticsearch clues in minutes – no code or additional infrastructure required.
New features are also coming in future releases such as: auto-completion (KQL Syntax), waffle maps, Rollups UI and Vega visualization. The latter allows users to create custom visualizations by simply using a declarative language based on open source JSON called Vega, or its simplified version called Vega-Lite.

Beats (Wire Data, Log Files, Metrics)

Margo - Beats suite Elastic

Beats is the platform for lightweight transfer solutions installed on our servers to collect data and centralize them in Elasticsearch. If we want more processing power, the Beats can also transfer this data to Logstash, which will be responsible for transforming and analysing it. It should be noted that there are several types of beats depending on the type of data. In the presentation, the speakers treated the PacketBeat which is a lightweight network packet analyser, by sending data to Logstash or Elasticsearch directly which allows the monitoring of ongoing activities on our various applications.

Subsequently, presenters focused on the FileBeat. The latter is simply a light agent allowing the transfer and centralization of logs regardless of their numbers, sizes and sources (servers, virtual machines, containers that generate logs). In case of unavailability of an application or interruption, filebeat resumes reading or transfer where it left off when the connection returns. In addition, Filebeat adapts to the logstash when sending data, in case of slowing down or congestion, thanks to its flow control protocol.

Towards the end of the presentation, speakers highlighted the MetricBeat, a lightweight statistical transfer agent such as CPU, Memory, Redis or NGINX indicators, right from our systems and services with the ability to configure the statistics recovery interval.

It is possible to display the graphs corresponding to these metrics on Kibana thanks to the MetricBeat plugin.

 

Logstash:

Logstash is an open source pipeline that aggregates data from a multitude of sources, processes it, and sends it to the pipeline, usually for direct indexing in Elasticsearch.

inputs:

One of the things that makes Logstash so powerful is its ability to aggregate logs and events from multiple sources while using more than 50 input plugins for different platforms, databases, and applications.
The most common entries are file, time, syslog, http, tcp, udp, and stdin, but you can ingest data from many other sources.

Registered users

The key role Logstash plays in the sequel is to allow users to filter and shape data for ease of understanding. To do this, logstash supports a number of extremely powerful filter plugins that allow you to manipulate, measure, and create events. The power of these filters makes Logstash a very versatile and valuable tool.
The date filter, for example, is often used to specify which date is used for the event corresponding to the generated log. This date is retrieved from the log and allows the supply of the @timestamp field. The geoip filter is then used to add geolocation information through an IP address (or hostname) by using the GeoCityLite database.

outputs:

Logstash supports multiple output plugins that allow you to push your data to different locations, services and technologies. The large number of input and output combinations in Logstash makes it a truly versatile event processor.

In conclusion, we note the possibility of setting up several pipelines as needed. For example, we can distinguish the pipeline from access logs from that of errors.

To learn more about the difference between logstash and Beats, we invite you to see this link: https://logz.io/blog/filebeat-vs-logstash/

The least known products of the elastic suite

X-Pack

X-Pack is an extension of the elastic suite that brings together new features in a single package, easy to install, with the ability to enable or disable the desired functionality. This innovation has made the elastic suite a complete solution for analysis and monitoring. In the diagram below, you will find a general view of these different features.

During the presentation, the speakers mainly came back to three themes: security, alerting and briefly machine learning.

In this article, we will only discuss Safety and Machine Learning. Let’s start with the X-Pack Security which allows us to easily secure a cluster and protect our data. This functionality is provided by preventing unauthorized access through password protection and role-based access control and IP filtering. Similarly, it is possible to implement more advanced security measures, such as encrypting communications to preserve data integrity or auditing to trace the actions performed on the data your cluster stores.

In the screenshot below, we can see an example definition of a clicks_admin role:

We finish with the X-Pack Machine Learning which presents a set of tools allowing users to use the indicators related to their Elasticsearch data, to detect anomalies in the temporal data and to receive an automatic alert in this case. It is also possible to define machine learning jobs on Kibana to automate the evolution of thousands of indicators and predict the future with the Forecast API.

 

APM (Application Performance Monitoring)

Elastic APM is the solution that provides performance information at the application level and enables quick debugging and correction of errors in case of production issues. The solution consists of three elements: an interface, an agent and a server. The latter indexes information in Elasticsearch, however the interface facilitates monitoring through Kibana.

The diagram below gives an overview of how the APM monitoring solution integrates into the overall architecture.

Elastic Application Performance Monitoring

Adding an APM agent to a new application is simple and requires only a few lines of code. Once the server and agents are installed, it is possible to access performance data and use the customizable and preconfigured dashboards provided with the server.

 

Conclusion

The Elastic suite is both a set of simple and complex products. In this article, we explained the basic concepts, detailed the overall architecture and presented some examples of lesser known products.
We hope this article offers you a better understanding and the most important to inspire you to experience the Elastic suite which is a rich data analysis tool giving a quick and clear view of what is happening in the monitored applications. However, it should be noted that a study of need is always necessary before making a choice of solutions and that it will depend on your project context.

 

References:


By Imen Jaouani Software Engineer
Big Data
Data
Machine Learning
News

Successfully completing a data project: a path still strewn with pitfalls

In 2020, corporate investment in data projects is expected to exceed 203 billion dollars worldwide. But at a time when many are claiming to be Data Driven Companies, lots of data projects end in failure. Yet most of these failures are unnecessary and due to well-known causes! Focus on the recurrent pitfalls to avoid.

05/02/2019 Discover 
News

Kaggle Challenge: TalkingData AdTracking Fraud Detection

TalkingData, China’s largest independent big data service platform, covers over 70% of active mobile devices nationwide. Their current approach to prevent click fraud for app developers is to measure the journey of a user’s click across their portfolio, and flag IP addresses who produce lots of clicks, but never end up installing apps. While successful, they want to always be one step ahead of fraudsters and have turned to the Kaggle community for help in further developing their solution.

31/05/2018 Discover 
News

Data Science applied to the retail industry: 10 essential use cases

Data Science is having an increasing impact on business models in all industries, including retail. According to IBM, 62% of retailers say the use of Big Data techniques gives them a serious competitive advantage. Knowing what your customer wants and when, is today at your fingertips thanks to data science. You just need the right tools and the right processes. We present in this article 10 essential applications of data science in the field of retail.

31/05/2018 Discover 
News

Introduction to TensorFlow on the datalab of Google Cloud Platform

TensorFlow is a software library, open source since 2015, of numerical computation developed by Google. The particularity of TensorFlow is its use of data flow graphs.

30/05/2018 Discover 
News

Lamport clocks and the pattern of the Idempotent Producer (Kafka)

Do you know the Lamport clocks? Devoxx France 2018 was the opportunity, during the very interesting talk of DuyHai DOAN , to discover or rediscover this algorithm formalized by Leslie Lamport in 1978, more than ever used today in the field of distributed systems, and which would have inspired the Kafka developers in the implementation of the pattern of Idempotent Producer .

23/05/2018 Discover 
News

Introduction to Reactive Systems

Margo Consultants participated in  Devoxx France 2018 , the conference for Passionate Developers, organized from April 18 to 20, 2018 in Paris. Discover a synthesis on reactive systems illustrated by a concrete use case.

11/05/2018 Discover