Server logs contain some of the most valuable and untapped information. Logs are always unstructured and usually makes little sense. Various opportunities of improvement might unveil by deriving insights from them. Elastic Stack or the ELK Stack is the most widely used solution for Log Analysis. It is Open Source and has a massive community pushing the boundaries by adapting it into various scalable systems. Companies including Microsoft, LinkedIn, Netflix, ebay, SoundCloud, StackOverflow use the Elastic Stack.
ELK is an acronym for three open source projects:
- Elasticsearch: A search and Analytics Engine. It is an open source, Distributed, RESTFul, JSON based search engine.
- Logstash: A Server side data Processing Pipeline that can ingest data from multiple sources. It then transforms and send data to Elasticsearch.
- Kibana: Visualizes data with charts and graphs.
- Beats: A light-weight single purpose Data Shipper. Beats have a small installation footprint and use limited system resources. It can either directly send data to Elasticsearch or send it via Logstash. Beats is written in Go!
Logstash along with Beats collects and parses log data from multiple sources. Elasticsearch indexes and stores this information.
Kibana visualizes this information to provide insights.
Elasticsearch is worth discussing in-detail. It is widely used for Full Text search. It is written in Java. This powerful search engine is designed to scale-up to millions of search events per second. Elasticsearch is used by Wikipedia, Airbus, ebay and shopify for powering their search for near-real time access. Its powerful features:
- Highly Available
- Developer friendly
Logstash supports data of many formats coming from various systems. It can ingest data from logs, web applications, Data stores, Network devices, AWS services and REST endpoints. It then parses and transforms data, identifies named fields to build the structure and converts into a common format.
- Provides around 200 plugins to mix and match and build the data pipeline. It also provides the feature to build a plugin to ingest from a custom application.
- Pipelines can be very complicated and cumbersome to monitor Load, Performance, Latency, Availability etc. Centralized monitoring is provided by the “monitoring and pipeline viewer” that makes the task easier and understandable.
- Structures, transforms and enriches data with filter plugins
- Can emit data to Elasticsearch or other destinations using output plugins like TCP or UDP
- Logstash is horizontally scalableSecurity: Incoming data from Beats can be encrypted. Logstash also integrates with secured Elasticsearch clusters.
Kibana provides interactive visuals of Elasticsearch data to monitor the behavior, understand the impact of certain data changes and so on.
- Kibana core comes with histograms, line charts, pie charts, sunburst and many other classics.
- Plots Geospatial data on any given map.
- Can perform advanced Time Series analysis.
- Graph Exploration: Analyzing Relationships with Graphs
- Build customized canvas, add logos, elements and create a story.
- Can easily share dashboards across the organization
Problems ELK can solve:
- In a distributed system with several nodes, searching through several log files for certain information, using unix commands is a tedious task. Elasticsearch comes to the rescue by providing faster access along with Logstash+Beats by collecting logs from all the nodes.
- Ship Reports: Kibana provides faster ways to explore and visualize data. It can schedule and email reports. Can quickly export the results of ad-hoc analysis or saved searches into a CSV file. Alerting can be used to generate data dumps when certain conditions are met, or on a regular interval.
- Alerting feature can set alerts on data changes, that can be identified using the Elasticsearch query language. Can proactively identify intrusion attempts, trend in social media, peak-hours in network traffic and can also learn from its own Alerting history. It comes with built-in integrations for email, Slack, HipChat etc.
- Unsupervised Learning: The Machine learning features have the ability to detect different kinds of anomalies, unusual network activities and quick root cause identification.
It can also be integrated with Graph APIs to analyze relationships in data. Canvas can be used to build presentations and organize reports. Elastic Stack has been extending its features and exploring many possibilities.