No Comments

Security Information and Event Management (SIEM) Solution – the open source way!

Sandeep Khuperkar I CTO & Director, Ashnik
Singapore, 19 Nov 2018

by , , No Comments


The need for tighter security regulations require enterprises to implement a comprehensive set of security controls including monitoring, auditing and reporting. Hence, enterprises are increasingly implementing SIEM solutions as a part of their a standard security system.

According to Gartner’s 2017 Magic Quadrant report on SIEM, 80% of breaches go undetected by the breached organizations, making SIEM solutions more vital than ever. However, Gartner noted that threat management is still the primary driver of demand in the field. The report also highlighted that general monitoring and compliance reporting came second in the minds of security professionals.

SIEM solution combines – Log Management, Security Information Management and Security Event Management capabilities in a single system. It automatically collects and processes information from various sources which is stored in a centralized location and correlates among various events to generate alerts and reports.

There are proprietary platforms that do offer an all-in-one SIEM solution. However, most enterprise that we talk to, complain about high costs. With increasing commoditization of features and rapid development in opens source solutions, organizations are keen to explore open source SIEM system.

There is no all-in-one, out of the box open source SIEM solution available that meets enterprise needs. That is why, in this article, I wanted to throw light on various components that together form a SIEM solution and how Elastic as an Open Source Technology can help you build your own or augment an existing SIEM solution.

At the heart of any SIEM system is a log data. A lot of it. Whether from servers, firewalls, databases, or from network routers — logs provide analysts with raw data for gaining insight into events taking place in an IT environment. The data needs to be collected, processed, normalized, enhanced and stored.

But, the SIEM systems are much more than just ‘log management’ and ‘log analysis’. SIEM solution offers capabilities like correlation, incident detection and management, alerts, etc. Organizations are looking at combining new Threat Analytics and intelligence capabilities, which helps correlate network and user behavior to achieve more intelligence in identifying the nature of activity.


(Combination of key capabilities which helps build SIEM system)

Lets look at how ELK stack helps in building these capabilities.

The ELK stack consists of Elastic search , Logstash and Kibana. Logstash is a log aggregator that can collect and process data from almost any data source. It can filter, process, correlate and generally enhance any log data that it collects. Elasticsearch is the storage engine and one of the best solutions for storing and indexing time-series data. Kibana is the visualization layer in the stack. Beats include a variety of lightweight log shippers that are responsible for collecting the data and shipping it into the stack.

Key capabilities that are at eht core of fully functional SIEM system :

Log Collection and Processing

SIEM systems involve aggregating data from multiple data sources. These data sources will vary depending on your environment, but most likely you will be pulling data from your application, the infrastructure, security controls, network infrastructure etc.

ELK Stack is well-suited for these aggregation capabilities. Using a combination of Beats and Logstash, you can have multiple data pipelines to build your logging architecture. Beats are lightweight log forwarders that can be used as agents on edge hosts to track and forward different types of data, the most common Beat being Filebeat for forwarding log files.

A robust and resilient pipeline of data queuing mechanism needs to be deployed. This makes sure data bursts are handled and do not result in data loss. Kafka, Redis and RabbitMQ are some of the tools, installed before Logstash.

Collecting data and forwarding it is of course one of the important tasks Logstash does. It also helps in another extremely important task i.e of processing and parsing the data in the context of building SIEM system.

All the various data types generate data in different formats. Next critical thing is to search the data and analyze it — for which the data needs to be normalized. That means breaking down different log messages into meaningful field names and mapping the field types correctly in Elasticsearch. Without the correct parsing, your data will be meaningless as you attempt to analyze it in Kibana. Logstash can help break up your logs and enrich the specific fields.

Storage and Retention

The log data collected from different data sources needs to be stored in a robust, scalable data store. In the case of ELK, Elasticsearch plays that role of data indexing and storage. Elasticsearch is one of the most popular databases today. This popularity stems from a variety of different reasons — its open source, relatively easy to set-up, fast, scalable and has a huge community supporting it.

Deploying an Elasticsearch cluster is just the first step. Since we are talking of large sets of data being indexed, which will most likely increase in volume over time, any Elasticsearch deployment used for SIEM needs to be extremely scalable and fault tolerant. We already mentioned using a queuing mechanism to ensure data does not get lost in case of data bursts. We will also need to monitor the Elasticsearch performance metrics, such as indexing rate and node JVM heap and CPU. In the Elastic Stack, one has the monitoring capability that is extremely useful for this purpose. It is also important to do capacity planning to ensure you have enough resources to index and shards.

Another consideration is retention. When did hackers get in? Where did they go? What did they do? What else is compromised? To answer these questions, 7 days may not be the sufficient period for retention. Average threats can get longer before they’re resolved. Elastic makes searching through long-term historical data not only possible, but practical, easy, and fast.


Once your data is collected, parsed, and indexed in Elasticsearch, the next step is querying the data. Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. 

Dashboard and Reports

Kibana is renowned for its visualization capabilities, supporting a wide array of different visualization types, and allowing users to slice and dice their data in any way they like. You can create pie charts, graphs, geographical maps, single metrics, data tables, and more, and the results are quite effective. With Elastic stack, you can start with basic histograms, line graphs, pie charts, sunbursts, and more. On top of it you can design your own visualizations. All these help leverage the full aggregation capabilities of Elasticsearch. You can use Geo data on any map. Perform advanced time series analysis on your Elasticsearch data with our curated time series UIs. Describe queries, transformations, and visualizations with powerful, easy-to-learn expressions. You can analyze relationships with graph. You can explore anomalies with unsupervised machine learning features.

Correlation and Alerts

Another key ingredient in SIEM is event correlation. Event correlation, is the connection of signals coming in from the different data sources into a pattern that could be indicative of a breach in security. Most protocols that Packetbeat supports today are request-response oriented. Packetbeat indexes into Elasticsearch a document for each request-response pair (called a transaction). This way you can have data from the request and the response in the same document and measure the response time. The TCP stream or UDP ports are usually good indicators that two messages belong to the same transactions. Therefore, most protocol implementations in Packetbeat use a map with tcp-tuple-maps for correlating the requests with the responses. One thing you need to be careful about is to perish and remove incomplete transactions from this map. For example, we might see the request that has created an entry in the map, but if we never see the reply, we need to remove the request from memory on a timer, otherwise we risk leaking the memory. Correlation rules mean nothing without alerts. Being alerted when a possible attack pattern is identified is a key ingredient in SIEM systems. The ELK Stack (community edition), does not ship with a built-in mechanism for alerting. To add this capability, the ELK Stack needs to be augmented with an alerting plugin or add-on. Again, X-Pack is one option.

Security analytics is more than just security events. Have metrics? Infrastructure logs? Documents with tons of text? Centralize it all into Elastic Stack with your security events to enrich your analysis, minimize your risk, and simplify your architecture. Build something new, or enhance your SIEM with the Elastic Stack.

Our team would be happy to help you in your journey.


  • Sandeep is the CTO and Director at Ashnik. He is also responsible in building Ashnik’s India business. Sandeep brings 23 year plus of Industry experience, with 14+ years in open source and building open source and Linux business model. Prior to Ashnik he worked in various leadership capacity in IBM, RedHat. To bring open source technologies to customers’ businesses by providing better solutions and even better services are his core areas of expertise.

More From Sandeep Khuperkar I CTO & Director, Ashnik :