Integrating SIEM with Big Data – Part II


In Part I of this series, we discussed the idea of integrating SIEM solutions with Big Data technologies so that organizations can preserve their existing investments while also scaling their SIEM solutions to process more data and support new and more complex analytics. In this second part of the series, we look at a notional solution architecture to accomplish this. This solution architecture augments traditional SIEM architectures with a Big Data Platform that includes Apache Kafka, Apache Storm, and Apache Accumulo, as shown in the figure. By integrating a Big Data Platform with these components, organizations will be able to enhance the performance, scalability, and extensibility of their SIEM tools while preserving their existing investments, most significantly in the countless person-hours spent in customizing and fine-tuning the rules and algorithms based on the typical operating conditions of their enterprise networks. The components of the Big Data Platform will offload the data and processing intensive capabilities of the SIEM solution in the following ways:

  • Data Collection – will be offloaded to Apache Kafka, a high throughput distributed publish/subscribe system. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization that can be elastically and transparently expanded without downtime. The SIEM’s data sources can be configured to publish to different topics in Kafka based on the type of data they have.
  • Data Processing (Aggregation, Correlation, Enrichment, Contextualization) – will be offloaded to Apache Storm, a distributed real-time computation system that can reliably process unbounded streams of data. Individual Storm Topologies (i.e. processing pipelines) can be created for aggregation, correlation, enrichment, and contextualization and configured to consume the events published to the Kafka topics from the data sources.
  • Storage – will be offloaded to Apache Accumulo, a distributed, scalable, high performance data storage and retrieval system that can store text and binary data in a key/value format. The Storm Topologies can write the post-processed data to Accumulo for storage where it can be later used to support additional queries or batch analytic algorithms.

A Lambda architecture can be used to support streaming and query/batch access to the data for integration with the core capabilities of SIEM tools. The Storm Topologies can be configured to generate events that feed the SIEM to take advantage of event detection and alerting rules that have been previously customized in the SIEM. In addition, data APIs can be provided on top of Accumulo to serve data in standard formats to SIEM capabilities, e.g. audit data management or compliance reporting, that require query or batch access. Through integration and reuse of the core capabilities of SIEM tools, the security reporting and visualization capabilities of the SIEM tool can also be reused, thus preserving the look-and-feel that security personnel may have already grown accustomed to.