Skip to main content

Detection as Code (DaC) challenges, automation, maintenance and SIEGMA

Developing detection capabilities is something that we’re constantly working on.


It equips many of our products and services, either as part of our Managed Detection services or through a subscription service for inclusion in third-party and partner products.

While keeping a set of detection rules loaded into a SIEM does not pose a technical challenge, doing it in a way that allows for them to be part of a continuous integration/delivery process (CI/CD) isn’t so straightforward. For the last couple of months we’ve been working on processes and tools to make that process as future-proof and as automated as we possibly can.

The goal of this post is to share some of the lessons learned as well as provide some additional use cases and examples that might be useful for anyone trying to accomplish similar tasks.

Below we’ll dive into some of the challenges and how we addressed them, as well as highlight some features of SIEGMA, our open-source SIEM consumable creation tool that was released a few months ago.

Sigma as a standard

Very early in the process of developing rules we decided to utilize the Sigma format for maintaining our rules. Sigma and its corresponding toolset is a good match for what we’re doing, and even though we mostly focus on one particular engine for SIEM, the usage of this standard also allows us to easily share our detection capabilities for inclusion in other products or for sharing it with the community.

One of the benefits of Sigma’s YAML based rule format is the possibility to include additional/custom fields. These fields, while ignored by the Sigma Converter, allow us to include relevant (to us) information in our rules that can be picked up by an automated process. Our challenge wasn’t so much the creation of the queries for detection (which the existing toolset does very well) but the metadata that goes around it.

Our goal was to make our rules as rich as possible and be able to automate the actual SIEM consumable based on our unique set of fields.

Another challenge was the uniqueness of each source (technology) that we support, and how the building of that rule would take place (different query languages, for example). We wanted to be able to take a large number of rules and have them built in a single command, regardless of the particularities of each detection source.

After coming up with an internal standard on how rules should be developed, we proceeded to create the tooling needed to be able to parse our set of rule files and create a file that would either be automatically loaded into our client clusters or made available for download/upload.

This is when the process meets tooling, and the reason we decided to create SIEGMA.

Introducing SIEGMA

The goal of SIEGMA is to facilitate the maintenance and development of rules at scale, by automating as much as possible all aspects that go into a SIEM consumable. The Github project page goes into more detail about why we developed and made it open-source. In this article we’ll focus more on how SIEGMA is used internally and why it grew in the way that it did.

As an example, it is often in our development process that we’re faced with configurations and fields that are unique to our SIEM, that while we would like to leverage in our Sigma rules, support for those fields isn’t present in sigmac.

While at first glance it would seem that adding support for those in sigmac would be the preferred way to do it, that isn’t, unfortunately, easily done. Many of those fields (from the SIEM side) require custom fields on the Sigma rules. On top of that, we wanted to work on a few different areas that fall outside of the scope of sigmac and the overall Sigma project. Namely:

  • After rule creation, automation of bulk rule installation.
  • Simplifying the cumbersome process of rule installation.
  • Coupling a detection query with other key value pairs that go into a SIEM, thus making a SIEM consumable.
  • To be able to spend more time in creating detections, instead of setting them up in a multi-cluster environment.
  • Advanced documentation capabilities tied into the rule itself (more about this later in the blog post).

We consider these objectives our “process objectives”. These are technical tasks that might not be related to constructing queries or translating fields but an area that falls within our development objectives.

On top of the objectives described above, we also have “detection objectives”, which are goals for the creation of the detection query itself.

To provide a few examples (and challenges we had to tackle) of those, we can consider scenarios in which different detection sources will have specific sets of requirements:

  • Different query languages (KQL vs Lucene) for detection capabilities.
  • Different parameters/switches that should be passed to sigmac.
  • Utilize aggregation-based queries for detection.

Building detections with those requirements in mind, through sigmac, will require specific (per source) configurations and build commands.

These requirements led to adding logic in SIEGMA so that, directly from the rule file, we would be able to specify all configuration options we would like to pass to sigmac, without having to utilize separate build commands per rule.

Having rules that are built with one command while a different set of rules is built with another would be far from optimal. Additionally, the complexity increases with the inclusion of additional sources (with their own set of requirements).

In these situations we do all the heavy lifting in the rule file itself and we transfer the responsibility of parsing and taking action accordingly to SIEGMA.

Take a look at an example rule that utilizes some additional fields:

Defining the sigma and siegma fields in the rule itself makes it so that our rules are self-sustainable and building the SIEM consumable (which is what will reach our client cluster) can be achieved by always utilizing a single build command, across all rules (with different build requirements, detection sources, backend options, etc).

SIEGMA does have some additional bonus features. The field note, for example, is where we define our (Palantir inspired) Alert and Detection Strategy. During the build we will see which ADS is being referenced in the rule file and build our SIEM consumable with the information from that ADS file, which will eventually find its way into an easy to read guide inside of our clients managed SIEM.

All of this information and features have been made available in the SIEGMA project page. Feel free to give it a try.

We plan on sharing additional information on the processes and lessons learned. In a future post we'll focus on automation around Quality Assurance and, hopefully by then, we'll have some more open source projects shared with the community.

Are you doing something similar? Liked what you read? Stop by our Community Slack and let us know. Having an issue with SIEGMA? Reach out to us on Slack or open an issue in Github.

Popular posts from this blog

Community Update - 3CORESec Blacklist πŸ““ 🍯

Recently we tweeted about some issues we had with 3CORESec Blacklist , a platform that shares - openly and freely - a subset of the information seen and processed by our honeypot network.  While those issues have been addressed, and seeing as significant changes were made to how we monitor the generation of the lists (which is reflected in our status page ) and how we determine if an IP should be listed as an offending IP or not, this felt like a good opportunity to write a bit more about the platform as well as the changes we made.   For regular users of Blacklist πŸ““ the first thing they’ll notice is an increase on the numbers of IPs we include. That is a direct result of the changes we made and the growth of the honeypot network itself. We have not - and will not - ever increase the period for which we query the honeypot network, as we believe anything higher than 24h (as specified in the project page) for IP addresses can quickly fall into a decaying state that adds little value

Detection as Code (DaC) challenges - Introducing Automata

This blog post is the second part of our Detection as Code (DaC) challenges series. You can read part one here . The development process of detections by itself doesn't pose a lot of barriers for security engineering teams, as they are typically done in a lab/controlled environment, but after tuning and deploying rules to a SIEM, the work is only starting. Many things can go wrong after this, and a process of continued and automated testing is crucial. Detection Validation In an ideal (and fictional) world, once the datasets are parsed, normalized, and put into production, detections developed by your team would work forever. Still, the reality is quite different. Maintenance is heavy work that needs to be done frequently - especially if you work on an MSP - but the reality is that the ecosystem lacks tooling and processes to do it proactively. Effectiveness is an important metric and crucial to the successful response of incidents in time, and effectiveness is what we aim to ensu

Trapdoor - The serverless HTTP honeypot

  Today we are announcing the release of Trapdoor , our AWS-based serverless honeypot.  The idea of a serverless honeytoken isn't new. Adel released his honeyLambda a few years ago and we've been working with it for quite some time. It was because of this experience and the goal of improving on what was already a great idea that we decided to go to the drawing board and see how we would change and tweak the concept.  What is it? Trapdoor is a serverless application that can be deployed in any AWS environment. Its goal is to receive HTTP requests with the intent of identifying, and alerting, on its visitors. The URLs generated by Trapdoor can also be referred to as honeytokens .  While you can get creative on how to use it, one of the goals of a honeytoken is to be hidden or stored in a "safe" place and, if accessed, fire of an alarm, as access to the token would be considered a compromise or unauthorized access.  This example is the passive way of using deception ta