Building file2stix


If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on for the full interactive viewing experience. In this post I will describe some of the problems I wanted to solve building file2stix and a few tips to get up and running with it.

More-and-more organisations are standardising the way the represent threat intelligence using the STIX 2.1 data model.

As a result, an increasing number of SIEMs, SOARs, TIPs, etc. have native STIX 2.1 support.

However, authoring STIX 2.1 content can be laborious. I have seen analysts manually copy and paste data from reports, blogs, emails, and other sources into STIX 2.1 Objects.

In many cases these Observables (IOCs) can be automatically detected in plain text using defined patterns.

For example, an IPv4 observable has a specific pattern that can be identified using regular expressions. This regular expression will match an IPv4 observable;

Similarly, the following regular expression will capture URLs;


Both of these examples ( here and here, respectively) are taken from the brilliant Regular Expressions Cookbook (2nd edition) by Jan Goyvaerts and Steven Levithan.

Now this isn’t rocket science, and indeed there are already quite a few open source tools that contain regular expressions for extracting Observables in this way;

However, only one, /ninoseki/ioc-extractor, supported STIX output and it is somewhat limited in the STIX objects it supports.

Introducing file2stix

file2stix aimed to take the good parts of all of these products and build them into a single command line tool that;

file2stix offers 3 modes;

Out-of-the-box file2stix supports over 30 unique observable extraction regular expressions, including IP addresses, cryptocurrency, and MITRE ATT&CK STIX 2.1 Objects.

For MITRE knowledge-bases; the following ATT&CK data types from the Enterprise, Mobile and ICS matrices are supported;

And for CAPEC;

You can also add you own custom extractions and map them to STIX 2.1 Objects too using keyword matches.

One of the key features of file2stix is allowing for assignment of TLP to extracted objects to ensure resulting objects are shared correctly. file2stix uses the default STIX 2.1 TLP marking definitions to do this.

In many cases, inputted reports have fanged date (e.g. 1[.]1[.]1[.]1). Defanging obfuscates indicators into a safer representations so that a user reading a report does not accidentally click on a malicious URL or inadvertently run malicious code. Unfortunately, there is no universal standard for defanging, although there are some common methods. file2stix does not convert (and thus extract) fanged data by default, but can be instructed to do so.

It’s also possible to assign a confidence score to extracted data to convey the confidence in the reports findings. The STIX 2.1 confidence property is used to do this.

What file2stix is not

file2stix was designed to remove the tedious data entry often performed by Intel Analysts workload freeing them up to put their skills to work.

As such, this approach isn’t perfect; it will often create benign extractions. To counter this, file2stix allows for the use of MISP Warning Lists and custom Warning lists to flag erroneous extractions.

All extractions in file2stix are based on regular expressions. This works perfectly for pattern matching, but not at all when reading semantics.

There are a couple of open-source tools out there that solve this problem, MITRE TRAM is a good example. TRAM takes a report as an input and identifies ATT&CK Tactics and Techniques being discussed through NLP.

Similarly, file2stix is not smart enough to understand complex relationships between more than one object. All extractions in file2stix have single relationships back to the original report (vs. relationships between indicator and malware objects, for example).

In short, the output of file2stix is not designed to generate threat intelligence ready for dissemination. It does however generate threat intelligence that can be reviewed and worked with in other tooling.

Try file2stix now

Hopefully this post has given you a little insight into why I built file2stix and how you can use it.

To really understand the power of file2stix, take a look at the user documentation here

file2stix is available to download on Github here.

I hope you find it useful, and am always very happy to receive feedback either via Github issues, or directly via our Slack community.

Similar Posts You Will Enjoy Reading…

Originally published at on September 18, 2022.



I help early stage cyber-security companies to build products that make users go; “Wow! That’s what I need!”.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
David G

I help early stage cyber-security companies to build products that make users go; “Wow! That’s what I need!”.