Security operations have become increasingly proficient at using structured threat intelligence to enrich alerts, and accelerate investigation and threat hunting workflows. Threat intelligence platforms have largely automated the process of collecting, extracting and normalizing intel from structured data sources. But a significant amount of intelligence available today is still shared through blogs, advisories and research articles, which requires tedious manual processes to make it usable by SIEM and SOAR tools. Researchers and practitioners have been working on challenges related to extracting unstructured intelligence and making it useful for a variety of use cases. For those interested in this topic and are looking for a starting point I have created a brief list of blogs, projects and presentations covering different approaches and related NLP methods. This is not meant to be a long list of articles - it is meant to be a starting point that can help you drill down further.
- Unstructured Threat Processing Using NLP - Black Hat Arsenal 2015 . Approach uses natural language processing to extract STIX objects from US-CERT advisories and prioritize it by checking against internal assets list for relevance.
- Semi-Automated Information Extraction from Unstructured Threat Advisories - ACM Proceeding 2017 . Approach uses natural language processing, semi-supervised pattern identification and matching techniques to extract information present from security advisories and map it to STIX.
- A Supervised Machine Learning Based Approach for Automatically Extracting High-Level Threat Intelligence from Unstructured Sources - IEEE 2018 International Conference on Frontiers of Information Technology (FIT) . Approach uses Named Entity Recognition (NER) to extract data from unstructured intelligence and map to STIX.
- Making Sense of Unstructured Threat Intelligence Data - Integrated Cyber Conference 2019. Apply Doc2Vec classification methodology to cluster vulnerability descriptions from the NVD and map clusters to a specific ATT&CK technique.
- Death to the IOC - Presented at Black Hat 2019 . Describes the process to build a Cyber Entity Extractor and evaluation of various models on real world data.
Frameworks like STIX and Mitre ATT&CK offer an intermediary translation step between unstructured intelligence and machine usable intelligence. Advances in NLP and deep learning techniques will also spur new ideas and approaches to solving this problem. As new work comes to light I will keep adding to this list - and always happy to get recommendations from you!