A labeled dataset used for the threat intelligence knowledge graph construction. In this dataset, unstructured threat intelligence text is analyzed to extract the entities and relationships contained therein.
For the construction of threat intelligence knowledge graph, the few-shot learning capability of the GPT3.5 is used for data annotation with the following promp:
Despite the powerl generating capabilities of GPT3.5, the directly generated annotation results still have some errors. We manually correct a portion of the labeled dataset generated by GPT which used for the knowledge graph construction.
Since we use lora-based instruction tuning, the structure of the dataset consists of insturction, input and output. The "instruction" is the sentences of the report, the "input" is null, and the "output" contains the result of information extraction with entity and its type and relationship between these entities. An example is as follows.