Skip to content

A labeled dataset used for the knowledge graph construction.

Notifications You must be signed in to change notification settings

Netsec-SJTU/LLM-TIKG-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

LLM-TIKG-dataset

A labeled dataset used for the threat intelligence knowledge graph construction. In this dataset, unstructured threat intelligence text is analyzed to extract the entities and relationships contained therein.

Data Generation:

For the construction of threat intelligence knowledge graph, the few-shot learning capability of the GPT3.5 is used for data annotation with the following promp:

prompt

Despite the powerl generating capabilities of GPT3.5, the directly generated annotation results still have some errors. We manually correct a portion of the labeled dataset generated by GPT which used for the knowledge graph construction.

Data Structure:

Since we use lora-based instruction tuning, the structure of the dataset consists of insturction, input and output. The "instruction" is the sentences of the report, the "input" is null, and the "output" contains the result of information extraction with entity and its type and relationship between these entities. An example is as follows.

dataStructure

About

A labeled dataset used for the knowledge graph construction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published