LLM-TIKG-dataset

A labeled dataset used for the threat intelligence knowledge graph construction. In this dataset, unstructured threat intelligence text is analyzed to extract the entities and relationships contained therein.

Data Generation:

For the construction of threat intelligence knowledge graph, the few-shot learning capability of the GPT3.5 is used for data annotation with the following promp:

Despite the powerl generating capabilities of GPT3.5, the directly generated annotation results still have some errors. We manually correct a portion of the labeled dataset generated by GPT which used for the knowledge graph construction.

Data Structure:

Since we use lora-based instruction tuning, the structure of the dataset consists of insturction, input and output. The "instruction" is the sentences of the report, the "input" is null, and the "output" contains the result of information extraction with entity and its type and relationship between these entities. An example is as follows.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
fig		fig
README.md		README.md
entity&relationship.json		entity&relationship.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-TIKG-dataset

Data Generation:

Data Structure:

About

Releases

Packages

Netsec-SJTU/LLM-TIKG-dataset

Folders and files

Latest commit

History

Repository files navigation

LLM-TIKG-dataset

Data Generation:

Data Structure:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages