Skip to content

Commit

Permalink
lab3
Browse files Browse the repository at this point in the history
  • Loading branch information
vemonet committed Feb 19, 2024
1 parent 0f84795 commit 406a94c
Showing 1 changed file with 23 additions and 0 deletions.
23 changes: 23 additions & 0 deletions 2024/lab3/Lab3 - KG from unstructured data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,29 @@
"# displacy.render(next(doc.sents), style='dep', jupyter=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🧫 Train spacy to recognize diseases\n",
"\n",
"Spacy is not able to recognize diseases. So we will train it for this purpose. \n",
"\n",
"We will use an existing [dataset where diseases have been annotated](https://raw.githubusercontent.com/MaastrichtU-IDS/prodigy-drug-indication-annotation/master/relation/dailymed_disease3.jsonl) in english sentences, containing ~500 annotations for diseases, sample:\n",
"\n",
"```json\n",
"{\n",
" \"text\":\" Iritis, iridocyclitis.\",\n",
" \"spans\": [\n",
" {\"start\":4,\"end\":10,\"token_start\":1,\"token_end\":1,\"label\":\"DISEASE\"},\n",
" {\"start\":12,\"end\":25,\"token_start\":3,\"token_end\":3,\"label\":\"DISEASE\"}\n",
" ]\n",
"}\n",
"```\n",
"\n",
"First generate the training data:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down

0 comments on commit 406a94c

Please sign in to comment.