From 406a94c8ddef06ee9ccc2c8c6152f1b62939a354 Mon Sep 17 00:00:00 2001 From: Vincent Emonet Date: Mon, 19 Feb 2024 11:55:59 +0100 Subject: [PATCH] lab3 --- .../Lab3 - KG from unstructured data.ipynb | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/2024/lab3/Lab3 - KG from unstructured data.ipynb b/2024/lab3/Lab3 - KG from unstructured data.ipynb index f0a7318..af96ba9 100644 --- a/2024/lab3/Lab3 - KG from unstructured data.ipynb +++ b/2024/lab3/Lab3 - KG from unstructured data.ipynb @@ -610,6 +610,29 @@ "# displacy.render(next(doc.sents), style='dep', jupyter=True)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 🧫 Train spacy to recognize diseases\n", + "\n", + "Spacy is not able to recognize diseases. So we will train it for this purpose. \n", + "\n", + "We will use an existing [dataset where diseases have been annotated](https://raw.githubusercontent.com/MaastrichtU-IDS/prodigy-drug-indication-annotation/master/relation/dailymed_disease3.jsonl) in english sentences, containing ~500 annotations for diseases, sample:\n", + "\n", + "```json\n", + "{\n", + " \"text\":\" Iritis, iridocyclitis.\",\n", + " \"spans\": [\n", + " {\"start\":4,\"end\":10,\"token_start\":1,\"token_end\":1,\"label\":\"DISEASE\"},\n", + " {\"start\":12,\"end\":25,\"token_start\":3,\"token_end\":3,\"label\":\"DISEASE\"}\n", + " ]\n", + "}\n", + "```\n", + "\n", + "First generate the training data:\n" + ] + }, { "cell_type": "markdown", "metadata": {},