-
Notifications
You must be signed in to change notification settings - Fork 17
Dictionaries: creation from Wikidata
Here's last year's tutorial: https://github.com/petermr/tigr2ess/blob/master/wikimedia/wikidata.md
Make sure you understand items
and properties
A dictionary consists of terms (used for lexical matching). To make those understood to machines and humans we link them to Wikidata. In favourable cases you may find that all your terms are already in Wikidata and have a common Wikidata Property. We'll take "funders" .
-
Creating a dictionary using Wikidata Query Service/SPARQL saves time and energy. Wikidata is a knowledge database and SPARQL is a language to formulate queries using knowledge databases.
-
To start with, open the home page of WDQS/SPARQL : https://www.wikidata.org/wiki/Wikidata:WikiFactMine/Core_SPARQL
-
From the list on the left hand side of the home page, go to Query Service, this takes you to: https : //query.wikidata.org/
-
This opens the SPARQL query page where you can add your query. If you find any trouble in creating a SPARQL Query, you can create it from: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Query_Helper#How_to_create_a_Query?
OR You can create a query without knowing SPARQL, it's too easy. Go to Query Helper on the left side of Query Editor and apply filters to your search, then test it by running the query and so on.
-
Wikidata Query Service provides various tools to build a SPARQL Query. One of them is Wikidata Query Builder :
https://wd-query-builder.toolforge.org/
-
Just do the related search and Click on 'Show Query', this has built a SPARQL Query for you. eg. in my case, the property is Crossref Funder ID (P3153).
-
Set the limit of your results, it is by default 20, you can change it by clicking on to 'Query Options', I changed it to 20000 and got 13442 results.
-
Go to 'Search results': You will get a list of the items which you are searching for.
-
Go to 'link' and then 'SPARQL endpoint' and it's done.
-
The dictionary created is : https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql%20(4)
-
Reference TUTORIAL for creating queries : https://www.youtube.com/watch?v=kJph4q0Im98
Being Developed by PMR
PMR : I'm creating the map from:
<variable name='DiseaseLabel'/>
<variable name='instanceofLabel'/>
<variable name='DiseaseAltLabel'/>
<variable name='Disease'/>
<variable name='ICDcode'/>
to
--map wikidataID=DiseaseLabel,w_instanceOf=instanceof,wikidataAltLabel=DiseaseAltLabel,term=DiseaseLabel,name=DiseaseLabel,_ICDCode=ICDCode
The uncontrolled names are:
-
w_
for Wikidata Properties (we might change this to add the WD identifier) -
_
for completely uncontrolled names My current commandline is:
String cmd = "-vvv"
+ " --dictionary " + dictionary
+ " --directory=" + outputDir
+ " --input=" + inputFile
+ " create"
+ " --informat=wikisparqlxml"
+ " --sparqlmap "
+ "wikidata=Disease,"
+ "w_instanceOf=instanceof,"
+ "term=DiseaseLabel,"
+ "name=DiseaseLabel,"
+ "_ICD10=ICDCode"
+ " --synonyms=DiseaseAltLabel"
;
This maps from Wikidata-SPARQL names to amiNames . This means that the user has to create the --sparqlmap values.
Note that:
- more than one
amiTerm
can map to aWikidataSparql
(it's usuallyname
andterm
) - there is a special Option
synonyms
for retrieving the (not very well structured) list ofAltLabel
(it's a mess when they contain commas). -
p_foo
represents a Wikidata property _e.g.P31
instance of
-
q_foo
represents a a Wikidata item -
_foo
represents a dictionary-specific name that can be ignored by other dictionaries. (e.g._icd10
)