Skip to content

Dictionaries: creation from Wikidata

petermr edited this page Jul 27, 2020 · 1 revision

what is wikidata?

Here's last year's tutorial: https://github.com/petermr/tigr2ess/blob/master/wikimedia/wikidata.md

Make sure you understand items and properties

wikidata links

A dictionary consists of terms (used for lexical matching). To make those understood to machines and humans we link them to Wikidata. In favourable cases you may find that all your terms are already in Wikidata and have a common Wikidata Property. We'll take "funders" .

Additions by : Vaishali Arora

  • Creating a dictionary using Wikidata Query Service/SPARQL saves time and energy. Wikidata is a knowledge database and SPARQL is a language to formulate queries using knowledge databases.

  • To start with, open the home page of WDQS/SPARQL : https://www.wikidata.org/wiki/Wikidata:WikiFactMine/Core_SPARQL

  • From the list on the left hand side of the home page, go to Query Service, this takes you to: https : //query.wikidata.org/

  • This opens the SPARQL query page where you can add your query. If you find any trouble in creating a SPARQL Query, you can create it from: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Query_Helper#How_to_create_a_Query?

    OR You can create a query without knowing SPARQL, it's too easy. Go to Query Helper on the left side of Query Editor and apply filters to your search, then test it by running the query and so on.

  • Wikidata Query Service provides various tools to build a SPARQL Query. One of them is Wikidata Query Builder :

    https://wd-query-builder.toolforge.org/
    
  • Just do the related search and Click on 'Show Query', this has built a SPARQL Query for you. eg. in my case, the property is Crossref Funder ID (P3153).

  • Set the limit of your results, it is by default 20, you can change it by clicking on to 'Query Options', I changed it to 20000 and got 13442 results.

  • Go to 'Search results': You will get a list of the items which you are searching for.

  • Go to 'link' and then 'SPARQL endpoint' and it's done.

  • The dictionary created is : https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql%20(4)

  • Reference TUTORIAL for creating queries : https://www.youtube.com/watch?v=kJph4q0Im98

Syntax for creating dictionary from Wikipedia

Being Developed by PMR

PMR : I'm creating the map from:

		<variable name='DiseaseLabel'/>
		<variable name='instanceofLabel'/>
		<variable name='DiseaseAltLabel'/>
		<variable name='Disease'/>
		<variable name='ICDcode'/>

to

--map wikidataID=DiseaseLabel,w_instanceOf=instanceof,wikidataAltLabel=DiseaseAltLabel,term=DiseaseLabel,name=DiseaseLabel,_ICDCode=ICDCode

The uncontrolled names are:

  • w_ for Wikidata Properties (we might change this to add the WD identifier)
  • _ for completely uncontrolled names My current commandline is:
		String cmd = "-vvv"
				+ " --dictionary " + dictionary
				+ " --directory=" + outputDir
				+ " --input=" + inputFile
				+ " create"
				+ " --informat=wikisparqlxml"
				+ " --sparqlmap "
				+ "wikidata=Disease,"
				+ "w_instanceOf=instanceof,"
				+ "term=DiseaseLabel,"
				+ "name=DiseaseLabel,"
				+ "_ICD10=ICDCode"
				+ " --synonyms=DiseaseAltLabel"
				;

This maps from Wikidata-SPARQL names to amiNames . This means that the user has to create the --sparqlmap values.

Note that:

  • more than one amiTerm can map to a WikidataSparql (it's usually name and term)
  • there is a special Option synonyms for retrieving the (not very well structured) list of AltLabel (it's a mess when they contain commas).
  • p_foo represents a Wikidata property _e.g. P31 instance of
  • q_foo represents a a Wikidata item
  • _foo represents a dictionary-specific name that can be ignored by other dictionaries. (e.g. _icd10)
Clone this wiki locally