WFD import is a collection of scripts and tools for batch importing of European water data, reported under the Water Framework Directive, to Wikidata.
It currently supports harvesting data on River Basin Districts (RBDs) and Surface Water Bodies (SWBs).
To do a basic run call either RBD.py
or swb_import.py
from the commandline
together with the -in_file:<path>
argument. The -in_file
value should
be the url to an RBDSUCA .xml file for an RBD import or an SWB file (there is
one per RBD) for an SWB import.
The xml files only include the names of the RBD/SWB in English, if any. To
load the names in a local language use the -gml_file:<path>
argument together
with the path to a .gml file belonging to the RBD or SWB dataset.
You can also use WfdBot.load_data()
to create a json dump of either of the
.xml or .gml files. The local json dumps can then be used together with the
-in_file
or -gml_file
argument.
To create new items, rather than just enriching pre-existing ones, use the
-new
flag. A new item is one where the unique RBD/SWB code is not yet
associated with an item on Wikidata.
To ensure no data is written to Wikidata you may use the -simulate
flag which
will raise an error if the scripts attempt to write to Wikidata.
The -preview_file:<path>
also prevents data from being written to Wikidata,
instead outputting a preview of the data, in wikitext, to the file at the
specified path. See this page
for an example.
Preview can be combined with the -cutoff:<num>
argument which limits the
number of items being processed.
The scripts also support any common Pywikibot flags. Use -help
to get a list.
Sweden is used as an example in all links below:
The data can be found on the Central Data Repository (CDR) under European Union (EU) obligations/Water Framework Directive: River Basin Management Plans - 2016 Reporting.
- The RBDSUCA file (for Competent Authorities and RBDs) is found under National RBDSUCA.
- The SWB files can be found in their respective RBD directories under River Basin Districts.
- Both .gml files (one for RBDs and one for SWBs) are located under National spatial data.
It is worth noting that there may be multiple releases of data, found in separate directories at CDR, so you should always choose the most recent one. For the .gml files it varies from country to country whether either of them are made accessible to the public.
Before investing energy into adding support for a new country you should ensure that the license of the data is compatible with Wikidata (CC0). The default license on CDR is CC BY which is not free enough. You therefore need a source indicating that the member country released their data under a more permissive CC0 license.
- Create an item for the Report, and add to the "dataset" entry in
mappings.json
. See the 2016 Sweden report for an example item. - Add the country to the "countryCode" entry in
mappings.json
. - Add a mapping of the three-letter language code to the two-letter code used
on Wikidata to the "languageCode" entry in
mappings.json
. - Create items for each of the Competent Authorities listed in the RBDSUCA
file, then and add these to the "CompetentAuthority" entry in
mappings.json
. Instructions for the claims to use on a Competent Authorities item can be found on this page. - If you wish to add support for a new language add it to
self.langs
inWfdBot.__init__
and ensure all the required entries inmappings.json
have been translated.
During the run the above additions are validated to ensure all the required mappings are present. Feel free to add any new mappings as a pull request to this repository.
To import the data first run the RBD importer (with -new
if needed) before
running the SWB importer.
The results are of course limited by the data quality. If a country has not followed the guidelines on e.g. language labeling then imported data will also be wrong.
The full logic of Significant Impact has not been implemented. Specifically it does not adapt its output based on pre-existing values for other years. See see the porperty proposal for the full logic.
RBD does not make use of internationalRBDName
(nor does either SWB or RBD make
use of nameTextInternational
). Although this is supposed to be an international
(English) label the field was found to hold a variety of content.
Issues and bugs are tracked on Phabricator.
If pip -r requirements.txt
does not work correctly you might have to add
the --process-dependency-links
flag to ensure you get the right version
of Pywikibot and
lokal-profil/wikidata-stuff.
This repository was split off from lokal-profil/wikidata_batches so the history might be a bit mixed up.