Work towards the integration of spatial and textual data processing tools into a modular software package which features preprocessing, geocoding, disambiguation and visualization. Construction of gazzetteers and basic text processing functions are included. The installation works best with recent Linux and Mac systems (see below for more details).
Current reference: Barbaresi, A. (2017). Towards a toolbox to map historical text collections, Proceedings of 11th Workshop on Geographic Information Retrieval, ACM, Heidelberg.
Contents
Data helpers are included to derive geographic data from existing sources such as Geonames, Wikipedia or Wikidata (all under CC BY licenses), see for example Geonames with country codes:
>>> from geokelone import data
# decide countries for which Geonames information is downloaded
>>> countries = ['dk', 'fi'] # 2-letter tld-style country code
# go fetch the data
>>> codesdict, metainfo = data.geonames.fetchdata(countries)
# write files for further use
>>> data.geonames.writefile(codesdict, 'geonames-codes.dict')
>>> data.geonames.writefile(metainfo, 'geonames-meta.dict')
This tutorial uses a file provided in the tests
folder and the information gathered above to go from a tagged sentence to a map:
>>> from geokelone import data, geo, text
# read from a tagged text (one token per line)
>>> splitted = text.readfile.readtagged('tests/data/fontane-stechlin.tagged')
# load default gazetteer info (Geonames, see above)
>>> metainfo = data.load.geonames_meta('geonames-meta.dict')
>>> codesdict = data.load.geonames_codes('geonames-codes.dict', metainfo)
# search for place names and store a list of resolved toponyms with metadata
>>> results = geo.geocoding.search(splitted, codesdict, metainfo)
# write the results to a file
>>> text.outputcontrol.writefile('results.tsv', results, dict())
# load results from a file
>>> results = data.load.results_tsv('results.tsv')
# draw a map
>>> geo.mapping.draw_map('testmap.png', results)
Requires a file containing results of a placename extraction. The minimal requirements are a toponym and coordinates, see the example file in the tests
folder:
>>> from geokelone import data, geo
>>> results = data.load.results_tsv('tests/data/dummy-results.tsv')
>>> geo.mapping.draw_map('testmap1.png', results)
The map window can be configured using the settings.py
file.
Did you know there was a Jerusalem in Bavaria and a Leipzig in Ukraine?
A series of parameters can be set to affect both search and visualization, see settings.py
file.
Allowed values for the filter level are MAXIMUM
(conservative setting, recommended), MEDIUM
and MINIMUM
(better recall comes at a price).
Even with a touch of filtering, the token "Berlin" in Geonames resolves to a place north of Germany with a population of 0, see map below:
The helper function in data.load.load_tsv()
allow for additional registers to match particular needs, with particular levels (0 to 3), for example:
>>> from geokelone import data
# read from a TSV-file with three columns: name, latitude, longitude
>>> customized = data.load.load_tsv('file-X.tsv')
# read from a CSV-file with optional level option (additional metadata)
# four columns expected: name, canonical name, latitude, longitude
>>> customized = data.load.load_csv('file-Y.csv', level=1)
>>> results = geo.geocoding.search(splitted, codesdict, metainfo, customized)
The module includes helpers to navigate categories, for example the World Heritage Sites in England or the Cultural Landscapes of Japan and to fetch coordinates for a given list by querying Wikipedia.
>>> from geokelone.data import wikipedia
# chained operations for a list of categories
>>> wikipedia.process_todolist('mytodolist.txt', outputfile='solved.tsv', categories=True)
# discover entries in a category
>>> category_members = wikipedia.navigate_category('XYZ')
# process them one by one
>>> for member in category_members:
>>> lat, lon = wikipedia.find_coordinates(member)
>>> print(member, lat, lon)
# change language code for search (default is 'en')
>>> wikipedia.find_coordinates('Wien', language='de')
(48.208, 16.373)
For language-independent solutions in the Python world, see spacy or polyglot.
API-based geocoding solutions for Python: geopy and geocoder.
The instructions below have been tested on Linux with several system settings (see .travis.yml
file). It works best with recent Linux and Mac systems and Python version >= 3.5.
The cartographic components may need to be installed separately, for detailed instructions please refer to the Cartopy documentation.
Unofficial Windows binaries for Python packages are available here.
The proj library is needed. There are several ways to install it:
- From a package repository (preferably posterior to 2016)
- there are several options (libproj0 or libproj9 or libproj12), to let the system decide:
apt-get install libproj-dev proj-data proj-bin
- From source:
wget http://download.osgeo.org/proj/proj-5.2.0.tar.gz
tar -xzvf proj-5.2.0.tar.gz
cd proj-5.2.0 && ./configure --prefix=/usr && make && sudo make install
apt-get install libgeos-* libffi-dev libgdal-dev libxslt1-dev
Only Python3 (especially 3.4 onwards) is supported, although the scripts may work for Python 2.7.
Two options, from system repositories or through pip
:
- python3-dev python3-shapely python3-gdal python3-matplotlib python3-pyproj python3-shapely
- or simply
pip3 install cairocffi GDAL matplotlib pyproj shapely
For installation on Debian/Ubuntu simply follow the instructions (before_install:) in the travis.yml
file
Additional note on GDAL in case problems occur during installation:
gdal-config --version
sudo pip3 install --global-option=build_ext --global-option="-I/usr/include/gdal" GDAL==2.2.3
Finally, cartopy can be installed:
pip3 install Cython
(if not installed already)pip3 install cartopy
- or on newer systems:
apt-get install python3-cartopy
cf https://packages.ubuntu.com/source/zesty/python-cartopy - or see here: http://scitools.org.uk/cartopy/docs/latest/installing.html#installing
Direct installation of the latest version over pip is possible (see build status):
pip3 install git+https://github.com/adbar/geokelone.git
Why geokelone? Because.
Work in progress, see legacy page for more information: https://github.com/adbar/toponyms
- provide map configuration
- integrate named entity recognition tool from Python repositories
- add more import and export filters
- write more tests
- documentation
Uses of the code base so far:
- Barbaresi, A. (2018). Borderlands of text mapping: Experiments on Fontane's Brandenburg. Proceedings of INF-DH-2018 workshop.
- Barbaresi, A. (2018). A constellation and a rhizome: two studies on toponyms in literary texts. In Visual Linguistics, Bubenhofer N. & Kupietz M. (Eds.), Heidelberg University Publishing, pp. 167-184.
- Barbaresi, A. (2018). Toponyms as Entry Points into a Digital Edition: Mapping Die Fackel. Open Information Science, 2(1), De Gruyter, pp.23-33.
- Barbaresi, A. (2018). Placenames analysis in historical texts: tools, risks and side effects. In Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), Dept. of Geoinformation, TU Vienna, pp. 25-34.
- Barbaresi, A. (2017). Towards a toolbox to map historical text collections, Proceedings of 11th Workshop on Geographic Information Retrieval, ACM, Heidelberg.
- Barbaresi, A. and Biber, H. (2016). Extraction and Visualization of Toponyms in Diachronic Text Corpora. In Digital Humanities 2016: Book of Abstracts, pp. 732-734.
- Barbaresi, A. (2016). Visualisierung von Ortsnamen im Deutschen Textarchiv. In Proceedings of DHd 2016, Digital Humanities im deutschprachigen Raum e.V. pp. 264-267.