The goal of this analysis was to create an interactive work of data journalism showing the diversity in the city names of European cities.
The analysis covers 30 major cities located in 17 EU member states or candidate countries, including 0 capital cities: Athens, Berlin, Brussels, Bucharest, Budapest, Chisinau, Copenhagen, Kyiv, Lisbon, Madrid, Paris, Prague, Rome, Stockholm, Vienna, Warsaw and Zagreb. The selection aims at covering most regions of Europe; it was influenced by our team’s language skills and by the availability of reliable external sources for the verification of data quality. Beside capital cities, we added the second largest city in several countries and a few other cities inhabited by at least 500,000 people. City borders refer to the borders of the local administrative units (LAU) distributed by Eurostat, except for Brussels and Lisbon where the relevant communes and freguesias were merged.
This project is based on open, crowdsourced data sources. What we did, in essence, was to match the list of streets built by OpenStreetMap (and made available by Geofabrik) with the Wikidata identifier of the entities they are dedicated to. Matching was first done automatically and then manually verified street by street using a dedicated interface. Wikidata is a database associated with Wikipedia that allows information to be extracted in a systematic manner. For example, in Wikidata Marie Curie is associated with the identifier Q7186. From here, we extracted the field 'sex or gender', which was then used in the analysis along with information coming from other fields (occupation, date of birth, etc.). For all persons for whom the Wikidata identifier was missing or incomplete, we attributed the gender ourselves.
The process of data collection is described step-by-step and discussed in these detailed methodological notes [1], [2]. Here we are also sharing the script containing the process used to analyze the raw data in order to obtain the aggregate results.
Meticulous quality control work was carried out at different stages, both automatically and manually, often relying on information provided by external sources for each city (e.g. official lists of street names released by municipalities). However, this is a dataset of considerable size and complexity – just think of how many variants there can be of the same name, or all the cases of homonymy. If you notice something missing, strange, or simply incorrect, you can help us either by editing the data directly at the source, i.e. contributing to OpenStreetMap or Wikidata, or by letting us know at info@europeandatajournalism.eu.
The data is based on the archive of streets that can be travelled by vehicle or on foot according to OpenStreetMap. Streets that have not yet been added to OpenStreetMap are missing from our analysis, as are some public areas that are categorised differently. We estimate that this limitation has a minimal impact on the overall data.
We decided to focus only on streets that are dedicated to specific individuals. We have therefore excluded streets dedicated to collective names, e.g. streets dedicated to blood donors, victims of terrorism, or workers such as blacksmiths or tailors. We have tried to attribute a street to a person only when it is clear that the reference is actually to that person, and not to another entity. For example, we would consider a hypothetical 'St. Francis’s Church road' to be dedicated to the building, and not directly to the saint.
We decided to focus on streets dedicated to individuals in the broad sense: we therefore include all individuals who have existed in the real world, as well as individuals who may or may not have existed (such as mythical kings or poets) and fictional humans such as some protagonists of literary works. Most gendered anthropomorphic figures, e.g. Nordic gods and goddesses, are also included.
Many cases of homonymy were resolved by relying on an external source providing additional details on the exact individual in question. Other cases were resolved by prioritising the individual who is by far the most present on Wikidata in terms of number of statements, provided that the attribution of a street to her/him in a given city is realistic. In cases of homonymy where it was not possible to understand to whom precisely a street is dedicated, the individual was not matched with a Wikidata ID but was only attributed a gender.
In Wikidata, the field 'sex or gender' is not defined by a binary choice (here) is a list of the many options available). We have relied on the first name to attribute gender to persons for whom there is currently no such information on Wikidata: this field should be interpreted as 'gender identity assigned at birth'. For a discussion of the limitations of this approach, we suggest this work by Lincoln Mullen and this chapter of the book ‘Data Feminism’ by Catherine D'Ignazio and Lauren Klein.
Some women that give their names to streets across Europe are currently missing from Wikidata, or some key information about them is missing, e.g. year or place of birth, occupation, etc. This is partly a structural problem: some streets are named after local inhabitants or landowners with no historically noteworthy achievements and remain basically unknown, except for their name. In other cases gaps will probably be filled over time. Except for gender, we excluded from the analyses of individuals’ profiles and features the cities where Wikidata coverage of women is below 75%, i.e. Madrid, Bucharest, and Zagreb.
Wikipedia specifies the occupation of most individuals. Hundreds of different occupations are present: in order to make sense of the data, we grouped these occupational categories into five broad fields of activity, i.e. politics and government; military; religion; culture, science, arts; other occupations. In the maps of the various cities, the colour of the streets dedicated to women who were active in more than one field depends on the first occupation listed on Wikipedia.
Street names are increasingly attracting attention by journalists, activists, and scholars across Europe. Mapping Diversity is the first large-scale journalistic project covering the street names of dozens of cities in multiple European countries; here are a few links to other projects, initiatives, and groups that you may want to check out:
- EqualStreetNames, a project coordinated by Open Knowledge Belgium and launched in 2020. It has been using open data to map the gender gap in street names in a few dozen cities across Europe (mostly located in Belgium and Germany).
- Las calles de las mujeres, a project run by the Geochicas. It maps the gender gap in street names in cities across Spain and Latin America.
- Toponomastica femminile, an Italian association monitoring the gender gap in street names in Italy and carrying out dissemination and awareness-raising activities.
- STNAMES LAB, a research group based in Sevilla focusing on the quantitative analysis of urban toponyms in Europe and the United States, and MILL, a research project looking at street (re)naming in Poland and Germany.
- Various journalistic projects have looked at street names from various national or local perspectives. They include, for instance, Las calles de ellas about Spain, Straßenbilder about Germany, Nevek és terek about Budapest, Ruas do Género about Porto, this project by Le Figaro about Paris, this project by iRozhlas about Czech cities.
This project is licensed under the Creative Commons Attribution 4.0 International (CC BY-SA 4.0). To cite this project please refer to the European Data Journalism Network, as well as the data provider (Speedtest by Ookla Global Fixed and Mobile Network Performance Maps).