Skip to content

Annif 0.59

Compare
Choose a tag to compare
@juhoinkinen juhoinkinen released this 23 Sep 08:39
v0.59.0
278f2c3

This release makes many changes to how Annif handles vocabularies.

First, the vocabularies are now multilingual: projects with different languages can share the same vocabulary by using a common vocabulary id in the project configurations. The vocabulary id should no longer include a language specifier, which has been the practice until now. The language of the labels of subject suggestions is now defined by the project's language setting, or it can be overridden in a project by giving the language code in parentheses after the vocabulary id (e.g. vocab=lcsh(en) in a Finnish language project). These changes break the backward compatibility of existing projects and vocabularies.

The CLI command for loading a vocabulary has changed: the command is now annif load-vocab to align with the other annif commands and its first argument is a vocabulary id instead of a project id. When loading a vocabulary from a TSV file the --language option needs to be given to set the language. A command annif list-vocabs is introduced for listing vocabularies. The old annif loadvoc command still works in this release, but it has been deprecated and will be removed in the next Annif release.

The CLI commands are now documented in a page on the ReadTheDocs instead of the Annif wiki. The development installations of Annif now use Poetry for managing Python virtual environments and dependencies. There are also a few other minor changes, including an upgrade to Simplemma v0.8 series that introduced support for new languages.

Note also that we are starting to prepare for Annif 1.0 release. For this purpose we have opened the issue #616 for discussing the expectations of backward compatibility and Semantic Versioning in releases beyond 1.0.

Backward compatibility

The changes in the vocabulary functionality require reloading of previously loaded vocabularies and retraining of existing models.

New features

#559/#600 Make vocabularies multilingual
#602/#614 Implement load-vocab and list-vocabs commands
#603/#610 Store vocabs in AnnifRegistry so they are shared between projects
#597 Include labels without language tag and concepts without labels in vocabulary

Improvements

#617/#618 Upgrade to simplemma 0.8 and disable unnecessary cache
#595/#611 Autogenerated CLI commands documentation on ReadTheDocs
#612 Add Annif logo to ReadTheDocs sidebar
#608 Multilingual SubjectIndex backed by CSV file
#604 Refactor SubjectSuggestion to store subject_id - not uri, label, notation

Maintenance

#607 Remove language suffixes from vocabulary ids in example config
#606 Refactor SubjectSet and Document to store subject IDs instead of URIs and labels
#601/#605 Switch to Poetry for dependency management
#621 Remove curl from Docker image
#622 Remove Poetry cache from Docker image

Fixes

#613 Restore ability to use vocab language different from project language
#619 Allow use of hyphens in vocabulary IDs
#620 Make NN ensemble suggest operations silent