Annif 0.61
The main improvements in this release are internal changes to allow batch processing of documents for better suggestion performance and the streamlining of suggestion result representation by using sparse arrays. Currently batched processing of documents is implemented in the Omikuji, SVC, and all ensemble backends. Also a new REST API method for suggesting subjects for multiple documents has been added.
The new REST API method /v1/projects/{project_id}/suggest-batch
accepts at most 32 documents in one POST request; the documents in the batch are processed in parallel when the used backend provides support for this. The request body is given in JSON format and, like in the case of the regular single-document suggest method, the limit, threshold and language parameters are optional and can be given as URL query parameters. For details see the interactive OpenAPI documention of the REST API of annif.org.
The annif suggest
CLI command is augmented to accept path(s) to file(s) to be processed, in addition to stdin, to enable it to operate on multiple documents. The annif optimize
command is now much faster than before and supports using a --jobs
parameter for parallel processing.
The Annif Docker image has been updated to use Python 3.10.
Also various maintenance tasks have been performed, for example, the default branch of the git repository has been renamed from master
to main
, the Schemathesis tool has been introduced for testing the REST API and many dependendencies have been updated. A bug causing a memory leak in the neural network ensemble backend bas been fixed.
The next release of Annif will be version 1.0. For this purpose we have opened the issue #616 for discussing the expectations of backward compatibility and Semantic Versioning in releases beyond 1.0.
Backward compatibility:
- Models trained with Annif v0.60 should remain working; the warnings by SciKit-learn are harmless
- LRAP metric has been removed from evaluation results
New features:
#664 Add REST API method /v1/projects/{project_id}/suggest-batch
#663 Support for batch suggest operations for CLI commands
#423/#681 Parallelize optimize command
Improvements:
#678/#681 Represent suggestion results as sparse arrays
#665/#669 Batch suggest in Omikuji backend
#667/#670 Batch suggest in SVC backend
#677 Batch suggest in ensemble backends
#671 Add log message indicating finishing projects initialization
#673 Suppress duplicate log messages from subject module
Maintenance:
#668 Migrate codestyle to Black v23
#679/#680 Switch default git branch to main
#672 Fix slow CI/CD runs for Python 3.10
#675 Refactor and cleanup CLI module
#682/#685 Schemathesis tests for REST API and OpenAPI schema fixes
#683 Update dependencies v0.61
#691 Upgrade Docker image to Python 3.10