Releases: openvinotoolkit/datumaro
Releases · openvinotoolkit/datumaro
Release 1.1.0rc1
What's Changed - Brief Version
Added
- Add with_subset_dirs decorator (Add ImagenetWithSubsetDirsImporter)
(#816) - Add CommonSemanticSegmentationWithSubsetDirsImporter
(#826) - Add DatumaroBinary format
(#828, #829, #830, #831) - Add Searcher CLI documentation
(#838) - Add version to dataset exported as datumaro format
(#842) - Add Ava action data format support
(#847) - Add Shift Analyzer (both covariate and label shifts)
(#855) - Add YOLO Loose format
(#856) - Add Ultralytics YOLO format
(#859)
Changed
- Refactor Datumaro format code and test code
(#824)
Fixed
- Fix image filenames and anomaly mask appearance in MVTec exporter
(#835) - Fix CIFAR10 and 100 detect function
(#836) - Fix celeba and align_celeba detect function
(#837) - Choose the top priority detect format for all directory depths
(#839) - Fix MVTec format detect function
(#843) - Fix wrong
__len__()
of Subset when the item is removed
(#854) - Fix mask visualization bug
(#860)
What's Changed - Full Version
- Add daily/weekly test triggers by @chuneuny-emily in #811
- Raise ImportError on importing malformed COCO directory by @vinnamkim in #812
- Upload data explorer model in public storage by @sooahleex in #813
- Merge back releases/v1.0.0 to develop for taping out v1.0.0.rc by @vinnamkim in #818
- Add with_subset_dirs decorator by @vinnamkim in #816
- Skip some video unit tests on MacOS by @vinnamkim in #825
- Update copyright year in PR template by @vinnamkim in #823
- Refactor Datumaro format code and test code by @vinnamkim in #824
- Add CommonSemanticSegmentationWithSubsetDirsImporter by @vinnamkim in #826
- Develop DatumaroBinaryFormat to export/import the dataset header & DatasetItem by @vinnamkim in #828
- Update weekly_check.yml by @yunchu in #833
- Remove Codacy badge in readme by @chuneuny-emily in #834
- Implement DatumaroBinaryFormat to export/import the image dataset completely by @vinnamkim in #829
- Fix bugs in mvtec exporter by @djdameln in #835
- [Doc] Add documentation for searcher cli by @sooahleex in #838
- Support PointCloud dataset by DatumaroBinary format by @vinnamkim in #830
- [TEST] move test_utils.py to tests package by @yunchu in #841
- Fix CIFAR10 and 100 detect function by @vinnamkim in #836
- Merge back/releases/v1.0.0 by @vinnamkim in #846
- [HOTFIX] Update ipas_default.config by @yunchu in #848
- [HOTFIX] Update ipas_default.config by @yunchu in #849
- [develop] Update .gitattributes by @yunchu in #851
- Choose the top priority detect format for all directory depths by @vinnamkim in #839
- mark datumaro library version when exporting as datumaro by @bonhunko in #842
- Add AVA action data format support by @wonjuleee in #847
- Fix MVTec format detect function by @vinnamkim in #843
- Fix wrong len() of Subset when the item is removed by @vinnamkim in #854
- Fix celeba and align_celeba detect function by @vinnamkim in #837
- Fix mask visualization bug by @vinnamkim in #860
- Add YOLO Loose format by @vinnamkim in #856
- Add ShiftAnalyzer to compute covariate and label shift between two datasets by @wonjuleee in #855
- Add Ultralytics YOLO format by @vinnamkim in #859
- Add full encryption/decryption functionalities for image datasets to DatumaroBinary format by @vinnamkim in #831
New Contributors
Full Changelog: v1.0.0...v1.1.0rc1
Release v1.0.0
Release v0.5.0
Added
- Add Tile transformation (#790)
- Add Video keyframe extraction (#791)
- Add TileTransform documentation and Jupyter notebook example (#794)
- Add MergeTile transformation (#796)
Changed
- Improved mask_to_rle performance (#770)
Deprecated
- N/A
Removed
- N/A
Fixed
- Fix auto-documentation for the data_format plugins (#793)
Security
- Add security.md file for the SDL (#798)
Release v0.4.0.1
Added
- Support for exclusive of labels with LabelGroup (#742)
- Jupyter samples
- Visualization Python API
- Documentation for Python API (#753)
- dataset handler, visualizer, filter descriptions (#761)
- Support for exporting as CVAT video format (#757)
- Jupyter notebook example rendering to documentation (#758)
- An interface to manipulate 'infos' to store the dataset meta-info (#767)
- 'bbox' annotation when importing a COCO dataset (#772)
Changed
- Wrap title text according to its plot width (#769)
- Get list of subsets and support only Image media type in visualizer (#768)
Deprecated
- N/A
Removed
- N/A
Fixed
- Correcting static type checking (#743)
- Fixing a VOC dataset export error when a label contains 'space' (#771)
Security
- N/A
Release v0.3.1
Added
- Support for custom media types, new
PointCloud
media type,DatasetItem.media
and.media_as(type)
members (#539) - [API] A way to request dataset and extractor media type with
media_type
(#539) - BraTS format (import-only) (.npy and .nii.gz), new
MultiframeImage
media type (#628) - Common Semantic Segmentation dataset format (import-only) (#685)
- An option to disable
data/
prefix inclusion in YOLO export (#689) - New command
describe-downloads
to print information about downloadable datasets (#678) - Detection for Cityscapes format (#680)
- Maximum recursion
--depth
parameter fordetect-dataset
CLI command (#680) - An option to save a single subset in the
download
command (#697) - Common Super Resolution dataset format (import-only) (#700)
- Kinetics 400/600/700 dataset format (import-only) (#706)
- NYU Depth Dataset V2 format (import-only) (#712)
Changed
env.detect_dataset()
now returns a list of detected formats at all recursion levels instead of just the lowest one (#680)- Open Images: allowed to store annotations file in root path as well (#680)
- Improved parsing error messages in COCO, VOC and YOLO formats (#684, #686, #687)
- YOLO format now supports almost any subset names, except
backup
,names
andclasses
(instead of justtrain
andvalid
). The reserved names now raise an error on exporting. (#688)
Deprecated
--save-images
is replaced with--save-media
in CLI and converter API (#539)- [API]
image
,point_cloud
andrelated_images
ofDatasetItem
are replaced withmedia
andmedia_as(type)
members and c-tor parameters (#539)
Removed
- N/A
Fixed
- Detection for LFW format (#680)
- Adding depth value of image when dataset is exported in VOC format (#726)
- Adding to handle the numerical labels in task chains properly (#726)
- Fixing the issue that annotations inside another annotation (polygon) are duplicated during import for VOC format (#726)
Security
- N/A
Release v0.3: Video Support
Added
- Ability to import a video as frames with the
video_frames
format and to split a video into frames with thedatum util split_video
command (#555) --subset
parameter in theimage_dir
format (#555)MediaManager
API to control loaded media resources at runtime (#555)- Command to detect the format of a dataset (#576)
- More comfortable access to library API via
import datumaro
(#630) - CLI command-like free functions (
export
,transform
, ...) (#630) - Reading specific annotation files for train dataset in Cityscapes (#632)
- Random sampling transforms (
random_sampler
,label_random_sampler
) to create smaller datasets from bigger ones (#636, #640) - API to report dataset import and export progress; API to report dataset import and export errors and take action (skip, fail)
(supported in COCO, VOC and YOLO formats) (#650) - Support for downloading the ImageNetV2 and COCO datasets (#653, #659)
- A way for formats to signal that they don't support detection (#665)
- Removal transforms to remove items/annoations/attributes from dataset (
remove_items
,remove_annotations
,remove_attributes
) (#670)
Changed
- Allowed direct file paths in
datum import
. Such sources are imported like when therpath
parameter is specified, however, only the selected path is copied into the project (#555) - Improved
stats
performance, added new filtering parameters, image stats (unique
,repeated
) moved to thedataset
section,
removedmean
andstd
from thedataset
section (#621) - Allowed
Image
creation from justsize
info (#634) - Added image search in VOC XML-based subformats (#634)
- Added image path equality checks in simple merge, when applicable (#634)
- Supported saving box attributes when downloading the TFDS version of VOC (#668)
- Switched to a
pyproject.toml
-based build (#671)
Deprecated
- TBD
Removed
- Official support of Python 3.6 (due to it's EOL) (#617)
- Backward compatibility annotation symbols in
components.extractor
(#630)
Fixed
- Prohibited calling
add
,import
andexport
commands without a project (#555) - Calling
make_dataset
on empty project tree now produces the error properly (#555) - Saving (overwriting) a dataset in a project when rpath is used (#613)
- Output image extension preserving in the
Resize
transform (#606) - Memory overuse in the
Resize
transform (#607) - Invalid image pixels produced by the
Resize
transform (#618) - Numeric warnings that sometimes occurred in
stats
command (e.g. #607) (#621) - Added missing item attribute merging in simple merge (#634)
- Inability to disambiguate VOC from LabelMe in some cases (#658)
Security
- TBD
Release v0.2.3: Public dataset downloading
Added
- Command to download public datasets (#582)
- Extension autodetection in
ByteImage
(#595) - MPII Human Pose Dataset (import-only) (.mat and .json) (#584)
- MARS format (import-only) (#585)
Changed
smooth_line
fromdatumaro.util.annotation_util
- the function is renamed toapproximate_line
and has updated interface (#592)- The
pycocotools
dependency lower bound is raised to2.0.4
(#449)
Deprecated
- Python 3.6 support
Fixed
Release v0.2.2
Added
- Video reading API (#521)
- Python API documentation site (#526)
- Mapillary Vistas dataset format (Import-only) (#537)
- Datumaro can now be installed on Windows on Python 3.9 (#547)
- SYNTHIA dataset format (Import-only) (#532)
- Support of
score
attribute in KITTI detection (#571) - Support for Accuracy Checker dataset meta files in formats (#553, #569, #575)
- VoTT dataset format (Import-only) (#573)
- Image resizing transform (#581)
Changed
- The following formats can now be detected unambiguously:
ade20k2017
,ade20k2020
,camvid
,coco
,cvat
,datumaro
,icdar_text_localization
,icdar_text_segmentation
,icdar_word_recognition
,imagenet_txt
,kitti_raw
,label_me
,lfw
,mot_seq
,open_images
,vgg_face2
,voc
,widerface
,yolo
(#531, #536, #550, #557, #558) - Allowed export options in the
datum merge
command (#545)
Deprecated
- Using
Image
,ByteImage
fromdatumaro.util.image
- these classes are moved todatumaro.components.media
(#538)
Removed
- Equality comparison support between
datumaro.components.media.Image
andnumpy.ndarray
(#568)
Fixed
Release v0.2.1
A bugfix release. Relaxes some requirements on formats.
Added
- Import for CelebA dataset format (#484)
Changed
- File
people.txt
became optional in LFW (#509) - File
image_ids_and_rotation.csv
became optional Open Images (#509) - Allowed underscores (
_
) in subset names in COCO (#509) - Allowed annotation files with arbitrary names in COCO (#509)
- The
icdar_text_localization
format is no longer detected in every directory (#531) - Updated
pycocotools
version to 2.0.2 (#534)
Fixed
- Unhandled exception when a file is specified as the source for a COCO or MOTS dataset (#530)
Release v0.2: Dataset versioning
This release adds dataset versioning capabilities and significantly changes the command line.
It also improves CLI and API documentation, and extends the transformations library.
A Datumaro project can contain and manage multiple datasets instead of a single one.
CLI operations can be applied to the whole project, or to separate datasets.
Datasets are now modified inplace, by default. The project layout is updated. To update
an old project to the new version, use datum project migrate
.
Added
- A new installation target:
pip install datumaro[default]
, which should be
used in most cases by default. The simpledatumaro
is supposed for library users (#238) - Dataset and project versioning capabilities (Git-like) (#238)
- [CLI] "dataset revpath" concept in CLI, allowing to pass a dataset path with
the dataset format indiff
,merge
,explain
andinfo
CLI commands (#238) - [CLI]
import
,remove
,commit
,checkout
,log
,status
,info
CLI commands (#238) - [CLI]
patch
CLI command to patch one dataset from another (#401) - [CLI, API]
ProjectLabels
transform to change dataset labels for merging etc. (#401, #478) - [API] Type annotations and docs for
Annotation
classes (#493) - [formats] Support for custom labels in the KITTI detection format (#481)
- [formats]
Coco*Extractor
classes now have an option to preserve label IDs from the
original annotation file (#453) - [formats] Options to control label loading behavior in
imagenet_txt
import (#434, #489) - Data collection by telemetry. Check this notice about the details (#495)
Changed
- A project can contain and manage multiple datasets instead of a single one.
CLI operations can be applied to the whole project, or to separate datasets.
Datasets are modified inplace, by default (#328) - [CLI] The
import
command copies datasets by default. Useadd
to add datasets without copying (#508) - [CLI] Projects use new file layout, incompatible with old projects.
An old project can be updated withdatum project migrate
(#238) - [CLI]
diff
andediff
are joined into a singlediff
CLI command (#238) - [CLI] CLI help for builtin plugins doesn't require project (#328)
- [API] The
Project
class fromdatumaro.components
is changed completely (#238) - [API] Inheriting
CliPlugin
is not required in plugin classes (#238) - [API]
Importer
s do not createProject
s anymore and just return a list of
extractor configurations (#238) - [API] Annotation-related classes were moved into a new module,
datumaro.components.annotation
(#439) - [API] Rollback utilities replaced with Scope utilities (#444)
Removed
- [CLI]
project merge
CLI command (#238) - Support for project hierarchies. A project cannot be a source anymore (#238)
- A project cannot have independent internal dataset anymore. All the project
data must be stored in the project data sources (#238) datumaro_project
format (#238)- [API] Unused
path
field ofDatasetItem
(#455)
Fixed
- Deprecation warning in
open_images_format.py
(#440) lazy_image
returning unrelated data sometimes (#409)- Invalid call to
pycocotools.mask.iou
(#450) - Importing of Open Images datasets without image data (#463)
- Return value type in
Dataset.is_modified
(#401) - Incorrect remapping of secondary categories in
RemapLabels
(#401) - VOC dataset patching for classification and segmentation tasks (#478)
- Exported mask label ids in KITTI segmentation (#481)
- Missing
label
forPoints
read in the LFW format (#494)