dhSegment text

This a fork of the original dhSegment repository. It contains the code used for the experiments of the paper:

Barman, Raphaël, Ehrmann, Maud, Clematide, Simon, Ares Oliveira, Sofia, and Kaplan, Frédéric  (2020).
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers.
Journal of Data Mining and Digital Humanities. https://arxiv.org/abs/2002.06144

Modifications

The following modifications were made:

Changing the input pipeline to read embeddings
Creation of embeddings maps with several dimensionality reduction algorithms
Concatenation of the embeddings map inside the encoder or decoder

Usage

For general usage of dhSegment, see the original documentation.

The csv file now needs four columns: image, label, embeddings, embeddings_map.
Different configuration options were added for choosing the different hyperparamters and can be found in dh_segment_text/utils/params_config.py and in the encoder and decoder.
An example config can be found under embeddings_config.json.

The training can be launched using the trainer script with python dh_segment_train.py with /path/to/config.json.

Name		Name	Last commit message	Last commit date
Latest commit History 397 Commits
demo		demo
dh_segment_text		dh_segment_text
doc		doc
exps		exps
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
dh_segment_train.py		dh_segment_train.py
embeddings_config.json		embeddings_config.json
environment.yml		environment.yml
general_config.json		general_config.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dhSegment text

Modifications

Usage

About

Releases

Packages

Languages

License

raphaelBarman/dhSegment-text

Folders and files

Latest commit

History

Repository files navigation

dhSegment text

Modifications

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages