Skip to content
@DataCatalogue

DataCatalogue

GitHub organization for the research project DataCatalogue (Inria - BnF - INHA).

📜 Presentation of the Project

DataCatalogue is a research project jointly led by Inria Paris' ALMAnaCH research team, the Bibliothèque nationale de France (BnF), and the Institut national d'histoire de l'art (INHA). It is funded by Inria and the French Ministry of Culture. After an experimental phase in 2021-2022, the project has been renewed for a second phase in 2023-2024.

Our corpus consists in a sample of the sales catalogs from the collections of the BnF and the INHA (over 280,000 documents in total). The 713 catalogs in our sampled corpus are representative in terms of time periods (18th to 21st centuries) and types of sales (numismatics, books, antiquities, works of art, furniture, etc.). The vast majority of the catalogs is in French, but there are instances of catalogs in English and German as well. We aim at desining a complete and mostly automated workflow for processing sales catalogs from their digitization to their publication online as augmented documents that can be queried like a database.

🐈 The DataCatalogue Pipeline

[DIAGRAM COMING SOON]

📂 Repositories

  • .github → README for the DataCatalogue GitHub organization
  • datacat-object-detection-dataset → Development of the object detection model with YOLOv8
  • datacat-teiTEI customization for sales catalogs
  • extraction-internship → Internship on information extraction with GROBID (Abdel Farhi, 2022)
  • grobid-datacat → GROBID module for catalogs
  • grobid-datacat-TrainingData → Training datasets for the GROBID "catalogues" module
  • publication-internship → Internship on publication with TEI Publisher (Jules Nuguet, 2022)

📝 Bibliography

  • Hugo Scheithauer, Sarah Bénière, Jean-Philippe Moreux, & Laurent Romary. (2023, November 29). DataCatalogue : rétro-structuration automatique des catalogues de vente. Webinaire Culture Inria. https://hal.science/hal-04360229.
  • Thibault Clérice, Juliette Janès., Hugo Scheithauer, Sarah Bénière, Laurent Romary, & Benoît Sagot. (2024, August 6-9). Layout Analysis Dataset with SegmOnto. DH 2024 - Annual Conference of the Alliance of Digital Humanities Organizations, Washington, D.C., United States. https://inria.hal.science/hal-04513725.
  • Hugo Scheithauer, Sarah Bénière, & Laurent Romary. (2024, August 6-9). Automatic Retro-Structuration of Auction Sales Catalogs layout and Content. DH 2024 - Annual Conference of the Alliance of Digital Humanities Organizations, Washington, D.C., United States. https://hal.science/hal-04547239.

🖌️ Credits

Logo by Alix Chagué, inspiration from Loading Artist.

Pinned Loading

  1. datacat-object-detection-dataset datacat-object-detection-dataset Public

    DataCatalogue Object Detection Dataset

  2. datacat-tei datacat-tei Public

    TEI Customization for Encoding Sales Catalogues

    HTML

Repositories

Showing 7 of 7 repositories
  • datacat-tei Public

    TEI Customization for Encoding Sales Catalogues

    DataCatalogue/datacat-tei’s past year of commit activity
    HTML 0 CC-BY-4.0 0 0 0 Updated Aug 23, 2024
  • datacat-object-detection-dataset Public

    DataCatalogue Object Detection Dataset

    DataCatalogue/datacat-object-detection-dataset’s past year of commit activity
    0 0 0 0 Updated May 29, 2024
  • .github Public

    DataCatalogue project's readme.

    DataCatalogue/.github’s past year of commit activity
    0 0 0 0 Updated May 29, 2024
  • grobid-datacat Public

    A GROBID module for extracting and automatically structuring sale catalogues into TEI-XML format.

    DataCatalogue/grobid-datacat’s past year of commit activity
    Java 0 0 1 0 Updated Oct 13, 2022
  • grobid-datacat-TrainingData Public

    Training datasets for GROBID sale catalogues models.

    DataCatalogue/grobid-datacat-TrainingData’s past year of commit activity
    Python 0 CC-BY-4.0 1 0 0 Updated Oct 11, 2022
  • DataCatalogue/extraction-internship’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Jul 28, 2022
  • DataCatalogue/publication-internship’s past year of commit activity
    XQuery 1 0 0 0 Updated May 30, 2022

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…