Skip to content

Commit

Permalink
reorganisation of the documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
lfoppiano committed Sep 7, 2023
1 parent 10f8957 commit 877d05b
Show file tree
Hide file tree
Showing 26 changed files with 552 additions and 581 deletions.
470 changes: 5 additions & 465 deletions README.md

Large diffs are not rendered by default.

49 changes: 49 additions & 0 deletions docs/api-documentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# API documentation

The application supports custom `root_path`, which can be configured from the `config.yaml` file. All the API is served
under the custom `root_path`.

The API documentation is provided by apiflask OpenAPI (swagger) implementation.

| URL | Description |
|-----------|------------------------------------------------|
| `/spec` | Serve the OpenAPI documentation as YAML |
| `/redoc` | Serve the OpenAPI documentation via redoc |
| `/docs` | Serve the OpenAPI documentation via swagger-UI |

Following an API documentation summary:

| URL | Method | Description |
|---------------------------------------------------------|------------|----------------------------------------------------------------------------------|
| `/annotation/<doc_id>` | GET | Return the JSON annotation representation of a document |
| `/biblio/<doc_id>` | GET | Get the bibliographic data of the document by document id |
| `/config` | GET | Get the configuration |
| `/database` | GET | Render the database interface |
| `/database/document/<doc_id>` | GET | Get the tabular data filtering by doc_id |
| `/document/<doc_id>` | GET | Render the PDF viewer (PDF document + JSON annotations) |
| `/label/studio/project/{project_id} ` | GET | Get information from a label-studio project |
| `/label/studio/project/{project_id}/record/{record_id}` | POST/PUT | Send annotation task to Label studio |
| `/label/studio/project/{project_id}/records` | POST/PUT | Send all annotation tasks to Label studio |
| `/label/studio/projects ` | GET | Get the list of projects from label-studio (annotation tool) |
| `/pdf/<doc_id>` | GET | Return the PDF document corresponding to the identifier |
| `/publishers` | GET | Get list of all publishers in the database |
| `/record` | POST | Create a new record |
| `/record/<record_id>` | DELETE | Remove a record by its id |
| `/record/<record_id>` | GET | Return the single record |
| `/record/<record_id>` | PUT/PATCH | Update the record |
| `/record/<record_id>/mark_invalid` | PUT/PATCH | Mark a record as invalid |
| `/record/<record_id>/mark_validated` | PUT/PATCH | Mark a record as validated |
| `/record/<record_id>/reset` | PUT/PATCH | Reset record status |
| `/record/<record_id>/status` | GET | Return the flags of a single record |
| `/records` | GET | Return the list of records |
| `/records/<type>` | GET | Return the list of records of a specific type `automatic`/`manual` |
| `/records/<type>/<publisher>/<year>` | GET | Return the list of records of a specific type + publisher + year |
| `/records/<type>/<year>` | GET | Return the list of records of a specific type + year |
| `/stats` | GET | Return statistics |
| `/training/data` | GET | Get the list of all training data stored in the database |
| `/training/data/status/<status>` | GET | Get the training data by status (of the training data: new, exported, corrected) |
| `/training/data/<training_data_id>` | GET | Export training data |
| `/training/data/<training_data_id>` | DELETE | Remove training data |
| `/training_data` | GET | Render interface for managing the training data |
| `/version` | GET | Render interface for managing the training data |
| `/years` | GET | Render interface for managing the training data |
161 changes: 161 additions & 0 deletions docs/curation_interface.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# Curation interface documentation

![record-list.png](images/supercon2-overview.png)

## Overview

The `supercon2` service provides the following features:

- Visualisation of materials-properties records as a table, with search/filtering, sorting, selection of non-empty values
- Visualisation of "augmented" PDFs with highlight of the annotations identifying materials and properties
- Reporting of invalid records (**mark as invalid**): records can be manually marked as invalid
- Records curation (**curation**): users can correct records or add missing records to existing documents.
- Automatic collection of training data: when a record has been corrected the information of the sentence, spans (the
annotations) and tokens (the tokens, including layout information, fonts, and other features) are collected

**Design principles**
- each document is identified by an 8-character hash code. To save space we do not allow to store multiple version of the same paper (paper with the same hash).
- each record is linked to the document by the document hash
- correcting a record will generate a new record and link it to the original, so that will be possible, in future to undo modifications or to visualise what was updated, when and how

**NOTE**: The technical details of the curation interface can be found [here](docs/data_workflow).

**Terminology**
- **Incorrect** = wrong (e.g. 3 K extracted instead of 30K is incorrect) [Ref](https://forum.wordreference.com/threads/invalid-incorrect-wrong.2776284/post-14029941)
- **Invalid** = wrong through being inappropriate to the situation (e.g. Tm or T curie extracted as superconducting critical temperature is invalid) [Ref](https://forum.wordreference.com/threads/invalid-incorrect-wrong.2776284/post-14029941)

**NOTE**: Additional details on record status and error types can be found [here](docs/readme.md)

**Future plans**
- Undo/redo functionality: possibility to revert incorrect edits and modification of the database
- Document versioning
- ...

## Features

Here a list of the main features, please notice that they **can all be used simultaneously**.

### Table columns filtering

By entering keywords in each column is possible to filter records by multiple filters

![](images/filter-by-keywords.png)

### Filter by document

There is a shortcut for identify only records belonging to a specific document (see column Document)

![](images/filter-by-document-1.png)

In the following figure only record of document `11d82...` are shown:

![](images/filter-by-document-2.png)


### Change which columns to visualise

The default view does not show all the attribute of the database

![](images/modify-visualised-columns-1.png)

it's possible to extend the table by using the "select columns" feature:

![](images/modify-visualised-columns-2.png)

### Hide empty/blank values

It's possible to show only **records for which certain column(s) contains non-blank characters (spaces, break lines, tabs, etc..):

![](images/visualise-non-empty-fields-1.png)
in this example the user sees only records of materials with "Applied pressure":
![](images/visualise-non-empty-fields-2.png)

such filters can be "combined" on multiple columns (e.g. formula + applied pressure):
![](images/visualise-non-empty-fields-3.png)

### Multi column sorting

The interface supports multicolumn sorting, the number indicate the priority, the arrow the order (ascendent or descendent):
![](images/multicolumn-sorting.png)

### Annotated sentence
The annotated sentence indicate the entities related to the sentence where the record belongs.
The sentence can be expanded by clicking on top of it (the mouse cursor should change)
![](images/annotated-sentence.png)

### Entity-id / document-id
The document id can be used for
- visualise only records belonging to the selected document (click on the icon near the document id)
- open the pdf viewer (**notice that the current version of the pdf viewer is not integrated in the tabular view. Removed record in the tabular view will still be visualised in the pdf viewer.**)

The entity-id is the unique identifier for each entity. It can be expanded by clicking on it, or copied by clicking on the clipboard icon.

![](images/document-entity-id.png)

### Mark records as validated / invalid

The "Record reporting" allows users and curator to quickly mark corrected or incorrected records.
There is a panel of actions described in the following figure:

![flagging_interface.png](images/flagging_interface.png)

The user can reset the status of a record, not the content of the record.

### Record manipulation (edit/remove/add)

The interface allow to manipulate records with three possible actions:
- edit record
- remove record
- add new record in the same document (the bibliographic data in the edit dialog will be already filled up)

![](images/record-actions.png)

The record can be edited on the following interface:
![](images/record-edit-dialog.png)

When adding a new record, the bibligrpahic data will be pre-filled:
![](images/add-new-record.png)

In any case, in the case of any modification (edit, add, or remove) the user have to select an error type.
This is mandatory in order to be able to save the modifications.

![](images/error-type-selection.png)

### Look for the extracted value on the paper

You can go to the document page from the database page.

![go-to-document-page.png](images/go-to-document-page.png)

Go to the position of the material on the paper.

![go-to-the-material-position.png](images/go-to-the-material-position.png)

Find the extracted value from bulb icon.

![bulb-icon-as-a-guide.png](images/bulb-icon-as-a-guide.png)

Show the detail of the material.

![show-the-detail-of-the-one](images/show-the-detail-of-the-one.png)

## Keystrokes

The interface can be managed entirely with the keyboard, which improves the efficiency of the curation work.

The table can be navigated using the arrows after having selected one row with the mouse.

The shortcuts are:

| Key | Description |
|--------------|--------------------------------------------------------------|
| n | Add new record (in the same document of the selected record) |
| e | Edit the selected record |
| ⌘ + Enter | Save the record in the edit dialog (Mac) |
| Ctrl + Enter | Save the record in the edit dialog (Win) |
| arrow-up | Selection up one record |
| arrow-down | Selection down one record |
| enter | Flag/unflag the selected record |
| ? | Show the keyboard shortcuts dialog |
| esc | Close the dialog you open |

50 changes: 50 additions & 0 deletions docs/curation_workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Curation workflow

The curation workflow is summarised in the following schema:
![](images/record-correction.png)

## Workflow control

### Error types

The error types or causes for which the material-properties record is incorrect.

They answer to the question: "What was the cause?"

| Name | Description |
|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| From table | The entities Material -> Tc -> Pressure is identified in a table. At the moment table extraction is not performed. |
| Extraction | The material, temperature, pressure is not extracted (no box) or extracted incorrectly. |
| Linking | The material is incorrectly linked to the Tc given that the entities are correctly recognised |
| Tc classification | The temperature is not correctly classified as "superconductors critical temperature" (e.g. Curie temperature, Magnetic temperature…) |
| Composition resolution | The exact composition cannot be resolved (e.g. the stochiometric values cannot be resolved) |
| Value resolution | The extracted formula contains variables that cannot be resolved, even after having read the paper. This includes when data is from tables. [#125](https://github.com/lfoppiano/supercon2/issues/125) |

![](images/error_types.png)

### Status flags

The workflows flags are properties within the database that are used to mark different status of the data:

- `type` indicate the type of operation that was performed
- `status` indicate the status of the current record

and their value is used as follows:

| Name | Values | Visible to users | Description |
|--------|-----------|------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| type | manual | true | The performed operation was manual |
| type | automatic | true | The performed operation was automatic (anomaly detection, loading script) |
| status | valid | true | The record is valid (if `type=automatic` the record might still be wrong) |
| status | invalid | true | The record is probably invalid, incorrect |
| status | obsolete | false | The record is obsolete, a new record supersedes it and the new record will point to the old (if we assume that a correction will create new record) |
| status | new | true | the record has been added by the automatic process |
| status | curated | true | the record has been edited (a curated record will also contain the [error type](docs/readme.md#error-types)) |
| status | validated | true | the record was validated by a curator as correct (this could have been done from a new or a curated record) |
| status | removed | false | the record was removed |
| status | empty | false | the originating document does not have any extracted information |

However, the flags should be used in pair and the state change is illustrate as follows:

![](images/status-flags-schema.png)

Loading

0 comments on commit 877d05b

Please sign in to comment.