diff --git a/docs/docs/technology/assets/hello_world/exclamation.png b/docs/docs/technology/assets/hello_world/exclamation.png new file mode 100644 index 00000000..1f2e5221 Binary files /dev/null and b/docs/docs/technology/assets/hello_world/exclamation.png differ diff --git a/docs/docs/technology/assets/hello_world/period.png b/docs/docs/technology/assets/hello_world/period.png new file mode 100644 index 00000000..147b4357 Binary files /dev/null and b/docs/docs/technology/assets/hello_world/period.png differ diff --git a/docs/docs/technology/assets/hello_world/question_mark.png b/docs/docs/technology/assets/hello_world/question_mark.png new file mode 100644 index 00000000..90f08208 Binary files /dev/null and b/docs/docs/technology/assets/hello_world/question_mark.png differ diff --git a/docs/docs/technology/assets/zurich/Zurich.png b/docs/docs/technology/assets/zurich/Zurich.png new file mode 100644 index 00000000..323f628c Binary files /dev/null and b/docs/docs/technology/assets/zurich/Zurich.png differ diff --git "a/docs/docs/technology/assets/zurich/Z\303\274erich.png" "b/docs/docs/technology/assets/zurich/Z\303\274erich.png" new file mode 100644 index 00000000..9d8e388d Binary files /dev/null and "b/docs/docs/technology/assets/zurich/Z\303\274erich.png" differ diff --git "a/docs/docs/technology/assets/zurich/Z\303\274rich.png" "b/docs/docs/technology/assets/zurich/Z\303\274rich.png" new file mode 100644 index 00000000..47094182 Binary files /dev/null and "b/docs/docs/technology/assets/zurich/Z\303\274rich.png" differ diff --git a/docs/docs/technology/introduction.md b/docs/docs/technology/introduction.md index 1d9707da..cd069e21 100644 --- a/docs/docs/technology/introduction.md +++ b/docs/docs/technology/introduction.md @@ -15,24 +15,24 @@ Each node represents a different module or function in the pipeline, with a link ```mermaid flowchart TD - A0[Spoken Language Audio] --> A1(Spoken Language Text) - A1[Spoken Language Text] --> B[Language Identification] - A1 --> C(Normalized Text) - B --> C - C & B --> Q(Sentence Splitter) - Q & B --> D(SignWriting) - C -.-> M(Glosses) - M -.-> E - D --> E(Pose Sequence) - D -.-> I(Illustration) - N --> H(3D Avatar) - N --> G(Skeleton Viewer) - N --> F(Human GAN) - H & G & F --> J(Video) - J --> K(Share Translation) - D -.-> L(Description) - O --> N(Fluent Pose Sequence) - E --> O(Pose Appearance Transfer) + A0[Spoken Language Audio] --> A1(Spoken Language Text) + A1[Spoken Language Text] --> B[Language Identification] + A1 --> C(Normalized Text) + B --> C + C & B --> Q(Sentence Splitter) + Q & B --> D(SignWriting) + C -.-> M(Glosses) + M -.-> E + D --> E(Pose Sequence) + D -.-> I(Illustration) + N --> H(3D Avatar) + N --> G(Skeleton Viewer) + N --> F(Human GAN) + H & G & F --> J(Video) + J --> K(Share Translation) + D -.-> L(Description) + O --> N(Fluent Pose Sequence) + E --> O(Pose Appearance Transfer) linkStyle default stroke:green; linkStyle 3,5,7 stroke:lightgreen; @@ -53,7 +53,7 @@ The dictionary-based translation approach aims to simplify the translation but s ```mermaid flowchart LR - a[Spoken Language Text] --> b[Glosses] --> c[Pose Sequence] --> d[Video] + a[Spoken Language Text] --> b[Glosses] --> c[Pose Sequence] --> d[Video] ``` ![Visualization of one example through the dictionary-based translation pipeline](./assets//dictionary-pipeline.png) @@ -80,7 +80,7 @@ The machine translation approach aims to achieve similar translation quality to ```mermaid flowchart LR - a[Spoken Language Text] --> b[SignWriting] --> c[Pose Sequence] --> d[Video] + a[Spoken Language Text] --> b[SignWriting] --> c[Pose Sequence] --> d[Video] ``` ![Visualization of one example through the SignWriting-based translation pipeline](./assets/sign-tube-example.png) @@ -97,9 +97,33 @@ flowchart LR By combining a relatively small dataset of transcribed single signs (~100k) with a relatively small dataset of segmented continuous signs, and leveraging large video/text sign language datasets, we can automatically transcribe the latter. This process will generate large synthesized datasets for both **text-to-SignWriting** and **SignWriting-to-pose** conversions. -#### **Potential Quality:** +#### **Potential Quality** + +The system aims to accurately represent sign language grammar and structure, allowing for a good translation of both lexical and non-lexical signs, expressions, and classifiers. +Potentially, the system can be as good as a deaf human translator, given quality data. + +#### **Motivating Examples** + +##### Robustness to minor inconsequential changes + +Here is an example where a minor, inconsequential, and possibly even **wrong** modification to the spoken language yields the same correct translation in SignWriting (the sign for the city of zurich) but the dictionary yields different ones. + +| Text | Machine Translation | Dictionary Translation | +| ------------------------------------------------------------ | ---------------------------------------------------------------------------------- | ----------------------------------------------- | +| [Zürich](https://sign.mt/?spl=de&sil=sgg&text=Z%C3%BCrich) | ![SignWriting for Zurich in Swiss-German Sign Language](assets/zurich/Zürich.png) | The sign for Zurich (correct) | +| [Zurich](https://sign.mt/?spl=de&sil=sgg&text=Zurich) | ![SignWriting for Zurich in Swiss-German Sign Language](assets/zurich/Zurich.png) | Spelling the city name without umlaut (strange) | +| [Züerich](https://sign.mt/?spl=de&sil=sgg&text=Z%C3%BCerich) | ![SignWriting for Zurich in Swiss-German Sign Language](assets/zurich/Züerich.png) | Spelling the city name (strange) | + +##### Adaptivity to minor important changes + +Here is an example where a minor, important modification to the spoken language (exclamation) yields different, correct translations in SignWriting (reflecting the emotion) but the dictionary yields the same one. +Changing to question mark, the face correctly become questioning (even though the SignWriting is not perfect). -The system aims to accurately represent sign language grammar and structure, allowing for a good translation of both lexical and non-lexical signs, expressions, and classifiers. Potentially, the system can be as good as a deaf human translator, given quality data. +| Text | Machine Translation | Dictionary Translation | +| --------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | ------------------------------------------------- | +| [Hello world.](https://sign.mt/?spl=en&sil=ase&text=Hello%20world.) | ![SignWriting for "Hello World." in American Sign Language](assets/hello_world/period.png) | The sign for Hello followed by the sign for World | +| [Hello world!](https://sign.mt/?spl=en&sil=ase&text=Hello%20world!) | ![SignWriting for "Hello World!" in American Sign Language](assets/hello_world/exclamation.png) | The sign for Hello followed by the sign for World | +| [Hello world?](https://sign.mt/?spl=en&sil=ase&text=Hello%20world%3F) | ![SignWriting for "Hello World?" in American Sign Language](assets/hello_world/question_mark.png) | The sign for Hello followed by the sign for World | ## Signed to Spoken Language Translation @@ -107,16 +131,16 @@ Following, is a flowchart of the current translation pipeline from signed to spo ```mermaid flowchart TD - A0[Upload Sign Language Video] --> A3[Video] - A1[Camera Sign Language Video] --> A3 - A3 --> B(Pose Estimation) - B --> C(Segmentation) - C & B --> D(SignWriting Transcription) - A2[Language Selector] --> E(Spoken Language Text) - D --> E - E --> F(Spoken Language Audio) - E --> G(Share Translation) - C -.-> H(Sign Image) + A0[Upload Sign Language Video] --> A3[Video] + A1[Camera Sign Language Video] --> A3 + A3 --> B(Pose Estimation) + B --> C(Segmentation) + C & B --> D(SignWriting Transcription) + A2[Language Selector] --> E(Spoken Language Text) + D --> E + E --> F(Spoken Language Audio) + E --> G(Share Translation) + C -.-> H(Sign Image) linkStyle 1,2 stroke:orange;