Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if input is USFM itself before attempting to parse #189

Open
kavitharaju opened this issue Oct 29, 2022 · 2 comments
Open

Check if input is USFM itself before attempting to parse #189

kavitharaju opened this issue Oct 29, 2022 · 2 comments

Comments

@kavitharaju
Copy link
Collaborator

How usfm_grammar 3.x would behave if we gave it a random text file?
Can we do some checks like, if no \id found in the first 3 content lines of the file, then bail?

@cmahte
Copy link

cmahte commented Oct 29, 2022

Here's what I use:

\id GEN ENG-US (p.sfm) - [GTP] Galilee Translation Project 2021[CC0] Hackett [7]
\id AAA BBB-CC (DDDD)  - [EEE] Fffffff Fffffffffff Fffffff 2021[CC0] Kkkkkkk [L]

Where the ID line is (theoretically) parsed into variables

Var Example Definitiion [spec] (data form)
id (&) (all) Project ID-complete field
id0 (&) AAA Project ID-Book ID [USFM]
id1 (&) BBB Project ID-ISO639 Language [p.sfm]
id2 (&) CC Project ID-ISO3166 Country [p.sfm]
id3 (&) DDD Project ID-Tagging Language [p.sfm]
id4 (&) EEE Project ID-Acronym (3Letter) [p.sfm]
id5 (&) FFF Project ID Title [p.sfm]
id6 (&) GGG Project ID Text Freeze Date [p.sfm]
id7 (&) HHH Project ID Rights Code [p.sfm] (creative commons)
id8 (&) KKK Project ID Rights Owner [p.sfm] (of final work)
id9 (&) LLL Project ID Status Level [p.sfm] (1-7 p.sfm publishing status, not 1-3 USFM community acceptance.)

So, specifically to USFM conformance:

If exactly "(USFM)" is found before the first dash on the ID line, then the work should conform to the USFM standard listed with the \usfm tag, or USFM 2.5 if no \usfm tag is found.

This affects linking, tables and images.

  • Links \jmp are only intra-document.
  • Tables (\tr#) will have no preceeding definition \rem or closing \b tags, and will fail somewhat gracefully as blue text paragraphs instead.
  • Images will have no print/display size information embedded into them.

@joelthe1
Copy link
Collaborator

joelthe1 commented Nov 18, 2022

Thank you @cmahte for sharing this! I could not, however, find any official documentation to support this syntax specification. Is this something Paratext (or some such software) does? If so, could you point to the documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants