Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotating tone #92

Open
iljackb opened this issue Nov 3, 2019 · 1 comment
Open

Annotating tone #92

iljackb opened this issue Nov 3, 2019 · 1 comment
Labels
help wanted linguistic issues pertaining to linguistic description to-do

Comments

@iljackb
Copy link
Owner

iljackb commented Nov 3, 2019

The way that I annotate by default is to tag the orthography. Given that there are many items that in Mixtec don't explicitly mark certain features, the annotations are underspecific as to what is expressing the given feature, eg. in the example below the verb "sketa" is actually present tense and 1sg which don't show up in the orthography, but the entire form is just tagged for those features:

            <u who="#TS" xml:id="d1e112" n="2" start="1.48" end="2.98" xml:lang="mix">
               <seg xml:lang="mix" xml:id="d1e113" notation="orth" type="S">
                  <w xml:id="d1e114" synch="#T14">sketa</w>
                  <w xml:id="d1e116" synch="#T19">ntikii</w>
               </seg>
               ......
            </u>
            <spanGrp type="annotations">
                ....
               <span type="translation" target="#d1e114" xml:lang="en" ana="#INFL">I run</span>
               <span type="translation" target="#d1e114" xml:lang="es" ana="#INFL">corro</span>
               <span type="gram" target="#d1e114" ana="#V #INTRANS #INCOMPL #1PERS #SG"/>
               .........
            </spanGrp>

If however there is a phonetic transcription included, I tag both the orthographic forms (as above) as well as explicitly tagging the tone contours (encoded as <m> with @xml:id's), which specifically labels the linguistic feature.

            <u who="#TS" xml:id="d1e112" n="2" start="1.48" end="2.98" xml:lang="mix">
               <seg xml:lang="mix" xml:id="d1e113" notation="orth" type="S">
                  <w xml:id="d1e114" synch="#T14">sketa</w>
                  <w xml:id="d1e116" synch="#T19">ntikii</w>
               </seg>
               <seg xml:lang="mix" xml:id="d1e118" notation="ipa" type="S" sameAs="#d1e113">
                  <w xml:id="d1e119" synch="#T14" sameAs="#d1e114">skɛ<m xml:id="d1e225">˥</m>t̪a<m xml:id="d1e120">↘</m></w>
                  <w xml:id="d1e132" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
               </seg>
            </u>
            <spanGrp type="annotations">
                 ....
               <span type="translation" target="#d1e114" xml:lang="en" ana="#INFL">I run</span>
               <span type="translation" target="#d1e114" xml:lang="es" ana="#INFL">corro</span>
               <span type="gram" target="#d1e114" ana="#V #INTRANS #INCOMPL #1PERS #SG"/>
               <span type="gram" target="#d1e125" ana="#INCOMPL"/>
               <span type="gram" target="#d1e120" ana="#1PERS #SG"/>
                 ....
            </spanGrp>

However, I'm not sure what value of <span @type> to give it (currently labeling it "gram" the same as the general grammatical annotations, but I'm wondering if I should call it "tone" or something so that a retrieval script can just look for the presence of a <span @type> value rather that looking at whether the target is a <m> which is an ancestor of //seg[@notation='ipa']..

@laurent, what do you think?

@iljackb iljackb added help wanted to-do linguistic issues pertaining to linguistic description labels Nov 3, 2019
@iljackb
Copy link
Owner Author

iljackb commented Nov 4, 2019

solution is to use <span type="gram" @subtype>, this requires a schema alteration and for <span> to be added to att.typed.

I am thinking that there should be at least two possible values of @subtype, the first "tone" (for the case discussed above in this issue) and the other possibly "morph" for when pointing to a morphological unit on an inflected, or maybe derived form.

Here is an example showing both uses of @subtype. to tag:

  1. the presence of the future/potentive prefix "kun-" (which is realized phonetically as "ũː↗↘") in front of the verb, but which is only tagged in the phonetic transcription (annotated below as: <span type="gram" subtype="morph" target="#d1e157" ana="#FUT"/>):
    and
  2. The presence of the tone inflection marking 1st person singular on the verb, which isn't marked in the orthography, annotated below as <span type="gram" subtype="tone" target="#d1e172" ana="#1PERS #SG"/>:
               <seg xml:lang="mix" xml:id="d1e41" notation="orth" type="phrase">
                  <w xml:id="d1e42" synch="#T2">kunkanta</w>
               </seg>
               <seg xml:lang="mix" xml:id="d1e46" notation="ipa" type="phrase" sameAs="#d1e41">
                  <w xml:id="d1e47" synch="#T1" sameAs="#d1e42"><m xml:id="d1e157">ũː↗↘</m>k̬a˩nd̪a<m xml:id="d1e172">˩</m></w>
               </seg>
            </u>
            <spanGrp type="annotations">
               <span type="translation" target="#d1e42" xml:lang="en" ana="#INFL">I will jump</span>
               <span type="translation" target="#d1e42" xml:lang="es" ana="#INFL">saltaré</span>
               <span type="translation" target="#d1e42" xml:lang="es" ana="#INFL">voy a saltar</span>
               <span type="gram" target="#d1e42" ana="#V #INTRANS #FUT #1PERS #SG">
                  <gloss type="igt">fut- jump\1s</gloss>
               </span>
               <span type="gram" subtype="morph" target="#d1e157" ana="#FUT"/>
               <span type="gram" subtype="tone" target="#d1e172" ana="#1PERS #SG"/>
            </spanGrp>

Note that (in relation to issue #93 ), the <gloss type="igt"> will still only be placed in the <span>'s annotating the orthographic content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted linguistic issues pertaining to linguistic description to-do
Projects
None yet
Development

No branches or pull requests

1 participant