Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with tagging <m> within strings #90

Open
iljackb opened this issue Oct 29, 2019 · 11 comments
Open

problems with tagging <m> within strings #90

iljackb opened this issue Oct 29, 2019 · 11 comments
Labels
decisions for open questions where a key decision is needed enhancement final output goals Goals for tasks to do to achieve best possible output of project and contribution to community help wanted

Comments

@iljackb
Copy link
Owner

iljackb commented Oct 29, 2019

In issue #88 we concluded that rather than keep the <c>'s from the transcriptions in order to make the content more searchable and usable, we would remove all <c>'s except for where on a morpho-semantically significant tone and these would be changed to <m>, thus leaving the structure as follows:

           <u who="#TS" xml:id="d1e112" n="2" start="1.48" end="2.98" xml:lang="mix">
               <seg xml:lang="mix" xml:id="d1e113" notation="orth" type="S">
                  <w xml:id="d1e114" synch="#T14">sketa</w>
                  <w xml:id="d1e116" synch="#T19">ntikii</w>
               </seg>
               <seg xml:lang="mix" xml:id="d1e118" notation="ipa" type="S" sameAs="#d1e113">
                  <w xml:id="d1e119" synch="#T14" sameAs="#d1e114">skɛ<m xml:id="d1e225">˥</m>t̪a<m xml:id="d1e120">↘</m></w>
                  <w xml:id="d1e132" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
               </seg>
            </u>

However while an improvement, this is still problematic in that if one is searching for phonological content, where there is a <m> (which also means that the tone encoded therein is particularly significant) it is not possible to search for full phonetic strings.

So there are three possible solutions I can envision:

  1. Live with it

  2. Copy the string into an attribute like @orig and search for phonetics in the attribute values (though that contradicts the usage in this project in which I'm using these to keep track of where I've normalized)

  3. Make another copy of the IPA contents and don't include the <m>'s;
    However, this raises the questions of:

    • these would have to be linked to either the orthographic or the original IPA contents
      which would be best to point to? Could we instead also have the orth <seg> point to it?

    • they would have to be typed; which is a problem given that @type is already used to classify the type of segment (thus @subtype wouldn't be consistant) and @Notation is still ="ipa"

Below is an example in which I use @function="full" on the <seg> and which also points to the orthographic <seg>:

           <u who="#TS" xml:id="d1e112" n="2" start="1.48" end="2.98" xml:lang="mix">
              <seg xml:lang="mix" xml:id="d1e113" notation="orth" type="S">
                 <w xml:id="d1e114" synch="#T14">sketa</w>
                 <w xml:id="d1e116" synch="#T19">ntikii</w>
              </seg>
              <seg xml:lang="mix" xml:id="d1e118" notation="ipa" type="S" sameAs="#d1e113">
                 <w xml:id="d1e119" synch="#T14" sameAs="#d1e114">skɛ<m xml:id="d1e225">˥</m>t̪a<m xml:id="d1e120">↘</m></w>
                 <w xml:id="d1e132" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
              </seg>
              <seg xml:lang="mix" xml:id="d1e128" notation="ipa" type="S" sameAs="#d1e113" function="full">
                 <w xml:id="d1e129" synch="#T14" sameAs="#d1e114">skɛ˥t̪a↘</w>
                 <w xml:id="d1e142" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
              </seg>            
           </u>

Using this, a search for all phonetic strings would then have to be done matching both @Notation="ipa" and @function="full"; and to get the full phonetic string (to copy into a dictionary for example) it would have to match the same as well as point to an @xml:id of a <w> which is a child of <seg notation="orth">.

What do you think @laurent?

@iljackb iljackb added decisions for open questions where a key decision is needed enhancement final output goals Goals for tasks to do to achieve best possible output of project and contribution to community help wanted labels Oct 29, 2019
@laurentromary
Copy link

Now that I think about it, hadn't we manage to implement an XSLT search that flattens strings?

@iljackb
Copy link
Owner Author

iljackb commented Oct 30, 2019

I already have done it myself! But the problem isn't how to do it it, it's how to encode and annotate it in a way that allows for easy access but also maximally accurate annotation

@iljackb
Copy link
Owner Author

iljackb commented Oct 30, 2019

actually I remember what you were talking about it was something to retrieve the content, but it was based on searching for the translations. The goal, and the basis of this issue is to try to figure out a way to be able to search the Mixtec, specifically the phonetic and/or orthographic strings.

@laurentromary
Copy link

That's what I mean, if we can manage to search in decent conditions, I would not delete fine grained markup too much...

@laurentromary
Copy link

laurentromary commented Oct 30, 2019 via email

@iljackb
Copy link
Owner Author

iljackb commented Oct 30, 2019

Sorry, I misunderstood your first comment originally, what I said I did was just to make a flat copy to convert the phonetics with the <c>'s for every character.

So the only think I do to search the strings is just basic XQuery (I generally use XQuery to search and only use XSLT to convert into another format) I search as follows: e.g. //seg[@notation='ipa']/w[contains(.,'skɛ˥t̪a↘')] (which isn't possible unless I make that flattened copy)

@laurentromary
Copy link

laurentromary commented Oct 30, 2019 via email

@iljackb
Copy link
Owner Author

iljackb commented Oct 30, 2019

I wouldn't know how to do that. I assume this is with XSLT not XQuery? I like making things XQuery friendly because in Oxygen, you can do 'search whole project' and it gathers from files in different folders but in XSLT you have to specify a single directory (unless I'm mistaken)..

@iljackb
Copy link
Owner Author

iljackb commented Oct 30, 2019

I'm thinking it may also be possible to search using "string-join" in XQuery but I'm not sure yet...

@laurentromary
Copy link

laurentromary commented Oct 30, 2019 via email

@laurentromary
Copy link

laurentromary commented Oct 30, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decisions for open questions where a key decision is needed enhancement final output goals Goals for tasks to do to achieve best possible output of project and contribution to community help wanted
Projects
None yet
Development

No branches or pull requests

2 participants