problems with tagging <m> within strings #90

iljackb · 2019-10-29T17:05:38Z

In issue #88 we concluded that rather than keep the <c>'s from the transcriptions in order to make the content more searchable and usable, we would remove all <c>'s except for where on a morpho-semantically significant tone and these would be changed to <m>, thus leaving the structure as follows:

           <u who="#TS" xml:id="d1e112" n="2" start="1.48" end="2.98" xml:lang="mix">
               <seg xml:lang="mix" xml:id="d1e113" notation="orth" type="S">
                  <w xml:id="d1e114" synch="#T14">sketa</w>
                  <w xml:id="d1e116" synch="#T19">ntikii</w>
               </seg>
               <seg xml:lang="mix" xml:id="d1e118" notation="ipa" type="S" sameAs="#d1e113">
                  <w xml:id="d1e119" synch="#T14" sameAs="#d1e114">skɛ<m xml:id="d1e225">˥</m>t̪a<m xml:id="d1e120">↘</m></w>
                  <w xml:id="d1e132" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
               </seg>
            </u>

However while an improvement, this is still problematic in that if one is searching for phonological content, where there is a <m> (which also means that the tone encoded therein is particularly significant) it is not possible to search for full phonetic strings.

So there are three possible solutions I can envision:

Live with it
Copy the string into an attribute like @orig and search for phonetics in the attribute values (though that contradicts the usage in this project in which I'm using these to keep track of where I've normalized)
Make another copy of the IPA contents and don't include the <m>'s;
However, this raises the questions of:
- these would have to be linked to either the orthographic or the original IPA contents
  which would be best to point to? Could we instead also have the orth <seg> point to it?
- they would have to be typed; which is a problem given that @type is already used to classify the type of segment (thus @subtype wouldn't be consistant) and @Notation is still ="ipa"

Below is an example in which I use @function="full" on the <seg> and which also points to the orthographic <seg>:

           <u who="#TS" xml:id="d1e112" n="2" start="1.48" end="2.98" xml:lang="mix">
              <seg xml:lang="mix" xml:id="d1e113" notation="orth" type="S">
                 <w xml:id="d1e114" synch="#T14">sketa</w>
                 <w xml:id="d1e116" synch="#T19">ntikii</w>
              </seg>
              <seg xml:lang="mix" xml:id="d1e118" notation="ipa" type="S" sameAs="#d1e113">
                 <w xml:id="d1e119" synch="#T14" sameAs="#d1e114">skɛ<m xml:id="d1e225">˥</m>t̪a<m xml:id="d1e120">↘</m></w>
                 <w xml:id="d1e132" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
              </seg>
              <seg xml:lang="mix" xml:id="d1e128" notation="ipa" type="S" sameAs="#d1e113" function="full">
                 <w xml:id="d1e129" synch="#T14" sameAs="#d1e114">skɛ˥t̪a↘</w>
                 <w xml:id="d1e142" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
              </seg>            
           </u>

Using this, a search for all phonetic strings would then have to be done matching both @Notation="ipa" and @function="full"; and to get the full phonetic string (to copy into a dictionary for example) it would have to match the same as well as point to an @xml:id of a <w> which is a child of <seg notation="orth">.

What do you think @laurent?

The text was updated successfully, but these errors were encountered:

laurentromary · 2019-10-30T09:31:29Z

Now that I think about it, hadn't we manage to implement an XSLT search that flattens strings?

iljackb · 2019-10-30T09:54:23Z

I already have done it myself! But the problem isn't how to do it it, it's how to encode and annotate it in a way that allows for easy access but also maximally accurate annotation

iljackb · 2019-10-30T10:44:16Z

actually I remember what you were talking about it was something to retrieve the content, but it was based on searching for the translations. The goal, and the basis of this issue is to try to figure out a way to be able to search the Mixtec, specifically the phonetic and/or orthographic strings.

laurentromary · 2019-10-30T10:56:13Z

That's what I mean, if we can manage to search in decent conditions, I would not delete fine grained markup too much...

laurentromary · 2019-10-30T11:05:18Z

That should be feasible to adapt the search function to flatten the content. I can see several techniques. Can you show me how you do it currently?

…

Le 30 oct. 2019 à 11:44, Jack Bowers ***@***.***> a écrit : actually I remember what you were talking about it was something to retrieve the content, but it was based on searching for the translations, the goal is to be able to search the Mixtec, specifically the phonetic and/or orthographic strings. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=ABH5B32XRZSZMZENQAEHHETQRFQQDA5CNFSM4JGMK4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECTWCKA#issuecomment-547840296>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH5B3ZAR2NJMBGLG22FJK3QRFQQDANCNFSM4JGMK4GA>.

iljackb · 2019-10-30T11:12:24Z

Sorry, I misunderstood your first comment originally, what I said I did was just to make a flat copy to convert the phonetics with the <c>'s for every character.

So the only think I do to search the strings is just basic XQuery (I generally use XQuery to search and only use XSLT to convert into another format) I search as follows: e.g. //seg[@notation='ipa']/w[contains(.,'skɛ˥t̪a↘')] (which isn't possible unless I make that flattened copy)

laurentromary · 2019-10-30T11:21:41Z

So there is a possibility by replacing the “.” by a function that flattens the content of <w>. This is where I see a technical solution. Do you know how to write a function? This would call <xsl:value-of select=“xxx” separator=“”/> (the empty string is significant since by default, it is a white space.

…

Le 30 oct. 2019 à 12:12, Jack Bowers ***@***.***> a écrit : Sorry, I misunderstood your first comment originally, what I said I did was just to make a flat copy to convert the phonetics with the <c>'s for every character. So the only think I do to search the strings is just basic XQuery (I generally use XQuery to search and only use XSLT to convert into another format) I search as follows: e.g. ***@***.***='ipa']/w[contains(.,'skɛ˥t̪a↘')] (which isn't possible unless I make that flattened copy) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=ABH5B3YKVQGUHWHVBJTLFFDQRFTZRA5CNFSM4JGMK4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECTYVTA#issuecomment-547850956>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH5B347RXQOBL62J4BH3ADQRFTZRANCNFSM4JGMK4GA>.

iljackb · 2019-10-30T11:38:01Z

I wouldn't know how to do that. I assume this is with XSLT not XQuery? I like making things XQuery friendly because in Oxygen, you can do 'search whole project' and it gathers from files in different folders but in XSLT you have to specify a single directory (unless I'm mistaken)..

iljackb · 2019-10-30T13:37:18Z

I'm thinking it may also be possible to search using "string-join" in XQuery but I'm not sure yet...

laurentromary · 2019-10-30T13:53:15Z

That would be XPath, which is both XQuery and XSLT friendly.

…

Le 30 oct. 2019 à 12:38, Jack Bowers ***@***.***> a écrit : I wouldn't know how to do that. I assume this is with XSLT not XQuery? I like making things XQuery friendly because in Oxygen, you can do 'search whole project' and it gathers from files in different folders but in XSLT you have to specify a single directory (unless I'm mistaken).. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=ABH5B374PR4XSVKRCQAUXKDQRFWZTA5CNFSM4JGMK4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECT24YQ#issuecomment-547860066>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH5B3YDOQRYWEAYCS6VQXLQRFWZTANCNFSM4JGMK4GA>.

laurentromary · 2019-10-30T13:53:45Z

I am not mastering XQuery, but I could check easily.

…

Le 30 oct. 2019 à 14:37, Jack Bowers ***@***.***> a écrit : I'm thinking it may also be possible to search using "string-join" in XQuery but I'm not sure yet... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=ABH5B32I3JAOZBEHOSTQ76TQRGEY7A5CNFSM4JGMK4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECUGIMQ#issuecomment-547906610>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH5B3452QQQGRDQYXJTIITQRGEY7ANCNFSM4JGMK4GA>.

iljackb added decisions for open questions where a key decision is needed enhancement final output goals Goals for tasks to do to achieve best possible output of project and contribution to community help wanted labels Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problems with tagging <m> within strings #90

problems with tagging <m> within strings #90

iljackb commented Oct 29, 2019

laurentromary commented Oct 30, 2019

iljackb commented Oct 30, 2019

iljackb commented Oct 30, 2019 •

edited

Loading

laurentromary commented Oct 30, 2019

laurentromary commented Oct 30, 2019 via email

iljackb commented Oct 30, 2019

laurentromary commented Oct 30, 2019 via email

iljackb commented Oct 30, 2019

iljackb commented Oct 30, 2019

laurentromary commented Oct 30, 2019 via email

laurentromary commented Oct 30, 2019 via email

problems with tagging <m> within strings #90

problems with tagging <m> within strings #90

Comments

iljackb commented Oct 29, 2019

laurentromary commented Oct 30, 2019

iljackb commented Oct 30, 2019

iljackb commented Oct 30, 2019 • edited Loading

laurentromary commented Oct 30, 2019

laurentromary commented Oct 30, 2019 via email

iljackb commented Oct 30, 2019

laurentromary commented Oct 30, 2019 via email

iljackb commented Oct 30, 2019

iljackb commented Oct 30, 2019

laurentromary commented Oct 30, 2019 via email

laurentromary commented Oct 30, 2019 via email

iljackb commented Oct 30, 2019 •

edited

Loading