Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify Getty queries #84

Open
VladimirAlexiev opened this issue Aug 6, 2015 · 7 comments
Open

simplify Getty queries #84

VladimirAlexiev opened this issue Aug 6, 2015 · 7 comments
Milestone

Comments

@VladimirAlexiev
Copy link

Hi! We developed & maintain the Getty endpoint, and luc:term should only include terms (which includes pref & altLabels, minus any " (qualifier)". You can see the props that are navigated to collect FTS text here: http://vocab.getty.edu/doc/#FTS_Insert_Queries.

You write "The full text index matches on fields besides the term, so we filter to ensure the match is in the term" and do a REGEX on pref|altLabel, and then DISTINCT since there are multiple altLabels. This query is quite complex and a bit more expensive than it needs to be.

If you provide some testing examples, we'll fix the problem "matches on fields besides the term".

For AAT, you seem to want prefLabel only. I wrote in the support forum "I think that if we make an index by prefLabels only, that would resolve most problems. But is this what you need? Eg it won't find "frostbiting" aka "frostbite boating".
If you want an extra index by prefLabel only, let me know (but it'll also have more languages than EN)

@VladimirAlexiev
Copy link
Author

BTW excellent project, I'll add it to "Getty usage stories"

@VladimirAlexiev
Copy link
Author

If you need to filter by regex, it would be faster to return 1 row per concept and use GROUP_CONCAT to put all altLabel in that row. This way you'll avoid multiple regex() checks per concepts, and DISTINCT. Eg:

SELECT ?s ?name ?bio  {
  {select ?s ?name ?bio (CONCAT(?name, ' ', GROUP_CONCAT(?alt)) as ?labels) {
              ?s a skos:Concept; luc:term "#{search}\";
                 skos:inScheme <http://vocab.getty.edu/ulan/> ;
                 gvp:prefLabelGVP [skosxl:literalForm ?name] ;
                 foaf:focus/gvp:biographyPreferred [schema:description ?bio] ;
                 skos:altLabel ?alt .
         } GROUP BY ?s ?name ?bio}
      filter(regex(?labels,"#{search}\","i"))}

@jcoyne
Copy link
Member

jcoyne commented Aug 6, 2015

@VladimirAlexiev Thanks so much for the feedback. I'm not currently working on questioning_authority, but I'm hoping that another of our other consortium members will be able to incorporate your suggestions.

@mjgiarlo mjgiarlo modified the milestone: Backlog Mar 21, 2017
@elrayle
Copy link
Contributor

elrayle commented Mar 4, 2019

@geekscruff Can you take a look at this issue and comment on whether or not it still applies? I know there have been changes to the Getty processing since this issue was opened.

@VladimirAlexiev
Copy link
Author

VladimirAlexiev commented Mar 4, 2019 via email

@ghost
Copy link

ghost commented Mar 5, 2019

Thanks for the feedback @VladimirAlexiev

I think the following regex-free query returns the same results, but is much simpler. Would you mind having a look and seeing if you agree? The following example uses vinchi from the alt label.

SELECT DISTINCT ?s ?name ?bio {
  ?s a skos:Concept; 
      luc:term "leonardo AND da AND vinchi"; 
      skos:inScheme ulan: ;
      gvp:prefLabelGVP [xl:literalForm ?name];
      foaf:focus/gvp:biographyPreferred [schema:description ?bio] ;
      skos:altLabel ?alt .
} order by asc(lcase(str(?name)))

@VladimirAlexiev
Copy link
Author

I like that it doesn't have regex but AND gives too much freedom imho.
I'd use the FTS query from http://vocab.getty.edu/doc/queries/#Exact-Match_Full_Text_Search_Query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

4 participants