This document will help you understand how to access the information associated with an OM::XML::Document object. We will explain some of the methods provided by the OM::XML::Document module and its related modules OM::XML::TermXPathGenerator & OM::XML::TermValueOperators
Note: In your code, don’t worry about including OM::XML::TermXPathGenerator and OM::XML::TermValueOperators into your classes. OM::XML::Document handles that for you.
These examples use the Document class defined in OM::Samples::ModsArticle
Download hydrangea_article1.xml (sample xml) into your working directory, then run this in irb:
require "om/samples" sample_xml = File.new("hydrangea_article1.xml") doc = OM::Samples::ModsArticle.from_xml(sample_xml)
Querying the OM::XML::Document
The OM::XML::Terminology" declared by OM::Samples::ModsArticle maps the defined Terminology structure to xpath queries. It will also run the queries for you in most cases.
xpath_for method of OM::XML::Terminology" retrieves xpath expressions for OM terms
The xpath_for method retrieves the xpath used by the OM::XML::Terminology"
Examples of xpaths for :name and two variants of :name that were created using the :ref argument in the Terminology builder:
OM::Samples::ModsArticle.terminology.xpath_for(:name) => "//oxns:name" OM::Samples::ModsArticle.terminology.xpath_for(:person) => "//oxns:name[@type=\"personal\"]" OM::Samples::ModsArticle.terminology.xpath_for(:organization) => "//oxns:name[@type=\"corporate\"]"
To retrieve the values of xml nodes, use the term_values method:
doc.term_values(:person, :first_name) doc.term_values(:person, :last_name)
The term_values method is defined in the OM::XML::TermValueOperators module, which is included in OM::XML::Document
Not that if a term’s xpath mapping points to XML nodes that contain other nodes, the response to term_values will be Nokogiri::XML::Node objects instead of text values:
doc.term_values(:name)
More examples of using term_values and find_by_terms (defined in OM::XML::Document):
doc.find_by_terms(:organization).to_xml doc.term_values(:organization, :role) => ["\n Funder\n "] doc.term_values(:organization, :namePart) => ["NSF"]
To retrieve the values of nested terms, create a sequence of terms, from outermost to innermost:
OM::Samples::ModsArticle.terminology.xpath_for(:journal, :issue, :pages, :start) => "//oxns:relatedItem[@type=\"host\"]/oxns:part/oxns:extent[@unit=\"pages\"]/oxns:start" doc.term_values(:journal, :issue, :pages, :start) => ["195"]
If you get one of the term names wrong in the sequence, OM will tell you which one is causing problems. See what happens when you put :page instead of :pages in your argument to term_values.
doc.term_values(:journal, :issue, :page, :start) OM::XML::Terminology::BadPointerError: You attempted to retrieve a Term using this pointer: [:journal, :issue, :page] but no Term exists at that location. Everything is fine until ":page", which doesn't exist.
(Another way to put this: the xpath statement for a term can be ambiguous.)
In our MODS document, we have two distinct uses of the title XML element:
- the title of the published article
- the title of the journal it was published in.
How can we distinguish between these two uses?
doc.term_values(:title_info, :main_title) => ["ARTICLE TITLE", "VARYING FORM OF TITLE", "TITLE OF HOST JOURNAL"] doc.term_values(:mods, :title_info, :main_title) => ["ARTICLE TITLE", "VARYING FORM OF TITLE"] OM::Samples::ModsArticle.terminology.xpath_for(:title_info, :main_title) => "//oxns:titleInfo/oxns:title"
The solution: include the root node in your term pointer.
OM::Samples::ModsArticle.terminology.xpath_for(:mods, :title_info, :main_title) => "//oxns:mods/oxns:titleInfo/oxns:title" doc.term_values(:mods, :title_info, :main_title) => ["ARTICLE TITLE", "VARYING FORM OF TITLE"]
We can still access the Journal title by its own pointers:
doc.term_values(:journal, :title_info, :main_title) => ["TITLE OF HOST JOURNAL"]
If you use a nested term often, you may want to avoid typing the whole sequence of term names by defining a proxy term.
As you can see in OM::Samples::ModsArticle, we have defined a few proxy terms for convenience.
t.publication_url(:proxy=>[:location,:url]) t.peer_reviewed(:proxy=>[:journal,:origin_info,:issuance], :index_as=>[:facetable]) t.title(:proxy=>[:mods,:title_info, :main_title]) t.journal_title(:proxy=>[:journal, :title_info, :main_title])
You can use proxy terms just like any other term when querying the document.
OM::Samples::ModsArticle.terminology.xpath_for(:peer_reviewed) => "//oxns:relatedItem[@type=\"host\"]/oxns:originInfo/oxns:issuance" OM::Samples::ModsArticle.terminology.xpath_for(:title) => "//oxns:mods/oxns:titleInfo/oxns:title" OM::Samples::ModsArticle.terminology.xpath_for(:journal_title) => "//oxns:relatedItem[@type=\"host\"]/oxns:titleInfo/oxns:title"