Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add functionality for processing based on presence of margin attribute #2

Open
tcatapano opened this issue Oct 25, 2019 · 3 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@tcatapano
Copy link
Member

sample query: find all occurrences of "gold" when it occurs in the left-middle margin

@tcatapano tcatapano added the enhancement New feature or request label Oct 25, 2019
@matthewkumar
Copy link
Collaborator

in the most recent push the master, the method search_margins() was added to the BnF class. Using this example,

manuscript=BnF()
manuscript.search_margins('tl', 'gold', 'left-middle')

yields the following:

['004v_1', '004v_3', '016r_1', '024v_1', '036v_3', '039r_2', '057v_1', '072r_2', '081v_4', '085v_4', '094r_1', '098r_2', '098r_3', '104r_2', '110v_2', '112v_1', '118v_1', '126v_3', '139r_1', '141r_1', '144r_1', '145v_1', '151v_1', '152r_1', '155v_1', '156v_1', '169r_1']

@matthewkumar
Copy link
Collaborator

This function was updated to convert all text to lowercase, notice that there are a couple new entries in the updated output (after removing duplicates):

['', '010r_1', '029v_6', '032v_3', '100v_2', '106r_2', '111v_4', '116v_2', '120r_2', '120v_6', '121v_1', '121v_2', '123r_1', '124v_2', '126v_1', '128r_1', '128v_1', '131v_1', '134v_2', '135r_1', '135v_1', '139r_1', '145v_2', '152v_1', '154r_2', '154r_3', '156v_1']

I noticed that some terms, like 'goldsmith' will register positively since 'gold' is a substring. There are three possible ways to deal with it.

  1. do nothing, and this case will be manually ignored by scholars upon close analysis
  2. use Regex to specify that a space or punctuation must surround the term in question
  3. specify property type in the function call.

I recommend the first, but all are possible and easily implemented.

@njr2128 njr2128 changed the title add support for <ab>'s add functionality for processing based on presence of margin attribute Oct 29, 2020
@njr2128 njr2128 assigned gschare and unassigned matthewkumar Oct 29, 2020
@gschare
Copy link
Contributor

gschare commented Oct 29, 2020

I can add a new function to entry.py to do this with lxml.etree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants