Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capturing vague units of time #27

Open
JenPreciado opened this issue Jun 14, 2017 · 1 comment
Open

Capturing vague units of time #27

JenPreciado opened this issue Jun 14, 2017 · 1 comment
Labels

Comments

@JenPreciado
Copy link

I recently annotated a cyology-related blog and noticed that grobid doesn't allow for vague or inexplicit units of time to be captured. Examples of these include: late July, early August, end of the month, this week, through April, recent decades etc.

I also noticed that it ignores mentions of seasons like spring, fall, summer, summertime, winter, wintertime. Cryology has it's own unique terms to denote seasons like melt season or ice growth period.

It would be very useful if grobid 1)could capture these vague time expressions, 2) if it could be linked to the document/blog/articles publishing date, and 3) if grobid allowed prototypical seasons (if not also those unique to cryology season terms) to be captured as a kind of time expression.

@lfoppiano
Copy link
Owner

Dear @JenPreciado, apologise for this very very late answer, it is indeed long time since you opened this issue.

In general, the amount of training data available for grobid-quantities, as of today, is still quite limited, and cover mostly papers from astronomy, health and physics. Said that we are quite happy for the current performances, however we are aware there is needs for more data.
We haven't really focused on recognising vague expressions because there was the problem, afterwards on how to normalise them (e.g. expressions like in recent time) . We have decided to leave the vague part of the expression out of the annotation e.g. later <date>July</date> and focus on resolving the expression itelf.

BTW The guidelines annotations (https://grobid-quantities.readthedocs.io/en/latest/guidelines.html) have been improved in the last year with more examples and special cases. In brief we annotate time/date expressions as <date when="2001-08">2001 August</date> (https://grobid-quantities.readthedocs.io/en/latest/guidelines.html#additional-items).

If you are still working on it and you want to share what you have done, we can see whether there are some complementary needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants