Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XML Harvard to html citation generation #4499

Merged
merged 7 commits into from
Dec 12, 2024

Conversation

flooie
Copy link
Contributor

@flooie flooie commented Sep 26, 2024

Add XML Harvard to html_with_citations generation

Also
Some XML has xml encoding which is an issue for lxml used in eye cite.
This is stripped out in some cases and a test is added

Add XML harvard to to
get_and_clean_opinion_text
so that we can find and add
annotations in the harvard corpus
Some harvard xml has encoding info which
throws lxml for a loop.  Remove it and
add a test to ensure proper parsing.
Copy link

sentry-io bot commented Sep 26, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: cl/citations/annotate_citations.py

Function Unhandled Issue
get_and_clean_opinion_text AttributeError: 'bytes' object has no attribute 'encode' cl.citations.tasks.find_citations_and_parentheticals_for_opinio...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@flooie flooie requested review from albertisfu and mlissner and removed request for albertisfu September 26, 2024 17:35
@mlissner mlissner removed their request for review November 19, 2024 16:17
@flooie flooie assigned grossir and unassigned flooie Dec 12, 2024
@grossir grossir self-requested a review December 12, 2024 17:02
Copy link
Contributor

@grossir grossir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I left a couple minor comments @flooie

cl/citations/tests.py Outdated Show resolved Hide resolved
cl/citations/annotate_citations.py Show resolved Hide resolved
Add comment on order
and tweak test file to reflect xml ingestiong
@flooie flooie merged commit 0771925 into main Dec 12, 2024
15 checks passed
@flooie flooie deleted the add-harvard-to-citation-parsing branch December 12, 2024 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants