Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle combined opinions in columbia merger #3799

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

quevon24
Copy link
Member

@quevon24 quevon24 commented Feb 16, 2024

This small change excludes combined opinions when we have multiple opinions in the file and a combined opinion in cl, this is useful we try to map and merge the opinions in the columbia merger.

We need to exclude those opinions because we have splitted opinions in the columbia dataset and some of the matched clusters have combined opinions.

I also added a new exception type to catch the issues with strings in volume number, issue mentioned in #3824

Copy link

sentry-io bot commented Feb 16, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: cl/corpus_importer/management/commands/columbia_merge.py

Function Unhandled Issue
process_cluster FileNotFoundError: [Errno 2] No such file or directory: '/storage/columbia/opinions/alabama/court_opinions/documents... ...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@quevon24 quevon24 changed the title Exclude combined opinions from columbia merger Handle combined opinions in columbia merger Feb 26, 2024
@quevon24
Copy link
Member Author

For this PR the change is minimal and avoids conflicts when trying to match combined opinions from cl when the xml has multiple opinions. The only thing that could be improved is the FILED_TAGS list which contains the valid text strings to identify a date as date_filed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: General Backlog
Development

Successfully merging this pull request may close these issues.

2 participants