-
-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle combined opinions in columbia merger #3799
base: main
Are you sure you want to change the base?
Conversation
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: cl/corpus_importer/management/commands/columbia_merge.py
Did you find this useful? React with a 👍 or 👎 |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…lumbia-merger # Conflicts: # cl/corpus_importer/management/commands/columbia_merge.py
for more information, see https://pre-commit.ci
# Conflicts: # cl/corpus_importer/management/commands/columbia_merge.py
update comments
# Conflicts: # cl/corpus_importer/utils.py
For this PR the change is minimal and avoids conflicts when trying to match combined opinions from cl when the xml has multiple opinions. The only thing that could be improved is the FILED_TAGS list which contains the valid text strings to identify a date as date_filed |
This small change excludes combined opinions when we have multiple opinions in the file and a combined opinion in cl, this is useful we try to map and merge the opinions in the columbia merger.
We need to exclude those opinions because we have splitted opinions in the columbia dataset and some of the matched clusters have combined opinions.
I also added a new exception type to catch the issues with strings in volume number, issue mentioned in #3824