Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add address1 Regex for remaining countries #282

Merged
merged 1 commit into from
Sep 19, 2024

Conversation

gabypancu
Copy link
Contributor

@gabypancu gabypancu commented Sep 13, 2024

What are you trying to accomplish?

Follow up to #281

Adds address1 regex for remaining countries with additional address fields in address1: Belgium, Chile, Mexico, Spain, Brazil, Israel

What approach did you choose and why?

The regexes are as conservative as possible to avoid false matches.

Across the board, we don't match:

  • street names that have numbers in them
  • street names that have commas in them
  • address lines with unit/extra information
Country Match % Examples Limitations
Belgium 78% https://rubular.com/r/7dkUcKmH6PDXzv, https://rubular.com/r/9KwlnO2a2sVp5d
Netherlands 75% https://rubular.com/r/DGaeGiC8Z9pMA1
Israel 62% https://rubular.com/r/Jz8qmaEqluehmU, https://rubular.com/r/MFbVRBxF9kjsy7 We should confirm that the RTL parsing behaves as expected with addresses coming from checkout
Chile 59% https://rubular.com/r/XTHbB4OhyZwQAt Many addresses contain unit/extra info
Spain 44% https://rubular.com/r/ecfetJ9cyTiCEZ Many addresses contain unit/extra info
Mexico 40% https://rubular.com/r/uRXKYK522CUuGU Many addresses contain unit/extra info
Brazil 32% https://rubular.com/r/wtB2VwRpbMQUpo Neighbourhood often in address1

What should reviewers focus on?

Can the regex be simplified? Have I overlooked any false/incorrect matches that could occur?

Checklist

  • I have added a CHANGELOG entry for this change (or determined that it isn't needed)

@gabypancu gabypancu changed the base branch from main to parse-additional-fields September 13, 2024 15:05
@@ -31,6 +31,9 @@ format:
show: "{firstName} {lastName}_{company}_{address1}_{address2}_{zip} {city}_{country}_{phone}"
format_extended:
edit: "{country}_{firstName}{lastName}_{company}_{streetName}{streetNumber}_{address2}_{zip}{city}_{phone}"
address1_regex:
- "^(?<streetName>[^\\d,]+),? (?<streetNumber>\\d+(?: ?[A-Za-z])?)$"
- "^(?<streetNumber>\\d+(?: ?[A-Za-z])?),? (?<streetName>[^\\d,]+)$"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second pattern matches french addresses, which typically have the street number first

@@ -16,6 +16,8 @@ format:
show: "{firstName} {lastName}_{company}_{address1}_{address2}_{zip} {city}_{province}_{country}_{phone}"
format_extended:
edit: "{country}_{firstName}{lastName}_{company}_{streetName}{streetNumber}_{line2}{neighborhood}_{zip}{city}_{province}_{phone}"
address1_regex:
- "^(?<streetName>[^\\d,]+(?<!\\s[nN])(?<!\\s[nN]\\.)(?<!\\s[Nn]º)(?<!\\s[nN]úmero)(?<!\\s[Nn]o\\.)(?<!\\s[Nn]o)(?<!\\s#)),? (?<streetNumber>(?:n|N|n\\.|N\\.|Nº|nº|número|Número|no\\.|No\\.|no|No|#)? ?\\d+(?: ?[A-Za-z])?)$"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't allow any street-number-designators in the street name

@gabypancu gabypancu changed the title Regex parsing more countries Add address1 Regex for remaining countries Sep 18, 2024
@gabypancu gabypancu marked this pull request as ready for review September 18, 2024 17:19
Copy link
Contributor

@rochlefebvre rochlefebvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I suggest you update the shared rubular links, using the newest regex patterns.

@gabypancu gabypancu merged commit 0714399 into parse-additional-fields Sep 19, 2024
10 checks passed
@gabypancu gabypancu deleted the regex-parsing-more-countries branch September 19, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants