The data cleaning service.
Checks that addresses have minimum required parts and optionally normalizes them.
Checks for duplicate features.
Checks for empty geometries.
Checks to make sure that the metadata meets the Basic SGID Metadata Requirements.
Checks to make sure that existing tags are cased appropriately. This mean that the are title-cased other than known abbreviations (e.g. UGRC, BLM) and articles (e.g. a, the, of).
This check also verifies that the data set contains a tag that matches the database name (e.g. SGID
) and the schema (e.g. Cadastre
).
--try-fix
adds missing required tags and title-cases any existing tags.
Checks to make sure that the summary is less than 2048 characters (a limitation of AGOL) and that it is shorter than the description.
Checks to make sure that the description contains a link to a data page on gis.utah.gov.
Checks to make sure that the text in this section matches the official text for UGRC.
--try-fix
updates the text to match the official text.
This project contains a module that can be used as a standalone address parser, sweeper.address_parser
. This allows developer to take advantage of sweepers advanced address parsing and normalization without having to run the entire sweeper process.
from sweeper.address_parser import Address
address = Address('123 South Main Street')
print(address)
'''
--> Parsed Address:
{'address_number': '123',
'normalized': '123 S MAIN ST',
'prefix_direction': 'S',
'street_name': 'MAIN',
'street_type': 'ST'}
'''
All properties default to None if there is no parsed value.
address_number
address_number_suffix
prefix_direction
street_name
street_direction
street_type
unit_type
unit_id
If no unit_type
is found, this property is prefixed with #
(e.g. # 3
). If unit_type
is found, #
is stripped from this property.
city
zip_code
po_box
The PO Box if a po-box-type address was entered (e.g. po_box
would be 1
for p.o. box 1
).
normalized
A normalized string representing the entire address that was passed into the constructor. PO Boxes are normalized in this format PO BOX <number>
.
- clone arcgis conda environment
conda create --name sweeper --clone arcgispro-py3
- activate environment
activate sweeper
- install sweeper
pip install ugrc-sweeper
- Optionally duplicate
config.sample.json
asconfig.json
in the folder where you will run sweeper.
Caution
This is required for the following functions:
--scheduled
argument (required for sending emails)--change-detect
argument- using user-specific connection files via the
CONNECTIONS_FOLDER
config value
Tables can be skipped by adding values to the EXCLUSIONS.<sweeper_key>
config array. These values are matched against table names using fnmatch. Note that these do not apply when using the --table-name
argument.
- clone arcgis conda environment
conda create --name sweeper --clone arcgispro-py3
- activate environment
activate sweeper
- install required dependencies to work on sweeper
pip install -e ".[tests]"
test_metadata.py
uses a SQL database that needs to be restored viasrc/sweeper/tests/data/Sweeper.bak
to your local SQL Server.- run sweeper:
sweeper
- test:
pytest
- lint:
ruff check .
- format:
ruff format .