This repository contains (mostly) static data that is directly used by the Bichard7 service.
The data contained within the output-data
folder is automatically published as both an npm package and a Maven artefact by the Release GitHub Actions workflow in this repository. These packages are then referenced as dependencies from other repositories within the Bichard code base (such as bichard7-next-core and bichard7-next).
All merges and commits to the main
branch will cause the release
GitHub Action to run, which will:
- Increment the patch version number in
output-data/package.json
(usingnpm version patch
) - Replace the
pom.xml
file withpom.template.xml
, substituting the$PACKAGE_JSON_VERSION
variable in the template with the new version number (usingenvsubst
) - Commit the changes to
output-data/package.json
,output-data/package-lock.json
andpom.xml
(which will be version number bumps) as the @bichard7 user - Create a git tag labelled with the version number that points at the commit that was just made
- Push the commit directly to the
main
branch. The @bichard7 user has been added as an exception for the branch protection/PR requirements in the settings for this repo, which means it can push commits directly tomain
. - Build and publish the npm and maven packages
- Checkout a copy of the bichard7-next-core repo, update the npm standing data dependency to the exact version number from step 1 (using
npm install bichard7-next-data@<version>
), and create a PR with the results - Checkout a copy of the bichard7-next repo, update the gradle standing data dependency to the exact version number from step 1 (using
sed
on thebichard-backend/build.gradle
file), and create a PR with the results
This means that every time new commits are added to main
, new Maven and npm packages will automatically be published.
If breaking changes are introduced to main
, it's advisable to manually bump the major version release number as part of those changes. This follows Semantic Versioning principles.
With the exception of the offence code data and organisation unit data, the data in this repository is static and will not be updated automatically. Manual changes can be made to this data directly in the output-data
folder, and when the PR containing these changes is merged to main
, the release
GitHub Actions workflow described above will publish the changes.
The offence code data and organisation unit data are the output of a build process that combines data from multiple sources to produce the final version that is packaged and used by Bichard. These data files in the output-data
folder should not be edited directly; these files are just the output of the 'build' process.
The output offence code data (output-data/data/offence-code.json
) is generated by combining a number of different data sources into one set of offence codes. Input data is:
- b7-overrides. Any offence code that is referenced in this file must have its offence category set to "B7" so that it is ignored by Bichard. NB: This exists for compatibility with the legacy dataset and should be removed in future.
- cjs-offences. Offence code data exported from the data standards team and published here
- pnc-ccjs-cjs-offences. Data exported from the PNC
- pnld-offences Data exported from the PNLD
To rebuild the offence code data:
# Download the latest external offence code data sources to input-data/
$ npm run download-offence-code-data
# Combine all sources into the final output in output-data/
$ npm run merge-offence-data
We use puppeteer to interact with the PNLD in the browser. From time to time we experience differences in the browser through updates to html or broken links so its helpful debugging. To access it in the browser go to PNLD and look in the table bellow to find out where to get the credentials
The function PnldFileDownloader
is called by the script download-offence-code-data.ts
and needs four environment variables in order to access the PNLD service.
environment variable | description |
---|---|
PNLD_USERNAME | User name can be found in 1password in the shared vault |
PNLD_PASSWORD | Password can be found in 1password in the shared vault |
PNLD_LOGIN_URL | https://www.pnld.co.uk/standard-offence-wording-extracts/ |
PNLD_DOWNLOAD_URL | This depends on the zip file we want to download: Full extract: https://www.pnld.co.uk/standard-offence-wording-extracts/full-extract Monthly Update (Current Month): https://www.pnld.co.uk/standard-offence-wording-extracts/monthly-delta-extract Monthly Updates (Last Month): https://www.pnld.co.uk/standard-offence-wording-extracts/monthly-delta-extract-1-month-prior), [Monthly Updates (2 Months Ago): https://www.pnld.co.uk/standard-offence-wording-extracts/monthly-delta-extract-2-months-prior |
Here is an example of what the command would look like to download the full-extract:
PNLD_DOWNLOAD_URL="https://www.pnld.co.uk/standard-offence-wording-extracts/monthly-delta-extract-1-month-prior" PNLD_LOGIN_URL="https://www.pnld.co.uk/standard-offence-wording-extracts/" PNLD_PASSWORD=<PASSWORD> PNLD_USERNAME=<USERNAME> npx ts-node src/download-offence-code-data.ts
The organisation unit data (output-data/data/organisation-unit.json
) is generated by combining Police Service and Court data:
- Court Organisation Units updated daily by the
update standing data
GitHub Actions workflow. The source spreadsheet is downloaded from the criminal justice system data standards page. The spreadsheet does not includethirdLevelPsaCode
, therefore this data is backfilled from the existing OU records or manually. - Police Organisation Units generated from the the PNC spreadsheet.
To rebuild the organisation unit data:
# Download the latest external organisation unit data sources to input-data/
# and combine all sources into the final output in output-data/
$ npm run download-organisation-unit-data
In order to make differences between versions of data easy to read, the data should be sorted before committing. This can be done by running ./data-formatter/format.sh
from the root of the repository. This will sort the arrays of data by alphabetical attribute name and then output them with their attributes sorted.
Follow these steps to import an updated data export from the PNC.
- Ask Ben to request a new export (note: at the moment this needs sending from Ben's CJSM as that's the only approved address)
- Receive the files
- Unzip the files and place the xlsx docs in the root of this project (note: the files are normally named the same way each time, but they should end in
.CJS.xlsx
andFSCODE.xlsx
for them to be found automatically. You can ignore the file ending inACPO.xlsx
) - Run
npm run import-pnc-data
to convert these xlsx files to JSON - Run
npm run merge-offence-data
to regenerate the standing data based on these new input files - Make a PR
- Delete the xlsx files