Skip to content

Latest commit

 

History

History
75 lines (38 loc) · 6.06 KB

README.md

File metadata and controls

75 lines (38 loc) · 6.06 KB

Automated ADP Reference Ingest Capability

About

The automated ADP Reference Ingest Capability is a Github action workflow that will automatically add references to an ADP container. The ADP Reference Ingest Capability has the ability to pull from multiple sources of ADP references in Github, de-duplicate those entries, and keep a master list of references.

Data Flow

Data flows through a multi-staged Github Action. The first two stages can be repeated for as many data source repositories as desired.

Stage 1 - Check out source repository

During the first stage, the source repository is "checked out" into the Github Actions's runner. This requires that the source repository is public, or that the Github Action is configured to have permissions to access the source repository. See Github's documation details here for more information about configuration of actions to use private repositories.

Stage 2 - Identify new data in source repository

This stage performs two major operations:

  1. Identifes the new files that have been added to the source data repository since the last run of the Github Action.
  2. Creates copies of the the new files in the main reference list contained in this repository in the references folder.

To perfom these operations, the ADP Automated Reference Ingest Capability leverages the last_run_shas directory. The last_run_shas directory keeps track of the last Git commit SHA from the data source repository that was processed by the github action. This SHA is then used by this stage as a starting point where as the most recent SHA is used as the ending point. The previous SHA is collected through the Github API and processed in base64 by the read-file-via-api.py python helper script.

Once the previous SHA and current SHA have been deterimined, a call to the git diff tooling is made to get the list of files that have changed during the time between the provided shas. These files are then copied to the cve-reference-ingest/references repostiory/folder by the create-file-via-api.py python helper script for processing in a later stage.

Stage 2 - Configuration

Stage 2 requires the usage of REST requests to the Github REST API. This requires useage of access tokens for successful authentication.

  • Details on how to create Access tokens can be found in Github's documentation here
  • Details on how secrets are used in Github actions can be found in Github's documentation here

Stage 2 - Error Management

Stage 2 is a cruical point for maintaining state. Stage two must be completed in full in a linear methodology. If stage two fails the primary stage of the Github Action, Stage 4, will incorrectly complete and cause an invalid state to be reached that will require manual fixing.

Therefore, if Stage 2 fails, the Github Action will fail, and will trigger an email to the team.

If a failed state is reached, a proper debug message will be written to the logs to be viewed at a later time. However, almost all failed states in this stage will be related to network based failures to the Github API. All network based failures will be automatically retried the next time the github action runs, and should allow the action to retry copying the files.

If multiple failures of this stage happen consecutively over a 24 hour period, it should be invesitaged by a member of the team.

Stage 3 - Check out Github action repository

During the third stage, the cve-reference-ingest repository is "checked out" into the Github Actions's runner. Providing the action with the cruical last_run_shas folder and references folder.

Stage 4 - CVE Services

Stage 4 is responsible for 3 major operations:

  1. Determining the new references in the cve-reference-ingest repository that need to be proccessed.
  2. Writing the references to CVEs using CVE services.
  3. Updating and committing the last_run_shas for any sources that data was pulled from.

For step 1 listed above, the Github Action checks its current Git SHA against the SHA saved in last_run_shas. A call to the git diff tooling then determines what files changed between those points.

For step 2 listed above, the Github actions passess the reference file to the adp.py python helper script. The adp.py helper script will then check to ensure the CVE the reference is for exists, ensures that there is no duplicate in the ADP container if there is one, and finally will write the new reference to the ADP container.

For step 3 listed above, after processing all the files, the Github action will update the appropriate last_run_shas files and commit the changes to the cve-reference-ingest repository.

Stage 4 - Configuration

Stage 4 requires the useage of a CVE services account and API key. Speak to your ORG's CNA to have an account created. The API key will then need to be added as a secret, as described in stage 2's configuration.

Stage 4 - Error Handling

While a file is being processed by adp.py, if any network requests to CVE services fail, the network reqeust will be automatically retried once. If a failure happens on the second attempt, the file will be copied to the retry folder. Where it will be queued to be re-attempted at a later time.

Stage 5 - Retry

This stage only triggers if there is a reference file in the retry directory. Files are added to the retry directory due to failures in Stage 4. The Github Action will attempt to write all references to the respective CVE ADP containers. If the write fails, the file will remain in the retry folder to be tried again during the next run of the Github Action. If the write succeeds, the file will be removed from the retry folder.