Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade bulk extractor #1381

Merged
merged 10 commits into from
Nov 1, 2023
Merged

Conversation

HolzmanoLagrene
Copy link
Contributor

Description of the change

Even though the Issue resolved by this pull request (#1263) was initially about the Strings-Job, it came clear, that
bulk-extractor is able to to all desired work. Therefore in accordance with the discussion on the issue, several changes were made to the bulk-extractor-wrapper.

This pull request therefore changes a couple of things regarding the bulk-extractor in turbinia. The following list gives an overview:

  • Adding new evidence types: It now supports Directory and CompressedDirectory as well. In both cases the R-Flag is used to scan the evidence recursively.
  • Adding a new TASK_CONFIG-Parameter: regex_pattern_files lets you list one or many files that include regular expressions separated by newline. This does not break the possibility to pass other parameters to bulk-extractor using bulk_extractor_args including other regex-pattern-files passed directly to the tool. These are then passed to the bulk-extractor using the F-Flag and results are listed in find.txt
  • Writing a Report-File: The existing function generate_summary_report already generates a report. This report is now written to file in the output-directory.

An example recipe e.g. would be a yaml file with the following content:

globals:
  jobs_allowlist:
   - BulkExtractorJob

plaso_base:
  task: "BulkExtractorTask"
  regex_pattern_files: [
  "/evidence/regex_pattern_file"
  ]

An example file in path /evidence/regex_pattern_file would look e.g. like this:

[a-fA-F0-9]{64}
\b(?:\d{1,3}\.){3}\d{1,3}\b

Applicable issues

Additional information

Apart from extending the functionality of the bulk-extractor this pull request includes code that creates an
enhanced report. This includes writing the report to file at the end and displaying the hits as table instead of a bulleted list.

Checklist

  • All tests were successful.
  • Documentation updated.
  • Applied Google Python Style Guide

Copy link
Collaborator

@hacktobeer hacktobeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, small nits. As @aarontp is the official lead for Turbinia I will let him hit the final approval button.

turbinia/workers/bulk_extractor.py Outdated Show resolved Hide resolved
turbinia/workers/bulk_extractor.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@hacktobeer hacktobeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @aarontp PTAL

Copy link
Member

@aarontp aarontp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple small things. Thanks for the contribution!

turbinia/workers/bulk_extractor.py Show resolved Hide resolved
turbinia/jobs/bulk_extractor.py Show resolved Hide resolved
@aarontp
Copy link
Member

aarontp commented Nov 1, 2023

LGTM, thanks!

@aarontp aarontp merged commit d02a35a into google:master Nov 1, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants