-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API assistance? #878
Comments
We'll have a look at it this week. Thanks for bringing this up, it's a good opportunity to improve our documentation. |
Issue is on us, not you. This issue is documented at #554, I'll see if we can prioritize it. On another note, one improvement you can make is to use the Something along those lines. I leave the hash_report usage to you. --- extractor.py 2024-06-17 09:10:51.035069581 +0200
+++ extractor.py 2024-06-17 08:57:03.218147812 +0200
@@ -8,6 +8,7 @@
from pathlib import Path
from typing import Dict, List
from unblob.processing import ExtractionConfig, process_file
+from unblob.report import HashReport
from unblob.logging import configure_logger
from unblob import report
@@ -40,7 +41,7 @@
for task_result in unblob_results.results:
task_file = task_result.task.path
task_id = task_result.task.blob_id
-
+ hash_report = [report for report in task_result.reports if isinstance(report, HashReport)]
for subtask in task_result.subtasks:
if subtask.blob_id not in known_tasks:
# XXX: We'll see the same subtask.blob_id for each time we extract more data from a blob. E.g., we could have
You can think of
Not at the moment. We have some auto-generated documentation at https://unblob.org/api/ but it's clearly insufficient. |
Thanks so much! Really appreciate the guidance. |
I'm trying to use the unblob API for what I think should be a fairly straightforward task but I'm having some difficulties and hoping to get some help. I haven't found many examples of API usage so I'm hoping this issue might also help other users get started with the API from the code I have and learn from my mistakes.
My goal here is to use unblob to do a recursive extraction of a blob but to produce a clean copy of each extraction without any sub-extractions (i.e., have no
_extract
files within any of my output directories). Instead I want each of the extractions to be stored one directory deep within an output directory (e.g., output/extraction1, output/extraction2). I've previously implemented something like this by just running unblob then parsing the generated outputs looking for files named*_extract
but I think it should be much cleaner to do this with the API.I've written the code below which successfully logs a lot of information about the extraction process and almost gets me what I want, but I find that extraction files sometimes still end up in my output so I suspect I'm doing something wrong or missing something obvious here.
I have 3 specific questions, but any advice or guidance would be much appreciated! Thanks
blob_id
values supposed to be unique per blob? It seems like the same blob_id will show up with distinct paths for example if a blob is carved into 2 files, both the base blob and the 2 generated files will have the same blob_id. Am I just misunderstanding this interface?Example usage after installing unblob dependencies, unblob itself, and saving the below script as extracator.py
In the generated output directory I see one of the extracted directories contains two
_extract
directoriesIt's almost right, but the
part0_extract
andpart1_extract
directories withinoutput/extracted/31c56af333e9f4652626f6e0e10418e27dd1af33.unblob
shouldn't be there!The text was updated successfully, but these errors were encountered: