You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 15, 2023. It is now read-only.
I use pdfx -v path_to_pdf_file to gather URLs from a PDF. This is great on its own.
I would love to see pdfx expand to allow for URL extraction across a directory tree - the ability to extract URLs recursively across a directory, skipping files that are not PDFs as it goes along.
Right now I use find /path/to/folder/ -type f -name '*.pdf' -exec pdfx -v {} \; > foo.txt
This works well and someone else more skilled than I helped me with the above command but I wonder if a recursive type of feature could be integrated directly into pdfx or maybe it's redundant as unix itself has features to accomplish the same, as noted by the command above.
**
I really like this tool and am using it for a personal project of mine that I will share freely once it becomes voluminous enough. Basically it's a filetype miner/download that pulls specific filetypes from the waybackmachine - a digital archeological tool of sorts. I use old books and magazine from archive.org as sources for URLs. The URLs are used to query the waybackmachine downloader to download file types.
Thanks for this really easy to use and powerful tool!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi
I use
pdfx -v path_to_pdf_file
to gather URLs from a PDF. This is great on its own.I would love to see pdfx expand to allow for URL extraction across a directory tree - the ability to extract URLs recursively across a directory, skipping files that are not PDFs as it goes along.
Right now I use
find /path/to/folder/ -type f -name '*.pdf' -exec pdfx -v {} \; > foo.txt
This works well and someone else more skilled than I helped me with the above command but I wonder if a recursive type of feature could be integrated directly into pdfx or maybe it's redundant as unix itself has features to accomplish the same, as noted by the command above.
**
I really like this tool and am using it for a personal project of mine that I will share freely once it becomes voluminous enough. Basically it's a filetype miner/download that pulls specific filetypes from the waybackmachine - a digital archeological tool of sorts. I use old books and magazine from archive.org as sources for URLs. The URLs are used to query the waybackmachine downloader to download file types.
Thanks for this really easy to use and powerful tool!
The text was updated successfully, but these errors were encountered: