Can `pdfplumber` read URLs? #913

petermr · 2023-06-26T06:55:43Z

petermr
Jun 26, 2023

I'd like to be able to read the contents of a URL directly without downloading it as a file. How can this be done?
My initial approach will be:

b = BytesIO(response.content)

and pass this to pdfplumber's open() , but I'd appreciate a working example.

Answered by petermr

Jun 26, 2023

It seems to work:

    def test_read_urls(self):
        url = "https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_SPM.pdf"
        response = requests.get(url)
        bytes_io = BytesIO(response.content)
        with pdfplumber.open(bytes_io) as f:
            pages = f.pages
            assert len(pages) == 40

I don't know whether this scales to very large PDFs and whether there are buffering issues, but this may be useful for others.

View full answer

petermr · 2023-06-26T08:45:56Z

petermr
Jun 26, 2023
Author

It seems to work:

    def test_read_urls(self):
        url = "https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_SPM.pdf"
        response = requests.get(url)
        bytes_io = BytesIO(response.content)
        with pdfplumber.open(bytes_io) as f:
            pages = f.pages
            assert len(pages) == 40

I don't know whether this scales to very large PDFs and whether there are buffering issues, but this may be useful for others.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can `pdfplumber` read URLs? #913

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Can pdfplumber read URLs? #913

petermr Jun 26, 2023

Replies: 1 comment

petermr Jun 26, 2023 Author

Can `pdfplumber` read URLs? #913

petermr
Jun 26, 2023

petermr
Jun 26, 2023
Author