Skip to content

Problems extracting complete set of text from a pdf #491

Answered by samkit-jain
nigelkiernan asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @nigelkiernan Appreciate your interest in the library. Allow me to resolve your query on behalf of Jeremy. The reason the text is getting extracted from a single page is that you are using [220-222]. To slice the list, you need to use it like [220:222]. 220-222 gets converted to -2 and it refers to the second last page. To fix it, you can refer to the following code

import pdfplumber

pdf = pdfplumber.open("XXXX.pdf")

start_page = 220
end_page = 222

for page in pdf.pages[start_page-1:end_page]:
    # Do operations on page like page.extract_text()

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
4 replies
@nigelkiernan
Comment options

@samkit-jain
Comment options

@nigelkiernan
Comment options

@samkit-jain
Comment options

Answer selected by samkit-jain
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants