Skip to content

How to extract internal links using PyPDF #2911

Closed Answered by stefan6419846
swathiJayav asked this question in Q&A
Discussion options

You must be logged in to vote

The following should work:

dest = annot_obj["/Dest"]
pg_ref = dest[0]
pg_num = [p.indirect_reference for p in reader.flattened_pages].index(pg_ref)

For the corresponding syntax, see table 149 of the PDF 2.0 specification.

Full code (slightly modified):

import pypdf

internal_links = []
with pypdf.PdfReader('Boeing.pdf') as reader:
    for page_num, page in enumerate(reader.pages):
        page = reader.pages[page_num]
        if "/Annots" in page:
            for annot in page["/Annots"]:
                annot_obj = annot.get_object()
                if annot_obj["/Subtype"] == "/Link":
                    if "/Dest" in annot_obj:
                        dest = annot_obj["/Dest"][0]
     …

Replies: 3 comments 9 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
9 replies
@swathiJayav
Comment options

@stefan6419846
Comment options

@swathiJayav
Comment options

@stefan6419846
Comment options

Answer selected by swathiJayav
@swathiJayav
Comment options

@stefan6419846
Comment options

@swathiJayav
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #2910 on October 19, 2024 06:40.