Skip to content

Commit

Permalink
Merge pull request #832 from user202729/patch-2
Browse files Browse the repository at this point in the history
Several small changes
  • Loading branch information
JorjMcKie authored Jan 13, 2021
2 parents 407805a + 50caae0 commit b0babae
Show file tree
Hide file tree
Showing 7 changed files with 25 additions and 24 deletions.
12 changes: 6 additions & 6 deletions docs/document.rst
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ For details on **embedded files** refer to Appendix 3.

>>> for item in doc.layer_configs: print(item)
{'number': 0, 'name': 'my-config', 'creator': ''}
>>> # use 'number' as config identifyer in add_ocg
>>> # use 'number' as config identifier in add_ocg

.. method:: add_layer_config(name, creator=None, on=None)

Expand Down Expand Up @@ -1307,27 +1307,27 @@ For details on **embedded files** refer to Appendix 3.

*(New in version 1.16.8)*

PDF only: Return the definition of a PDF object. For details please refer to :meth:`Document.xrefObject`.
PDF only: Return the definition of a PDF object.

.. method:: PDFCatalog()

*(New in version 1.16.8)*

PDF only: Return the :data:`xref` of the PDF catalog (or root) object. For details please refer to :meth:`Document._getPDFroot`.
PDF only: Return the :data:`xref` of the PDF catalog (or root) object.


.. method:: PDFTrailer(compressed=False)

*(New in version 1.16.8)*

PDF only: Return the trailer of the PDF (UTF-8), which is usually located at the PDF file's end. For details please refer to :meth:`Document._getTrailerString`.
PDF only: Return the trailer of the PDF (UTF-8), which is usually located at the PDF file's end.


.. method:: metadataXML()

*(New in version 1.16.8)*

PDF only: Return the :data:`xref` of the document's XML metadata. For details please refer to :meth:`Document._getXmlMetadataXref`.
PDF only: Return the :data:`xref` of the document's XML metadata.

.. method:: xrefStream(xref)

Expand Down Expand Up @@ -1517,7 +1517,7 @@ Clear metadata information. If you do this out of privacy / data protection conc
{'producer': 'none', 'format': 'PDF 1.4', 'encryption': None, 'author': 'none',
'modDate': 'none', 'keywords': 'none', 'title': 'none', 'creationDate': 'none',
'creator': 'none', 'subject': 'none'}
>>> doc._delXmlMetadata() # clear any XML metadata
>>> doc.del_xml_metadata() # clear any XML metadata
>>> doc.save("anonymous.pdf", garbage = 4) # save anonymized doc

:meth:`setToC` Demonstration
Expand Down
18 changes: 9 additions & 9 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1952,12 +1952,12 @@ If it is *False* or if you want to be on the safe side, pick one of the followin

* **Prepend** the missing stacking command by executing *fitz.TOOLS._insert_contents(page, b"q\n", False)*.
* **Append** an unstacking command by executing *fitz.TOOLS._insert_contents(page, b"\nQ", True)*.
* Alternatively, just use :meth:`Page._wrapContents`, which executes the previous two functions.
* Alternatively, just use :meth:`Page.wrap_contents`, which executes the previous two functions.

.. note:: If small incremental update deltas are a concern, this approach is the most effective. Other contents objects are not touched. The utility method creates two new PDF :data:`stream` objects and inserts them before, resp. after the page's other :data:`contents`. We therefore recommend the following snippet to get this situation under control:

>>> if not page._isWrapped:
page._wrapContents()
page.wrap_contents()
>>> # start inserting text, images or annotations here

--------------------------
Expand Down Expand Up @@ -2034,7 +2034,7 @@ How to Handle Object Streams
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some object types contain additional data apart from their object definition. Examples are images, fonts, embedded files or commands describing the appearance of a page.

Objects of these types are called "stream objects". PyMuPDF allows reading an object's stream via method :meth:`Document.xrefStream` with the object's :data:`xref` as an argument. And it is also possible to write back a modified version of a stream using :meth:`Document.updatefStream`.
Objects of these types are called "stream objects". PyMuPDF allows reading an object's stream via method :meth:`Document.xrefStream` with the object's :data:`xref` as an argument. And it is also possible to write back a modified version of a stream using :meth:`Document.updateStream`.

Assume that the following snippet wants to read all streams of a PDF for whatever reason::

Expand All @@ -2044,9 +2044,9 @@ Assume that the following snippet wants to read all streams of a PDF for whateve
# do something with it (it is a bytes object or None)
# e.g. just write it back:
if stream:
doc.updatefStream(xref, stream)
doc.updateStream(xref, stream)

:meth:`Document.xrefStream` automatically returns a stream decompressed as a bytes object -- and :meth:`Document.updatefStream` automatically compresses it (where beneficial).
:meth:`Document.xrefStream` automatically returns a stream decompressed as a bytes object -- and :meth:`Document.updateStream` automatically compresses it (where beneficial).

----------------------------------

Expand Down Expand Up @@ -2125,11 +2125,11 @@ ID array File identifier consisting of two byte strings.
XRefStm int Offset of a cross-reference stream. See :ref:`AdobeManual` p. 109.
======= =========== ===================================================================================

Access this information via PyMuPDF with :meth:`Document._getTrailerString`.
Access this information via PyMuPDF with :meth:`Document.PDFTrailer`.

>>> import fitz
>>> doc=fitz.open("PyMuPDF.pdf")
>>> trailer=doc._getTrailerString()
>>> trailer=doc.PDFTrailer()
>>> print(trailer)
<</Size 5535/Info 5275 0 R/Root 5274 0 R/ID[(\340\273fE\225^l\226\232O|\003\201\325g\245)(}#1,\317\205\000\371\251wO6\352Oa\021)]>>
>>>
Expand Down Expand Up @@ -2159,7 +2159,7 @@ PyMuPDF has no way to **interpret or change** this information directly, because
Using some XML package, the XML data can be interpreted and / or modified and then stored back::

>>> # write back modified XML metadata:
>>> doc.updatefStream(metaxref, xmlmetadata)
>>> doc.updateStream(metaxref, xmlmetadata)
>>>
>>> # if these data are not wanted, delete them:
>>> doc._delXmlMetadata()
>>> doc.del_xml_metadata()
10 changes: 5 additions & 5 deletions docs/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Yet others are handy, general-purpose utilities.
:meth:`ConversionTrailer` return trailer string for *getText* methods
:meth:`Document.del_xml_metadata` PDF only: remove XML metadata
:meth:`Document.set_xml_metadata` PDF only: remove XML metadata
:meth:`Document.delete_object` PDF only: delete an object
:meth:`Document._deleteObject` PDF only: delete an object
:meth:`Document.get_new_xref` PDF only: create and return a new :data:`xref` entry
:meth:`Document._getOLRootNumber` PDF only: return / create :data:`xref` of */Outline*
:meth:`Document.pdf_catalog` PDF only: return the :data:`xref` of the catalog
Expand Down Expand Up @@ -346,7 +346,7 @@ Yet others are handy, general-purpose utilities.

-----

.. method:: Document.delete_object(xref)
.. method:: Document._deleteObject(xref)

PDF only: Delete an object given by its cross reference number.

Expand Down Expand Up @@ -410,7 +410,7 @@ Yet others are handy, general-purpose utilities.

.. method:: Document.xml_metadata_xref()

Return the XML-based metadata :data:`xref` of the PDF if present -- also refer to :meth:`Document._delXmlMetadata`. You can use it to retrieve the content via :meth:`Document.xrefStream` and then work with it using some XML software.
Return the XML-based metadata :data:`xref` of the PDF if present -- also refer to :meth:`Document.del_xml_metadata`. You can use it to retrieve the content via :meth:`Document.xrefStream` and then work with it using some XML software.

:rtype: int
:returns: :data:`xref` of PDF file level XML metadata -- or 0 if none exists.
Expand Down Expand Up @@ -521,9 +521,9 @@ Yet others are handy, general-purpose utilities.

PDF only: Clean and concatenate all :data:`contents` objects associated with this page. "Cleaning" includes syntactical corrections, standardizations and "pretty printing" of the contents stream. Discrepancies between :data:`contents` and :data:`resources` objects will also be corrected if sanitize is true. See :meth:`Page.getContents` for more details.

Changed in version 1.16.0 Annotations are no longer implicitely cleaned by this method. Use :meth:`Annot._cleanContents` separately.
Changed in version 1.16.0 Annotations are no longer implicitly cleaned by this method. Use :meth:`Annot.cleanContents` separately.

:arg bool sanitize: *(new in v1.17.6)* if true, synchronization between resources and their actual use in the contents object is snychronized. For example, if a font is not actually used for any text of the page, then it will be deleted from the ``/Resources/Font`` object.
:arg bool sanitize: *(new in v1.17.6)* if true, synchronization between resources and their actual use in the contents object is synchronized. For example, if a font is not actually used for any text of the page, then it will be deleted from the ``/Resources/Font`` object.

.. warning:: This is a complex function which may generate large amounts of new data and render old data unused. It is **not recommended** using it together with the **incremental save** option. Also note that the resulting singleton new */Contents* object is **uncompressed**. So you should save to a **new file** using options *"deflate=True, garbage=3"*.

Expand Down
4 changes: 2 additions & 2 deletions docs/page.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ In a nutshell, this is what you can do with PyMuPDF:
:meth:`Page.showPDFpage` PDF only: display PDF page image
:meth:`Page.updateLink` PDF only: modify a link
:meth:`Page.widgets` return a generator over the fields on the page
:meth:`Page.writeText` write one or more :ref:`Textwriter` objects
:meth:`Page.writeText` write one or more :ref:`TextWriter` objects
:attr:`Page.CropBox` the page's :data:`CropBox`
:attr:`Page.CropBoxPosition` displacement of the :data:`CropBox`
:attr:`Page.firstAnnot` first :ref:`Annot` on the page
Expand Down Expand Up @@ -472,7 +472,7 @@ In a nutshell, this is what you can do with PyMuPDF:

*(New in version 1.16.18)*

PDF only: Write the text of one or more :ref:`Textwriter` ojects to the page.
PDF only: Write the text of one or more :ref:`TextWriter` objects to the page.

:arg rect_like rect: where to place the text. If omitted, the rectangle union of the text writers is used.
:arg sequence writers: a non-empty tuple / list of :ref:`TextWriter` objects or a single :ref:`TextWriter`.
Expand Down
2 changes: 1 addition & 1 deletion docs/rect.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Hence some useful classification:
:meth:`Rect.morph` transform with a point and a matrix
:meth:`Rect.norm` the Euclidean norm
:meth:`Rect.normalize` makes a rectangle finite
:meth:`Rect.round` create smallest :ref:`Irect` containing rectangle
:meth:`Rect.round` create smallest :ref:`IRect` containing rectangle
:meth:`Rect.transform` transform rectangle with a matrix
:attr:`Rect.bottom_left` bottom left point, synonym *bl*
:attr:`Rect.bottom_right` bottom right point, synonym *br*
Expand Down
2 changes: 1 addition & 1 deletion docs/tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This class is a collection of utility methods and attributes, mainly around memo
====================================== =================================================
**Method / Attribute** **Description**
====================================== =================================================
:meth:`Tools.gen_id` generate a unique identifyer
:meth:`Tools.gen_id` generate a unique identifier
:meth:`Tools.image_profile` report basic image properties
:meth:`Tools.store_shrink` shrink the storables cache [#f1]_
:meth:`Tools.mupdf_warnings` return the accumulated MuPDF warnings
Expand Down
1 change: 1 addition & 0 deletions fitz/fitz.i
Original file line number Diff line number Diff line change
Expand Up @@ -10002,6 +10002,7 @@ struct TextPage {
"""Return simple, bare text on the page."""
return self._extractText(0)

extractTEXT = extractText

def extractHTML(self) -> str:
"""Return page content as a HTML string."""
Expand Down

0 comments on commit b0babae

Please sign in to comment.