Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/about-feature-matrix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@
:width: 0
:height: 0

.. image:: images/icons/icon-md.svg
:width: 0
:height: 0

.. raw:: html


Expand Down Expand Up @@ -181,6 +185,11 @@
background-size: 40px 40px;
}

#feature-matrix .icon.md {
background: url("_images/icon-md.svg") 0 0 transparent no-repeat;
background-size: 40px 40px;
}

</style>


Expand All @@ -207,6 +216,7 @@
<span class="icon cbz"><cite>CBZ</cite></span>
<span class="icon svg"><cite>SVG</cite></span>
<span class="icon txt"><cite>TXT</cite></span>
<span class="icon md"><cite>MD</cite></span>
<span class="icon image"><cite id="transFM3">Image</cite></span>
<hr/>
<span class="icon docx"><cite>DOCX</cite></span>
Expand Down
4 changes: 3 additions & 1 deletion docs/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ The following table illustrates how |PyMuPDF| compares with other typical soluti

Therefore input files are mostly in a form that's useful for text extraction.

If faithful reproduction of layout is important, then consider using :ref:`PyMuPDF Pro <pymupdf-pro>`.


----

Expand Down Expand Up @@ -97,7 +99,7 @@ The following table illustrates what features the products offer:
- PyMuPDF Pro
- PyMuPDF4LLM
* - **Input Documents**
- `PDF`, `XPS`, `EPUB`, `CBZ`, `MOBI`, `FB2`, `SVG`, `TXT`, Images (*standard document types*)
- `PDF`, `XPS`, `EPUB`, `CBZ`, `MOBI`, `FB2`, `SVG`, `TXT`, `MD`, Images (*standard document types*)
- *as PyMuPDF* and:
`DOC`/`DOCX`, `XLS`/`XLSX`, `PPT`/`PPTX`, `HWP`/`HWPX`
- *as PyMuPDF*
Expand Down
82 changes: 82 additions & 0 deletions docs/app3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,88 @@ Typical document page sizes are **ISO A4** and **Letter**. A **Letter** page has




.. _CSS_Support:

CSS Support
--------------------------------------------

For now, only a subset of CSS properties are supported.

The underlying C library MuPDF supports a subset of HTML4 and CSS2. The primary goal of the HTML/CSS support is to serve as a popular and convenient way to style text — not to faithfully reproduce websites in PDF.

What Works
~~~~~~~~~~~~~

The following list shows the supported properties, grouped by category.

Box Model & Layout
""""""""""""""""""

``margin``, ``margin-top``, ``margin-right``, ``margin-bottom``, ``margin-left``, ``padding``, ``padding-top``, ``padding-right``, ``padding-bottom``, ``padding-left``, ``width``, ``height``, ``display``, ``position``, ``top``, ``right``, ``bottom``, ``left``, ``inset``, ``overflow-wrap``, ``columns``

.. note::

The properties ``position`` & ``display`` are supported in a very limited way. Only the values ``position: relative`` and ``display: block`` are supported.


Border
""""""""""""""""""

``border``, ``border-top``, ``border-right``, ``border-bottom``, ``border-left``, ``border-color``, ``border-style``, ``border-width``, ``border-spacing``, ``border-collapse``, ``border-top-color``, ``border-right-color``, ``border-bottom-color``, ``border-left-color``, ``border-top-style``, ``border-right-style``, ``border-bottom-style``, ``border-left-style``, ``border-top-width``, ``border-right-width``, ``border-bottom-width``, ``border-left-width``

Background
""""""""""""""""""

``background``, ``background-color``

.. note::

Background images are not supported, but the ``background`` property can be used to set a background color for a text block, which is then rendered as a filled rectangle behind the text.

Font
""""""""""""""""""

``font``, ``font-family``, ``font-size``, ``font-style``, ``font-variant``, ``font-weight``

Text
""""""""""""""""""

``color``, ``letter-spacing``, ``line-height``, ``text-align``, ``text-decoration``, ``text-indent``, ``text-transform``, ``word-spacing``, ``white-space``, ``vertical-align``, ``direction``, ``hyphens``

List
""""""""""""""""""

``list-style``, ``list-style-image``, ``list-style-position``, ``list-style-type``

Page
""""""""""""""""""

``page-break-before``, ``page-break-after``, ``orphans``, ``widows``

Visibility
""""""""""""""""""""""""""""""""""""

``visibility``

MuPDF-specific / WebKit extensions
""""""""""""""""""""""""""""""""""""

``-mupdf-leading``, ``-webkit-text-fill-color``, ``-webkit-text-stroke-color``, ``-webkit-text-stroke-width``

Other
""""""""""""""""""

``src`` (for @font-face), ``overflow-wrap``




What Doesn't Work
~~~~~~~~~~~~~~~~~~~~~~~~~~

Modern CSS (CSS3+): no ``flexbox``, ``grid``, ``custom properties`` (--vars), ``calc()``, ``transitions``, ``animations``, ``position: absolute`` / ``fixed``, ``float``, ``clear`` and so on.

.. rubric:: Footnotes

.. [#f1] MuPDF supports "deep-copying" objects between PDF documents. To avoid duplicate data in the target, it uses so-called "graftmaps", like a form of scratchpad: for each object to be copied, its :data:`xref` number is looked up in the graftmap. If found, copying is skipped. Otherwise, the new :data:`xref` is recorded and the copy takes place. PyMuPDF makes use of this technique in two places so far: :meth:`Document.insert_pdf` and :meth:`Page.show_pdf_page`. This process is fast and very efficient, because it prevents multiple copies of typically large and frequently referenced data, like images and fonts. However, you may still want to consider using garbage collection (option 4) in any of the following cases:
Expand Down
2 changes: 1 addition & 1 deletion docs/archive-class.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Archive

This class represents a generalization of file folders and container files like ZIP and TAR archives. Archives allow accessing arbitrary collections of file folders, ZIP / TAR files and single binary data elements as if they all were part of one hierarchical tree of folders.

In PyMuPDF, archives are currently only used by :ref:`Story` objects to specify where to look for fonts, images and other resources.
In PyMuPDF, archives are currently only used by :ref:`Story` objects and as an :ref:`option when opening files <Full_Options_for_Opening_a_File>` to specify where to look for fonts, images and other resources.

================================ ===================================================
**Method / Attribute** **Short Description**
Expand Down
120 changes: 102 additions & 18 deletions docs/converting-files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Converting Files
Files to PDF
~~~~~~~~~~~~~~~~~~

:ref:`Document types supported by PyMuPDF<HowToOpenAFile>` can easily be converted to |PDF| by using the :meth:`Document.convert_to_pdf` method. This method returns a buffer of data which can then be utilized by |PyMuPDF| to create a new |PDF|.
:ref:`Document types supported by PyMuPDF <HowToOpenAFile>` can easily be converted to |PDF| by using the :meth:`Document.convert_to_pdf` method. This method returns a buffer of data which can then be utilized by |PyMuPDF| to create a new |PDF|.



Expand All @@ -20,38 +20,97 @@ Files to PDF
.. code-block:: python

import pymupdf


# Convert Markdown to PDF
md_doc = pymupdf.open("example.md")
pdfdata = md_doc.convert_to_pdf()
pdf_doc = pymupdf.open(stream=pdfdata)
pdf_doc.save("example.pdf")

# Convert XPS to PDF
xps = pymupdf.open("input.xps")
pdfbytes = xps.convert_to_pdf()
pdf = pymupdf.open("pdf", pdfbytes)
pdfdata = xps.convert_to_pdf()
pdf = pymupdf.open(stream=pdfdata)
pdf.save("output.pdf")

.. _Markdown_to_PDF:

Markdown to PDF
~~~~~~~~~~~~~~~~~

PDF to SVG
~~~~~~~~~~~~~~~~~~
As Markdown files are supported input files they can be easily converted to PDF using the :meth:`Document.convert_to_pdf` method.

Technically, as SVG files cannot be multipage, we must export each page as an SVG.
In the simplest case you can just open the Markdown file and call the method to get a PDF representation of the content.

To get an SVG representation of a page use the :meth:`Page.get_svg_image` method.

**Example**
Defining paper size
"""""""""""""""""""

The default paper size is 400 x 600 :doc:`rect` but you can specify a custom paper size if you wish, to do this just send through the `rect` parameter as required, for example:

.. code-block:: python

import pymupdf
md_doc = pymupdf.open("example.md", rect=pymupdf.paper_rect("A4")) # A4 size

doc = pymupdf.open("input.pdf")
page = doc[0]

# Convert page to SVG
svg_content = page.get_svg_image()
Defining CSS
""""""""""""

By default, the Markdown content will be converted to PDF using a default CSS stylesheet. However, you can specify your own CSS stylesheet to customize the appearance of the resulting PDF. To do this, define your `css` and apply it.

For example, to make all ``h1`` headers red (The single ``#`` symbol in Markdown), you could do the following:

.. code-block:: python

md_doc = pymupdf.open( # open the Markdown document in A4 size
"example.md",
rect=pymupdf.paper_rect("A4")
)

css = "h1 {color:red;}"
md_doc.apply_css(css)

pdf_doc = pymupdf.open(stream=md_doc.convert_to_pdf())
pdf_doc.ez_save("red-colored-header.pdf")

.. note::

The :ref:`support for CSS <CSS_Support>` is currently limited.


Defining Fonts
"""""""""""""""""

Fonts can be defined by using the `archive` parameter to provide a custom :ref:`Archive` containing the font files.

The fonts must exist in an archive which is provided to the `archive` parameter when opening the Markdown file. The CSS can then refer to these fonts by their names as defined in the archive.

For example, assuming you have access to the source files for the "Comic Sans" font for all text, you could do the following:

.. code-block:: python

# Global CSS instructions to use the "Comic Sans" font for all text. The font files must be provided in the archive.
css = """
@font-face {font-family: sans-serif; src: url(comic.ttf);}
@font-face {font-family: sans-serif; src: url(comicbd.ttf); font-weight: bold;}
@font-face {font-family: sans-serif; src: url(comicz.ttf); font-weight: bold; font-style: italic;}
@font-face {font-family: sans-serif; src: url(comici.ttf); font-style: italic;}
"""

archive = pymupdf.Archive("C:/Windows/Fonts") # the fonts are here
archive.add(".") # we've stored the archive image in this script's folder

md_file = "sample.md"
md_doc = pymupdf.open( # open the Markdown document
md_file,
archive=archive, # where to look for resources (fonts, images)
rect=pymupdf.paper_rect("A4"), # page dimension ISO A4
)

md_doc.apply_css(css)


# Save to file
with open("output.svg", "w", encoding="utf-8") as f:
f.write(svg_content)

doc.close()


PDF to Markdown
Expand All @@ -72,6 +131,31 @@ By utlilizing the :doc:`PyMuPDF4LLM API <pymupdf4llm/api>` we are able to conver
pathlib.Path("4llm-output.md").write_bytes(md_text.encode())


PDF to SVG
~~~~~~~~~~~~~~~~~~

Technically, as SVG files cannot be multipage, we must export each page as an SVG.

To get an SVG representation of a page use the :meth:`Page.get_svg_image` method.

**Example**

.. code-block:: python

import pymupdf

doc = pymupdf.open("input.pdf")
page = doc[0]

# Convert page to SVG
svg_content = page.get_svg_image()

# Save to file
with open("output.svg", "w", encoding="utf-8") as f:
f.write(svg_content)

doc.close()

PDF to DOCX
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
Loading
Loading