-
Notifications
You must be signed in to change notification settings - Fork 867
Tutorial 2 Altering Markdown Rendering
While many extensions to Python-Markdown add new syntax, occasionally, you want to simply alter the way Markdown renders the existing syntax. For example, you may want to display some images inline, but require externally hosted images to simply be links which point to the image.
Suppose the following Markdown was provided:
![a local image](/path/to/image.jpg)
![a remote image](http://example.com/image.jpg)
We would like Python-Markdown to return the following HTML:
<p><img alt="a local image" src="/path/to/image.jpg" /></p>
<p><a href="http://example.com/image.jpg">a remote image</a></p>
Note: This tutorial is very generic and assumes a basic Python 3 development environment. A basic understanding of Python development is expected.
Let's consider the options available to us:
-
Override the image related inline patterns.
While this would work, we don't need to alter the existing patterns. The parser is recognizing the syntax just fine. All we need to do is alter the HTML output.
We also want to support both inline image links and reference style image links, which would require redefining both inline patterns, doubling the work.
-
Leave the existing pattern alone and use a Treeprocessor to alter the HTML.
This does not alter the tokenization of the Markdown syntax in any way. We can be sure that anything which represents an image will be included, even any new image syntax added by other third-party extensions.
Given the above, let's use option two.
To begin, let's create a new Treeprocessor
:
from markdown.treeprocessors import Treeprocessor
class InlineImageProcessor(Treeprocessor):
def run(self, root):
# Modify the HTML here
The run
method of a Treeprocessor
receives a root
argument which contains an ElementTree object. We need to iterate over all of the img
elements within that object and alter those which contain external URLs. Therefore, add the following code to the run
method:
# Iterate over img elements only
for element in root.iter('img'):
# copy the element's attributes for later use
attrib = element.attrib
# Check for links to external images
if attrib['src'].startswith('http'):
# Save the tail
tail = element.tail
# Reset the element
element.clear()
# Change the element to a link
element.tag = 'a'
# Copy src to href
element.set('href', attrib.pop('src'))
# Copy alt to label
element.text = attrib.pop('alt')
# Reassign tail
element.tail = tail
# Copy all remaining attributes to element
for k, v in attrib.items():
element.set(k, v)
A few things to note about the above code:
- We make a copy of the element's attributes so that we don't loose them when we later reset the element with
element.clear()
. The same applies for thetail
. Asimg
elements don't havetext
, we don't need to worry about that. - We explicitly set the
href
attribute and theelement.text
as those are assigned to different attribute names ona
elements that onimg
elements. When doing so, wepop
thesrc
andalt
attributes fromattrib
so that they are no longer present when we copy all remaining attributes in the last step. - We don't need to make changes to
img
elements which point to internal images, so there no need to reference them in the code (they simply get skipped). - The test for external links (
startswith('http')
) could be improved and is left as an exercise for the reader.
Now we need to inform Markdown
of our new Treeprocessor
with an Extension subclass:
from markdown.extensions import Extension
class ImageExtension(Extension):
def extendMarkdown(self, md):
# Register the new treeprocessor
md.treeprocessors.register(InlineImageProcessor(md), 'inlineimageprocessor', 15)
We register the Treeprocessor
with a priority of 15
, which ensures that it runs after all inline processing is done.
Let's see that all together:
ImageExtension.py
from markdown.treeprocessors import Treeprocessor
from markdown.extensions import Extension
class InlineImageProcessor(Treeprocessor):
def run(self, root):
for element in root.iter('img'):
attrib = element.attrib
if attrib['src'].startswith('http'):
tail = element.tail
element.clear()
element.tag = 'a'
element.set('href', attrib.pop('src'))
element.text = attrib.pop('alt')
element.tail = tail
for k, v in attrib.items():
element.set(k, v)
class ImageExtension(Extension):
def extendMarkdown(self, md):
md.treeprocessors.register(InlineImageProcessor(md), 'inlineimageprocessor', 15)
Now, pass our extension to Markdown:
Test.py
import markdown
input = """
![a local image](/path/to/image.jpg "A title.")
![a remote image](http://example.com/image.jpg "A title.")
"""
from ImageExtension import ImageExtension
html = markdown.markdown(input, extensions=[ImageExtension()])
print(html)
And running python Test.py
correctly returns the following output:
<p><img alt="a local image" src="/path/to/image.jpg" title="A title."/></p>
<p><a href="http://example.com/image.jpg" title="A title.">a remote image</a></p>
Success! Note that we included a title
for each image, which was also properly retained.
Suppose we want to allow the user to provide a list of know image hosts. Any img
tags which point at images in those hosts may be inlined, but any other images should be external links. Of course, we want to keep the existing behavior for internal (relative) links.
First we need to add the configuration option to our Extension
subclass:
class ImageExtension(Extension):
def __init__(self, **kwargs):
# Define a config with defaults
self.config = {'hosts' : [[], 'List of approved hosts']}
super(ImageExtension, self).__init__(**kwargs)
We defined a hosts
configuration setting which defaults to an empty list. Now, we need to pass that option on to our treeprocessor
in the extendMarkdown
method:
def extendMarkdown(self, md):
# Pass host to the treeprocessor
md.treeprocessors.register(InlineImageProcessor(md, hosts=self.getConfig('hosts')), 'inlineimageprocessor', 15)
Next, we need to modify our treeprocessor
to accept the new setting:
class InlineImageProcessor(Treeprocessor):
def __init__(self, md, hosts):
self.md = md
# Assign the setting to the hosts attribute of the class instance
self.hosts = hosts
Then, we can add a method which uses the setting to test a URL:
from urllib.parse import urlparse
class InlineImageProcessor(Treeprocessor):
...
def is_unknown_host(self, url):
url = urlparse(url)
# Return False if network location is empty or an known host
return url.netloc and url.netloc not in self.hosts
Finally, we can make use of the test method by replacing the if attrib['src'].startswith('http'):
line of the run
method with if self.is_unknown_host(attrib['src']):
.
The final result should look like this:
ImageExtension.py
from markdown.treeprocessors import Treeprocessor
from markdown.extensions import Extension
from urllib.parse import urlparse
class InlineImageProcessor(Treeprocessor):
def __init__(self, md, hosts):
self.md = md
self.hosts = hosts
def is_unknown_host(self, url):
url = urlparse(url)
return url.netloc and url.netloc not in self.hosts
def run(self, root):
for element in root.iter('img'):
attrib = element.attrib
if self.is_unknown_host(attrib['src']):
tail = element.tail
element.clear()
element.tag = 'a'
element.set('href', attrib.pop('src'))
element.text = attrib.pop('alt')
element.tail = tail
for k, v in attrib.items():
element.set(k, v)
class ImageExtension(Extension):
def __init__(self, **kwargs):
self.config = {'hosts' : [[], 'List of approved hosts']}
super(ImageExtension, self).__init__(**kwargs)
def extendMarkdown(self, md):
md.treeprocessors.register(InlineImageProcessor(md, hosts=self.getConfig('hosts')), 'inlineimageprocessor', 15)
Let's test that out:
Test.py
import markdown
input = """
![a local image](/path/to/image.jpg)
![a remote image](http://example.com/image.jpg)
![an excluded remote image](http://exclude.com/image.jpg)
"""
from ImageExtension import ImageExtension
html = markdown.markdown(input, extensions=[ImageExtension(hosts=['example.com'])])
print(html)
And running python Test.py
returns the following output:
<p><img alt="a local image" src="/path/to/image.jpg"/></p>
<p><img alt="a remote image" src="http://example.com/image.jpg"/></p>
<p><a href="http://exclude.com/image.jpg">an excluded remote image</a></p>
Wrapping the above extension up into a package for distribution is left as an exercise for the reader.