- Markdown is a way of writing a formatted text on the web. This article discusses how an HTML text can be converted to Markdown. We can easily convert HTML to markdown using markdownify package. So let’s see how to download markdownify package and convert our HTML to markdown in python. Installation: This module does not come in-built with Python.
- Markdown to Html bitbucket. Apache-mod-sundown C Apache HTTPd module Using Sundown C library github: Discount C Library and HTML converter - site.
- Markdown 类的构造函数参数与 markdown 函数相同。 扩展说明 1. Abbr扩展 from markdown.extensions.abbr import AbbrExtension 语法例子: The HTML specification is maintained by the W3C.HTML: Hyper Text Markup Language.W3C: World Wide Web Consortium 转换为:.
The rest of the file (after the second '-') is the markdown content of the file. But for brevity we will call the entire file a markdown file. Converting this page to HTML actually involves 4 separate tasks: Split the file into yaml and markdown parts; Extract the meta-data from the YAML. Convert the markdown to an HTML fragment (the page.
In the previous article we looked at what static sites are, and how they work.
Now we will look at how to convert a single markdown file into an HTML file.
The conversion process
This diagram from the previous article shows the basic process for converting a set of markdown files into the required HTML files for a complete website:
This time we will look in more detail at what is involved in converting a single page of markdown into the corresponding HTML file:
Here is an example markdown file, test.md:
This actually isn't a pure markdown file. The top part of the file is meta-data for the page, in a format called yaml. Many static site generators use a similar system. The yaml is contained between the two '---' markers. The rest of the file (after the second '---') is the markdown content of the file. But for brevity we will call the entire file a markdown file.
Converting this page to HTML actually involves 4 separate tasks:
- Split the file into yaml and markdown parts
- Extract the meta-data from the YAML.
- Convert the markdown to an HTML fragment (the page content).
- Combine the meta-data and page content with the HTML template to create a complete HTML file.
Fortunately, if we use the right Python libraries, each of these steps is very easy.
Splitting the file
This part is fairly standard Python. We read the markdown file in, line by line, and create two strings, ym
that contains the yaml text, and md
that contains the markdown text.
Python allows us to treat a text file as a sequence of lines of text, that we can loop through using a for loop.
The first loop discards strings until we find the first '---'. The second loop reads all the strings until the next '---'. Those are the yaml_lines
. Finally, all the remaining lines after the second '---' are the markdown data.
We join all the yaml_lines
to form a string ym
. We join all the lines of markdown data to form the string md
.
Parsing the yaml data
We will use the Python yaml library to parse the yaml data, like this:
This parses a block of yaml text and creates a dictionary with the result. Here is what it prints:
This is the same data as we had on the test.md file, but now in the form of a Python dictionary.
Notice that the tags element has a list of values. That is because the yaml header uses a syntax for tags that allows for multiple values.
Converting the markdown data
Here we convert the second part of the file, the markdown data, into an html fragment, like this:
We are using the markdown library to do the conversion. This takes a markdown format string and returns an html string. Based on the markdown code above, the html content
string will be:
As you can see it correctly marked up the bold and italic text, hyperlink, and image. The markdown
method has several extensions that can be added, for example to provide syntax highlighting, but we aren't using those here.
The output is an html fragment. It places each paragraph inside its own paragraph tags, but it doesn't provide higher level tags such as a body tag. It is assumed that the html fragment will be place within a full html document (which we will do next).
Creating the full html
We create our final html using a template like this:
This template is just a basic html page. For a real website, you would probably want to use something more sophisticated, maybe a responsive template and some CSS styling.
But the basic method is the same. You use a full html page template, but with placeholders for variable content such as the title of the page, the author's name, and the main content itself.
The placeholders are enclosed in double curly brackets, for example {{title}}
. We use the pystache module to substitute real values for the placeholders to create the final html. Here is the code:
The render
function accepts the html template, plus a dictionary that maps the template names on to their values.
Notice that the info
dictionary we are using comes straight from the yaml parser. It already contains entries for the title, author and date. The trick here is to make sure that each tag in the html template exactly matches the equivalent field in the yaml header. That way, pystache will be looking for the same tags that the yaml parser stored.
Well that isn't quite true. The info
dictionary doesn't yet have an entry for content, because the content comes from the markdown. So we add and extra element to the dictionary, called 'content', containing the processed markdown content.
Convert Word To Markdown
The other thing to notice is that we use triple brackets for content - {{{content}}}. The reason for this is that the content is raw html data:
- For {{value}}, pystache renders the value assuming it is text that you want to display. If it contains html characters such as
<
it will use escape characters so the the symbol is displayed as a<
in the browser. That is what you would want in the page title, for instance. - For {{{value}}}, pystache renders the text unaltered, so it the text contains
<p>
, it will cause a paragraph break. This is what you want for the page content, which does include paragraph breaks.
Putting it all together
This has taken a bit of explaining, but if you actually look at the code to convert the yaml plus markdown into a final html page, it is remarkably simple:
In the next article we will look at how to build a complete site.
Note
The documentation for this module is an excerpt of the documentation available on the markdown2 project page on GitHub. Minor edits have been made to make the documentation fit on one page and to remove passages that are not relevant on iOS. Also, only a summary of available extras is included here, for more information on individual extras, please refer to the project page wiki.
Markdown is a light text markup format and a processor to convert that to HTML. The originator describes it as follows:
“Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).”
– http://daringfireball.net/projects/markdown/
This (markdown2) is a fast and complete Python implementation of Markdown. It was written to closely match the behaviour of the original Perl-implemented Markdown.pl. markdown2 also comes with a number of extensions (called “extras”) for things like syntax coloring, tables, header-ids. See the “Extras” section below.
Quick usage:
Extras¶
By default markdown2‘s processing attempts to produce output exactly as defined by http://daringfireball.net/projects/markdown/syntax – the “Markdown core.” However, a few optional extras are also provided.
Implemented Extras¶
- code-friendly: Disable _ and __ for em and strong.
- code-color: (DEPRECATED Use fenced-code-blocks extra instead.) Pygments-based syntax coloring of <code> sections.
- cuddled-lists: Allow lists to be cuddled to the preceding paragraph.
- fenced-code-blocks: Allows a code block to not have to be indented by fencing it with ‘```‘ on a line before and after. Based on http://github.github.com/github-flavored-markdown/ with support for syntax highlighting.
- footnotes: support footnotes as in use on daringfireball.net and implemented in other Markdown processors (tho not in Markdown.pl v1.0.1).
- header-ids: Adds “id” attributes to headers. The id value is a slug of the header text.
- html-classes: Takes a dict mapping html tag names (lowercase) to a string to use for a “class” tag attribute. Currently only supports “pre” and “code” tags. Add an issue if you require this for other tags.
- link-patterns: Auto-link given regex patterns in text (e.g. bug number references, revision number references).
- markdown-in-html: Allow the use of markdown=”1” in a block HTML tag to have markdown processing be done on its contents. Similar to http://michelf.com/projects/php-markdown/extra/#markdown-attr but with some limitations.
- metadata: Extract metadata from a leading ‘—’-fenced block.
- nofollow: Add rel=”nofollow” to add <a> tags with an href. See http://en.wikipedia.org/wiki/Nofollow.
- pyshell: Treats unindented Python interactive shell sessions as <code> blocks.
- smarty-pants: Fancy quote, em-dash and ellipsis handling similar to http://daringfireball.net/projects/smartypants/. See old issue 42 for discussion.
- toc: The returned HTML string gets a new “toc_html” attribute which is a Table of Contents for the document. (experimental)
- wiki-tables: Google Code Wiki table syntax support.
- xml: Passes one-liner processing instructions and namespaced XML tags.
How to turn on extras¶
Extras are all off by default and turned on as follows:
(New in v1.0.1.2) You can also now specify extras via the “markdown-extras” emacs-style local variable in the markdown text:
Functions¶
Convert the markdown-formatted text to html with the given options.
Python Markdown To Html With Css
Same as markdown(), but use the text in a given file as input.