How to use pandoc to convert files on Linux

You can use pandoc on Linux to convert over 40 different file formats. You can also use it to create a simple docs-as-code system by writing to Markdown, saving with git and exporting it in any supported format.

Convert text and docs-as-code format

If you have a document of any file format supported by pandoc, converting the format will be extremely simple.

The real power of pandoc becomes clearer when you use it as a platform for a simple docs-as-code system. The premise of docs-as-code is to apply some software development techniques and principles and use them in writing documents, especially for computer program development projects. You can apply it to develop any kind of document.

Install pandoc

To install pandoc on Ubuntu, use the command:

sudo apt-get install pandoc

On Fedora, use the command:

sudo dnf install pandoc

On Manjaro, use the command:

sudo pacman -Syu pandoc

You can check which version you have installed using the --version option:

pandoc --version

Use pandoc without files

If you use pandoc without any command line options, it also accepts the entered text. Just press Ctrl + D to let the computer know you have finished typing. pandoc wants you to enter Markdown format and it produces output in HTML.

See the example below:

pandoc

Enter some Markdown lines and press Ctrl + D.

How to use pandoc to convert files on Linux Picture 1How to use pandoc to convert files on Linux Picture 1 Enter some Markdown lines

Immediately after importing, pandoc generates equivalent HTML output.

How to use pandoc to convert files on Linux Picture 2How to use pandoc to convert files on Linux Picture 2 Output HTML equivalent

However, in order to use pandoc is useful, we really need to use the files.

Basic markdown

Markdown is a lightweight markup language, intended for certain characters. You can use a simple text editor to create a Markdown file.

Markdown can be read easily, because there is no visual clutter to distract from the text. The format in the Markdown document is the same as the format it represents. Here are some basic things:

  1. To italicize, place the text in an asterisks.
  2. For boldface, use two asterisks.
  3. Headings are indicated by a number sign / # sign. Text is separated from function # by a space. Use one function # for top level titles, two functions for second levels, etc.
  4. To create a bulleted list, start each line of the list with an asterisk and insert a space before the text.
  5. To create a numbered list, start each line with a number followed by a period, then insert a space before the text.
  6. To create hyperlinks, put the site's name in square brackets ([]) and the URL in brackets [()] as such.
  7. To insert an image, enter an exclamation point immediately before the brackets (! []). Enter any alt text for pictures in parentheses. Then, place the path to the image in brackets [() Example].

Convert files

File conversion is very simple. pandoc can often find the file format you work with. For example, here will create an HTML file from Markdown. The -o option (output) tells pandoc the name of the file we want to create:

pandoc -o sample.html sample.md

The sample Markdown file, sample.md, containing the short portion of Markdown is shown in the image below.

How to use pandoc to convert files on Linux Picture 3How to use pandoc to convert files on Linux Picture 3 Sample Markdown file

A file named sample.html is created. When you double click on the file, the default browser will open it.

Now, create an Open Document Format that can be opened in LibreOffice Writer:

pandoc -o sample.odt sample.md

The ODT file has the same content as the HTML file.

How to use pandoc to convert files on Linux Picture 4How to use pandoc to convert files on Linux Picture 4 ODT file

Specify the file format

The -f (from) and -t (to) options are used to tell pandoc which file format you want to convert from. This can be helpful if you work with file formats that share file extensions with other related formats. For example, both TeX and LaTeX use the '.tex' extension.

The -s (standalone) option is also used for pandoc to create all the necessary LaTeX previews for a document to become a complete, closed and well-formed LaTeX document. Without the -s (standalone) option, the output would still be LaTeX, which could be included in another LaTeX document, but it would parse as a standalone LaTeX document.

Type the following command:

pandoc -f markdown -t latex -s -o sample.tex sample.md

If you open the sample.tex sample file in a text editor, you should see LaTeX created. If you have a LaTeX editor, open the TEX file to see a preview of how to interpret LaTeX typesetting commands. Minimize the window to fit the image below making the screen look cramped, but in reality, it still works fine.

How to use pandoc to convert files on Linux Picture 5How to use pandoc to convert files on Linux Picture 5 LaTeX text editor

This is a LaTeX editor called Texmaker. If you want to install, type the following command into Ubuntu:

sudo apt-get install texmaker

On Fedora, type the following command:

sudo dnf install texmaker

On Manjaro, use the command:

sudo pacman -Syu texmaker

Convert files with the template

With templates, you can dictate which pandoc to use when creating documents. For example, ask pandoc to use styles defined in Cascading Style Sheets (CSS) with --css option.

A small CSS file containing the text has been created below. It changes the top and bottom spacing of the heading levels. It also changes the text color to white and the background color to blue:

h1 { color: #FFFFFF; background-color: #3C33FF; margin-top: 0px; margin-bottom: 1px; }

The full command is here:

pandoc -o sample.html -s --css sample.css sample.md

Another tweaking option available when working with HTML files includes the HTML markup in your Markdown file. It will be converted to the newly created HTML file as a standard HTML markup.

This technique should only be used when outputting HTML. If you work with multiple file formats, pandoc will ignore the HTML markup for non-HTML files and it will be converted to the text format.

We can also specify which type to use when the ODT file is created. Open a white LibreOffice Writer page and adjust the heading styles and fonts to suit your needs. The example below adds a title and a bookmark. After that, save your document as 'odt-template.odt'.

We can now use this template as a template with the --reference-doc option:

pandoc -o sample.odt --reference-doc=odt-template.odt sample.md

Compare this to the previous ODT example. This document uses a different font, has color headers and page numbers. However, it was created from the same 'sample.md' file of the Markdown file. 

Reference templates can be used to indicate the different stages of a document's production.

Create a PDF file

By default, pandoc uses LaTeX PDF tool to create PDF files. The easiest way is to install a LaTeX editor, such as Texmaker.

Since Tex and LaTeX are both quite large, if your hard drive is not enough or you never use TeX or LaTeX, create an ODT file. Then you just need to open it in LibreOffice Writer and save it as a PDF.

Docs-as-Code

A few benefits of using the Markdown programming language:

  1. Work fast in simple text files. Many editors, including gedit, Vim and Emacs, use highlights with Markdown text.
  2. You will have the timeline of all document versions. If you store your documents in a VCS, such as Git, you can easily see the difference between two versions of the same file. However, this only works when the files are in plain text, as this is how VCS works.
  3. A VCS can record the time and who made any changes. This is especially useful if you regularly work on teams in large projects. It also provides a central repository for documents. There are many cloud Git storage services like GitHub, GitLab and BitBucket, all of which have free and paid versions.
  4. You can create your documents in many formats. With just a few simple shell scripts, you can get templates from CSS and references. If you store your documents in the VCS repository integrated with the CI / CD platform, they can be created automatically whenever software is developed.
4 ★ | 5 Vote