Markdown for science and academia – options and commands

In the last few post on Markdown, we learned why we might want to use it, how to prepare study notes and how to add basic formatting (bold, italic, lists, tables, superscript, subscript, etc).

Markdown is simple to write and many websites, such as Github, can render Markdown files. That is, they can present the content of the Markdown file as intended, not as a simple text file. For example, the introduction (i.e. README.md) document of one of my coding projects looks like this on Github.

But for scientists and academics, the beauty and power of Markdown shines when it is combined with Pandoc. Pandoc gives us the power to control various aspects of how our document is converted and formatted; this is achieved via command-line options when we run the Pandoc program, or options embedded directly into our document using the .yaml format.

Command-line options

Pandoc has lots of command-line options. While these may seem intimidating at first, you will grow to love them. You can read a description of all the command-line options in the Pandoc manual. Alternatively, you can type pandoc --help on the command-line to list all the available options:

pandoc [OPTIONS] [FILES]
-f FORMAT, -r FORMAT  --from=FORMAT, --read=FORMAT                    
-t FORMAT, -w FORMAT  --to=FORMAT, --write=FORMAT                     
-o FILE               --output=FILE                                   
                      --data-dir=DIRECTORY                            
-M KEY[:VALUE]        --metadata=KEY[:VALUE]                          
                      --metadata-file=FILE                            
-d FILE               --defaults=FILE                                 
                      --file-scope                                    
-s                    --standalone                                    
                      --template=FILE                                 
-V KEY[:VALUE]        --variable=KEY[:VALUE]                          
                      --wrap=auto|none|preserve                       
                      --ascii                                         
                      --toc, --table-of-contents                      
                      --toc-depth=NUMBER                              
-N                    --number-sections                               
                      --number-offset=NUMBERS   
...      

Let’s have a look at some simple command-line options and how we might use them. To illustrate the effect of these command-line options, we will be working with a sample Markdown file, which is accessible here.

Let’s start by generating a simple PDF document by running the following command:

pandoc notes.md --output=notes.pdf

The above command uses one command-line option, --output. This option informs Pandoc the name and format of our output file.

Now let’s add a table of contents to our file using the --toc command-line options. We can also use the --bibliography command-line option we saw in a previous post to properly format our references:

pandoc notes.md --toc --bibliography ref_list.bib --output=notes.pdf

That was easy and looks nice, but we might want to limit the depth of our table of contents to the first two heading levels. That is, we don’t want Interview to appear. This is easily achieved using the --toc-depth command-line option:

pandoc notes.md --toc --toc-depth=2 --bibliography ref_list.bib --output=notes.pdf

Command-line options – passing LaTeX-specific options

When preparing a LaTeX document, we often want to specify options, like the size of the page, whether we want to use one or two columns, etc. These options can be passed to Pandoc using the --variable command-line option.

For example, let’s add a header on each page, specify we want to use two columns, and set our paper size to A4:

pandoc notes.md \
--toc \
--toc-depth=2 \
--number-sections \
--bibliography ref_list.bib \ 
--variable pagestyle=headings \
--variable classoption=twocolumn \
--variable papersize=a4paper \
--output=notes.pdf

NOTE: To help with the readability of Pandoc commands, you can put the various command-line options on separate lines by including a \ after each command-line option.

To provide a final example, we will apply one of the commands that appears in the document we have been generating. Specifically, we will add --variable fontfamily=arev to our command-line options to generate a document that uses a sans serif font:

YAML header

Rather than type our options on the command-line, we can include them in our Markdown file itself as a yaml header. yaml is a file format often used for configuration files. When used as part of our Markdown file, we include it at the very top of our file start and ending with three dashes ---.

Here is what it would look like to include the various command-line options we used in our last example:

---
classoption:
  - twocolumn
pagestyle:
  - headings
papersize:
  -a4
toc: True
toc-depth: 2
bibliography: ref_list.bib
number-section: True
fontfamily: arev
---

# Sans serif fonts

Without using another pdf engine (which would require using Markdown's `--pdf-engine` option), there are a few ways to obtain sans serif fonts.

With all these options specified in the Markdown file items, we can run Pandoc as follows:

pandoc notes.md --output=notes.pdf

YAML file

Another way to work with Markdown/Pandoc is to specify our various options in a dedicated yaml file. This helps keep our Markdown files clean and allows us to reuse various option combinations (e.g. draft manuscript, study notes, letters, webpage).

So, let’s move the various things in our yaml header to a dedicated notes.yaml file:

---
variables:
  classoption:
    - twocolumn
  pagestyle:
    - headings
  papersize:
    - a4
  fontfamily: arev
  number-section: True
toc: True
toc-depth: 2
bibliography: ref_list.bib
... 

The few things to note here is that the file starts with --- and ends with ..., and the LaTeX-specific commands are included as variables.

Summary

We just learned how to use various Pandoc (and LaTeX) options to modify how our document is rendered. These options can be passed on the command-line or in a yaml header or in a dedicated yaml file.

And remember, we have been focusing on writing a Markdown file and converting it to PDF (or html) using Pandoc and LaTeX. However, Pandoc can convert from many file formats to many file formats. So the same document could be rendered to docx or odt if you want to share your document for editing with colleagues who are intimidated by Markdown files (not sure why they would be, they are so straightforward!).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s