papermate.py: A tool for academic writers

If you have been following along, we have been exploring how we can use Markdown to do most of our academic and scientific writing (1, 2, 3, 4, 5, 6). While Markdown is extremely simple, incorporating it into the workflow of writing a scientific paper is not necessarily straightforward. How do we take the Markdown text and turn it into a PDF that you can submit? What about revisions? Can we generate a marked-up version with our changes? What if the journal wants a Word (.docx) file for the final submission?

Thankfully for you I have too much time on my hands (not really, but I enjoy figuring out these types of problems): let me introduce you to papermate.

papermate

papermate is a Python package that helps us write and revise scientific papers. We use papermate from the command-line and use simple inputs and command-line flags, similar to what was covered in a recent Python post.

If we want to get the most out of papermate, we should also use git to keep track of our changes and additions. This has two obvious benefits. First, we have a record of our changes and who made them. This means we have the option to revert back to a previous version or retrieve a deleted sentence or paragraph. Second, we can add git tags to tag key versions of our manuscript, for example ‘v1_MH_draft’, ‘v2_JD_draft’, ‘v3_submitted’, ‘v4_revised’. We can then use papermate to generate a marked-up PDF version of our manuscript that highlights the differences between any tagged versions. This is useful to see what changes a co-author has made to the manuscript (e.g. ‘v1_MH_draft’ versus ‘v2_JD_draft’), or to prepare the marked-up PDF version when we submit our revisions to a journal (‘v3_submitted’ versus ‘v4_revised’). Also, we may want to retrieve a PDF version of a previously tagged version of our manuscript; this too is possible with papermate.

When we submit the final version of our manuscript to the publisher, papermate allows us to output our manuscript to either LaTeX or .docx.

Submitting a .docx version is counter intuitive given that no publisher typesets articles in Microsoft Word; the first thing they do is extract our text and do away with all the painstaking point-and-click formatting we did to match the journal style!

Regardless, given that .docx is still the dominant format, papermate can output our manuscript in this format. The formatting in .docx might not be perfect, but it will only take a few moments to make it acceptable for submission; this is a much better alternative to worrying about formatting all the way through the writing process.

papermate requirements

Writing a paper in papermate is simple. We can start with the template that was introduced in our last post, use the various tips it contains, and write a complete paper. However, to be able to convert our paper from Markdown to PDF (or LaTeX or .docx), we will need to have Pandoc installed. Also, we will need to have git installed to use the bits of functionality that depend on it.

papermate in action

To demonstrate how we might use papermate to write a scientific paper, we will draft a toy example of a manuscript. The first thing we do is set-up the basic file structure that is expected by papermate:

manuscript
├── bib
├── img
├── tex
└── manuscript.md

The bib folder is where we need to put our BibTeX file if we want to cite items. For the present example, let’s say we have a file called refs.bib that contains some references. The bib folder is also where we need to put our .csl file, which tells Pandoc (and LaTeX) how to format our references. For this toy example we will use vancouver-author-date.csl

This means our file structure now looks like this:

manuscript
├── bib
│   ├── refs.bib
│   └── vancouver-author-date.csl
├── img
├── tex
└── manuscript.md

The img folder is where we will store figures for our paper.

The tex folder is where we would store modified versions of the three .tex files that are used by Pandoc to render our manuscript in LaTeX. These files were discussed in the document that was generated as part of a previous post. We don’t need to provide these files; but having the option to replace the default versions of these files allows greater flexibility in preparing our manuscript.

Drafting our manuscript

Next we start to draft our manuscript in the file manuscript.md. Because we know the importance of saving versions of our work, we make sure to commit changes to our manuscript as we write it. For example, we may want to commit a version when we have finished drafting our Methods. Then commit again when we have finished our Results, our Discussion, and our Introduction. Then we can do a final pass of our manuscript; remembering to commit the changes made during this final pass.

We are now ready to send our manuscript to our co-author for feedback. At this point we will want to add a git tag to mark this specific commit. For example, we might do the following:

$ git add manuscript.md
$ git commit -m "Martys first full draft of manuscript"
$ git tag -a v1_marty_first_daft -m "Martys first full draft of manuscript"

Then, if we look at our git log, we would see something like this:

$ git log --pretty=oneline
d76f1b5ae8a7e68d9f65301330264fd9c7e64f70 (HEAD -> master, tag: v1_marty_first_daft) Martys first full draft of manuscript
d56bda77b2d56a5123404687b28bb5605f6bd0ec First draft of Introduction
b02a1d3edd5aeeb45129690950a6388896e6b046 First draft of Discussion
b0994be6d1077b52254b41781cba6395711f310c First draft of Results
cd6ae8453479a9d58e573b6a878112aded4adf8a First draft of Methods

We are now ready to share our manuscript with our co-author. They will make their changes and commit them using git.

When we receive these changes, we may want to add a git tag as this will allow us to clearly identify this version of the paper. It will also allow us to generate a marked-up PDF comparing our version and that of our co-author.

By checking the git log, we see that the commit by our co-author has the follow identifier:

cd6ae8453479a9d58e573b6a878112aded4adf8a

So we can run the following command to add a git tag:

git tag -a v2_jo_revisions cd6ae8453479a9d58e573b6a878112aded4adf8a

We now make a final pass through the paper and address any comments made by our co-author. Again, we commit this version and add a git tag:

$ git add manuscript.md
$ git commit -m 'Version for submission'
$ git tag -a v3_submitted_version -m "Version for submission"

Generating a PDF of our manusript

Great, we are now ready to render a PDF version of our manuscript and submit it. With papermate, this could not be any easier! By default, papermate will search the current directory for a Markdown file. If it finds one, it will be used to render our manuscript. Similarly, papermate will search for .bib and .csl files in the bib folder. However, we can specify these ourselves at the command line. Let’s see a few examples.

First, the simplest case:

$ papermate

Note that the above assumes you have added papermate to your main path.

An alternatively is to have papermate on your PYTHONPATH:

$ python papermate.py

Executing this command will generate a file called manuscript.pdf, which is the rendered version of our manuscript. The title page will look something like this:

The more explicit way of calling papermate would be as follows:

$ python papermate.py --input manuscript.md --csl bib/vancouver-author-date.csl --bib bib/refs.bib

Rather than remembering and typing this command, we could have it stored in a bash shell script that we can run. We could even have different versions of the command that we comment and uncomment, depending on whether we want to generate a marked-up version of our paper, a clear PDF, a .docx file, etc.

Revising our manuscript and generating a marked-up version, a .docx version and .tex version

Great, our manuscript received positive reviews. We only need to make minor revisions to the text and respond to a few comments. Once we have completed this work, we can commit and tag our work:

$ git add manuscript.md
$ git commit -m 'Revised version for submission'
$ git tag -a v4_revised -m "Revised version for submission"

We can now generate a final/clean PDF version of our manuscript by simply calling papermate with no command-line options.

We can also generate a marked-up PDF version by specifying the git tag of the two versions we want to compare. In our current example, we would run the following command:

$ python papermate.py --tags v4_revised v3_submitted

This will generate a file titled v4_revised_v3_submitted_diff.pdf. An example of what this file will look like is included below:

The last thing we need to do is determine the type of file the publisher needs for final submissions. If the publisher needs a .docx, we can run the following command:

$ python papermate.py --docx

This will generate a file called manuscript.docx. The formatting won’t be perfect (see example below), but it should only take a minute or two to fix up.

Alternatively, the publisher might accept LaTeX files for the final submission. In this case, we would run the following command:

$ python papermate.py --tex

So simple.

Conclusion

papermate is still a work in progress, but works just fine for the workflow we just saw. The goal is to add some tests, documentation, and a setup.py file, and then publish papermate as Python package on Pypi. For now, papermate can be downloaded directly from github here.

One comment

  • Since writing this post, I modified the code slightly and made papermate available on [pypi][https://pypi.org/project/papermate/].

    This means you can simply ‘pip install papermate’.

    Like

Leave a comment