Python: Working with text files & PubMed references part 4

I am assuming you have followed the previous tutorials in this short series on how to manipulate Pubmed references using Python (1, 2, 3).

You should have a list variable called refs4 in your Python console that contains the cleaned references. Because we will be working with this cleaned version of the references from now on, it would be a good idea to save them to a text file. In this way, any other work we want to do with the references will start by loading the text file with the cleaned references, rather than starting from the uncleaned references each time.

Saving cleaned references to a text file

The code below will save our cleaned references into a new text file called cleaned_pubmed_results.txt.

1
2
3
4
5
f_out = open('cleaned_pubmed_results.txt', 'w')
for line in refs4:
    f_out.write(line)
    f_out.write('\n')
f_out.close()

Line 1. We start by opening up our new file for writing; this is specified by the ‘w’ flag.

Line 2-4. Next we loop through each item in the refs4 list. We write the content of line to our new text file using f_out.write(). Because we want each part of the references to appear on a separate line in our text file, we need to add an end-of-line character after writing line to the text file.

Line 5. After writing all our references to our new text file, we need to close the text file using f_out.close().

Summary

Finally! We have finished cleaning our Pubmed references and we have saved them into a new text file. From now on we will work with this new text file. However, if ever anything goes wrong and we accidentally delete the file containing our cleaned references, we can always recreate it by running the code we have written thus far. Also, we might later realize that we want the parts of the references to be saved in a different order (e.g., PMID first and authors last); this can easily be done by modifying our code and re-running it. So the most important thing is to ensure we have a backup of the raw data and our code (it is even better if your code is kept under version control). With these two pieces of information, we can always recreate any variables and files.

The next tutorial in this series will show us how to have a user select which references to keep, and save the kept and rejected references in different text files.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s