File in/out: How to import CSV files into Python using Pandas

Posted on March 2, 2017 by Joanna Diong Leave a comment

Comma separated values (CSV) files are a type of text file commonly used to store data. In a CSV file, each line of text contains values separated with commas. CSV files can be imported into Python in different ways (eg. csv.reader, numpy.loadtxt, etc). One useful method is to import CSV files into Pandas dataframes.

Pandas package. Pandas is a Python package that structures data as a dataframe and provides functions to manipulate numeric and time series data, similar to the way statistical packages such as R and Stata structure data. The name comes from “panel data”, a term to describe structured data sets.

Let’s write a function called readfile to import an example CSV file into a Pandas dataframe. The CSV file contains 5 channels of data for a calibration routine, but we only want data from the first 3 channels. The CSV file is available for download here.

import pandas as pd
import numpy as np

def readfile(file, fq, timescale):
    df = pd.read_csv(file, sep=',', header=None)
    df.rename(columns={0: 'sample', 1: 'thumb', 2: 'index',
                       3: 'channel3', 4: 'channel4', 5: 'channel5', }, inplace=True)
    # create time and customise units of time
    if timescale == 'second':
        time = np.arange(0, len(df['sample']) / fq, 1 / fq)
        xlab = 'Time (sec)'
    elif timescale == 'minute':
        time = np.arange(0, len(df['sample']) / fq, 1 / fq) / 60
        xlab = 'Time (min)'
    elif timescale == 'hour':
        time = np.arange(0, len(df['sample']) / fq, 1 / fq) / (60*60)
        xlab = 'Time (hour)'
    # fix uneven column lengths between time and samples, if needed
    if len(time) > len(df['sample']):
        time = time[:-(len(time)-len(df['sample']))]
    # assign dataframe values to variables
    sample = df['sample']
    thumb = df['thumb']
    index = df['index']
    return sample, thumb, index, time, xlab

Line 1-2. Import the necessary libraries and specify showing Matplotlib plots inline if running code in a Jupyter or IPython notebook.

Line 4. Define the readfile function and have it take the arguments file (name of data file), fq (frequency at which data were sampled in Hz) and timescale (whether we report/plot data in seconds, minutes or hours.)

Line 5. Call the function read_csv to read in the CSV file where values are separated by commas, and don’t read column names from the first row of data. Assign data to the dataframe df.

Line 6-7. Use rename to create column names for the data. Sample numbers are stored in channel 0, and data from the thumb and index finger were stored in channels 1 and 2 respectively.

Line 9-17. Write conditional statements using if and elif to customise whether time and the x-axis label xlab are reported in seconds, minutes or hours.

Line 19-20. If the number of samples in one column is different from those in another column, make the columns the same length. (Some data acquisition systems inadvertedly record an extra sample in one channel, but Python or Matlab dataframes cannot handle columns of uneven lengths.)

Line 22-25. Assign data from different columns to variables, and have the function return these variables.

See these posts for refreshers on functions and conditional statements. Now, let’s use this function to extract and plot data from the CSV file:

import matplotlib.pyplot as plt 
%matplotlib inline

sample, thumb, index, time, xlab = readfile(file="001.csv", fq=10, timescale='second')

fig = plt.figure(figsize=(11, 7))
plt.plot(time, thumb)
plt.xlabel(xlab)
plt.ylabel('Thumb (a.u.)')
plt.savefig('fig1.png')

Figure 1:

Play around with specifying timescale as minutes or hours, and see what this does to the plot.

Summary

To import data from a CSV file into a Pandas dataframe, we use the read_csv function to get the data in and use rename to label our columns. Data from each column are assigned to variables for further analysis.

Posts in series

tagged with file in/out, pandas, Python

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31