Python: Data structures

Python has a number of in-built data structures. Two of the most common structures are lists and dictionaries. In Python, a list is a sequence of values and is constructed with square brackets, e.g. [1, 5, 'abc', 34.8]. The values in a list can be numbers or strings. A dictionary is a sequence of keys associated with values and is constructed with curly brackets, e.g. {'Jack': 18, 'Jill': 16, 'Half': 0.5, 24:'potatoes', 1:'start'}. The keys of a dictionary (e.g., 'Jack', 'Jill', 'Half', 24, 1) can be integers or strings, and the values (e.g., 18, 16, 0.5, 'potatoes', 'start') can be any data type.

We can index elements in a list or dictionary but importantly, key-value pairs in a dictionary are not stored (or retrieved) in the same order. (This is because Python uses a method called hash maps for very fast access to dictionary keys and values.)

What is good data analysis planning? For any research project, we need to consider how to organise collected data so that automated analysis routines can be applied across data from all samples. This means data files being analysed by the same routine need to have the same structure (so that all files are analysed in the same way), and file names need to be consistent across trials. A record of trials and exceptions also needs to be kept.

I am analysing some data from people who have had a stroke, and I needed to structure the analysis to reference each person’s identification number and their stroke-affected side. This can be done using Python lists and dictionaries. Since the dictionaries need to be built in a way that allows information for each person to be retrieved in the same order, I used ordered dictionaries from the library collections.

# Define a function to create an ordered dictionary of keys (subject index) 
# and values (affected side and subject ID)
from collections import OrderedDict

def gen():
    Generate OrderedDict of subject index, stroke-affected side and subject ID.

    OrderedDict of subject index (key) and list of affected side and
    subject ID (value).
    e.g., OrderedDict([('l', 'ID001')])
    id_key = OrderedDict()
    id_key[1] = ['l', 'ID001']
    id_key[2] = ['l', 'ID002']
    id_key[3] = ['r', 'ID003']

    return id_key

# Generate the ordered dictionary and assign it to the variable id
id = gen()

The neat thing about dictionaries is we can use a double iterator to loop over keys and values at the same time. Say we want to print out keys and values for the dictionary id, we can write the code like this:

for k, v in id.items():
    print(k, v)

In line 1 of the for loop, the variables k and v are assigned to the dictionary’s keys and values respectively. Writing id.items() means “for the ordered dictionary id, call the method .items() which returns a copy of the dictionary’s key-value pairs”. The following output is produced:

['l', 'ID001']
['l', 'ID002']
['r', 'ID003']

The same output can be obtained using lists. This approach converts the keys and values of our id dictionary into lists using the list() function. This approach is useful when we want to perform operations using the dictionary’s keys or values:

# create lists of keys and values from dictionary id
keys = list(id.keys())
values = list(id.values())

# loop over lists called keys and values simultaneously
for k, v in zip(keys, values):
    print(k, v)


We learned how to generate an ordered dictionary of keys and values, and to perform operations using the .items() method to loop over keys and values simultaneously. The .keys() method indexes dictionary keys while the .values() method indexes dictionary values, and dictionary keys or values can be converted into a list with the list() function. We also learned an equivalent way to get the same output using the zip() function for lists.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s