Pandas: Tips to clean and describe data

In a previous post we used binary data to demonstrate sampling error and calculate 95% confidence intervals (CI). Now, suppose that data can take many values; for example, normal body temperature has many values and varies continuously over a physiological range. How can we measure this variability in body temperature? For continuous data, variability can be quantified as the standard
Read moreRaw data are not always pretty. Data can have different patterns, be noisy, and vary from trial to trial. We usually collect data to measure, or more accurately, estimate an effect or phenomenon. At times, it is necessary to fit a mathematical expression to raw data in order to estimate the underlying effect or phenomenon. We can then use this
Read moreIn the previous post, when repeated measures data from 10 subjects in 2 conditions were compared, it seemed that subjects who took drug 1 slept fewer hours compared to when they took drug 2. How might we test whether the median number of hours of sleep after drug 1 is less than after drug 2? We can calculate the difference
Read moreIn the previous post we plotted repeated measures data from 10 subjects under 2 conditions. There are different ways to analyse small datasets. We could apply parametric methods to analyse the data values, such as describing the data with means and standard deviations, and calculating a paired difference. Or, we could also apply non-parametric methods by analysing data values based
Read moreExperimental studies are often based on relatively small samples. It is always better to test more subjects where possible, but even if the final dataset is comparatively small we should still strive to analyse the data properly. Let’s look at one way to analyse a small dataset. We will analyse data from the sleep dataset available in R. The sleep
Read more