Badly behaved data, part 3: Log-transforming to compare two means

When we try to interpret findings from a study, we often like to understand whether an effect (of a treatment or test condition) might be different in subjects with different characteristics. If there was substantial variability among subjects, this may have masked a treatment effect in a select few. How can we understand effects in select groups of subjects that
Read moreIn the first and second posts of this series, we performed simple linear regression of a continuous outcome on a single continuous predictor, but we also learned it is possible to include binary or categorical predictors in such regression models. How is this be done? The hsb2.csv dataset we have been using also contains the variable female where male participants
Read moreIn the previous post, we performed simple linear regression of science scores on reading scores from 200 students using ordinary least squares (OLS) estimation. This was done using Python’s Statsmodels package. What does the OLS output show and how should it be interpreted? Here is the figure of the individual subject data and the line of best fit, as well
Read moreIn a previous blog, we applied simple linear regression to an interesting problem: how well does a measure of wine density account for alcohol content. This was considered simple linear regression because we had one outcome variable (alcohol content) and one predictor variable (wine density). We can extend this approach to have more than one predictor. Specifically, we can use
Read moreWe introduced simple linear regression in a previous series and learned how to perform it in R (1, 2). What is the theory behind simple linear regression? How is it used to understand relationships between variables? What is another way to perform it in Python? The hsb2.csv dataset (available here) contains demographic and academic test scores data from 200 students.
Read more