Associate Professor Sama Low Choy
It is a relatively straightforward (though sometimes time-consuming) process to design, validate and then implement questions in surveys that use a Likert scale. However, the analysis of the answers to these kinds of questions can be complex. For this reason, it is not surprising that many researchers still rely on simple statistical tests (often nearly a century old), such as the Chi-squared tests of homogeneity or of independence, or Kruskal-Wallis' non-parametric version of one-way Analysis of Variance (ANOVA) for ranked data, etc. However, such bivariate methods only consider two variables at a time, and are also limited in other ways. A major issue is that with very large datasets, such tests are almost guaranteed to find significant effects. A bigger issue is that, in a modern research landscape, researchers are now routinely expected to run more complex analyses that consider the role of several variables simultaneously. Indeed, there is an extensive literature on the need to adjust models between a few key variables for important moderators, such as demographics and pre-existing relevant conditions. Otherwise Simpson's paradox may occur, where "lurking" or hidden variables can substantially bias results. This workshop introduces several more modern methods for analysis of survey data.
We take a chronological approach, starting with the analyses that you use for introducing yourself (and others) to your data: summary statistics, probing the design and sampling underlying the data collection process, and histograms to represent the range of responses. We revisit how demographics of a surveyed population are reported, ensuring they are contextualised. E.g. If your sample is 70% women, is that unusual or expected? What is this percentage in some appropriate reference population?
We then progress to consider:
• Decision trees (classification and regression trees) for exploring the make-up of the sample, including under- or over-representation or other unexpected structure in observational data;
• Cross-tabulation of a single variable, a pair or a triplet of variables, including row and column summary statistics;
• Simpson's paradox, which occurs when a hidden (aka lurking) variable is affecting the relationship between two other variables; and
• the shortcomings of bivariate statistical techniques that seek only to test whether the data are consistent with a hypothesis that the two variables have absolutely no (linear) relationship.
Regression to better describe how two variables are related:
• Clarify assumptions required to fit linear regression to Likert scaled data.
• Demonstrate loglinear regression when those assumptions are not valid.
• If time permits, we may show the link between regression and structural equation modelling.
Software: The focus for most of the workshop will be on the concepts and interpretation of outputs, provided by many packages. In the last hour, we will support participants to interpret output obtained using a software package of their choice, but provide no support for running a package (except R).
Prerequisite: This workshop will make sense to people who have already imported their survey data into a stats package or at a minimum, Excel.
Please bring a paper, that is relevant to your research, which conducts a statistical analysis of survey data.
We will contact you two weeks before to upload citation details and name of the statistical method the paper uses.
We hope sharing such information will help students prepare, in this online environment.
Helpful for: Design and/or quantitative validation of (closed) survey questions.
Format: Small groups with computing
Companion workshops: You may find the following training helpful for getting data into a stats package: R Induction and Graphical Excellence in R (RED), Research Bazaar Introduction to R (eResearch, 4 online tutorials); tutorials in SPSS and LimeSurvey (Library).