Analysis of Survey Data beyond Chi-squared

Analysis of Survey Data beyond Chi-squared
Analysis of Survey Data beyond Chi-squared

Principal speaker

Daniela Vasco

Other speakers

Basilio Goncalves, Tim Newans


It is a relatively straightforward (though sometimes time-consuming) process to design, validate and then implement questions in surveys that use a Likert scale. However, the analysis of the answers to these kinds of questions can be complex. For this reason, it is not surprising that many researchers still rely on simple statistical tests (often nearly a century old), such as the Chi-squared tests of homogeneity or of independence, or Kruskal-Wallis' non-parametric version of one-way Analysis of Variance (ANOVA) for ranked data, etc. However, such bivariate methods only consider two variables at a time, and are also limited in other ways. A major issue is that with very large datasets, such tests are almost guaranteed to find significant effects. A bigger issue is that, in a modern research landscape, researchers are now routinely expected to run more complex analyses that consider the role of several variables simultaneously. Indeed, there is an extensive literature on the need to adjust models between a few key variables for important moderators, such as demographics and pre-existing relevant conditions. Otherwise Simpson's paradox may occur, where "lurking" or hidden variables can substantially bias results. This workshop introduces several more modern methods for analysis of survey data.

We take a chronological approach, starting with the analyses that you use for introducing yourself (and others) to your data: summary statistics, probing the design and sampling underlying the data collection process, and histograms to represent the range of responses. We revisit how demographics of a surveyed population are reported, ensuring they are contextualised. E.g. If your sample is 70% women, is that unusual or expected? What is this percentage in some appropriate reference population?

We then progress to consider:
• Decision trees (classification and regression trees) for exploring the make-up of the sample, including under- or over-representation or other unexpected structure in observational data;
• Cross-tabulation of a single variable, a pair or a triplet of variables, including row and column summary statistics;
• Simpson's paradox, which occurs when a hidden (aka lurking) variable is affecting the relationship between two other variables; and
• the shortcomings of bivariate statistical techniques that seek only to test whether the data are consistent with a hypothesis that the two variables have absolutely no (linear) relationship.

Regression to better describe how two variables are related:
• Clarify assumptions required to fit linear regression to Likert scaled data.
• Demonstrate loglinear regression when those assumptions are not valid.
• If time permits, we may show the link between regression and structural equation modelling.

Software: The focus for most of the workshop will be on the concepts and interpretation of outputs, provided by many packages. In the last hour, we will support participants to interpret output obtained using a software package of their choice, but provide no support for running a package (except R).

Prerequisite: Have read survey data into a package, such as Excel or another statistical package. At Griffith there are several workshops available: Induction in R and Graphics in R (RED), Research Bazaar Introduction to R (eResearch); tutorials in SPSS and LimeSurvey (Library).
Helpful for: Design and/or quantitative validation of (closed) survey questions
Format: Small groups with computing
Companion workshops: You may find it useful to also do: Quantitative Validation of Surveys (RED), any of the statistical modelling workshops that use R (RED).

REGISTER NOW


Event categories
RSVP

RSVP on or before Friday 28 February 2020 , by email RED@griffith.edu.au , or by phone 0755529107 , or via http://events.griffith.edu.au/d/znq1ms/4W

Event contact details