Patterns of Missing Education Data

Important education data have missing values, which appear not to be missing at random. There are two key areas in particular; recorded parental demographics (language, employment status, education level) and National Assessment Program – Literacy and Numeracy (NAPLAN) results.

This project will investigate the patterns of missing education data and the implications for analyses based on these data. In looking at the demographic data we need to understand the patterns. Some places in remote have up to 50% missing variables. These data have community-level analogues available. This project will investigate the relative effectiveness of data imputation and modelling using the community level alternatives.

The missing NAPLAN test data appear to have a similar pattern, with remote and/or Indigenous students missing at higher rates. These students also have lower attendance rates overall, which may account for the differences. The project will build a logistic regression model to predict NAPLAN test completion based on attendance, ethnicity, remoteness, language background and other demographic factors. It will also investigate students who completed some but not all NAPLAN tests in a given year.

Contact: John McKenzie, Menzies School of Health Research.


Phone: (08) 8946 8433