Archived: Keeping it real: the potential of big data in simulation modelling

9 May 2016

The tremendous volume of data being collected from social media, environmental sensors, lab tests, smartphones and browsing patterns is providing vital insights into real-life human behaviour and transforming the modelling of public health interventions, a seminar co-hosted by the Prevention Centre has heard.

Professor Nate Osgood, an internationally respected health data science expert from the University of Saskatchewan in Canada, said dynamic simulation modelling could make sense of the cacophony of data and assess the implications of big data for policy.

Professor Nate Osgood
Professor Nate Osgood

Dynamic simulation models are computer representations of the real world that enable researchers to test policy interventions. Incorporating information derived from big data enables researchers to take into account how real human behaviour plays into the complex interactions that influence health, he said.

“We are facing policy challenges which involve complex systems and often cannot be simply reduced to individual components – in the real world there are multiple levels of context,” he told the seminar, co-hosted by the Prevention Centre and the Sax Institute.

Science of the whole

“What’s needed is a science of the whole. Simulation modelling enables us to understand all the pieces, their interactions and how behaviour of the whole comes out of these connected parts.”

A key program of work at the Prevention Centre, simulation modelling provides a ‘what-if’ tool to test the outcomes of different policy scenarios.

Used in engineering, defence and business for decades, simulation models map complex problems by bringing together a variety of evidence sources, such as research, datasets, expert knowledge, practice experience and administrative data.

Adding big data to the mix was key to understanding human behaviour, Professor Osgood said. It offered large volume, high velocity, highly varied and more accurate information than was currently possible through the literature or self-reporting.

For example, self-reported physical activity was often vastly different to information gained when people’s movements were tracked through GPS or fitbit data, while self-reported dietary intake could vary greatly from what was indicated via physical measurements such as photos or glucose monitoring via a smartphone.

Clues to improvement

“Big data can fill in the gaps in our simulation models,” he said. “Putting big data and models together allows us to much more reliably learn from our interventions, what worked and what didn’t work, and gives us a clue as to how we might do it better next time.”

He quoted a project in Canada, Ethica iEpi, which used big data to fill gaps in a simulation model developed to combat foodborne diseases. Data collected from a smartphone app was used to understand when and where people ate from food vendors, those who developed stomach upsets that both did and did not require medical attention, and how long they stayed sick.

The project found that gleaning this information from just 4% of smartphone users as sentinels could reduce the amount of food-borne illness by 60%, by enabling contaminated restaurants to be identified more quickly.

“With more nimble adversarial corporate actors who are running circles around health authorities by tapping into social networking and the latest technologies, this is urgently needed,” he said.
“It is critical to understand how the system behaves to be able to develop reliable interventions and policy regimes that will yield long-term gains – fixes that stay fixed.”

Helen Signy, Senior Communications Officer

WATCH: A video of the big data and dynamic simulation modelling seminar

READ MORE: Model behaviour: A systems approach to alcohol-related harm

Where do the data come from?

Sources of big data that are revolutionising dynamic simulation modelling:

  • Twitter (feeds)
  • Facebook (status updates)
  • Environmental sensors (weather, municipal, building)
  • Lab test results
  • Point of sale records
  • Administrative data
  • Questionnaire responses (mobile, web)
  • Voice audio
  • Incoming/outgoing calls
  • Communication infrastructure proximity data
  • Health information browsing behaviour
  • Sequence data
  • Consumer electronic devices sensors (physical activity, proximity, location, etc)