Bigger data = better science?
Some lament that modern-day research – systematic study for the purpose of establishing facts – has become a number-obsessed Cyclops. Admittedly, research is now first and foremost characterized by an insatiable hunger for data, empirical evidence and statistical estimates. Accordingly, researchers spend vast amounts of time and money in a furious race to develop new technologies that help collect oodles of data. Perhaps with good reason, many fear that a focus on numbers comes at a high price: the neglect of theory.
Theory and data are not mutually exclusive, nor have they ever been. Both are necessary for research that produces valuable insights: Theory drives the questions that data then answer – while begetting more questions. Interesting questions are quickly found, and each of us, scientist or not, knows an infinite number of big unknowns that are important to solve (for a finite example see the most important questions in social science). But high-quality data are hard to come by, especially when we ask ‘soft science’ questions about the mind, about behaviour and society, about human individuality.
Measurement is the marrow of all science. Not surprisingly, then, good science demands data derived from accurate measurement – but ‘big data’ is more than that. The term ‘big data’ refers on the one hand to the quantity of data, and on the other to their complexity – which, in turn, requires increasingly advanced analytic methods.
Take, for example, an old, deceivingly simple question: Does breastfeeding improve children’s IQ?
Here’s the theory: External factors, like dietary provision and intake, affect our physiological and psychological functioning and development. Specifically, breast milk is thought to enhance neurodevelopment, because it contains long-chain polyunsaturated fatty acids, unlike animal milk or formula. The corresponding hypothesis, then, is that breastfed babies will on average achieve higher IQ scores than children who were raised on formula. To date, however, the findings are inconclusive, because studies on breastfeeding yield only ‘small data’.
“Judgements about whether breastfeeding benefits your child’s IQ currently have roughly the same predictive accuracy as British opinion polls had for the Brexit vote.”
The first problem is that women are understandably preoccupied with a thousand and one different things when they have a baby. If you mail them a 20-page survey a couple of years after their child’s birth, as most studies do, asking how long they breastfed their babies, they simply don’t remember. The second problem? Breast milk isn’t the same as breast milk: Mums eat and drink differently, and they have different circadian rhythms, different blood types and different metabolisms.
These are differences that occur between mothers. But it gets even more complicated: The same mum will eat and sleep differently on different days, and if we believe our own theory that external factors influence our functioning, we need to consider the possibility that these differences will affect mothers’ breast milk. And so far we have only talked about the mothers – what about the babies and their differences?
“It is important not to scold scientists as anti-theoretical data mongers when they focus their efforts on producing big data, which are the key to better science.”
No study to date was equipped to take breast milk samples from a representative group of mothers and monitor their babies’ feeding in real time over the first months of life. Notwithstanding the fact that the available small data are inadequate to address our question, researchers (including me) have readily analysed those data. Not surprisingly, the results are inconclusive: Judgements about whether breastfeeding benefits your child’s IQ currently have roughly the same predictive accuracy as British opinion polls had for the Brexit vote.
Scientific questions about human individuality never have simple answers, except false ones. It is important not to scold scientists as anti-theoretical data mongers when they focus their efforts on producing big data, which are the key to better science. And until we have, and understand, big data, we must be careful not to bully young mothers with ‘breast is best’ slogans into feeding practices that are hyped for ideological virtues rather than based on empirical evidence.