Child development research needs open science

fujikama, Pixabay.com, CC0 1.0
fujikama, Pixabay.com, CC0 1.0

Where does language come from? What are the origins of morality? What are the roots of logical reasoning? Research on babies helps us address some of the most fascinating questions in the behavioral sciences. By adopting open science practices that encourage transparency and collaboration, developmental researchers can ensure that our answers are replicable, robust, and generalizable.

Babies are adorable creatures and remarkable learners, but they are also terrible research subjects. They are often sick and they can’t come to your lab; or they arrive, but hours late, after an impromptu nap in the car. Once present, they are hungry, shy, tired, or inexplicably inconsolable. Doing a study with only a dozen babies can take months even in a big, well-funded laboratory, and our scientific literature is built on just this type of small studies.

Making broad generalizations about human development from a handful of children is statistically perilous. Not only can results be due to chance fluctuations, conclusions also may not generalize to different samples. So baby scientists are increasingly realizing that, to make progress on the science of early childhood, they have to team up to share data and collaborate with one another.

These steps are part of a broader movement by researchers towards what’s known as “Open Science.” At the core of Open Science is the idea that science is built on transparent foundations. Open scientists share data, analysis code, and materials with other scientists, so that the field can repeat and build on their work.

“Baby scientists are increasingly realizing that, to make progress on the science of early childhood, they have to team up to share data and collaborate with one another.”

Developmental researchers who embrace openness are finding that it can help them overcome some of the fundamental challenges of studying young children. In fact, some developmentalists have been leaders in this area since before the term “Open Science” existed. The Child Language Data Exchange System (CHILDES), the premier resource for learning about child language, grew out of a group of researchers informally sharing transcripts of kids’ talk with any researcher who needed them. It’s time to bring the norms of open science to the rest of the developmental community.

Materials sharing. From a puppet show about a novel word to a movie of a surprising action, many experiments with children are notable because they create beautiful materials that engage their young participants. For researchers to replicate and extend this work, they need access to these materials. Open sharing of materials enables replication, and it also can pay unexpected dividends, as experimental stimuli get repurposed for new studies and creative applications.

Data sharing. Sharing data allows researchers’ original analyses to be verified – a key cornerstone of the scientific process. But sharing also allows creative data reuse, including in computational models and statistical meta-analyses that aggregate insights across studies. Anonymized, tabular data can easily be shared; now resources like Databrary allow sharing of rich video data as well. Further, when standardized tasks are used by many researchers, we can do even more. For example, Wordbank, a website that I run, archives data about children’s vocabulary, allowing researchers to explore language development across dozens of languages. Sharing data raises privacy concerns for developmentalists, but in many cases, data can be safely anonymized. With appropriate safeguards in place, data sharing can accelerate the research process.

Preregistration. When analyzing data from children, it is always important to explore the data as fully as possible so as to maximize the value from these costly datasets. But such exploration runs the risk of chance discoveries – which need confirmation in new experiments – being presented as though they were pre-planned hypotheses. The practice of preregistration, in which researchers document and timestamp their hypotheses and design decisions prior to running a study, provides an important remedy to this issue. Preregistration creates transparency around which discoveries are truly confirmatory tests of a hypothesis and which are simply observations due to data exploration.

Collaboration. Developmentalists are increasingly embracing collaborative work, where multiple labs converge on an experimental protocol and collect data across different sites. This model, which is widespread in medical research, allows teams to gather much larger and more representative samples, so they can answer more detailed questions with higher precision. Further, the inclusion of samples from different linguistic, cultural, and national backgrounds broadens the generalizability of developmental findings.

Replication. Replication – the verification of findings in an independent laboratory – is one of the cornerstones of science. But replication has been relatively rare in developmental research, in part because all work with children is costly. Replications have also historically been hard to publish, perhaps due to the perception that replication was somehow “uninteresting.” On the contrary: independent replication is a key activity in building strong theories. Developmental journals and funding agencies must actively acknowledge the importance of replication in building a cumulative science of early childhood.

“Developmental journals and funding agencies must actively acknowledge the importance of replication in building a cumulative science of early childhood.”

To speed the adoption of these principles and to address the challenges facing the developmental field, my collaborators and I have started a group we call the ManyBabies Consortium. We lead large, systematic replication studies with infants and young children. But we don’t just redo previous small-scale studies; instead, we try to create best-practices templates that build on previous work. Further, by carrying out these studies across many labs around the US and around the world, we can use our data to measure variability across populations of children, leading to more robust and more representative findings.

Our first ManyBabies study, on infants’ preference for infant-directed speech (“babytalk”) is just wrapping up, and – although we’re still analyzing the data – we believe this is the largest experimental study of infants ever conducted, with 68 different laboratories contributing data from more than 2000 infants around the world. With a host of future studies in the pipeline, we are excited to explore the potential for this new collaborative model of developmental research at scale.

In sum, developmental research addresses many important questions. But this research builds on a weak basis if our only evidence is small studies run in single labs. By embracing open science, researchers can pool their resources – sharing data, sharing materials, replicating one another’s work, and collaborating – to create a robust foundation for understanding child development.

Weekly newsletter