There are several children in every classroom who struggle with learning. They may have difficulties in specific areas, e.g. reading and maths. Or, they may have general difficulties, like a problem to keep things in mind or pay attention in class. If these difficulties are severe, the child may receive a diagnosis through community mental health services. The most common diagnoses include dyslexia and attention deficit hyperactivity disorder (ADHD).

The diagnosis informs the support that the child will receive. For instance, intensive educational instruction for dyslexia, or medication for ADHD. However, the diagnosis is not specific to the difficulties of individual children in many cases. This makes it difficult to find the best treatment options for individual children. For example, a child with ADHD may have problems with reading besides the core problem of inattention or hyperactivity. The reading problems may not be addressed through ADHD interventions.

There are even differences within seemingly homogeneous diagnostic groups. For instance, one child with ADHD may struggle with sitting still, while another child will daydream. Both behaviours may need a different intervention. The current diagnostic system is based on the consensus of expert panels. As an alternative, I’m trying to identify a data-driven grouping of behavioural problems.

For this, I’m making use of two recent advances in research: big data and machine learning. Big data means that there are now datasets of behavioural problems from hundreds to thousands of children. To make sense of the data, I use machine learning.

Machine learning comprises statistical procedures that can pick up patterns in complex data. For example, take the problem of grouping apples and oranges. Machine learning would ignore features that are common to both apples and oranges, e.g. roundness or size. Instead, it would create groups based on features that are very distinct, like colour or texture. Importantly, this does not depend on a programmer deciding which groups to create or which features to use. Instead, this will emerge from the data.

In a recent study, my colleagues and I used this approach to identify subtypes of struggling learners. The results indicated three clusters with distinct behavioural problems. One group of children had difficulties with aggression and peer relationships. Another group had problems with learning, such as reading, writing, or math. Yet another group had problems related to hyperactivity and inattention.

“We can use big data and machine learning to identify subgroups of behavioural problems.”

These groups were similar to the traditional diagnostic categories of conduct disorder, specific learning difficulties, and ADHD. But, children with the same diagnosis were not always grouped in the same cluster. For instance, while there were more children with an ADHD diagnosis in the hyperactivity/inattention cluster, a large proportion was assigned to the learning problems or conduct problems cluster.

The results of this study show that we can use big data and machine learning to identify subgroups of behavioural problems. These groups are not overlapping and are very consistent across children. Using this consistent grouping, we no longer have to compare apples and oranges. Instead, research can focus on just apples or just oranges. This will make it easier to investigate what causes specific behavioural problems in children. In addition, the consistent and specific grouping makes it easier devise interventions that are targeted to the problems of individual children.


The purpose of the biannual IMBES Conference is to facilitate cross-cultural collaborations in biology, education and the cognitive and developmental sciences. Our objectives are to improve the state of knowledge in and dialogue between education, biology, and the developmental and cognitive sciences; create and develop resources for scientists, practitioners, public policy makers, and the public; and create and identify useful information, research directions, and promising educational practices. The 2018 conference took place in Los Angeles, California.

The author of this blog post, Joe Bathelt, was among the presenters at the conference.

Keep up to date with the BOLD newsletter