Journal Club: Meta-analysis oversells popular autism screen

A meta-analysis published in February in JAMA Pediatrics suggests that the Modified Checklist for Autism in Toddlers (M-CHAT) can accurately identify children with autism and those without the condition. The analysis pooled the results of 50 M-CHAT studies and found an overall sensitivity and specificity of 83 and 94 percent respectively, meaning the tool identifies 83 percent of autistic children, and it correctly categorizes 94 percent of non-autistic children.

But the results run counter to prior assessments that indicate the M-CHAT has poor sensitivity and specificity — and so fails to identify many autistic children and misidentifies many non-autistic children as autistic. Spectrum asked two experts for their thoughts: Roald Øien, professor of special education and developmental psychology at UiT – the Arctic University of Norway in Tromsø and the Yale Child Study Center; and Kristin Sainani, teaching professor of epidemiology and population health at Stanford University in California.

Roald Øien:

There’s a lot here to grasp, but my main takeaway is that the persistent effort to prop up the M-CHAT and other screening instruments hinders progress toward early detection. Instead of continuing to examine this screener, we should look at new ways to find more children. We might even need to rethink screening and checklist usage overall.

One of the difficulties in studying the psychometrics of the M-CHAT and other screening tests is that the results are often restricted to the time period in which the studies are conducted. Children who screen positive receive a follow-up or diagnostic assessment that is reported in the study. But most of those who screen negative are regarded as not having autism and are thrown out of the analysis, with no follow-up or diagnostic assessment.

This practice potentially risks missing children who have autism even though they screened negative. Indeed, the new JAMA Pediatrics study does not provide any information on the diagnostic outcomes of the children who screened negative on the M-CHAT.

If we test a screening instrument in a prospective study, we can see who later received a diagnosis and who did not, which might show that the screening instrument has a lot of false negatives in the long term. However, that approach isn’t perfect either because we can’t say what the diagnosis would be at the time of screening. All it really determines is that some children are not detected.

My comments come from reading many of the studies included in this meta-analysis with an eye for details. For instance, the M-CHAT tends to be best at picking up children with the most prominent autism traits, and also those who have intellectual disability without autism. So it might detect just the cases of autism that co-occur with intellectual disability. This doesn’t mean the screener has no value, but it’s not a solution to earlier detection.

The M-CHAT’s creators have done a whole lot for this field, and the M-CHAT has value. However, studies show it has great limitations in terms of picking up the majority of autistic children. So my suggestion? I hope that all of us who have been working in this field would put our heads together for the future of early autism identification and come up with something that is completely novel.

Kristin Sainani:

Previous criticisms of the M-CHAT have reported that if you apply the test at about 18 months, the sensitivity is below 40 percent, meaning the test misses most children with autism at that age. By contrast, the JAMA Pediatrics paper reports a pooled sensitivity of approximately 80 percent. These results aren’t necessarily conflicting, however, as the new paper includes data from older children. Children grow and learn at different rates, and it may be hard to differentiate usual behavioral variation from autism at 18 months. This doesn’t mean that the M-CHAT is a bad screening tool — it just means that if you apply it too early, and if you use it only once, you’re going to miss some cases of autism.

This paper’s more favorable results may also stem from flaws in the underlying studies. When the researchers behind the meta-analysis looked at only the studies that tracked children for more than six months after they tested negative, sensitivity dropped to below 60 percent. This means that other studies may have overestimated sensitivity by not accounting for children who were diagnosed with autism later. Other biases in the underlying studies could also have led to overly optimistic results.

We should interpret the pooled sensitivity and specificity cautiously because they are artificial. In fact, there isn’t a single “overall” sensitivity or specificity. Rather, the accuracy of the M-CHAT varies depending on who is being screened. For example, the researchers found that the test yields more false positives (i.e., lower specificity) when given to children who already have an increased likelihood of having autism, such as those flagged by their pediatricians. These children may have other developmental conditions that are difficult to distinguish from autism. This finding highlights a key takeaway of the paper: The M-CHAT’s accuracy depends on the characteristics of the screening population.

Unfortunately, the researchers couldn’t quantify how much screening age impacts test accuracy, because of a lack of detailed data in the underlying studies. So it is unclear how much the difference between this paper’s results and those of more critical studies is explained by screening age. Despite these limitations, the paper suggests that the M-CHAT remains a useful screening tool for autism. The paper also provides an excellent summary of the body of work on this topic.

Corrections

This article has been updated to correct the definition of specificity. It is the proportion of true negatives in a sample, not the proportion of true positives as originally stated.