The accuracy and reproducibility of research studies are a major concern of the scientific community. Researchers at Baylor College of Medicine have examined this problem in the field of multisensory integration to understand how it affects both basic research and the development of therapies. They determined that sample size (the number of individuals examined for a study) is the most important factor determining the accuracy of the study results. They report in the journal PLOS ONE that studies with sample sizes of 20 individuals overestimate true effects and that sample sizes of 100 or more individuals generally are necessary to reliably measure population differences or experimental effects.
“It started when we tried to reproduce our own work,” said first author Dr. John Magnotti, assistant professor of neurosurgery at Baylor. “We had conducted multisensory integration experiments about the McGurk effect using common sample sizes but when we tried to replicate our results, we were unable to do so. We were also unable to replicate other groups’ results.”
Multisensory integration studies allow scientists to better understand how integrating information captured by different senses helps us perceive the world around us. For example, in speech perception, integrating auditory information from the talker’s voice and visual information from the talker’s face enhances the accuracy of speech recognition.
But there are instances when the integration of auditory and visual information does not work as expected. The result is the perception of a sound that is different from what actually is being said. For example, the visual “ga” combined with the auditory “ba” results in the perception of “da.” (See link for examples.) This is called the McGurk effect.
Multisensory integration researchers have been studying whether the susceptibility to the McGurk effect varies between genders, typical and atypical human development and different cultural or linguistic backgrounds.
“Recent studies on the McGurk effect have shown large variability in their results; some studies report completely opposite results,” said senior author Dr. Michael Beauchamp, professor of neurosurgery and neuroscience, vice chair of neurosurgery basic research and director of the Core for Advanced MRI at Baylor. “There have been failures to replicate reported differences in the McGurk effect between different cultures, different genders and children with or without developmental disorders. So we started to suspect that maybe this was a bigger issue than just a couple of isolated studies.”
“We tried to understand how the way we do our experiments could be causing these highly variable and conflicting results in the literature,” Magnotti said. “We took a large data set that we had collected for the McGurk effect and asked, what would the consequences be if we modeled group differences using a variety of sample sizes?
The researchers determined group differences in the McGurk effect in groups of 150 subjects, and also in groups of 20 or 40, and so on. Magnotti and Beauchamp discovered that when they used small sample sizes, the magnitude of the differences between groups can be greatly exaggerated.
“We showed that a true difference of 10 percent between populations, as determined with the largest group, would be vastly overestimated using smaller samples,” said Beauchamp. “In one case, the differences estimated for the McGurk effect between groups of 25 subjects were three times larger than those determined with the largest group. Also, the inflation of the results is most extreme when true population differences are small.”
“We were surprised that just the normal way people conduct their studies in multisensory integration could be problematic,” Magnotti said. “We concluded that reducing inflated results and increasing replicability requires increasing the number of participants compared with current practice. For our studies, a sample of 100 to 200 subjects is a good sample size to detect a difference for McGurk effect.”
These findings have relevant medical implications. If a study conducted with small groups of subjects – for instance comparing subjects with an autism spectrum disorder to individuals without the disorder – shows a large difference between the groups, then it would seem worth attempting to develop therapies to correct the difference. But, if the difference looks large only because of the way the study was designed using small samples, but in reality the difference is not significant, then trying to develop treatments to correct the difference would likely not benefit the patients.
“In this study, we contribute specific suggestions to improve accuracy and reproducibility in multisensory integration studies,” Beauchamp said. “We provide general guidelines about using good stimuli, releasing your data and collecting adequate sample sizes. We think that our findings would help researchers to do better studies. Our study supports 15 years of general scientific literature and research specific to multisensory integration that warns about inaccurate results produced by sample sizes that are too small.”