Speech Recognition in Education: The Powers and Perils

Wed, 08/19/2020 - 12:00 PM

Category:

Author:

Yaacov Petscher and Nicole Patton Terry

Dr. Yaacov Petscher and Dr. Nicole Patton Terry

As education researchers focused on children’s language and reading, concepts like “equity” and “disparity” are not new to us. In fact, we obsess about them.

We analyze data to understand and monitor. We design interventions to change them. We labor (sometimes painfully) to better explain the forces that contribute to injustice in schools so that educators, parents, and policymakers can use it to improve outcomes for all students.

Like a growing number of education researchers, we have embraced the potential of emergent technologies like voice tech, virtual assistants and artificial intelligence in the classroom to unearth new insights about a child’s language and reading development -- and translate the science of learning into practices that can affect outcomes at scale.

So, it was with a uniquely piqued eye that we read a recent New York Times article about the risk of racial prejudice in speech recognition technologies. The authors reported as much as a 16% difference in the number of misidentified words from speech recognition systems when used by Black and white speakers. Authors of another recent study concluded that racial disparities in speech recognition were due, in part, to insufficient audio data from Black speakers when training the models.

We agree.

In fact, this assertion is not surprising. Researchers in a range of fields lament the inadequate representation of wide-ranging participant samples and the consequences for the interpretation of our findings.

As AI-enabled technologies like speech recognition systems become more and more prevalent in everyday life, proclivities like these are not just annoying -- they’re dangerous. In the classroom, those dynamics present both profound risk -- and opportunity.

Disparate educational outcomes between race- and linguistic-minority groups and majority white peers have been well documented since the 1960s. But attempts to mitigate these effects by creating curricula or interventions to “fix” Black children’s achievement is no different than asking more Black people to sign up for speech recognition studies to “fix” the AI.

The reality is, Black children are not broken and neither is the AI. The AI algorithms operate exactly as they are programmed. Perhaps a problem lies in aspects of the design itself.

Just as lack of variety in the datasets used to “train” technology can lead to intolerance, it is often evident in speech recognition performed by humans who have been exposed to a limited dataset of voices (or who have been trained to believe that there is only one correct way to speak).

We know from research on dialect variation and code shifting, that the idea that there is one right way to speak is a unfair perspective. But we also know that it's in people’s perceptions of linguistic differences that can have real consequences for the decisions we make about them.

For adults, it can mean the difference between getting and losing a job offer. For children, it can mean the difference between low or high expectations for academic performance in school. That’s the tricky thing about these preconceived notions -- its insidious nature seeps into all aspects of life, requiring intentional effort to name it, accept it, address it, and ultimately stamp it out.

Speech verification systems can actually improve objectivity in schools because they hold the potential to challenge unintentional preconceptions that creep into classroom practices. Speech verification is a form of speech recognition that evaluates what a child actually said against what the system expected them to say.

The digital breadcrumbs left by millions of students can help us spot new patterns among young learners, long before traditional tools like tests can. Making good on the promise of such technologies in the classroom requires that we not only mitigate the potential for partiality but also proactively design for variety.

For example, for these AI-enabled technologies, the challenge cannot be solved by simply increasing the number of Black speakers included within the speech samples used in speech recognition systems; it requires the development of training models that reflect linguistic patterns common among varied speakers, like dialects such as African American English.

Ample research evidence for over 40 years has documented AAE. Like other dialects of American English, African American Vernacular English is systematic and rule-governed. It is not bad or poor English, and it is no more inferior than Bostonians who “pahk the cah.”

When it comes to speech verification systems, that kind of linguistic variation must be accounted for in actual speech samples and in the models that train the systems. It also requires specialized research to calibrate the probabilistic scores to include linguistic variation in order to reduce susceptibility to prejudgment in scoring.

As educational researchers and consumers of technology, we embrace the opportunity to leverage speech recognition technology in our work to hopefully inform the next generation of scientific inquiry. But we also recognize the importance of not running faster in technology than the science permits.

Over the last year, we embarked on an ambitious, multiyear research study to understand whether voice tech can be used to address challenges innate to human scoring. We are engaging researchers and entrepreneurs who have already labored in the challenge of making voice tech work with the complexity and variability of children’s voices. If we are successful, we may unlock a new frontier of assessment -- and inform instruction.

The good news is early indications suggest that research on dialect variation, coupled with advances in machine-learning, may pave the way for technologies that address the flaws pointed out by our colleagues at Stanford University in the recent NYT article. When used appropriately, modern speech recognition tools that train systems on wide-ranging data sets can become powerful tools, for assessment in the classroom and for finding a new recipe on your kitchen counter.

Yaacov Petscher and Nicole Patton Terry are associate directors of the Florida Center for Reading Research at Florida State University. Dr. Petscher is an associate professor at the FSU College of Social Work and Dr. Patton Terry is a professor at the FSU College of Education.

Friday, March 7, 2025 - 12:15 PM

Original Article on SmartBrief

Last updated: Tue, 07/29/2025 - 11:31 AM