The Unreasonable Effectiveness of Deep Learning in Language Learning

Special Focus : Well-Being for Better Learning Outcomes
Learning and Behavioural Sciences March 12, 2019

As language learning makes a sea change to online, users will come to expect personalized learning experiences. China’s AI-powered online education market alone reached $568 million in 2017 and is expected to surpass $26 billion in 2022. At Sana Labs, we build AI technologies to power these learning experiences through easy to integrate APIs. This means that machine learning models for personalization as well as pronunciation, grammar, and overall fluency feedback can be production ready in days, not months. In this article, I’ll highlight why deep learning will power this shift.

Pronunciation Feedback

Recent breakthroughs in speech recognition are making their way to language learning. Traditionally, machine learning models for automated speech scoring have relied on handcrafted rules and heuristics. This is not ideal due to difficulties in finding a model that can address real world noise issues as well as varying levels of non-native speech.

Recently, end-to-end solutions that apply neural networks to encode acoustical cues to learn predictive features automatically, have achieved state-of-the-art performance on par with human level graders. This is equally a function of larger datasets, computational capacity (both training and inference) and algorithmic breakthroughs. What this means in practice is that scalable feedback on pronunciation, fluency, and intonation is now able to deal with learner levels, accents, and background noise.

Grammar Correction

Detecting and correcting grammar errors is a highly complex task that involves addressing over 75 different error types learners can make. While approaches with distinct classifiers for different error types have achieved strong results, they do not handle the wide variety of orthographic and grammatical errors in a flexible manner. In order to capture the nuances of language, richer models are appropriate.

The past few years have seen neural networks achieve state-of-the-art performance on nearly all language processing tasks. The power of neural networks lies in their ability to automatically discover features that are useful for the task, enabling us to capture the complexities of language learning and provide significantly more accurate feedback. In practice this could off-load teacher grading while maintaining the same accuracy and provide a real-time writing assistant to every student.

Personalized Practice

In order to personalize practice one has to model both aspects of the user (such as the proficiency and memory) as well as the content (such as question relationships and difficulty). Traditional approaches have relied on hard-coded rules and simple heuristics based on psychological theories of learning. However, these models often fall short of capturing the complexity of content and human knowledge acquisition.

Combining neural networks with immense datasets of language learning, we are now able to capture a myriad of aspects such as how users forget a particular word and how mistakes relate to particular knowledge gaps as well as the optimal difficulty and order of exercises. Hence we can generate personalized study plan recommendations that drive engagement and learning outcomes. Case in point this approach won all categories in Duolingo’s SLAM competition.

Scalable Assessment

Users of language learning products enter at fundamentally different levels of proficiency and with different backgrounds. Thus placement tests and assessments are core components of language learning platforms. In order to create effective tests the assessment value of each content item and how it relates to other content items must be derived. Effective assessment systems target this problem in a data-driven way and provide immediate feedback.

Applying breakthroughs highlighted across speech recognition, grammar error detection, and knowledge tracing, we can more effectively assess and give feedback on a student’s knowledge. As a result, we can predict, with approximately 90% accuracy, whether users can answer certain questions correctly. This allows us to provide more effective placement tests as well as adaptive, personalized, real-time language proficiency assessment.


In the Yunnan province in China alone, 70% of schools lack English teachers and in the U.S. two-thirds of them report feeling underprepared to adequately help students succeed. With the number of students learning English predicted to grow from 1.5 billion in 2018 to 1.9 billion in 2020, I’m hopeful that deep learning can make language learning more scalable and accessible to all.