*Actually, “assessment” is the correct term to use here, but using “testing” makes the title much more interesting.
Language testing evolved from simple multiple-choice paper tests and one-on-one in-person interviews to fully web-based, adaptive tests with the launch of STAMP, the world’s first web-based language proficiency test, in 2002. Today artificial intelligence (AI) is changing the world like the World Wide Web did in the 1990s. AI will enable testing to be invisible and embedded in online learning. No longer will language skills always need to be measured by a test—an artificial construct that samples a limited number of topics and levels of the test taker’s language at a single designated point in time.
AI will enable testing to be invisible and embedded in online learning.
For decades, language testing has been closely intertwined with technological innovation. From Scantron bubble-sheet scoring in the 1970s to adaptive online testing in the early 2000s, new tools have shaped how learners are evaluated. The COVID-19 pandemic further accelerated the shift to digital testing, making online delivery the norm rather than the exception. More recently, advances in AI—particularly large language models (LLMs), speech recognition, and speech synthesis—have created unprecedented opportunities for both instruction and assessment.
These developments demand a rethinking of language assessment itself. Traditional testing provides only a limited snapshot of a learner’s ability at a specific point in time. In contrast, data-rich learning environments can store, analyze, and track learners’ performance on multiple dimensions over extended periods, yielding a rich longitudinal portrait of development. This approach promises more authentic insights into ability, actionable guidance for highly targeted instruction, and greater instructional time that, taken together, generate increased learning productivity for teachers and schools.
We believe that the future of language assessment lies not in testing as an isolated event but in the merging of learning and assessment through the ongoing analysis of learner performance data embedded within instruction. While tests will likely remain necessary for certification, credentialing, and several other purposes, longitudinal assessment offers a more precise, equitable, and learner-centered way forward.
Historical Evolution of Language Testing and Technology
Language testing has long evolved in tandem with technological change. In the 1970s, optical mark recognition, popularly known through bubble sheets, enabled large-scale multiple-choice testing by automating scoring and statistical reporting. This shift laid the groundwork for mass testing at national and international levels. By the late 1990s, computational tools such as latent semantic analysis and natural language processing were being applied to automate the scoring of writing. Adaptive testing further advanced efficiency and often the accuracy of measurement, with the 2002 launch of the STAMP test representing an early move toward responsive online assessment.
The COVID-19 pandemic marked a decisive turning point: what had been a gradual shift toward online delivery became a necessity. Today, digital platforms dominate both formative and summative testing, and alternative assessment formats—such as online portfolios, multimedia projects, and recorded presentations—are increasingly common. Each technological wave has not only reshaped how tests are delivered but also how teachers and learners understand what it means to measure language ability.
The AI Revolution in Language Education
Recent advances in AI have accelerated the pace of change in unprecedented ways. LLMs, AI-powered image and video generation, speech recognition, and speech synthesis tools allow educators to generate customized instructional materials in real time, from proficiency-leveled texts and audio passages to culturally relevant images and videos. Teachers no longer need to adapt instruction to fit available resources; instead, resources can be designed to fit the learners’ needs.
The pace of development is so rapid that attempts to define the “current state of AI” risk obsolescence within months. This creates both opportunities and challenges. On the one hand, teachers and test developers can harness generative AI to design tasks that are more relevant and engaging. On the other hand, the speed of change makes it difficult for educational institutions to establish stable pedagogical frameworks or guidelines and makes it challenging for teachers to constantly adjust to new ways of doing things. Nevertheless, the emergence of AI-driven tools is creating a fundamental shift in how assessment is envisioned, delivered, and understood.
Rethinking Tests: Limitations of Traditional Approaches
Despite their ubiquity, tests are artificial events. They sample at a single point in time and often from a narrower range of topics and constructs than would be possible through direct observation in the real world. Today, test developers must ensure that these samples reliably estimate underlying ability, yet factors such as test length, fatigue, and test-taker anxiety can affect the outcome of a test. High-stakes tests, often lasting several hours, amplify these risks: a learner’s low score may be due to exhaustion or circumstances rather than their competence.


In low-stakes situations such as a language classroom, formative assessment that leverages the power of AI provides a practical solution to this challenge. Using shorter, more frequent assessments minimizes fatigue and generates multiple data points that paint a more accurate picture of the learner’s actual language ability. By using AI in creative ways, as Avant’s Mira Stride Formative Assessment has done, it is even possible to provide immediate, detailed, and personalized feedback to the learner and teacher on strengths, weaknesses, and focused actions that can be taken to improve the learner’s language skills.
While this is a significant advance in assessment, it is still just a stepping stone toward an even more powerful method of measuring a learner’s true language ability: a method that enables the integration of assessment within the act of learning itself.
The Merging of Learning with Assessment
The integration of LLMs into learning environments has greatly expanded language practice opportunities. For example, in Avant’s Mira Coach+ product, learners can interact with AI characters through speech or text while receiving corrective feedback based on the principles of second-language acquisition. These interac- tions are not only useful for language practice but also for gener- ating authentic data on actual language use that is captured over time. AI used in this and other online language-learning platforms is able to identify in very fine-grained and personalized ways the errors a learner makes, or even language use that is correct but not the most appropriate. It can then provide highly targeted constructive feedback for the learner to gently adjust or correct their language and then continue to practice so as to deepen the learning. The data generated from these interactions can be used to trace developmental trajectories, offering teachers and learners real-time insights into their progress. In this model, testing ceases to be a separate activity and instead becomes a natural byproduct of instruction.


These learning platforms will capture the language produced in writing and speaking tasks or interpreted in reading and listening exercises and store them in databases that create individualized, evolving learner profiles. These profiles can be analyzed longitudinally, providing a detailed picture of development in a wide range of language elements such as vocabulary, syntax, idea development, cohesion and coherence, and pragmatics (i.e., appropriate use
of language in a certain context).
Toward a Multidimensional Assessment Model
These same elements are what truly holistic language assessment can use to identify the language level of a learner. Properly structured, AI will be able to analyze these elements and identify a very specific and accurate proficiency level for the learner. It will be able to calculate correlations with various proficiency standards, such as the global CEFR and the US national proficiency standards, to provide scores based on them. Through this process of ongoing alignment with these standards, there is the potential for a new, more nuanced and fine-grained global standard to emerge. It is likely that the standard will be based on a multidimensional matrix containing axes for various language elements. They will range from relatively easy-to-measure elements, such as grammar use, to complex and nuanced elements that have multiple and complex definitions, such as pragmatics or cultural appropriateness. AI will define a learner’s level with a multicolored fine-point pen instead of the wide black Magic Marker that we are limited to with current testing.
The concept of multidimensionality is core to understanding what AI will be able to do in defining a learner’s language skills through the analysis of a learner’s profile. LLMs map the myriad ways people use words, phrases, and sentences to accomplish specific communicative goals in a variety of sociocultural contexts. This will enable a vastly more precise calculation of each learner’s language skills than is currently possible.
The assessment process that we have laid out above is only applicable for individuals who will be learning language in an online environment. There will always be a need to develop and deliver tests of language for specific purposes (LSP) or for individuals who are not engaged in online learning of language. However, even these tests will be able to use some of the same tools that are used to analyze and measure the language skills of online language learners.
Conclusion: Beyond Testing
The history of language testing demon- strates how tightly it has been bound to the technologies of its time. From bubble-sheet scoring to adaptive online tests, innovations have shaped how teachers measure learning and how learners experience evaluation. Yet the latest wave of AI-driven tools has opened a different path. For the first time, it will be possible to capture and analyze authentic learner performance across time, tasks, and domains, creating a continuous record of development rather than a single snapshot.
This shift does not make tests obsolete, but it does reposition them. Tests are likely to remain necessary for certification, credentialing, admissions, and several other contexts for the foreseeable future. Rich longitudinal data collected in fully online learning environments can offer more precise, valid, and learner-centered insights while reducing stress and freeing up teacher time for instruction. When gaps appear in these records, targeted tests—personalized, adaptive, and generated on demand—can provide complementary evidence.
For teachers, this new paradigm promises tools that integrate assessment with instruction, giving clearer, more personalized and actionable information about learner progress. For learners, it offers a less artificial, less stressful, and more empowering way to demonstrate ability. The future of language assessment, then, is not defined by the testing event but by the ongoing story of learning, captured and analyzed as it unfolds. Assessment becomes less about delivering a score at a single moment in time and more about supporting growth throughout the learning journey.
David Bong is co-founder and CEO of both Avant, a pioneer in online adaptive language proficiency tests, and Mira, a leader in Al-based language learning. Previously, he established the Tokyo office of Kroll Associates, the world’s leading investigative and security firm. Later, he founded Earl, developing patented technologies enabling the blind to access and listen to newspapers, magazines, and books on an iPhone. David has a Working Fluency Global Seal in Japanese and lives in Eugene, Oregon.
Dr. Scott Payne is chief learning and research officer and co-founder of Mira, an AI-powered language-learning platform. After 20 years in academia teaching, developing language-learning software, and examining student learning processes and outcomes in technology-mediated learning environments, he transitioned to the private sector, working in learning scientist and research scientist roles before helping launch Mira.