Samuel J. Messick Memorial Lecture (Sponsored by Educational Testing Service)

Portrait Micheline Chalhoub-Deville

Reimagining Validity in Accountability Testing: Understanding Consequences in a Social Context

Prof. Micheline Chalhoub-Deville,
University of North Carolina at Greensboro


For the past 25 years, my research and professional endeavors have revolved around accountability testing systems and their implications for language testing.  This involvement has resulted in publications addressing topics such as the nature of policy-mandated testing and validity conceptualization (e.g., Chalhoub-Deville, 2009a, 2009b, 2016, 2020; Chalhoub-Deville & O’Sullivan, 2020).  In my presentation, I examine our current thinking and practices in this field and advocate for the adoption of alternate, socially-mediated theories to guide future efforts.

Accountability testing is closely tied to the rise of a pervasive Audit Culture, which became formalized in the U.S. through the enactment of No Child Left Behind. This accountability movement, which extends globally has been referred to as the Global Education Reform Movement (GERM).  GERM systems necessitate a reevaluation of established validity frameworks and methodologies to encompass aggregate scores and socio-educational contexts.  With accountability testing, educators and educational institutions are held responsible for student performance. Our validation approaches, however, primarily target student scores (e.g., Yumsek, 2023), overlooking aggregate scores and the broader contexts where crucial interpretations and decisions are made.  Consequently, accountability testing underscores the need for adjustments in validation models to incorporate aggregate scores and to consider the broader educational and societal contexts of testing (Chalhoub-Deville, 2016, 2020; Hoeve, 2022).

Sources such as the Standards for educational and psychological testing (AERA, APA, & NCME, 2014) and the published literature tend to zero in on fairness, which deals broadly with issues of accommodations, differential item functioning, and universal design.  While these efforts serve a critical role in improving practices and score inferences, the broader implications of accountability testing, i.e., socio-educational consequences tend to remain outside the purview of test publishers’ research agendas.  Research engagement with the broader socio-educational dimensions of fairness and consequences needs to be conceptualized as a shared responsibility among key stakeholder groups, including test publishers. In conclusion, this paper advocates for a shift towards conceptualizing GERM and related accountability testing within a broader socio-educational framework and suggests different models to help achieve that goal.

Washington DC:  Chalhoub-Deville, M. (2009a).  The intersection of test impact, validation, and educational reform policy.  Annual Review of Applied Linguistics, 29, 118-131.

Chalhoub-Deville, M.  (2009b).  Standards-based assessment in the U.S.: Social and educational impact.  In Taylor, L. and Weir, C. J. (Eds.), Language Testing Matters: Investigating the wider social and educational impact of assessment (281-300).  Studies in Language Testing 31. Cambridge:  Cambridge University Press and Cambridge ESOL.

Chalhoub-Deville, M. (2016).  Validity theory:  Reform policies, accountability testing, and consequences.  Language Testing, 33, 453-472.

Chalhoub-Deville, M.  (2020).  Towards a Model of Validity in Accountability Testing.  In M. K. Wolf, (Ed.), Assessing English Language Proficiency in U.S. K–12 Schools, (245-264).  NY: Routledge.

Chalhoub-Deville, M. and O’Sullivan, B. (2020). Validity: Theoretical Development and Integrated Arguments. British Council Monograph Series 3. Sheffield: Equinox Publishing.

Hoeve, K.B.  (2022).  A validity framework for accountability: Educational measurement and language testing.  Language Testing in Asia, 12, 3.

Yumsek, M. (2023).  Educational L2 constructs and diagnostic measurement.  Language Testing in Asia, 13, 3.  


Micheline Chalhoub-Deville holds a Bachelor's degree from the Lebanese American University and Master's and Ph.D. degrees from The Ohio State University.  She currently serves as a Professor of Educational Research Methodology at the University of North Carolina at Greensboro (UNCG) where she teaches courses on language testing, validity, and research methodology.  Prior to UNCG, she worked at the University of Minnesota and the University of Iowa.  Her professional roles have also included positions such as Distinguished Visiting Professor at the American University in Cairo, Visiting Professor at the Lebanese American University, and UNCG Interim Associate Provost for Undergraduate Education. 

Her contributions to the field include publications, presentations, and consultations on topics like computer adaptive tests, K-12 academic English language assessment, admissions language exams, and validation.  She has over 70 publications, including books, articles, and reports, has delivered more than 150 talks and workshops.  Additionally, she has played key roles in securing and leading research and development programs, with a total funding exceeding $4 million.  Her scholarship has been recognized through awards such as the ILTA Best Article Award, the Educational Testing Service—TOEFL Outstanding Young Scholar Award, the UNCG School of Education Outstanding Senior Scholar Award, and the national Center for Applied Linguistics Charles A. Ferguson Award for Outstanding Scholarship. 

Professor Chalhoub-Deville has served as President of the International Language Testing Association (ILTA).  She is Founder and first President of the Mid-West Association of Language Testers (MwALT) and is a founding member of the British Council Assessment Advisory Board-APTIS, the Duolingo English Test (DET) Technical Advisory Board, and English3 Assessment Board.  She is a former Chair of the TOEFL Committee of Examiners as well as a member of the TOEFL Policy Board.  She has participated in editorial and governing boards, such as Language Assessment Quarterly, Language Testing, and the Center for Applied Linguistics.  She has co-founded and directed the Coalition for Diversity in Language and Culture, the SOE Access & Equity Committee, and a research group focused on the testing and evaluation in educational accountability systems.  She has been invited to serve on university accreditation teams in various countries and to participate in a United Nations Educational, Scientific, and Cultural Organization (UNESCO) First Experts’ meeting. 

Alan Davies Lecture (Sponsored by British Council)

Portrait Lynda Taylor

Experimenting with Uncertainty, Advancing Social Justice: Placing Equity, Diversity, Inclusion and Access Centre Stage

Prof. Lynda Taylor, University of Bedfordshire, UK


Language testing has long been a locus for investigating the social context, consequences and power of assessment. During the 1990s, Professor Alan Davies (1931-2015) was among the first to focus our attention on the complex ethical dimensions of how we behave as language testing specialists. Davies drew on moral philosophy as a rich seam for mining core principles to inform and guide good professional conduct (1990, 1997). He was instrumental in helping to inspire and shape the ILTA Code of Ethics in 2000, and the ILTA Guidelines for Practice in 2007 which seek to instantiate ethical principles in terms of actual behaviours and practices. Both the Code of Ethics and the Guidelines for Practice have been revised and are kept under review in response to needs and changes within the professional field.

Since 2000 the field of language assessment has seen growing interest in matters of fairness, justice, ethics and social responsibility, leading to increased concern for equity, access and inclusion for all test takers, especially those from underserved communities. One such community includes those with specific language assessment requirements due to life circumstances, or to a disability or condition, whether temporary or permanent. Other communities are those who speak less commonly taught or spoken languages, especially marginalised languages. Principles of fair access and equitable treatment are now well-established in education and society, and the rights of minorities are increasingly enshrined in legislation. Despite this, however, advancing social justice through sound policy and good practice remains a challenge for the language testing profession in relation to knowing how best to address the aspirations and needs of test takers with special requirements.

In a Special Issue of Language Testing focusing on accommodations for test takers with disabilities, Taylor & Banerjee (2023a) highlighted the sensitive balance that language test providers have to maintain: between their professional commitment to test standardization, reliability and validity demands on the one hand, and a commitment to advancing equity of access and inclusion for all test takers, regardless of their circumstances. This is especially true where the latter may require a departure from standardized test procedures or risks compromising or undermining established test validity claims. Research findings to inform practical decisions about suitable accommodations (e.g. use of extended time or digital aids) can be scarce, due to challenges encountered with empirical research in this area: population cohorts can be hard to identify/reach and sample sizes can be too small for quantitative analysis.

Given the LTRC 2024 theme of ‘Reforming language assessment systems – reforming language assessment research’, the conference offers a welcome opportunity to highlight positive advances in language testing ethics over the past two decades, but also to examine how we might need to refresh and reframe our thinking and our practice so as to advance social justice for the benefit not just of specific communities but society as a whole (Taylor & Banerjee 2023b). In my presentation I will reflect on some specific JEDI-related challenges that I believe we need to confront, e.g. concerning construct definition, stakeholder engagement, human rights in a digital world. My aim will be to explore how we address and resolve such challenges in an ethically principled and evidence-based way. Through this 2024 Davies Lecture, I also wish to pay tribute to Professor Alan Davies as a pioneer and leading light in the movement to advance ethical practice and social justice in language testing; someone who was willing to experiment with uncertainty and to live creatively with the tension between speculation and empiricism.

Davies, A (1990) Principles of Language Testing. Basil Blackwell.

Davies, A (1997) Introduction: the limits of ethics in language testing. Language Testing (special issue) 14(3), 235-241.

Taylor, L & Banerjee, J (2023a) Editorial: Accommodations in language testing and assessment: Safeguarding equity, access and inclusion. Language Testing (special issue) 40(4), 847-855.

Taylor, L & Banerjee, J (2023b) Post-script: Language assessment accommodations: issues and challenges for the future. Language Testing (special issue) 40(4), 1000-1006.


Lynda Taylor is Visiting Professor at the Centre for Research in English Language Learning and Assessment (CRELLA) at the University of Bedfordshire, UK. She has worked for many years in the field of language testing and assessment, particularly with IELTS and the full range of Cambridge English qualifications. Her research interests include speaking and writing assessment, test takers with special needs and language assessment literacy. She was formerly Assistant Research Director with Cambridge Assessment English and has advised on test development and validation projects around the world. She has given presentations and workshops internationally, published extensively in academic journals, and authored or edited many of the volumes in CUP’s Studies in Language Testing (SiLT) series. In 2022 she was awarded Fellowship of the UK Academy of Social Sciences and she is currently serving a second term as President of the UK Association for Language Testing and Assessment (UKALTA) (2023-2025).

Cambridge/ILTA Distinguished Achievement Award (Sponsored by Cambridge Assessment English/ILTA)

Portrait Dr Antony Kunnan

Integration and Inclusiveness in Language Assessment

Dr Antony John Kunnan, Duolingo &
Carnegie Mellon University, Pittsburgh, US


In considering how to make language assessment more integrated and inclusive, I believe two areas need our attention. They are (a) understanding that language assessment is part of language teaching and learning and (b) including language learners/test takers from the Global South in assessment policies and practices.

In regard to (a), in the last decade, a survey of the research literature shows that the focus of the field has moved from fairness and validation of assessments to an understanding of the role of language assessment integrated within language teaching and learning. This underreported field is Learning-Oriented Language Assessment (LOLA) based on substantial research (for general definitions see Black and Wiliam, 1998; Carless, 2007, 2015; Gebril, 2021; Jones and Saville, 2016; Pellegrino et al., 2001; Purpura, 2004, 2014. 2021; Saville, 2021; Turner and Purpura, 2016). And yet, the frameworks and principles from LOLA have not entered contemporary language assessments systematically. For example, following Purpura (2021), very few of the performance moderators (instructional, socio-cognitive, affective, social-interactional, and technological factors) have been conceptualized and operationalized in local, national and international assessments. This primary goal of this reforming activity would be to integrate assessments with learning so that assessments could contribute to building language learners’ capability.

In regard to (b), for the last seven decades, international assessments have provided assessments to migrants from the Global South (GS) who are forced to take language tests in in order to assist institutions in the Global North (GN) with school and university admission, employment, residency and citizenship. Illustrating this point from the perspective of international English language assessments, providers have desultorily considered issues related to English as Lingua Franca (ELF) (for general definitions and discussions of ELF, see Brown, 2014; Canagarajah, 2006; Jenkins and Leung, 2014, 2024; Ockey and Hirsch, 2020, and Harding, 2022). Inspirational works recently include the development of an ELF construct (Harding and McNamara, 2018) and a demonstration test (Ockey and Hirsch (2020)). Following the latter study, issues that need to be addressed include rhetorical sensitivity, context sensitivity, international communication competence, grammatical and lexical appropriacy, and discourse sensitivity. Such a plan would ensure a more inclusive and representative language assessment, derived with dialogue with stakeholders such as students, faculty, community members as well as higher education institutions in the GS. This reforming activity would help GN institutions understand the language capabilities and needs of GS language learners/test takers better.


Dr Antony Kunnan, Principal Assessment Scientist at Duolingo and Senior Research Fellow at Carnegie Mellon University

Dr Antony Kunnan has had a long and distinguished career in language testing and assessment, covering many parts of the globe. This was reflected in nominations we received from a diverse range of contexts. His accomplishments across the five criteria on which the award is judged were clearly outstanding. Some particular highlights include that Dr Antony Kunnan was the founding editor of Language Assessment Quarterly; the founding President of the Asian Association for Language Assessment; the editor of the four-volume Companion to Language Assessment; a former ILTA President, Vice President and Treasurer; and a prominent thinker in conceptualising test fairness. He is a well-known and widely-respected member of our academic community, but beyond that – in the words of one of our committee members – Dr Antony Kunnan has been a “change maker”, opening up new avenues for publication, engagement and thought in our field.

Dr Antony Kunnan graduated from the University of California Los Angeles (UCLA) in 1991, and his trajectory was already marked with distinction when he won the Jacqueline Ross TOEFL Dissertation Award for his research. This study was subsequently published in the Cambridge University Press Studies in Language Testing series as Test-taker characteristics and test performance: A structural modeling approach. Since that time, Dr Antony Kunnan has held academic posts/professorships at California State University, Los Angeles, the University of Hong Kong, the American University in Armenia, Nanyang Technological University, Tunghai University, Guangdong University of Foreign Studies, and the University of Macau, as well as visiting posts at the University of California, Los Angeles, and Chulalongkorn University. He recently took up a position as Principal Assessment Scientist at Duolingo and a Senior Research Fellowship at Carnegie Mellon University. He has published 11 books and over 80 articles or book chapters. His initial work on validity and structural equation modelling has led to a more recent focus on ethics, fairness and policy. During his career, Dr Antony Kunnan has given over 125 invited talks and workshops across 36 countries, demonstrating his considerable international reputation and impact.

In addition to his scholarship, however, Dr Antony Kunnan is widely-recognised for his editorial work. When he founded Language Assessment Quarterly in 2004, it quickly rose to prominence as a prestigious publication venue. The aims and scope of the journal served to broaden the scope of enquiry in the field. As one nominator stated, the founding of LAQ in 2004:

"not only made it possible for wider dissemination of our language testing research so we became a distinct field of research … but also has supported generations of scholars and researchers in our field. More importantly, LAQ signals the major paradigm shift in our own field, both theoretically and methodologically, from Language Testing to Language Assessment – a global shift in educational measurement and assessment as well. For this, Antony leads our field and moves the field forward.”

Dr Antony Kunnan’s editorial activities continue as he remains involved with Language Assessment Quarterly, is Editor-in-Chief of The Journal of Asia TEFL, co-editor of the Routledge book series New Perspectives in Language Assessment, and is working on a second edition of the landmark Companion to Language Assessment

Dr Antony Kunnan’s service to the field is also an exceptional feature of his career. As well as working in various executive board positions within ILTA, his role in establishing the Asian Association for Language Assessment (AALA) was a significant contribution that has been valued by practitioners and researchers working in that region. As one nominator wrote, in contributing to setting up the association, he “navigated professionally, diplomatically, and passionately with colleagues across the larger Asian context”. AALA is now a vibrant and sustainable regional organisation. It is also notable that he has chaired or co-chaired three LTRC conferences in Orlando (1997); Temecula (2004); and Hangzhou (2008).

Dr Antony Kunnan has been an outstanding scholar, mentor and leader in language testing and assessment for many decades. We feel the words of one nominator provide the perfect way of summarising Dr Antony Kunnan’s selection for this award: “Antony has tirelessly devoted his career to language assessment, and is still very much involved. His contributions are expansive, and this award would be a fitting recognition of his lifetime achievements.”

We congratulate Dr Antony Kunnan and look forward to presenting him with this award at LTRC in Innsbruck in July.

Nach oben scrollen