Building the CELTEC Corpus: Assessing Lexical Complexity in Chilean Higher Education L2 English Learners.

Pamela Saavedra Jeldres; Lucía Ubilla Rosales; Belén Muñoz Muñoz

Keywords: Computational linguistics Learner corpus research Lexical complexity L2 writing Pre-service language

Abstract

This article introduces the development, construction, and potential applications of a learner corpus—Chilean English Language Teacher Education Corpus (CELTEC)—comprising 404 texts written by English as a foreign language (EFL) pre-service teachers enrolled at nine universities in Chile. The study outlines the methodology for creating this pseudo-longitudinal corpus, facilitating replication. It includes three cohorts representing years 3, 4, and 5 of a five-year undergraduate programme. Additionally, the study examines the lexical complexity of the corpus texts, focusing on constructs such as lexical density, diversity, and sophistication. Data were collected using the corpus query language (CQL) in Sketch Engine and analysed with freely available tools, TAALES 2.2 and TAALED 1.4, to calculate indices of lexical complexity within a multidimensional framework. The results reveal a slight developmental trend in lexical complexity across the CELTEC cohorts. Lexical density is moderate, averaging between 40- 49%, yet increases incrementally with each academic year. Lexical diversity also shows improvement across cohorts; however, this growth does not consistently correspond to higher lexical sophistication in the texts. These findings have implications for English for General Academic Purposes (EGAP) pedagogy, both within the study’s context and in broader educational settings. Specifically, they underscore the critical need for more explicit instruction i

Más información

Volumen: 23(1)
Fecha de publicación: 2025
Página de inicio: 2505
Página final: 2515
Idioma: inglés
URL: https://www.pjlss.edu.pk/pdf_files/2025_1/2505-2515.pdf