VOCHITEXT: Vocabulary Corpus of Chilean Children’s Textbooks

Yeniè Norambuena; María Erika Herrera

Keywords: linguistics, spanish language, Vocabulary, elementary school

Abstract

The VOCHITEXT is a specialized corpus of 23,297 Spanish (Chilean) words, curated to represent the core vocabulary used in Chilean primary education. Derived from 58 complete texts (student books) collected in 2022, it encompasses vocabulary from science, history, language, social science, and mathematics official textbooks, spanning preschool through 4th grade. For preschool, it includes words from reading materials (children's tales) recommended by the Chilean Ministry of Education. This corpus, originally developed for the LEXIKON App project (ANID Fondef, IT21I0078, University of Concepcion) to assess children's lexical development, provides a representative sample of school vocabulary. The VOCHITEXT is a valuable resource for researchers, educators, and developers interested in Chilean primary education, vocabulary acquisition, and linguistic analysis of educational materials, offering insights into the lexical content encountered by Chilean students across different subjects and grade levels.

Más información

Título de la Revista: Mendeley Data
Volumen: V1
Fecha de publicación: 2024
URL: https://data.mendeley.com/datasets/ttb8rk5t53/1
DOI:

DOI: 10.17632/ttb8rk5t53.1

Notas: WOS