Web Scraping Chilean News Media: A Dataset for Analyzing Social Unrest Coverage (2019-2023)

Molina I.; Morales J.; Keith B.

Keywords: data collection, estallido social, Web scraping, Chilean social outburst, news media dataset

Abstract

This paper presents a dataset of Chilean news media coverage during the social unrest and constitutional processes from 2019 to 2023. Using Python-based web scraping with BeautifulSoup and Selenium, we collected articles from 15 Chilean news outlets between 15 November 2019 and 17 December 2023. The initial collection of 1254 articles was filtered to 931 usable data points after removing non-relevant content, duplicates, and articles unrelated to the Chilean social outburst. Each news outlet required specific extraction approaches due to varying HTML structures, with some outlets inaccessible due to paywalls or anti-scraping mechanisms. The dataset is structured in JSON format with standardized fields including title, content, date, author, and source metadata. This resource supports research on media coverage during political events and provides data for Spanish-language processing tasks. The dataset and extraction code are publicly available on GitHub. © 2025 by the authors.

Más información

Título según WOS: Web Scraping Chilean News Media: A Dataset for Analyzing Social Unrest Coverage (2019-2023)
Título según SCOPUS: Web Scraping Chilean News Media: A Dataset for Analyzing Social Unrest Coverage (2019–2023)
Título de la Revista: Data
Volumen: 10
Número: 11
Editorial: Multidisciplinary Digital Publishing Institute (MDPI)
Fecha de publicación: 2025
Idioma: English
DOI:

10.3390/data10110174

Notas: ISI, SCOPUS