LLM-as-a-Judge Approaches as Proxies for Mathematical Coherence in Narrative Extraction
Keywords: narrative extraction, large language models, narrative evaluation, coherence metrics, LLM-as-a-judge
Abstract
Evaluating the coherence of narrative sequences extracted from large document collections is crucial for applications in information retrieval and knowledge discovery. While mathematical coherence metrics based on embedding similarities provide objective measures, they require substantial computational resources and domain expertise to interpret. We propose using large language models (LLMs) as judges to evaluate narrative coherence, demonstrating that their assessments correlate with mathematical coherence metrics. Through experiments on two data sets-news articles about Cuban protests and scientific papers from visualization conferences-we show that the LLM judges achieve Pearson correlations up to 0.65 with mathematical coherence while maintaining high inter-rater reliability (ICC > 0.92). The simplest evaluation approach achieves a comparable performance to the more complex approaches, even outperforming them for focused data sets while achieving over 90% of their performance for the more diverse data sets while using less computational resources. Our findings indicate that LLM-as-a-judge approaches are effective as a proxy for mathematical coherence in the context of narrative extraction evaluation.
Más información
Título según WOS: | LLM-as-a-Judge Approaches as Proxies for Mathematical Coherence in Narrative Extraction |
Título de la Revista: | ELECTRONICS |
Volumen: | 14 |
Número: | 13 |
Editorial: | MDPI |
Fecha de publicación: | 2025 |
Idioma: | English |
DOI: |
10.3390/electronics14132735 |
Notas: | ISI |