Multilingual Minimal Contrastive Editing

Benoit-Cea, Domingo; Ñanculef, Ricardo; De Ferrari, Joaquín

Keywords: NLP, XAI, Clinical NLP, Hate Speech, Sentiment Analysis

Abstract

We introduce MMiCE, a multilingual domain-agnostic method for generating contrastive explanations via minimal edits to multiclass and multilabel inputs. Building on MiCE, MMiCE fine-tunes large language models with LoRA adapters and guides edits using attribution and distance constraints, producing fluent, faithful edits that flip model predictions. We demonstrate its effectiveness across English and Spanish datasets in both social media and clinical domains, achieving an average label-flip rate of 99% across datasets. We also propose a new method for counterfactual edit generation in multilabel settings through an inverse gradient attribution scheme, and demonstrate its fluency improvements in the multilabel setting.

Más información

Editorial: SCITEPRESS – Science and Technology Publications, Lda
Fecha de publicación: 2026
Año de Inicio/Término: march 2026
Página de inicio: 4590
Página final: 4598
Idioma: Inglés
URL: https://doi.org/10.5220/0014474800004052