Multilingual Minimal Contrastive Editing
Keywords: NLP, XAI, Clinical NLP, Hate Speech, Sentiment Analysis
Abstract
We introduce MMiCE, a multilingual domain-agnostic method for generating contrastive explanations via minimal edits to multiclass and multilabel inputs. Building on MiCE, MMiCE fine-tunes large language models with LoRA adapters and guides edits using attribution and distance constraints, producing fluent, faithful edits that flip model predictions. We demonstrate its effectiveness across English and Spanish datasets in both social media and clinical domains, achieving an average label-flip rate of 99% across datasets. We also propose a new method for counterfactual edit generation in multilabel settings through an inverse gradient attribution scheme, and demonstrate its fluency improvements in the multilabel setting.
Más información
| Editorial: | SCITEPRESS – Science and Technology Publications, Lda |
| Fecha de publicación: | 2026 |
| Año de Inicio/Término: | march 2026 |
| Página de inicio: | 4590 |
| Página final: | 4598 |
| Idioma: | Inglés |
| URL: | https://doi.org/10.5220/0014474800004052 |