A Zipf-Like Distant Supervision Approach for Multi-document Summarization Using Wikinews Articles

Bravo-Marquez, Felipe; Manriquez, Manuel; CalderonBenavides, L; GonzalezCaro, C; Chavez, E; Ziviani, N

Abstract

This work presents a sentence ranking strategy based on distant supervision for the multi-document summarization problem. Due to the difficulty of obtaining large training datasets formed by document clusters and their respective human-made summaries, we propose building a training and a testing corpus from Wikinews. Wikinews articles are modeled as "distant" summaries of their cited sources, considering that first sentences of Wikinews articles tend to summarize the event covered in the news story. Sentences from cited sources are represented as tuples of numerical features and labeled according to a relationship with the given distant summary that is based on the Zipf law. Ranking functions are trained using linear regressions and ranking SVMs, which are also combined using Borda count. Top ranked sentences are concatenated and used to build summaries, which are compared with the first sentences of the distant summary using ROUGE evaluation measures. Experimental results obtained show the effectiveness of the proposed method and that the combination of different ranking techniques outperforms the quality of the generated summary.

Más información

Título según WOS: A Zipf-Like Distant Supervision Approach for Multi-document Summarization Using Wikinews Articles
Título de la Revista: LEARNING AND INTELLIGENT OPTIMIZATION, LION 15
Volumen: 7608
Editorial: SPRINGER INTERNATIONAL PUBLISHING AG
Fecha de publicación: 2012
Página de inicio: 143
Página final: 154
Notas: ISI