A Zipf-Like Distant Supervision Approach for Multi-document Summarization Using Wikinews Articles
Abstract
This work presents a sentence ranking strategy based on distant supervision for the multi-document summarization problem. Due to the difficulty of obtaining large training datasets formed by document clusters and their respective human-made summaries, we propose building a training and a testing corpus from Wikinews. Wikinews articles are modeled as "distant" summaries of their cited sources, considering that first sentences of Wikinews articles tend to summarize the event covered in the news story. Sentences from cited sources are represented as tuples of numerical features and labeled according to a relationship with the given distant summary that is based on the Zipf law. Ranking functions are trained using linear regressions and ranking SVMs, which are also combined using Borda count. Top ranked sentences are concatenated and used to build summaries, which are compared with the first sentences of the distant summary using ROUGE evaluation measures. Experimental results obtained show the effectiveness of the proposed method and that the combination of different ranking techniques outperforms the quality of the generated summary.
Más información
Título según WOS: | A Zipf-Like Distant Supervision Approach for Multi-document Summarization Using Wikinews Articles |
Título de la Revista: | BIO-INSPIRED SYSTEMS AND APPLICATIONS: FROM ROBOTICS TO AMBIENT INTELLIGENCE, PT II |
Volumen: | 7608 |
Editorial: | SPRINGER INTERNATIONAL PUBLISHING AG |
Fecha de publicación: | 2012 |
Página de inicio: | 143 |
Página final: | 154 |
Notas: | ISI |