Unsupervised query-focused multi-document summarization based on transfer learning from sentence embedding models, BM25 model, and maximal marginal relevance criterion


Authors / Editors


Research Areas

No matching items found.


Publication Details

Output type: Journal article

UM6P affiliated Publication?: Yes

Author list: Lamsiyah, Salima; El Mahdaouy, Abdelkader; Ouatik El Alaoui, Said; Espinasse, Bernard

Publisher: Springer (part of Springer Nature): Springer Open Choice Hybrid Journals

Publication year: 2021

Journal: Journal of Ambient Intelligence and Humanized Computing (1868-5137)

ISSN: 1868-5137

eISSN: 1868-5145

Languages: English (EN-GB)


View in Web of Science | View on publisher site | View citing articles in Web of Science


Abstract

Extractive query-focused multi-document summarization (QF-MDS) is the process of automatically generating an informative summary from a collection of documents that answers a pre-given query. Sentence and query representation is a fundamental cornerstone that affects the effectiveness of several QF-MDS methods. Transfer learning using pre-trained word embedding models has shown promising performance in many applications. However, most of these representations do not consider the order and the semantic relationships between words in a sentence, and thus they do not carry the meaning of a full sentence. In this paper, to deal with this issue, we propose to leverage transfer learning from pre-trained sentence embedding models to represent documents' sentences and users' queries using embedding vectors that capture the semantic and the syntactic relationships between their constituents (words, phrases). Furthermore, BM25 and semantic similarity function are linearly combined to retrieve a subset of sentences based on their relevance to the query. Finally, the maximal marginal relevance criterion is applied to re-rank the selected sentences by maintaining query relevance and minimizing redundancy. The proposed method is unsupervised, simple, efficient, and requires no labeled text summarization training data. Experiments are conducted using three standard datasets from the DUC evaluation campaign (DUC'2005-2007). The overall obtained results show that our method outperforms several state-of-the-art systems and achieves comparable results to the best performing systems, including supervised deep learning-based methods.


Keywords

No matching items found.


Documents

No matching items found.


Last updated on 2021-26-11 at 23:16