Journal
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019)
Volume -, Issue -, Pages 2190-2196Publisher
ASSOC COMPUTATIONAL LINGUISTICS-ACL
Keywords
-
Categories
Funding
- U.S. National Science Foundation [IIP-1631674]
- DARPA AIDA Program [FA8750-18-2-0014]
- ARL NS-CTA [W911NF-09-2-0053]
- Tencent AI Lab Rhino-Bird Gift Fund
Ask authors/readers for more resources
Transcripts of natural, multi-person meetings differ significantly from documents like news articles, which can make Natural Language Generation models generate unfocused summaries. We develop an abstractive meeting summarizer from both videos and audios of meeting recordings. Specifically, we propose a multi-modal hierarchical attention mechanism across three levels: topic segment, utterance and word. To narrow down the focus into topically-relevant segments, we jointly model topic segmentation and summarization. In addition to traditional textual features, we introduce new multi-modal features derived from visual focus of attention, based on the assumption that an utterance is more important if its speaker receives more attention. Experiments show that our model significantly outperforms the state-of-the-art with both BLEU and ROUGE measures.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available