☆ 4.6 Article

Enriching short text representation in microblog for clustering

FRONTIERS OF COMPUTER SCIENCE (2012)

期刊

FRONTIERS OF COMPUTER SCIENCE

卷 6, 期 1, 页码 88-101

出版社

HIGHER EDUCATION PRESS

DOI: 10.1007/s11704-011-1167-7

关键词

short texts; text representation; multi-language knowledge; matrix factorization; social media

类别

Computer Science, Information Systems Computer Science, Software Engineering Computer Science, Theory & Methods

资金

ONR [N000141010091]
NSF [0812551]
Direct For Computer & Info Scie & Enginr
Div Of Information & Intelligent Systems [0812551] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks. Their limited length, pervasive abbreviations, and coined acronyms and words exacerbate the problems of synonymy and polysemy, and bring about new challenges to data mining applications such as text clustering and classification. To address these issues, we dissect some potential causes and devise an efficient approach that enriches data representation by employing machine translation to increase the number of features from different languages. Then we propose a novel framework which performs multi-language knowledge integration and feature reduction simultaneously through matrix factorization techniques. The proposed approach is evaluated extensively in terms of effectiveness on two social media datasets from Facebook and Twitter. With its significant performance improvement, we further investigate potential factors that contribute to the improved performance.

Enriching short text representation in microblog for clustering

期刊

FRONTIERS OF COMPUTER SCIENCE

出版社

HIGHER EDUCATION PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Enriching short text representation in microblog for clustering

期刊

FRONTIERS OF COMPUTER SCIENCE

出版社

HIGHER EDUCATION PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文