4.7 Article

An Unsupervised Feature Selection Framework for Social Media Data

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Volume 26, Issue 12, Pages 2914-2927

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2014.2320728

Keywords

Unsupervised feature selection; linked data; social media; pseudo labels; social dimension regularization

Funding

  1. US National Science Foundation [0812551, IIS-1217466]
  2. ONR [N000141410095]
  3. Div Of Information & Intelligent Systems
  4. Direct For Computer & Info Scie & Enginr [1217466, 0812551] Funding Source: National Science Foundation

Ask authors/readers for more resources

The explosive usage of social media produces massive amount of unlabeled and high-dimensional data. Feature selection has been proven to be effective in dealing with high-dimensional data for efficient learning and data mining. Unsupervised feature selection remains a challenging task due to the absence of label information based on which feature relevance is often assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, e. g., social media data is inherently linked, which makes invalid the independent and identically distributed assumption, bringing about new challenges to unsupervised feature selection algorithms. In this paper, we investigate a novel problem of feature selection for social media data in an unsupervised scenario. In particular, we analyze the differences between social media data and traditional attribute-value data, investigate how the relations extracted from linked data can be exploited to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We systematically design and conduct systemic experiments to evaluate the proposed framework on data sets from real-world social media websites. The empirical study demonstrates the effectiveness and potential of our proposed framework.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available