☆ 4.5 Article

Clustering with t-SNE, Provably

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE (2019)

期刊

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE

卷 1, 期 2, 页码 313-332

出版社

SIAM PUBLICATIONS

DOI: 10.1137/18M1216134

关键词

t-SNE; visualization; spectral clustering; discrete dynamical system; discrete elliptic equation; maximum principle

类别

Mathematics, Applied

资金

NIH [1R01HG008383-01A1]
U.S. NIH MSTP Training grant [T32GM007205]
Institute of New Economic Thinking [INO15-00038]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

t-distributed stochastic neighborhood embedding (t-SNE), a clustering and visualization method proposed by van der Maaten and Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations, and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove that t-SNE is able to recover well-separated clusters; more precisely, we prove that t-SNE in the early exaggeration phase, an optimization technique proposed by van der Maaten and Hinton [J. Mach. Learn. Res., 9 (2008), pp. 2579-2605] and van der Maaten [J. Mach. Learn. Res., 15 (2014), pp. 3221-3245], can be rigorously analyzed. As a byproduct, the proof suggests novel ways for setting the exaggeration parameter alpha and step size h. Numerical examples illustrate the effectiveness of these rules: in particular, the quality of embedding of topological structures (e.g., the swiss roll) improves. We also discuss a connection to spectral clustering methods.

Clustering with t-SNE, Provably

期刊

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE

出版社

SIAM PUBLICATIONS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Clustering with t-SNE, Provably

期刊

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE

出版社

SIAM PUBLICATIONS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文