☆ 3.8 Article

RNN based machine translation and transliteration for Twitter data

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY (2020)

期刊

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY

卷 23, 期 3, 页码 499-504

出版社

SPRINGER

DOI: 10.1007/s10772-020-09724-9

关键词

Long short-term memory (LSTM); Recurrent neural network (RNN); Sequence-to-sequence; Python; Translation; Transliteration; Twitter; Machine translation (MT); BLEU; Tensorflow

类别

Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The present work aims at analyzing the social media data for code-switching and transliterated to English language using the special kind of recurrent neural network (RNN) called Long Short-Term Memory (LSTM) Network. During the course of work, TensorFlow is used to express LSTM suitably. Twitter data is stored in MongoDB to enable easy handling and processing of data. The data is parsed through different fields with the aid of Python script and cleaned using regular expressions. The LSTM model is trained for 1 M data which is further used for transliteration and translation of the Twitter data. Translation and transliteration of social media data enables publicizing the content in the language understood by majority of the population. With this, any content which is anti-social or threat to law and order can be easily verified and blocked at the source.

RNN based machine translation and transliteration for Twitter data

期刊

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

RNN based machine translation and transliteration for Twitter data

期刊

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文