☆ 4.1 Article

Empowering Deaf-Hearing Communication: Exploring Synergies between Predictive and Generative AI-Based Strategies towards (Portuguese) Sign Language Interpretation

JOURNAL OF IMAGING (2023)

期刊

JOURNAL OF IMAGING

卷 9, 期 11, 页码 -

出版社

MDPI

DOI: 10.3390/jimaging9110235

关键词

sign language recognition (SLR); Portuguese Sign Language; video-based motion analytics; machine learning (ML); long-short term memory (LSTM); large language models (LLM); generative pre-trained transformer (GPT); deaf-hearing communication; inclusion

类别

Imaging Science & Photographic Technology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a non-invasive Portuguese Sign Language (LGP) interpretation system using skeletal posture sequence inference and dataset augmentation. It achieves continuous conversations and coherent sentence construction. Users report high intuitiveness and real-time feedback. Promising semantic correlation is observed in the generated sentences.

Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Lingua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM-specifically ChatGPT-demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences.

Empowering Deaf-Hearing Communication: Exploring Synergies between Predictive and Generative AI-Based Strategies towards (Portuguese) Sign Language Interpretation

期刊

JOURNAL OF IMAGING

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Empowering Deaf-Hearing Communication: Exploring Synergies between Predictive and Generative AI-Based Strategies towards (Portuguese) Sign Language Interpretation

期刊

JOURNAL OF IMAGING

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文