4.6 Article

Voice Orientation Recognition: New Paradigm of Speech-Based Human-Computer Interaction

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Social Work

Do people intend to use AI Voice Assistants? An empirical study in Vietnam

Thuy Dung Pham Thi et al.

Summary: This study investigates the factors influencing users' intentions to use Voice Assistant, particularly the impact of smartphone voice assistants with anthropomorphic design. The results show that voice cues and message interactivity positively influence the social presence and perceived expertise, which affect the use intention of voice assistants.

JOURNAL OF HUMAN BEHAVIOR IN THE SOCIAL ENVIRONMENT (2023)

Article Chemistry, Analytical

The Investigation of Adoption of Voice-User Interface (VUI) in Smart Home Systems among Chinese Older Adults

Yao Song et al.

Summary: Driven by advanced voice interaction technology, the voice-user interface (VUI) has gained popularity, especially among older adults in the context of smart homes. A survey of 420 Chinese older adults revealed that the main factors influencing their adoption of VUI are perceived usefulness, perceived ease of use, and trust.

SENSORS (2022)

Review Acoustics

Unsupervised Automatic Speech Recognition: A review

Hanan Aldarmaki et al.

Summary: This paper reviews the research literature to examine the challenges and potential solutions for achieving fully unsupervised ASR, with the aim of optimizing ASR development for low-resource languages.

SPEECH COMMUNICATION (2022)

Article Telecommunications

Survey of Deep Learning Paradigms for Speech Processing

Kishor Barasu Bhangale et al.

Summary: This paper presents a brief survey of the application of deep learning techniques in various speech processing applications. It covers the use of different deep learning algorithms and evaluation metrics for performance evaluation.

WIRELESS PERSONAL COMMUNICATIONS (2022)

Article Business

An overview and empirical comparison of natural language processing (NLP) models and an introduction to and empirical application of autoencoder models in marketing

Venkatesh Shankar et al.

Summary: This article investigates different NLP models and their applications in marketing, highlighting the advantages and disadvantages of these models and the conditions under which they are appropriate. The latest neural autoencoder NLP models are introduced, and an empirical comparison of these models and statistical NLP models is provided. The insights from the comparison are discussed, and guidelines for researchers are offered.

JOURNAL OF THE ACADEMY OF MARKETING SCIENCE (2022)

Article Multidisciplinary Sciences

Inductive biases for deep learning of higher-level cognition

Anirudh Goyal et al.

Summary: This article presents an intriguing hypothesis that human and animal intelligence can be explained by a few principles. By studying the inductive biases used by humans and animals, we can gain a better understanding of these principles and draw inspiration for AI research and neuroscience theories.

PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES (2022)

Proceedings Paper Computer Science, Artificial Intelligence

TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION

Alexander H. Liu et al.

Summary: Unsupervised speech recognition has the potential to improve Automatic Speech Recognition (ASR) systems for all languages by eliminating pre-processing steps and introducing a self-supervised objective. The wav2vec-U 2.0 method shows improved results in unsupervised recognition across different languages while being conceptually simpler.

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT (2022)

Article Engineering, Electrical & Electronic

Recent Advances in End-to-End Automatic Speech Recognition

Jinyu Li

Summary: Recently, there has been a significant trend in the speech community to shift from hybrid modeling based on deep neural networks to end-to-end models for automatic speech recognition (ASR). While end-to-end models achieve state-of-the-art results in terms of ASR accuracy, hybrid models are still widely used in commercial ASR systems due to practical factors. This paper provides an overview of recent advances in end-to-end models, focusing on technologies that address industry-specific challenges.

APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING (2022)

Proceedings Paper Acoustics

S-DCCRN: SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT

Shubo Lv et al.

Summary: This paper investigates a deep learning-based approach for super wide band speech denoising. By extending the previous deep complex convolution recurrent neural network and utilizing a cascaded sub-band and full-band processing module, a complex feature encoder and decoder, as well as a learnable spectrum compression method, the proposed model achieves state-of-the-art performance in denoising speech with a sampling rate of 32kHz.

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

Proceedings Paper Acoustics

PERSONALIZED SPEECH ENHANCEMENT: NEW MODELS AND COMPREHENSIVE EVALUATION

Sefik Emre Eskimez et al.

Summary: Personalized speech enhancement (PSE) models use additional cues to remove background noise and interfering speech in real-time, improving the speech quality of online video conferencing systems. This study proposes two neural networks that outperform previous models and introduces test sets capturing various scenarios. A new metric to measure the target speaker over-suppression problem is proposed, along with multi-task training. Results show that the proposed models yield better performance and multi-task training improves speech recognition accuracy and mitigates over-suppression.

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

Proceedings Paper Instruments & Instrumentation

Performance Comparison of Omni and Cardioid Directional Microphones for Indoor Angle of Arrival Sound Source Localization

Meng Jiang et al.

Summary: This paper explores the method of sound source localization in indoor enclosed environments. By comparing different types of microphone systems, it is found that the cardioid-directional microphone system has superior accuracy in localization.

2022 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE (I2MTC 2022) (2022)

Article Computer Science, Information Systems

Toward Practical Usage of the Attention Mechanism as a Tool for Interpretability

Martin Tutek et al.

Summary: Natural language processing (NLP) has been greatly influenced by the neural revolution in artificial intelligence. Attention mechanisms provide transparency to previously black-box recurrent neural network (RNN) models, but recent research questions their faithfulness. This study presents a regularization technique to improve the faithfulness of attention-based explanations, showing consistent improvements across various datasets and models.

IEEE ACCESS (2022)

Article Communication

Hey Alexa, why do we use voice assistants? The driving factors of voice assistant technology use

Emily Buteau et al.

Summary: This study identified three factors predicting the use of voice assistants: technological factors, social influence factors, and risk factors, proposing an extended Technology Acceptance Model. Findings indicate positive relationships between perceived usefulness, personal norms, and perceived security with attitudes toward using voice assistants. Privacy concerns showed a negative relationship with attitudes, impacting behavioral intention.

COMMUNICATION RESEARCH REPORTS (2021)

Article Computer Science, Information Systems

Explainable Artificial Intelligence for Tabular Data: A Survey

Maria Sahakyan et al.

Summary: Machine learning techniques are gaining attention, but many suffer from the black-box problem, making it difficult to explain decisions. Interest in Explainable Artificial Intelligence (XAI) is growing, yet many techniques are not suitable for tabular data. Despite a vast literature on XAI, there are still no survey articles specifically focusing on tabular data.

IEEE ACCESS (2021)

Article Computer Science, Information Systems

Model-based Head Orientation Estimation for Smart Devices

Qiang Yang et al.

Summary: Voice interaction is convenient and user-friendly, especially through smart devices like Amazon Echo. Research has shown the importance of incorporating head orientation information to enhance context for voice commands. A new model-based system called HOE has been proposed, which utilizes only two microphone arrays for head orientation estimation, reducing training overhead and achieving high accuracy in real-world experiments.

PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT (2021)

Proceedings Paper Computer Science, Cybernetics

Soundr: Head Position and Orientation Prediction Using a Microphone Array

Jackie (Junrui) Yang et al.

PROCEEDINGS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'20) (2020)

Review Business

Natural language processing (NLP) in management research: A literature review

Yue Kang et al.

JOURNAL OF MANAGEMENT ANALYTICS (2020)

Review Mathematical & Computational Biology

Speech Technology Progress Based on New Machine Learning Paradigm

Vlado Delic et al.

COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE (2019)

Article Computer Science, Artificial Intelligence

User Preferences in Intelligent Environments

Juan Carlos Augusto et al.

APPLIED ARTIFICIAL INTELLIGENCE (2019)

Article Computer Science, Artificial Intelligence

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Zixing Zhang et al.

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY (2018)

Review Mathematical & Computational Biology

Deep Learning for Computer Vision: A Brief Review

Athanasios Voulodimos et al.

COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE (2018)

Article Engineering, Aerospace

Multipath Effects Characterization on Air-to-Air Analogue Voice AM Communications

Antonio Bazan-Sulzberger et al.

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS (2017)

Article Acoustics

A Robust Method to Extract Talker Azimuth Orientation Using a Large-Aperture Microphone Array

Avram Levi et al.

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2010)

Article Computer Science, Artificial Intelligence

Extremely randomized trees

P Geurts et al.

MACHINE LEARNING (2006)

Article Automation & Control Systems

Enhanced sound localization

B Mungamuru et al.

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS (2004)

Article Computer Science, Hardware & Architecture

Comparison of different implementations of MFCC

F Zheng et al.

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY (2001)