Speech-driven human-machine interaction using Mel-frequency Cepstral coefficients with machine learning and Cymatics Display
Mian Qaisar, Saeed
Mian Qaisar, Saeed
Citations
Altmetric:
Type
Supervisor
Date
2024-10
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
When people engage with machines, gadgets, or programs through spoken language, this is known as speech-driven human-machine interaction (MHI). Speech recognition technology is used in this interaction to interpret words in commands that the computer can understand and process. An innovative method for accomplishing automatic speech command recognition is presented in this chapter. The concept is to blend efficient analysis and speech processing techniques. An effective technique for autonomously isolated speech-based message recognition is proposed in this context. The input voice segments are improved for postprocessing when the appropriate preemphasis filtering, noise thresholding, and zero alignment procedures are used. The Mel-Frequency Cepstral coefficients (MFCCs), Delta, and Delta–Delta coefficients are extracted from the improved speech segment. The machine learning algorithms are then used to process these features that have been retrieved to classify the intended isolated speech commands automatically. As a case study, the science of Cymatics is applied to convert classification decisions into systematic signs. The system's functionality is examined using an experimental setting, and the findings are reported. It was possible to attain an average isolated speech recognition accuracy, for the intended dataset, of 98.9%. The suggested methodology has potential uses in the visual arts, in noisy and industrial settings, in integrating individuals with hearing impairments, and in education.
Department
Publisher
Sponsor
Copyright
Book title
Artificial Intelligence and Multimodal Signal Processing in Human-Machine Interaction