Speech-driven human-machine interaction using Mel-frequency Cepstral coefficients with machine learning and Cymatics Display
Abstract
When people engage with machines, gadgets, or programs through spoken language, this is known as speech-driven human-machine interaction (MHI). Speech recognition technology is used in this interaction to interpret words in commands that the computer can understand and process. An innovative method for accomplishing automatic speech command recognition is presented in this chapter. The concept is to blend efficient analysis and speech processing techniques. An effective technique for autonomously isolated speech-based message recognition is proposed in this context. The input voice segments are improved for postprocessing when the appropriate preemphasis filtering, noise thresholding, and zero alignment procedures are used. The Mel-Frequency Cepstral coefficients (MFCCs), Delta, and Delta–Delta coefficients are extracted from the improved speech segment. The machine learning algorithms are then used to process these features that have been retrieved to classify the intended isolated speech commands automatically. As a case study, the science of Cymatics is applied to convert classification decisions into systematic signs. The system's functionality is examined using an experimental setting, and the findings are reported. It was possible to attain an average isolated speech recognition accuracy, for the intended dataset, of 98.9%. The suggested methodology has potential uses in the visual arts, in noisy and industrial settings, in integrating individuals with hearing impairments, and in education.Department
Electrical and Computer EngineeringPublisher
ElsevierBook title
Artificial Intelligence and Multimodal Signal Processing in Human-Machine Interactionae974a485f413a2113503eed53cd6c53
https://doi.org/10.1016/B978-0-443-29150-0.00019-6