An analytical study of information extraction from unstructured and multidimensional big data

Adnan, Kiran; Akbar, Rehan

doi:10.1186/s40537-019-0254-8

Journal of Big Data

Table 12 Automatic speech recognition

From: An analytical study of information extraction from unstructured and multidimensional big data

	Purpose	Approach	Technique	Dataset	Results/limitations
[68]	To improve computational power To enhance the training capability of larger models To ease the process	ANN	Mariana: GPU and CPU clusters for parallelism Three frameworks were developed: multi-GPU for DNN, multi-GPU for DCNN, CPU cluster for large scale DNN	With 6 GPUs, 4.6 times speedup over one GPU was achieved and character error rate was decreased by 10% as compared to existing techniques DNN framework with GPUs performed better for ASR
[72]	To investigate noise robustness on DNN based models	ANN	DNN-HMM: DNN based noise aware training	Aurora 4 w/o explicit noise compensation	7.5% relative improvement Dropout training in DNN with overlapping concern as compared to feature space and model space noise adaptive training
[73]	Bilingual ASR system for Frisian and Dutch languages	ANN	DNN with language dependent and language independent phone	FAME Speech database	Bilingual DNN trained on phones of both languages achieved best performance yielding CS-WER of 59.5% and WER of 38.8% Code switching ASR combines phones of two languages outperformed on WER whereas latency of switching is also an important factor for these systems
[75]	To improve performance for large vocabulary speech recognition	ANN	LSTM RNN	2800 utterance, each distorted once with held-out noise samples	On 25 k word vocabulary, 19.5% WER, and 14.5% in vocabulary WER Word level acoustic models without language model can achieve reasonable accuracy
[74]	To compare the performance of DNN-HMM with CNN-HMM	ANN	CNN with limited weight sharing scheme to model speech features	Small-scale phone recognition in TIMIT large vocabulary voice search task	CNN reduce error rate by 6% to 10% compared with DNN ASR performance is sensitive to pooling size but insensitive to overlap b/w pooling units The results were better for voice search experiment but not for phone recognition
[71]	To develop ASR for Amazigh language	HMM	GMM and tied states MFCC for feature extraction, phonetic dictionary, language model using CMU-Cambridge Statistical Language Modeling Toolkit, HMM based large vocabulary system	New corpus with 187 distinct isolated word speech recording by 50 speakers	Achieved reduced WER to 8, 20% The new corpus was collected. Results are not compared with existing state of the art techniques
[69]	LMS Adaptive filter are introduced to preprocess the speech signals and to identify speaker	Template based	Adaptive Filtering + feature extraction + dimensionality reduction + ensemble classification model using LSTM, ICNN, and SVM	IITG multi variability speaker recognition database	Achieved 95.69% accuracy for noisy data Follows sequential processing Require memory-bandwidth bound computation Required large amount of training data for each new speaker
[70]	ASR for Tunisian dialect	Rule based	G2P rules were defined to build pronunciation dictionaries	TARIC, 9.5 h speech for training and 43 min for testing	WER of 22.6% Validated on manually annotated dataset Improved quality pronunciation dictionaries can be build using expert knowledge but high linguistic skills are required

Back to article page