From: An analytical study of information extraction from unstructured and multidimensional big data
 | Purpose | Approach | Technique | Dataset | Results/limitations |
---|---|---|---|---|---|
[68] | To improve computational power To enhance the training capability of larger models To ease the process | ANN | Mariana: GPU and CPU clusters for parallelism Three frameworks were developed: multi-GPU for DNN, multi-GPU for DCNN, CPU cluster for large scale DNN | With 6 GPUs, 4.6 times speedup over one GPU was achieved and character error rate was decreased by 10% as compared to existing techniques DNN framework with GPUs performed better for ASR | |
[72] | To investigate noise robustness on DNN based models | ANN | DNN-HMM: DNN based noise aware training | Aurora 4 w/o explicit noise compensation | 7.5% relative improvement Dropout training in DNN with overlapping concern as compared to feature space and model space noise adaptive training |
[73] | Bilingual ASR system for Frisian and Dutch languages | ANN | DNN with language dependent and language independent phone | FAME Speech database | Bilingual DNN trained on phones of both languages achieved best performance yielding CS-WER of 59.5% and WER of 38.8% Code switching ASR combines phones of two languages outperformed on WER whereas latency of switching is also an important factor for these systems |
[75] | To improve performance for large vocabulary speech recognition | ANN | LSTM RNN | 2800 utterance, each distorted once with held-out noise samples | On 25Â k word vocabulary, 19.5% WER, and 14.5% in vocabulary WER Word level acoustic models without language model can achieve reasonable accuracy |
[74] | To compare the performance of DNN-HMM with CNN-HMM | ANN | CNN with limited weight sharing scheme to model speech features | Small-scale phone recognition in TIMIT large vocabulary voice search task | CNN reduce error rate by 6% to 10% compared with DNN ASR performance is sensitive to pooling size but insensitive to overlap b/w pooling units The results were better for voice search experiment but not for phone recognition |
[71] | To develop ASR for Amazigh language | HMM | GMM and tied states MFCC for feature extraction, phonetic dictionary, language model using CMU-Cambridge Statistical Language Modeling Toolkit, HMM based large vocabulary system | New corpus with 187 distinct isolated word speech recording by 50 speakers | Achieved reduced WER to 8, 20% The new corpus was collected. Results are not compared with existing state of the art techniques |
[69] | LMS Adaptive filter are introduced to preprocess the speech signals and to identify speaker | Template based | Adaptive Filtering + feature extraction + dimensionality reduction + ensemble classification model using LSTM, ICNN, and SVM | IITG multi variability speaker recognition database | Achieved 95.69% accuracy for noisy data Follows sequential processing Require memory-bandwidth bound computation Required large amount of training data for each new speaker |
[70] | ASR for Tunisian dialect | Rule based | G2P rules were defined to build pronunciation dictionaries | TARIC, 9.5Â h speech for training and 43Â min for testing | WER of 22.6% Validated on manually annotated dataset Improved quality pronunciation dictionaries can be build using expert knowledge but high linguistic skills are required |