基于深度学习的中文语音识别系统

上传者: 32595487 | 上传时间: 2024-05-07 18:47:06 | 文件大小: 34.52MB | 文件类型: ZIP
包含声学模型和语言模型两个部分组成,两个模型都是基于神经网络。 该项目实现了GRU-CTC中文语音识别声音模型,所有代码都在gru_ctc_am.py中,包括: 增加了基于科大讯飞DFCNN的CNN-CTC结构的中文语音识别模型cnn_ctc_am.py,与GRU相比,对网络结构进行了稍加改造。 完全使用DFCNN框架搭建声学模型,稍加改动,将部分卷积层改为inception,使用时频图作为输入,cnn_with_fbank.py。 新增使用pluse版数据集的模型,cnn_with_full.py,建议直接训练这个模型。 语言模型 - language_model文件夹下 新增基于CBHG结构的语言模型language_model\CBHG_lm.py,该模型之前用于谷歌声音合成,移植到该项目中作为基于神经网络的语言模型。

文件下载

资源详情

[{"title":"( 90 个子文件 34.52MB ) 基于深度学习的中文语音识别系统","children":[{"title":"my_ch_speech_recognition-master","children":[{"title":"acoustic_model","children":[{"title":"gru_ctc_am.py <span style='color:#111;'> 11.11KB </span>","children":null,"spread":false},{"title":"cnn_with_full_data.py <span style='color:#111;'> 8.00KB </span>","children":null,"spread":false},{"title":"data","children":[{"title":"primewords","children":[{"title":"dev.wav.lst <span style='color:#111;'> 436.21KB </span>","children":null,"spread":false},{"title":"test.wav.lst <span style='color:#111;'> 443.48KB </span>","children":null,"spread":false},{"title":"train.wav.lst <span style='color:#111;'> 3.44MB </span>","children":null,"spread":false},{"title":"test.syllabel.txt <span style='color:#111;'> 552.35KB </span>","children":null,"spread":false},{"title":"dev.syllabel.txt <span style='color:#111;'> 546.90KB </span>","children":null,"spread":false},{"title":"train.syllabel.txt <span style='color:#111;'> 4.29MB </span>","children":null,"spread":false}],"spread":true},{"title":"st-cmds","children":[{"title":"dev.wav.lst <span style='color:#111;'> 38.67KB </span>","children":null,"spread":false},{"title":"test.wav.lst <span style='color:#111;'> 128.91KB </span>","children":null,"spread":false},{"title":"train.wav.lst <span style='color:#111;'> 6.29MB </span>","children":null,"spread":false},{"title":"test.syllabel.txt <span style='color:#111;'> 145.12KB </span>","children":null,"spread":false},{"title":"dev.syllabel.txt <span style='color:#111;'> 43.71KB </span>","children":null,"spread":false},{"title":"train.syllabel.txt <span style='color:#111;'> 7.06MB </span>","children":null,"spread":false}],"spread":true},{"title":"thchs30","children":[{"title":"dev.wav.lst <span style='color:#111;'> 31.47KB </span>","children":null,"spread":false},{"title":"test.wav.lst <span style='color:#111;'> 90.64KB </span>","children":null,"spread":false},{"title":"train.wav.lst <span style='color:#111;'> 371.09KB </span>","children":null,"spread":false},{"title":"test.syllabel.txt <span style='color:#111;'> 420.32KB </span>","children":null,"spread":false},{"title":"dev.syllabel.txt <span style='color:#111;'> 150.51KB </span>","children":null,"spread":false},{"title":"train.syllabel.txt <span style='color:#111;'> 1.64MB </span>","children":null,"spread":false}],"spread":true},{"title":"aishell","children":[{"title":"dev.wav.lst <span style='color:#111;'> 909.37KB </span>","children":null,"spread":false},{"title":"test.wav.lst <span style='color:#111;'> 462.52KB </span>","children":null,"spread":false},{"title":"train.wav.lst <span style='color:#111;'> 7.67MB </span>","children":null,"spread":false},{"title":"test.syllabel.txt <span style='color:#111;'> 637.71KB </span>","children":null,"spread":false},{"title":"dev.syllabel.txt <span style='color:#111;'> 1.22MB </span>","children":null,"spread":false},{"title":"train.syllabel.txt <span style='color:#111;'> 10.30MB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"cnn_ctc_am.py <span style='color:#111;'> 11.52KB </span>","children":null,"spread":false},{"title":"cnn_with_fbank.py <span style='color:#111;'> 13.52KB </span>","children":null,"spread":false},{"title":"extra_utils","children":[{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"feature_extract.py <span style='color:#111;'> 1.79KB </span>","children":null,"spread":false},{"title":"FSMNCell.py <span style='color:#111;'> 3.31KB </span>","children":null,"spread":false},{"title":"GetData.py <span style='color:#111;'> 18.13KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":".gitattributes <span style='color:#111;'> 66B </span>","children":null,"spread":false},{"title":"some_expriment","children":[{"title":"lm_develop","children":[{"title":"eval.py <span style='color:#111;'> 1.56KB </span>","children":null,"spread":false},{"title":"data_load.py <span style='color:#111;'> 4.13KB </span>","children":null,"spread":false},{"title":"hyperparams.py <span style='color:#111;'> 600B </span>","children":null,"spread":false},{"title":"build_corpus.py <span style='color:#111;'> 2.69KB </span>","children":null,"spread":false},{"title":"modules.py <span style='color:#111;'> 13.25KB </span>","children":null,"spread":false},{"title":"prepro.py <span style='color:#111;'> 2.62KB </span>","children":null,"spread":false},{"title":"train.py <span style='color:#111;'> 4.37KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 3.14KB </span>","children":null,"spread":false}],"spread":true},{"title":"gen_data","children":[{"title":"gen_aishell_lable.py <span style='color:#111;'> 2.04KB </span>","children":null,"spread":false},{"title":"gen_thchs_lable.py <span style='color:#111;'> 2.79KB </span>","children":null,"spread":false}],"spread":true},{"title":"linshi.py <span style='color:#111;'> 13.19KB </span>","children":null,"spread":false},{"title":"keras_test.py <span style='color:#111;'> 2.09KB </span>","children":null,"spread":false},{"title":"train.wav.lst <span style='color:#111;'> 3.45MB </span>","children":null,"spread":false},{"title":"my_develop.py <span style='color:#111;'> 2.69KB </span>","children":null,"spread":false},{"title":"data_process","children":[{"title":"read_data_prime.py <span style='color:#111;'> 23.47KB </span>","children":null,"spread":false},{"title":"gen_dict.py <span style='color:#111;'> 13.04KB </span>","children":null,"spread":false},{"title":"aishell_pre.py <span style='color:#111;'> 4.80KB </span>","children":null,"spread":false},{"title":"datalist","children":[{"title":"primewords","children":[{"title":"dev.wav.lst <span style='color:#111;'> 436.21KB </span>","children":null,"spread":false},{"title":"test.wav.lst <span style='color:#111;'> 443.48KB </span>","children":null,"spread":false},{"title":"train.wav.lst <span style='color:#111;'> 3.44MB </span>","children":null,"spread":false},{"title":"test.syllabel.txt <span style='color:#111;'> 552.35KB </span>","children":null,"spread":false},{"title":"dev.syllabel.txt <span style='color:#111;'> 546.90KB </span>","children":null,"spread":false},{"title":"train.syllabel.txt <span style='color:#111;'> 4.29MB </span>","children":null,"spread":false},{"title":"read_prim_data.py <span style='color:#111;'> 2.13KB </span>","children":null,"spread":false}],"spread":false},{"title":"st-cmds","children":[{"title":"test.wav.txt <span style='color:#111;'> 128.91KB </span>","children":null,"spread":false},{"title":"train.wav.txt <span style='color:#111;'> 6.29MB </span>","children":null,"spread":false},{"title":"test.syllabel.txt <span style='color:#111;'> 145.12KB </span>","children":null,"spread":false},{"title":"dev.syllabel.txt <span style='color:#111;'> 43.71KB </span>","children":null,"spread":false},{"title":"dev.wav.txt <span style='color:#111;'> 38.67KB </span>","children":null,"spread":false},{"title":"train.syllabel.txt <span style='color:#111;'> 7.06MB </span>","children":null,"spread":false}],"spread":false},{"title":"thchs30","children":[{"title":"dev.wav.lst <span style='color:#111;'> 31.47KB </span>","children":null,"spread":false},{"title":"test.wav.lst <span style='color:#111;'> 90.64KB </span>","children":null,"spread":false},{"title":"train.wav.lst <span style='color:#111;'> 371.09KB </span>","children":null,"spread":false},{"title":"test.syllabel.txt <span style='color:#111;'> 422.74KB </span>","children":null,"spread":false},{"title":"dev.syllabel.txt <span style='color:#111;'> 151.37KB </span>","children":null,"spread":false},{"title":"train.syllabel.txt <span style='color:#111;'> 1.65MB </span>","children":null,"spread":false}],"spread":false},{"title":".st-cmds.swp <span style='color:#111;'> 12.00KB </span>","children":null,"spread":false},{"title":"aishell","children":[{"title":"dev.wav.lst <span style='color:#111;'> 909.37KB </span>","children":null,"spread":false},{"title":"test.wav.lst <span style='color:#111;'> 462.52KB </span>","children":null,"spread":false},{"title":"train.wav.lst <span style='color:#111;'> 7.67MB </span>","children":null,"spread":false},{"title":"test.syllabel.txt <span style='color:#111;'> 637.71KB </span>","children":null,"spread":false},{"title":"dev.syllabel.txt <span style='color:#111;'> 1.22MB </span>","children":null,"spread":false},{"title":"train.syllabel.txt <span style='color:#111;'> 10.30MB </span>","children":null,"spread":false}],"spread":false}],"spread":false},{"title":"read_data_aishell.py <span style='color:#111;'> 21.82KB </span>","children":null,"spread":false},{"title":"dict.txt <span style='color:#111;'> 32.09KB </span>","children":null,"spread":false},{"title":"read_prim_data.py <span style='color:#111;'> 2.13KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":".gitignore <span style='color:#111;'> 433B </span>","children":null,"spread":false},{"title":"__pycache__","children":[{"title":"acoustic_model.cpython-36.pyc <span style='color:#111;'> 4.78KB </span>","children":null,"spread":false},{"title":"text.cpython-36.pyc <span style='color:#111;'> 3.06KB </span>","children":null,"spread":false},{"title":"audio.cpython-36.pyc <span style='color:#111;'> 2.40KB </span>","children":null,"spread":false}],"spread":true},{"title":"README.md <span style='color:#111;'> 3.33KB </span>","children":null,"spread":false},{"title":"language_model","children":[{"title":"CBHG_lm.py <span style='color:#111;'> 15.67KB </span>","children":null,"spread":false},{"title":"model_layers.py <span style='color:#111;'> 13.25KB </span>","children":null,"spread":false},{"title":"hyperparams.py <span style='color:#111;'> 600B </span>","children":null,"spread":false},{"title":"data","children":[{"title":"vocab.pkl <span style='color:#111;'> 158.07KB </span>","children":null,"spread":false},{"title":"lable.txt <span style='color:#111;'> 11.84MB </span>","children":null,"spread":false},{"title":"zh.tsv <span style='color:#111;'> 23.69MB </span>","children":null,"spread":false}],"spread":true}],"spread":true}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明