GigaSpeech 大型,现代化的语音识别数据集 下载 脚步 把aliyun_ossutil.cfg在SAFEBOX目录 编辑env_vars.sh export GIGA_SPEECH_LOCAL_ROOT=/Users/jerry/work/git/GigaSpeech # this path needs to have at least XXX G free space 运行download.sh
2021-07-08 15:40:20 8KB Shell
1
Automatic Speech Recognition A Deep Learning Approach Dong Yu Li Deng
2021-07-07 22:12:57 5.82MB 语音识别 深度学习
1
上传的文件是“解析深度学习:语音识别实践“全书的英文版,作者是【美】俞栋 邓力,供大家下载学习。友情提示:这本书的中文版已经在网上销售了。
Moreinformationaboutthisseriesathttp://www.springer.com/series/4748Dong Yu. Li DengAutomatic SpeechRecognitionA Deep Learning ApproachSringerDong yuLi DengMicrosoft researchMicrosoft researchBothellRedmond. waUSAUSAISSN1860-4862issn 1860-4870(electronic)ISBN978-1-44715778-6ISBN978-1-4471-5779-3( e Book)DOI10.10071978-1-4471-5779-3Library of Congress Control Number: 2014951663Springer London Heidelberg New York DordrechtC Springer-Verlag London 2015This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrationsrecitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformation storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarnethodology now known or hereafter developed. Exempted from this legal reservation are briefexcerpts in connection with reviews or scholarly analysis or material supplied specifically for thepurpose of being entered and executed on a computer system, for exclusive use by the purchaser of thework. Duplication of this publication or parts thereof is permitted only under the provisions ofthe Copyright Law of the Publishers location, in its current version, and permission for use must alwaysbe obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearance Center. Violations are liable to prosecution under the respective Copyright LawThe use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exemptfrom the relevant protective laws and regulations and therefore free for general useWhile the advice and information in this book are believed to be true and accurate at the date ofpublication, neither the authors nor the editors nor the publisher can accept any legal responsibility forany errors or omissions that may be made. The publisher makes no warranty, express or implied, withrespect to the material contained hereinPrinted on acid-free paperSpringerispartofSpringerScience+businessMedia(www.springer.com)To my wife and parentsong yuTo Lih-Yuan, Lloyd, Craig, Lyle, arie,and axelengForewordThis is the first book on automatic speech recognition(Asr) that is focused on thedeep learning approach, and in particular, deep neural network(DNn) technologyThe landmark book represents a big milestone in the journey of the dnn technology, which has achieved overwhelming successes in Asr over the past fewyears. Following the authors'recent book on"Deep Learning: Methods andApplications", this new book digs deeply and exclusively into ASR technology andapplications, which were only relatively lightly covered in the previous book inparallel with numerous other applications of deep learning. Importantly, thebackground material of AsR and technical detail of DNNs including rigorousmathematical descriptions and software implementation are provided in this book,invaluable for Asr experts as well as advanced studentsOne unique aspect of this book is to broaden the view of deep learning fromDNNS, as commonly adopted in asr by now, to encompass also deep generativemodels that have advantages of naturally embedding domain knowledge andproblem constraints. The background material did justice to the incredible richnessof deep and dynamic generative models of speech developed by asr researcherssince early 90s, yet without losing sight of the unifying principles with respect tothe recent rapid development of deep discriminative models of DNNs. Comprehensive comparisons of the relative strengths of these two very different types ofdeep models using the example of recurrent neural nets versus hidden dynamicmodels are particularly insightful, opening an exciting and promising direction fornew development of deep learning in AsR as well as in other signal and information processing applications. From the historical perspective, four generations ofASR technology have been recently analyzed. The 4th Generation technology isreally embodied in deep learning elaborated in this book, especially when DNNsare seamlessly integrated with deep generative models that would enable extendedknowledge processing in a most natural fashionAll in all, this beautifully produced book is likely to become a definitive ref-erence for AsR practitioners in the deep learning era of 4th generation ASR. Thebook masterfully covers the basic concepts required to understand the asr field asa whole, and it also details in depth the powerful deep learning methods that haveForewordshattered the field in 2 recent years. The readers of this book will become articulaten the new state-of-the-art of asr established by the dnn technology, and bepoised to build new AsR systems to match or exceed human performanceBy Sadaoki Furui, President of Toyota Technological Institute at Chicago, andProfessor at the Tokyo Institute of TechnologyPrefaceAutomatic Speech Recognition (Asr), which is aimed to enable natural humanmachine interaction, has been an intensive research area for decades many coretechnologies, such as Gaussian mixture models (GMMs), hidden Markov models(HMMS), mel-frequency cepstral coefficients(MFCCS) and their derivatives,nram language models (LMs), discriminative training, and various adaptationtechniques have been developed along the way, mostly prior to the new milleniumThese techniques greatly advanced the state of the art in Asr and in its relatedfields. Compared to these earlier achievements, the advancement in the research andapplication of Asr in the decade before 2010 was relatively slow and less exciting,although important techniques such as GMM-HMM sequence discriminativetraining were made to work well in practical systems during this periodIn the past several years, however, we have observed a new surge of interest inASR. In our opinion, this change was led by the increased demands on ASR inmobile devices and the success of new speech applications in the mobile world suchas voice search(VS), short message dictation(SMD), and virtual speech assistants(e. g, Apples Siri, Google Now, and Microsofts Cortana). Equally important is thedevelopment of the deep learning techniques in large vocabulary continuous speechrecognition (LVCSR) powered by big data and significantly increased computinability. A combination of a set of deep learning techniques has led to more than1 /3 error rate reduction over the conventional state-of-the-art gmM-hMm framework on many real-world L VCSR tasks and helped to pass the adoption threshold formany real-world users. For example, the word accuracy in English or the characteraccuracy in Chinese in most SMD systems now exceeds 90 and even 95 onsome systemsGiven the recent surge of interest in asr in both industry and academia we, asresearchers who have actively participated in and closely witnessed many of therecent exciting deep learning technology development, believe the time is ripe towrite a book to summarize the advancements in the Asr field, especially thoseduring the past several yearsefaceAlong with the development of the field over the past two decades or so, wehave seen a number of useful books on asr and on machine learning related toASR. some of which are listed hereDeep Learning: Methods and Applications, by Li Deng and Dong Yu ( June2014)Automatic Speech and Speaker Recognition: Large Margin and Kernel methodsby Joseph Keshet, Samy Bengio (January 2009)Speech Recognition Over Digital Channels: Robustness and Standards, byAntonio Peinado and Jose Segura(September 2006)Pattern Recognition in Speech and language processing by wu chou andBiing-Hwang Juang(February 2003)Speech Processing-A Dynamic and Optimization-Oriented Approach, by LiDeng and Doug O Shaughnessy June 2003)Spoken Language Processing: A Guide to Theory, Algorithm and SystemDevelopment, by Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon(April2001)Digital Speech Processing: Synthesis, and Recognition, Second Edition, bySadaoki Furui June 2001)Speech Communications: Human and Machine, Second Edition, by DouglasO'Shaughnessy (June 2000)Speech and Language Processing-An Introduction to Natural Language Pro-cessing, Computational LinguisticS, and Speech Recognition, by Daniel Jurafskyand James Martin(April 2000)Speech and Audio Signal Processing, by Ben Gold and Nelson Morgan(April2000Statistical Methods for Speech Recognition, by Fred Jelinek (June 1997)Fundamentals of Speech Recognition, by Lawrence Rabiner and Biing-HwangJuang(April 1993)Acoustical and Environmental robustness in automatic Speech Recognition, byAlex Acero(November 1992)All these books, however, were either published before the rise of deep learningfor asr in 2009 or. as our 2014 overview book. were focused on less technicalaspects of deep learning for Asr than is desired. These earlier books did notinclude the new deep learning techniques developed after 2010 with sufficienttechnical and mathematical detail as would be demanded by asr or deep learningspecialists. Different from the above books and in addition to some necessarybackground material, our current book is mainly a collation of research in mostrecent advances in deep learning or discriminative and hierarchical models, asapplied specific to the field of Asr. Our new book presents insights and theoreticalfoundation of a series of deep learning models such as deep neural network DNNrestricted Boltzmann machine(rbm), denoising autoencoder, deep belief networkrecurrent neural network (RNN) and long short-term memory (LSTM) rNN, andtheir application in Asr through a variety of techniques including the DNN-HMM
2021-07-07 22:00:19 7.53MB 深度学习、 自动语音识别
1
好多人私聊说请教Qt如何编写语音识别的教程,因为时间关系,教程就不写了直接将语音识别源码上传
2021-07-07 17:12:30 8KB Qt 语音聊天 语音识别 百度语音
1
Android 语音识别 (源码)
2021-07-07 14:18:01 216.47MB Android kotlin 语音识别 AI
1
微软.net离线语音识别引擎控制台测试程序,可以实现最简单的词汇语法,具体词汇请在代码的choice里添加。可以查看.net文档,实现更复杂的语法。
2021-07-07 11:26:33 158KB 语音识别
1
LD3320语音识别模块客户资料.zip
2021-07-06 18:01:36 129.71MB 单片机 嵌入式 语音识别模块
1
cmu sphinx语音识别,中文语音包,官网下载重新打的zip包
2021-07-06 17:05:40 51.69MB 语音识别 cmusphinx
1
比较少见的语音识别中文书籍,该领域权威俞栋、邓力著,豆瓣评分8.0。
2021-07-06 16:37:42 33.48MB 深度学习 语音识别
1
根据微信的插件“同声传译”封装的语音识别组件,只需要把组件复制到你的项目中,然后按里面的使用方式.txt的步骤直接调用即可
2021-07-05 20:07:25 83KB 微信小程序 组件
1