上传的文件是“解析深度学习:语音识别实践“全书的英文版,作者是【美】俞栋 邓力,供大家下载学习。友情提示:这本书的中文版已经在网上销售了。
Moreinformationaboutthisseriesathttp://www.springer.com/series/4748Dong Yu. Li DengAutomatic SpeechRecognitionA Deep Learning ApproachSringerDong yuLi DengMicrosoft researchMicrosoft researchBothellRedmond. waUSAUSAISSN1860-4862issn 1860-4870(electronic)ISBN978-1-44715778-6ISBN978-1-4471-5779-3( e Book)DOI10.10071978-1-4471-5779-3Library of Congress Control Number: 2014951663Springer London Heidelberg New York DordrechtC Springer-Verlag London 2015This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrationsrecitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformation storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarnethodology now known or hereafter developed. Exempted from this legal reservation are briefexcerpts in connection with reviews or scholarly analysis or material supplied specifically for thepurpose of being entered and executed on a computer system, for exclusive use by the purchaser of thework. Duplication of this publication or parts thereof is permitted only under the provisions ofthe Copyright Law of the Publishers location, in its current version, and permission for use must alwaysbe obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearance Center. Violations are liable to prosecution under the respective Copyright LawThe use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exemptfrom the relevant protective laws and regulations and therefore free for general useWhile the advice and information in this book are believed to be true and accurate at the date ofpublication, neither the authors nor the editors nor the publisher can accept any legal responsibility forany errors or omissions that may be made. The publisher makes no warranty, express or implied, withrespect to the material contained hereinPrinted on acid-free paperSpringerispartofSpringerScience+businessMedia(www.springer.com)To my wife and parentsong yuTo Lih-Yuan, Lloyd, Craig, Lyle, arie,and axelengForewordThis is the first book on automatic speech recognition(Asr) that is focused on thedeep learning approach, and in particular, deep neural network(DNn) technologyThe landmark book represents a big milestone in the journey of the dnn technology, which has achieved overwhelming successes in Asr over the past fewyears. Following the authors'recent book on"Deep Learning: Methods andApplications", this new book digs deeply and exclusively into ASR technology andapplications, which were only relatively lightly covered in the previous book inparallel with numerous other applications of deep learning. Importantly, thebackground material of AsR and technical detail of DNNs including rigorousmathematical descriptions and software implementation are provided in this book,invaluable for Asr experts as well as advanced studentsOne unique aspect of this book is to broaden the view of deep learning fromDNNS, as commonly adopted in asr by now, to encompass also deep generativemodels that have advantages of naturally embedding domain knowledge andproblem constraints. The background material did justice to the incredible richnessof deep and dynamic generative models of speech developed by asr researcherssince early 90s, yet without losing sight of the unifying principles with respect tothe recent rapid development of deep discriminative models of DNNs. Comprehensive comparisons of the relative strengths of these two very different types ofdeep models using the example of recurrent neural nets versus hidden dynamicmodels are particularly insightful, opening an exciting and promising direction fornew development of deep learning in AsR as well as in other signal and information processing applications. From the historical perspective, four generations ofASR technology have been recently analyzed. The 4th Generation technology isreally embodied in deep learning elaborated in this book, especially when DNNsare seamlessly integrated with deep generative models that would enable extendedknowledge processing in a most natural fashionAll in all, this beautifully produced book is likely to become a definitive ref-erence for AsR practitioners in the deep learning era of 4th generation ASR. Thebook masterfully covers the basic concepts required to understand the asr field asa whole, and it also details in depth the powerful deep learning methods that haveForewordshattered the field in 2 recent years. The readers of this book will become articulaten the new state-of-the-art of asr established by the dnn technology, and bepoised to build new AsR systems to match or exceed human performanceBy Sadaoki Furui, President of Toyota Technological Institute at Chicago, andProfessor at the Tokyo Institute of TechnologyPrefaceAutomatic Speech Recognition (Asr), which is aimed to enable natural humanmachine interaction, has been an intensive research area for decades many coretechnologies, such as Gaussian mixture models (GMMs), hidden Markov models(HMMS), mel-frequency cepstral coefficients(MFCCS) and their derivatives,nram language models (LMs), discriminative training, and various adaptationtechniques have been developed along the way, mostly prior to the new milleniumThese techniques greatly advanced the state of the art in Asr and in its relatedfields. Compared to these earlier achievements, the advancement in the research andapplication of Asr in the decade before 2010 was relatively slow and less exciting,although important techniques such as GMM-HMM sequence discriminativetraining were made to work well in practical systems during this periodIn the past several years, however, we have observed a new surge of interest inASR. In our opinion, this change was led by the increased demands on ASR inmobile devices and the success of new speech applications in the mobile world suchas voice search(VS), short message dictation(SMD), and virtual speech assistants(e. g, Apples Siri, Google Now, and Microsofts Cortana). Equally important is thedevelopment of the deep learning techniques in large vocabulary continuous speechrecognition (LVCSR) powered by big data and significantly increased computinability. A combination of a set of deep learning techniques has led to more than1 /3 error rate reduction over the conventional state-of-the-art gmM-hMm framework on many real-world L VCSR tasks and helped to pass the adoption threshold formany real-world users. For example, the word accuracy in English or the characteraccuracy in Chinese in most SMD systems now exceeds 90 and even 95 onsome systemsGiven the recent surge of interest in asr in both industry and academia we, asresearchers who have actively participated in and closely witnessed many of therecent exciting deep learning technology development, believe the time is ripe towrite a book to summarize the advancements in the Asr field, especially thoseduring the past several yearsefaceAlong with the development of the field over the past two decades or so, wehave seen a number of useful books on asr and on machine learning related toASR. some of which are listed hereDeep Learning: Methods and Applications, by Li Deng and Dong Yu ( June2014)Automatic Speech and Speaker Recognition: Large Margin and Kernel methodsby Joseph Keshet, Samy Bengio (January 2009)Speech Recognition Over Digital Channels: Robustness and Standards, byAntonio Peinado and Jose Segura(September 2006)Pattern Recognition in Speech and language processing by wu chou andBiing-Hwang Juang(February 2003)Speech Processing-A Dynamic and Optimization-Oriented Approach, by LiDeng and Doug O Shaughnessy June 2003)Spoken Language Processing: A Guide to Theory, Algorithm and SystemDevelopment, by Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon(April2001)Digital Speech Processing: Synthesis, and Recognition, Second Edition, bySadaoki Furui June 2001)Speech Communications: Human and Machine, Second Edition, by DouglasO'Shaughnessy (June 2000)Speech and Language Processing-An Introduction to Natural Language Pro-cessing, Computational LinguisticS, and Speech Recognition, by Daniel Jurafskyand James Martin(April 2000)Speech and Audio Signal Processing, by Ben Gold and Nelson Morgan(April2000Statistical Methods for Speech Recognition, by Fred Jelinek (June 1997)Fundamentals of Speech Recognition, by Lawrence Rabiner and Biing-HwangJuang(April 1993)Acoustical and Environmental robustness in automatic Speech Recognition, byAlex Acero(November 1992)All these books, however, were either published before the rise of deep learningfor asr in 2009 or. as our 2014 overview book. were focused on less technicalaspects of deep learning for Asr than is desired. These earlier books did notinclude the new deep learning techniques developed after 2010 with sufficienttechnical and mathematical detail as would be demanded by asr or deep learningspecialists. Different from the above books and in addition to some necessarybackground material, our current book is mainly a collation of research in mostrecent advances in deep learning or discriminative and hierarchical models, asapplied specific to the field of Asr. Our new book presents insights and theoreticalfoundation of a series of deep learning models such as deep neural network DNNrestricted Boltzmann machine(rbm), denoising autoencoder, deep belief networkrecurrent neural network (RNN) and long short-term memory (LSTM) rNN, andtheir application in Asr through a variety of techniques including the DNN-HMM
2021-07-07 22:00:19 7.53MB 深度学习、 自动语音识别
1
演讲 文字转语音(TTS)和自动语音识别(ASR)。 链接到Doxygen生成的文档: : 安装 在可以找到从源代码安装的安装说明。 用法 有关如何启动或配置它的信息: 如果要构建新的语言模型,请阅读 更多示例: 要查看其他程序如何调用speechRecognition和Espeak并通过yarp对其进行配置,您可以查看代码的这一部分。 贡献 发布问题 阅读 叉与拉请求 按照,在master分支上创建功能分支( git checkout -b my-new-feature ) 提交您的更改 推送到分支( git push origin my-new-feature ) 创建一个新的拉取请求 地位 相似及相关项目
2021-04-03 22:05:21 73.11MB text-to-speech automatic-speech-recognition C++
1
DaCiDian是一个开源的中文普通话词汇,用于自动语音识别(ASR)
2021-03-24 09:53:35 4.53MB Python开发-自然语言处理
1
用TensorFlow实现的端到端自动语音识别系统
2021-02-27 15:52:02 189KB Python开发-机器学习
1
DSpeech是一个集成了ASR(自动语音识别)功能的TTS(文本到语音)程序。它能够大声朗读书面文本,并根据用户的声音回答选择要发音的句子。它专门设计用于快速高效地帮助您。同时,侵入性和资源消耗最小。 (DSpeech不会自行安装,很轻,它在一秒钟内启动,不会向注册表写入任何内容)。 DSpeech的一些显着特点是: 1.允许您将输出保存为.WAV,.MP3,AAC,WMA或OGG文件。 2.允许您快速选择不同的声音,甚至可以将它们合并,或者将它们并列以便在不同声音之间创建对话。 3. DSpeech集成了一个声音识别系统,允许您与用户创建交互对话。 4.允许您以独立的方式配置声音。 5.由于使用了标准TAG,它可以让您在播放过程中(速度,音量和频率)动态地改变声音的特征,插入暂停,强调特定的单词,甚至拼出它们。 6.允许您捕捉和复制ClipBoard的内容。 7. DSpeech兼容所有声音引擎(兼容SAPI 4-5)。 8. AI对话系统。不是很有用,但有趣。它不适用于每种语言。 9.它能够复制电影;此功能可将阅读字幕(标准SRT格式)与电影播放同步。支持的播放器有Media Player Classic和更高版本,以及VideoLAN VLC Player。
2020-01-18 03:32:54 3.14MB 文本到语音
1
Automatic Speech Recognition| ASR Lecture. About 18 lectures, plus a couple of extra lectures on basic introduction to neural networks. Lecturers: Steve Renals and Hiroshi Shimodaira.
2019-12-21 20:12:35 14.26MB asr 语音识别 ai kaldi
1