解析深度学习:语音识别实践-讲义文档类资源

上传者: wwyy2010 | 上传时间: 2021-07-07 22:00:19 | 文件大小: 7.53MB | 文件类型: -
上传的文件是“解析深度学习:语音识别实践“全书的英文版,作者是【美】俞栋 邓力,供大家下载学习。友情提示:这本书的中文版已经在网上销售了。
Moreinformationaboutthisseriesathttp://www.springer.com/series/4748Dong Yu. Li DengAutomatic SpeechRecognitionA Deep Learning ApproachSringerDong yuLi DengMicrosoft researchMicrosoft researchBothellRedmond. waUSAUSAISSN1860-4862issn 1860-4870(electronic)ISBN978-1-44715778-6ISBN978-1-4471-5779-3( e Book)DOI10.10071978-1-4471-5779-3Library of Congress Control Number: 2014951663Springer London Heidelberg New York DordrechtC Springer-Verlag London 2015This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrationsrecitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformation storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarnethodology now known or hereafter developed. Exempted from this legal reservation are briefexcerpts in connection with reviews or scholarly analysis or material supplied specifically for thepurpose of being entered and executed on a computer system, for exclusive use by the purchaser of thework. Duplication of this publication or parts thereof is permitted only under the provisions ofthe Copyright Law of the Publishers location, in its current version, and permission for use must alwaysbe obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearance Center. Violations are liable to prosecution under the respective Copyright LawThe use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exemptfrom the relevant protective laws and regulations and therefore free for general useWhile the advice and information in this book are believed to be true and accurate at the date ofpublication, neither the authors nor the editors nor the publisher can accept any legal responsibility forany errors or omissions that may be made. The publisher makes no warranty, express or implied, withrespect to the material contained hereinPrinted on acid-free paperSpringerispartofSpringerScience+businessMedia(www.springer.com)To my wife and parentsong yuTo Lih-Yuan, Lloyd, Craig, Lyle, arie,and axelengForewordThis is the first book on automatic speech recognition(Asr) that is focused on thedeep learning approach, and in particular, deep neural network(DNn) technologyThe landmark book represents a big milestone in the journey of the dnn technology, which has achieved overwhelming successes in Asr over the past fewyears. Following the authors'recent book on"Deep Learning: Methods andApplications", this new book digs deeply and exclusively into ASR technology andapplications, which were only relatively lightly covered in the previous book inparallel with numerous other applications of deep learning. Importantly, thebackground material of AsR and technical detail of DNNs including rigorousmathematical descriptions and software implementation are provided in this book,invaluable for Asr experts as well as advanced studentsOne unique aspect of this book is to broaden the view of deep learning fromDNNS, as commonly adopted in asr by now, to encompass also deep generativemodels that have advantages of naturally embedding domain knowledge andproblem constraints. The background material did justice to the incredible richnessof deep and dynamic generative models of speech developed by asr researcherssince early 90s, yet without losing sight of the unifying principles with respect tothe recent rapid development of deep discriminative models of DNNs. Comprehensive comparisons of the relative strengths of these two very different types ofdeep models using the example of recurrent neural nets versus hidden dynamicmodels are particularly insightful, opening an exciting and promising direction fornew development of deep learning in AsR as well as in other signal and information processing applications. From the historical perspective, four generations ofASR technology have been recently analyzed. The 4th Generation technology isreally embodied in deep learning elaborated in this book, especially when DNNsare seamlessly integrated with deep generative models that would enable extendedknowledge processing in a most natural fashionAll in all, this beautifully produced book is likely to become a definitive ref-erence for AsR practitioners in the deep learning era of 4th generation ASR. Thebook masterfully covers the basic concepts required to understand the asr field asa whole, and it also details in depth the powerful deep learning methods that haveForewordshattered the field in 2 recent years. The readers of this book will become articulaten the new state-of-the-art of asr established by the dnn technology, and bepoised to build new AsR systems to match or exceed human performanceBy Sadaoki Furui, President of Toyota Technological Institute at Chicago, andProfessor at the Tokyo Institute of TechnologyPrefaceAutomatic Speech Recognition (Asr), which is aimed to enable natural humanmachine interaction, has been an intensive research area for decades many coretechnologies, such as Gaussian mixture models (GMMs), hidden Markov models(HMMS), mel-frequency cepstral coefficients(MFCCS) and their derivatives,nram language models (LMs), discriminative training, and various adaptationtechniques have been developed along the way, mostly prior to the new milleniumThese techniques greatly advanced the state of the art in Asr and in its relatedfields. Compared to these earlier achievements, the advancement in the research andapplication of Asr in the decade before 2010 was relatively slow and less exciting,although important techniques such as GMM-HMM sequence discriminativetraining were made to work well in practical systems during this periodIn the past several years, however, we have observed a new surge of interest inASR. In our opinion, this change was led by the increased demands on ASR inmobile devices and the success of new speech applications in the mobile world suchas voice search(VS), short message dictation(SMD), and virtual speech assistants(e. g, Apples Siri, Google Now, and Microsofts Cortana). Equally important is thedevelopment of the deep learning techniques in large vocabulary continuous speechrecognition (LVCSR) powered by big data and significantly increased computinability. A combination of a set of deep learning techniques has led to more than1 /3 error rate reduction over the conventional state-of-the-art gmM-hMm framework on many real-world L VCSR tasks and helped to pass the adoption threshold formany real-world users. For example, the word accuracy in English or the characteraccuracy in Chinese in most SMD systems now exceeds 90 and even 95 onsome systemsGiven the recent surge of interest in asr in both industry and academia we, asresearchers who have actively participated in and closely witnessed many of therecent exciting deep learning technology development, believe the time is ripe towrite a book to summarize the advancements in the Asr field, especially thoseduring the past several yearsefaceAlong with the development of the field over the past two decades or so, wehave seen a number of useful books on asr and on machine learning related toASR. some of which are listed hereDeep Learning: Methods and Applications, by Li Deng and Dong Yu ( June2014)Automatic Speech and Speaker Recognition: Large Margin and Kernel methodsby Joseph Keshet, Samy Bengio (January 2009)Speech Recognition Over Digital Channels: Robustness and Standards, byAntonio Peinado and Jose Segura(September 2006)Pattern Recognition in Speech and language processing by wu chou andBiing-Hwang Juang(February 2003)Speech Processing-A Dynamic and Optimization-Oriented Approach, by LiDeng and Doug O Shaughnessy June 2003)Spoken Language Processing: A Guide to Theory, Algorithm and SystemDevelopment, by Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon(April2001)Digital Speech Processing: Synthesis, and Recognition, Second Edition, bySadaoki Furui June 2001)Speech Communications: Human and Machine, Second Edition, by DouglasO'Shaughnessy (June 2000)Speech and Language Processing-An Introduction to Natural Language Pro-cessing, Computational LinguisticS, and Speech Recognition, by Daniel Jurafskyand James Martin(April 2000)Speech and Audio Signal Processing, by Ben Gold and Nelson Morgan(April2000Statistical Methods for Speech Recognition, by Fred Jelinek (June 1997)Fundamentals of Speech Recognition, by Lawrence Rabiner and Biing-HwangJuang(April 1993)Acoustical and Environmental robustness in automatic Speech Recognition, byAlex Acero(November 1992)All these books, however, were either published before the rise of deep learningfor asr in 2009 or. as our 2014 overview book. were focused on less technicalaspects of deep learning for Asr than is desired. These earlier books did notinclude the new deep learning techniques developed after 2010 with sufficienttechnical and mathematical detail as would be demanded by asr or deep learningspecialists. Different from the above books and in addition to some necessarybackground material, our current book is mainly a collation of research in mostrecent advances in deep learning or discriminative and hierarchical models, asapplied specific to the field of Asr. Our new book presents insights and theoreticalfoundation of a series of deep learning models such as deep neural network DNNrestricted Boltzmann machine(rbm), denoising autoencoder, deep belief networkrecurrent neural network (RNN) and long short-term memory (LSTM) rNN, andtheir application in Asr through a variety of techniques including the DNN-HMM

文件下载

评论信息

  • fyl222 :
    还不错,可是后来我又买了中文版。
    2018-05-03
  • zjk05110106 :
    东西很不好, 全英文
    2018-04-10

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明