首页编程正文内容

A Simple Way to Initialize Recurrent Networks of Re

编程

更新时间：2025-06-08 11:02:31 27

admin 管理员组

文章数量: 1087139

2024年4月12日发(作者：strcpy导致内存越界)

,NavdeepJaitly,

]

[

Google

Abstract

Learninglongtermdependenciesinrecurrentnetworksisdifﬁcultduetovan-

comethisdifﬁculty,researchershavede-

paper,weproposeasimplersolutionthatuserecurrentneuralnetworkscomposed

ofrectiﬁursolutionistheuseoftheidentitymatrixorits

ﬁndthatoursolutionis

comparabletoastandardimplementationofLSTMsonourfourbenchmarks:two

toyproblemsinvolvinglong-rangetemporalstructures,alargelanguagemodeling

problemandabenchmarkspeechrecognitionproblem.

1Introduction

Recurrentneuralnetworks(RNNs)areverypowerfuldynamicalsystemsandtheyarethenatural

wayofusingneuralnetworkstomapaninputsequencetoanoutputsequence,asinspeechrecog-

nitionandmachinetranslation,ortopredictthenextterminasequence,asinlanguagemodeling.

However,trainingRNNsbyusingback-propagationthroughtime[30]tocomputeerror-derivatives

canbedifﬁttemptssufferedfromvanishingandexplodinggradients[15]andthismeant

thattheyhadgreatdifﬁfferentmethodshavebeen

proposedforovercomingthisdifﬁculty.

Amethodthathasproducedsomeimpressiveresults[23,24]istoabandonstochasticgradient

descentinfavorofamuchmoresophisticatedHessian-Free(HF)ates

onlargemini-batchesandisabletodetectpromisingdirectionsintheweight-spacethathavevery

uentwork,however,suggestedthatsimilarresults

couldbeachievedbyusingstochasticgradientdescentwithmomentumprovidedtheweightswere

initializedcarefully[34]andlargegradientswereclipped[28].FurtherdevelopmentsoftheHF

approachlookpromising[35,25]butaremuchhardertoimplementthanpopularsimplemethods

suchasstochasticgradientdescentwithmomentum[34]oradaptivelearningratesforeachweight

thatdependonthehistoryofitsgradients[5,14].

ThemostsuccessfultechniquetodateistheLongShortTermMemory(LSTM)RecurrentNeural

Networkwhichusesstochasticgradientdescent,butchangesthehiddenunitsinsuchawaythat

thebackpropagatedgradientsaremuchbetterbehaved[16].LSTMreplaceslogisticortanhhidden

unitswith“memorycells”morycellhasitsowninputand

outputgatesthatcontrolwheninputsareallowedtoaddtothestoredanalogvalueandwhenthis

valueisallowedtoinﬂatesarelogisticunitswiththeirownlearnedweights

onconnectios

alsoaforgetgatewithlearnedweightsthatcontrolstherateatwhichtheanalogvaluestoredinthe

iodswhentheinputandoutputgatesareoffandtheforgetgateisnot

causingdecay,

storedvaluestaysconstantwhenbackpropagatedoverthoseperiods.

TheﬁrstmajorsuccessofLSTMswasforthetaskofunconstrainedhandwritingrecognition[12].

Sincethen,theyhaveachievedimpressiveresultsonmanyothertasksincludingspeechrecogni-

tion[13,10],handwritinggeneration[8],sequencetosequencemapping[36],machinetransla-

tion[22,1],imagecaptioning[38,18],parsing[37]andpredictingtheoutputsofsimplecomputer

programs[39].

TheimpressiveresultsachievedusingLSTMsmakeitimportanttodiscoverwhichaspectsofthe

rathercomplicas

unlikelythatHochreiterandSchmidhuber’s[16]initialdesigncombinedwiththesubsequentintro-

ductionofforgetgates[6,7]istheoptimaldesign:atthetime,theimportantissuewastoﬁndany

schemethatcouldlearnlong-rangedependenciesratherthantoﬁndtheminimaloroptimalscheme.

Oneaimofthispaperistocastlightonwhataspectsofthedesignareresponsibleforthesuccessof

LSTMs.

Recentresearchondeepfeedforwardnetworkshasalsoproducedsomeimpressiveresults[19,3]

andthereisnowaconsensusthatfordeepnetworks,rectiﬁedlinearunits(ReLUs)areeasiertotrain

thanthelogisticortanhunitsthatwereusedformanyyears[27,40].Atﬁrstsight,ReLUsseem

inappropriateforRNNsbecausetheycanhaveverylargeoutputssotheymightbeexpectedtobefar

daimofthispaperistoexplore

whetherReLUscanbemadetoworkwellinRNNsandwhethertheeaseofoptimizingthemin

feedforwardnetstransferstoRNNs.

2Theinitializationtrick

Inthispaper,wedemonstratethat,withtherightinitializationoftheweights,RNNscomposed

ofrectiﬁedlinearunitsarerelativelyeasytotrainandaregoodatmodelinglong-rangedependen-

saretrainedbyusingbackpropagationthroughtimetogeterror-derivativesforthe

weerformance

ontestdataiscomparablewithLSTMs,bothfortoyproblemsinvolvingverylong-rangetemporal

structuresandforrealtaskslikepredictingthenextwordinaverylargecorpusoftext.

Weinitializans

thateachnewhiddenstatevectorisobtainedbysimplycopyingtheprevioushiddenvectorthen

addingobsence

ofinput,anRNNthatiscomposedofReLUsandinitializedwiththeidentitymatrix(whichwecall

anIRNN)juststaysinthesamestateindeﬁntityinitializationhastheverydesirable

propertythatwhentheerrorderivativesforthehiddenunitsarebackpropagatedthroughtimethey

thesamebehaviorasLSTMs

whentheirforgetgatesaresetsothatthereisnodecayanditmakesiteasytolearnverylong-range

temporaldependencies.

Wealsoﬁndthatfortasksthatexhibitlesslongrangedependencies,scalingtheidentitymatrixby

thesamebehavioras

LTSMswhentheirforgetgatesaresetsothatthememorydecaysfast.

OurinitializationschemebearssomeresemblancetotheideaofMikolovetal.[26],whereapartof

theweightmatrixisﬁndifferenceoftheirworkto

oursisthefactthatournetworkusestherectiﬁedlinearunitsandtheidentitymatrixisonlyusedfor

ledidentityinitializationwasalsoproposedinSocheretal.[32]inthecontext

kisalsorelatedtotheworkof

Saxeetal.[31],whostudytheuseoforthogonalmatricesasinitializationindeepnetworks.

3Overviewoftheexperiments

timestep,theﬁrstinputunithasarealvalue

andthesecondinputunithasavalueof0or1asshowninﬁkistoreportthesumof

thetworealvaluesthataremarkedbyhavinga1asthesecondinput[16,15,24].IRNNscanlearn

tohandlesequenceswithalengthof300,whichisachallengingregimeforotheralgorithms.

本文标签：导致内存越界作者

版权声明：本文标题：A Simple Way to Initialize Recurrent Networks of Re 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://roclinux.cn/b/1712896258a611330.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

CatchPoint Cyclic-GMP Fluorescent Assay Kit产品说明书

技术日记

5月前

年月日发(作者：在线解码)-#():,-’,’-().,,.-.()--(.).,-.,.---.-_-,.,-(),,..:()---.-().-,,.,.--,,-.-_-.:.:..,.-,,-..,.:.()...:%()().–-

VB 225 FS 技术数据说明书

技术日记

5月前

年月日发(作者：如何判断为空)--,-,%.-.-;(%)..-.,-,,,.(--).-(--,-,--,.)’():():,..:,-.,..,--.:,().():;;;:()-,..,..%(.)-%(.)-%(.)---,,.(.

promega celltiter-fluor cell viability assay protocol

技术日记

5月前

年月日发(作者：用制作网页教程)-,..#-:..-:@............................................................................................

霍尼韦尔webs楼宇管理系统产品手册(中文版)共62页

技术日记

5月前

年月日发(作者：事件和事件的区别)

c语言中返回指针的函数

技术日记

5月前

年月日发(作者：)语言中返回指针的函数语言中可以编写返回指针的函数。这种函数的返回类型是指针类型，它返回一个指向特定类型的内存地址的指针。通过返回指针，我们可以在函数之外访问和操作函数内部创建的变量或数据结构。返回指针的函数在很多情况下非常

C语言 1套选择题

技术日记

5月前

年月日发(作者：实用教程)

程序设计教程(机械工业出版社)课后习题答案-第1章-概述

技术日记

5月前

年月日发(作者：集合)第章概述、简述冯•诺依曼计算机的工作模型。答：冯•诺依曼计算机的工作模型是：待执行的程序从外存装入到内存中，从内存中逐条地取程序中的指令执行；程序执行中所需要的数据从内存或从外设中获得，程序执行中产生的中间结果保存在内

计算机桌面图片打不开显示内存不足,windows照片查看器无法显示此图片,因为计算机上的可用内存可能不足解决方法...

编程

4月前

win7查看照片显示内存不足怎么办呢？有用户使用win7照片查看器打开图片时提示：windows照片查看器无法显示此图片,因为计算机上的可用内存可能不足。但是电脑硬件配置足够高&#xff0

WindowsXP内存支持到64G，有图有真相

编程

4月前

网上查很多资料都说WindowsXP内存最大支持到3.2G，通过WindowsXP补丁最大可支持64G，当然得您电脑有那么大内存才可以，我这里做了测试，

3dmax渲染计算机内存不足怎么办,解决3dmax渲染内存不够导致渲染失败的三种方法...

编程

4月前

3dmax渲染内存不够导致渲染失败应该怎么解决?在使用3dmax渲染效果图的过程之中，如果弹窗提示3dmax渲染内存不足，导致无法渲染或者渲染失败，这种情况我们要怎么办呢?本

程序员如何释放C盘内存的3种方式

编程

4月前

cmd运行：powercfg -h off 清除sys文件

七种有效方法，轻松解决C盘内存爆满问题

编程

4月前

七种有效方法，轻松解决C盘内存爆满问题在我们日常电脑的使用中，C盘内存爆满是一个常见且令人头疼的问题。由于电脑运行产生的临时文件、软件下载及安装位置默认在C盘、系统更新后的文件占用大量空间等因素，导致C盘空间捉襟见肘。为了电脑能够流畅稳

Win10电脑WinSXS文件占用C盘内存过高怎么办？

编程

4月前

最近许多用户在使用电脑的时候，发现自己的电脑变得非常卡顿，而且部分用户在检查之后发现自己的电脑中有一个WinSXS文件夹占用了非常的多的C盘空间，这是什么文件呢&#x

win10彻底关闭自动更新（关闭windows update）解决内存占用过高，卡顿问题

编程

4月前

前段时间突然发现自己的电脑开机就占用50%，心想不对啊，我这是16g的内存啊，一开机就占用8g?并且还偶尔卡顿，这还怎么愉快的玩耍&#xff0c

【内存取证】基本介绍 volatility

编程

4月前

目录基本知识 system 注册表 PID 分区 volatility 基本命令 -f imageinfo 查看进程信息pslist 查看注册表hivelist -p 所有命令基本知识 system 注册表

Windows和Linux下排查C++软件异常的常用调试器与内存检测工具详细介绍

编程

3月前

目录 1、引言 2、概述 3、Windows下常用调试器和分析工具 3.1、Visual Studio开发调试工具 3.2、Windbg调试器 3.3、Windows内存分析工具 4、Linux下常用调试器和工具 4.1、g

学计算机U盘内存,用U盘启动盘快速检测电脑内存的详细教程

编程

3月前

内存作为电脑中重要的设备，影响着系统的运行，同时人们在自己选购机器中也可能遇到不良商家以次充好，那么我们有什么办法来检测电脑内存呢？今天小编为大家介绍借

小巧精悍，支持十代i9标压CPU的小主机，支持双M2固态64G笔记本内存，最高可选配RTX 3000独立显卡！...

编程

3月前

在优秀的工业设计面前，小也可以性能强悍，扩展丰富，小巧的同时也可以要求更强的性能，比如下面这款戴尔 Precision 3240 微型工作站&#

win7win10的物理内存(RAM)12G还需要虚拟内存吗？

编程

2月前

最近发现C盘越来越小？我的物理内存12G但是查看C盘下的隐藏文件pagefile.sys居然占用10G硬盘空间，hiberfil.sys占用4.75G总共14G多空间白白浪费&#xff0c

发表评论

全部评论 0

暂无评论

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

A Simple Way to Initialize Recurrent Networks of Re

更多相关文章

CatchPoint Cyclic-GMP Fluorescent Assay Kit产品说明书

VB 225 FS 技术数据说明书

promega celltiter-fluor cell viability assay protocol

霍尼韦尔webs楼宇管理系统产品手册(中文版)共62页

c语言中返回指针的函数

最新国家开放大学电大专科《管理信息系统》机考网络考试标准题库及答案

C语言 1套 选择题

程序设计教程(机械工业出版社)课后习题答案-第1章-概述

计算机桌面图片打不开显示内存不足,windows照片查看器无法显示此图片,因为计算机上的可用内存可能不足解决方法...

WindowsXP内存支持到64G，有图有真相

3dmax渲染计算机内存不足怎么办,解决3dmax渲染内存不够导致渲染失败的三种方法...

程序员如何释放C盘内存的3种方式

七种有效方法，轻松解决C盘内存爆满问题

Win10电脑WinSXS文件占用C盘内存过高怎么办？

win10彻底关闭自动更新（关闭windows update）解决内存占用过高，卡顿问题

【内存取证】基本介绍 volatility

Windows和Linux下排查C++软件异常的常用调试器与内存检测工具详细介绍

学计算机U盘内存,用U盘启动盘快速检测电脑内存的详细教程

小巧精悍，支持十代i9标压CPU的小主机，支持双M2固态64G笔记本内存，最高可选配RTX 3000独立显卡！...

win7win10的物理内存(RAM)12G还需要虚拟内存吗？

发表评论

推荐文章

powerbi - Calculate average of 2 columns ignoring zeros - Stack Overflow

Add &lt;br &gt; tag after end of every line in jQueryJavaScript - Stack Overflow

javascript - count how many characters fit in a DIV - Stack Overflow

sql server - Not able to update step SQL job params while main job is running - Stack Overflow

windows7的“时钟”小工具

热门文章

javascript - How do I send multiple files with expressjs? - Stack Overflow

r - names_pattern in tidyr pivot_longer for multiple variables nested in the column names - Stack Overflow

javascript - Loading data asynchronously and downloading CSV data with a click is one step behind the latest retrieved data - St

Performing xnor in javascript - Stack Overflow

java - Selecting JTree node on popup - Stack Overflow

javascript - How to override default mock of useParams in jest - Stack Overflow

javascript - OL3: how to modify selected feature style based on zoom? - Stack Overflow

javascript - How to add vimeo player into pop up window when it clck on Image..??? - Stack Overflow

javascript - Nested for loops and multidimensional arrays - Stack Overflow

虚拟机及Linux[CentOS7]下载、安装说明(全)

最新文章

javascript - How do I toggle the readonly attribute of all child element with jquery - Stack Overflow

javascript - Might it be possible to block an entire US state from accessing my site, using PHP? - Stack Overflow

c++ - Is dereferencing std::span::end always undefined? - Stack Overflow

javascript - Delay function execution if it has been called recently - Stack Overflow

javascript - Google Maps Autocomplete List - Stack Overflow

Windows 安装和连接使用 PgSql数据库

cmd打开计算机D盘,Win7利用cmd命令进入d盘文件夹的操作方法

如何在VMare中制作Windows Embedded Standard 7 (WES 7)

开机、注销后自动登录Windows

【教程】Python Flask快速学习

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA

C语言 1套选择题

Add <br > tag after end of every line in jQueryJavaScript - Stack Overflow