장기 단기 기억력

Long Short-Term Memory(LSTM) 셀은 데이터를 순차적으로 처리하여 시간이 지남에 따라 숨겨진 상태를 유지할 수 있습니다.

LSTM(^[1]Long Short-Term Memory)은 인공지능과 딥러닝 분야에서 사용되는 인공신경망이다.표준 피드포워드 뉴럴 네트워크와는 달리 LSTM은 피드백 접속이 있습니다.이러한 반복신경망(RNN)은 단일 데이터 포인트(이미지 등)뿐만 아니라 전체 데이터 시퀀스(음성 또는 비디오 등)도 처리할 수 있습니다.예를 들어 LSTM은 분할되지 않은, 연결된 필기 인식,^[2] 음성 인식,^[3]^[4] 기계 번역,^[5]^[6] 로봇 제어,^[7]^[8] 비디오 게임 ^[9]^[10]및 의료 ^[11]등의 작업에 적용할 수 있습니다.LSTM은 20세기에 ^[12]가장 많이 인용된 신경 네트워크가 되었다.

LSTM의 이름은 표준 RNN이 "장기 메모리"와 "단기 메모리"를 모두 가지고 있다는 것을 의미합니다.네트워크의 연결 무게와 편견은 시냅스 강도의 생리적인 변화가 장기 기억을 저장하는 방법과 유사하며, 네트워크의 활성화 패턴은 시간 단계마다 한 번씩 변화하며, 뇌의 전기 발화 패턴의 순간적인 변화가 단기 기억을 저장하는 방법과 유사합니다.s.^[13] LSTM 아키텍처는 수천 개의 시간 단계를 지속할 수 있는 RNN에 단기 메모리를 제공하는 것을 목표로 하고 있으며, 따라서 "장기 단기 메모리"^[1]를 제공합니다.

공통 LSTM 유닛은 셀, 입력 게이트, 출력^[14] 게이트 및 포겟 ^[15]게이트로 구성된다.셀은 임의의 시간 간격에 걸쳐 값을 기억하고 3개의 게이트는 셀을 출입하는 정보의 흐름을 조절합니다.

LSTM 네트워크는 시계열 내의 중요한 이벤트 간에 알 수 없는 시간 지연이 발생할 수 있기 때문에 시계열 데이터를 기반으로 분류, 처리 및 예측에 적합합니다.LSTM은 기존 RNN을 훈련할 때 발생할 수 있는 소멸 구배^[16] 문제를 다루기 위해 개발되었다. 갭 길이에 대한 상대적 둔감함은 수많은 애플리케이션에서 ^{[citation needed]}RNN, 숨겨진 마르코프 모델 및 기타 시퀀스 학습 방법에 비해 LSTM의 장점이다.

아이디어

이론적으로 클래식(또는 "바닐라") RNN은 입력 시퀀스에서 임의의 장기 의존성을 추적할 수 있습니다.자연에 바닐라 RNNs의 문제점은 계산(또는 실용적):바닐라 RNN을 사용하여 back-propagation 어떤 장기적인 경사치 훈련 back-propagated 수 있" 사라지다"또는"폭발하"(즉, 그들이 infinity는 경향이 있을 수 있)[16]는 finite-prec 사용하는 계산 과정에 관여한, 때문에(즉, 그들이 찾는 경향이 있을 수 있).isi숫자에 따라.LSTM 장치를 사용하는 RNN은 LSTM 장치를 사용하여 구배도 변경되지 않고 흐를 수 있기 때문에 소실 구배 문제를 부분적으로 해결합니다.단, LSTM 네트워크에서는 여전히 폭발적인 구배 ^[17]문제가 발생할 수 있습니다.

변종

다음 방정식에서 소문자 변수는 벡터를 나타냅니다. $W_{q}$ $W_{q}$ q { $displaystyle$ $W$ $U_{q}$ _ { $U_{q}$ q $W_{q}$ } $U_{q}$ u u u $U_$ { q $}$ 에는 $U_{q}$ $_{q}$ 각각 입력 및 반복 접속의 가중치가 포함됩니다.여기서 $_{q}$ q ${$ $displaystyle$ _ { $q$ }는 $입력$ $게이트$ i { $displaystyle$ i $i$ }, $출력$ 게이트o { $displaystyle$ o $o$ 잊기 게이트f { $disp$ } 중 하나입니다. $계산$ 중인 $활성화$ 에 따라 laystyle $f$ $}$ 또는 $f$ 메모리 셀 c $\display$ c $c$ 따라서 이 섹션에서는 "벡터 표기법"을 사용합니다.예를 들어 c $c_{t}\in \mathbb {R} ^{h}$ R $c_{t}\in \mathbb {R} ^{h}$ \ $displaystyle$ c $_$ { $t$ } \ $in$ \ $mathbb$ { $R }^$ { $h$ $}$ 는 $c_{t}\in \mathbb {R} ^{h}$ 1개의 LSTM 셀의 1개의 유닛이 아니라h\ $displaystyle$ h} LSTM $h$ 셀의 유닛을 $h$ 합니다.

포겟 게이트가 있는 LSTM

포겟 게이트가 있는 LSTM 셀의 포워드 패스 방정식의 콤팩트한 형식은 다음과 같습니다.^[1]^[15]

{\displaystyle {t}f_{t}&=\display_{g}(W_{f}x_{t}+U_{f}h_{t-1}+b_{f}\i_{t}&=\sigma_{g}(W_{i}x_{t}+U_{i}h_{t-1}+b_{i})\\o_{t}&=\filter_{g}(W_{o}x_{t}+U_{o}h_{t-1}+b_{o}\{\tilde {c}_{t}&=\tilde _{c}(W_{c}x_{t}+U_{c}h_{t-1}+b_{c}\c_{t}&=f_{t}\odot c_{t}+i_{t}\odot {t}\h_{t}&=o_{t}\odot \{h}(c_{t}) 정렬됨

여기서 초기값은 $c_{0}=0$ 0 $=$ {\ $displaystyle c_{0}=$ 0 $c_{0}=0$ } $h_{0}=0$ $h_{0}=0$ $h_{0}=0$ $=$ 0 {\ $displaystyle h_{0}=$ 0}이고 $h_{0}=0$ 연산자 $⊙$ {\ $displaystyle \odot}$ 은 $\odot$ Hadamard 제품(제품)을 나타냅니다. $첨자$ t $\displaystyle$ t는 $t$ 시간 스텝을 인덱싱합니다.

변수

$x_{t}\in \mathbb {R} ^{d}$ t $x_{t}\in \mathbb {R} ^{d}$ R $x_{t}\in \mathbb {R} ^{d}$ \ $display$ style $x$ _ { $t$ } \ $in$ \ $mathbb$ { $R }^$ { $d$ } : LSTM 유닛 입력 벡터
$f_{t}\in {(0,1)}^{h}$ t $f_{t}\in {(0,1)}^{h}$ ( $f_{t}\in {(0,1)}^{h}$ , $f_{t}\in {(0,1)}^{h}$ ) $h {$ { $displaystyle f$ _ { t $}$ \ in { ( $0$ , 1 ) $}^{h$ : 게이트의 액티베이션벡터를 잊어버립니다.
$i_{t}\in {(0,1)}^{h}$ t $i_{t}\in {(0,1)}^{h}$ ( $i_{t}\in {(0,1)}^{h}$ , $1 )$ { ( 0 , 1 )\ $in$ { ( $0$ , 1 )^{ $h$ : 입력/업데이트 게이트의 액티베터
$o_{t}\in {(0,1)}^{h}$ t $o_{t}\in {(0,1)}^{h}$ ( $o_{t}\in {(0,1)}^{h}$ , $o_{t}\in {(0,1)}^{h}$ ) $o_{t}\in {(0,1)}^{h}$ $\$ $display$ style $o$ _ { $t$ } \ $in$ { ( $0$ , $1$ )^{ $h$ : 출력 게이트의 활성화 벡터
$h_{t}\in {(-1,1)}^{h}$ t $h_{t}\in {(-1,1)}^{h}$ ( - $h_{t}\in {(-1,1)}^{h}$ , $h_{t}\in {(-1,1)}^{h}$ 1) $h_{t}\in {(-1,1)}^{h}$ { \ $displaystyle h$ _ { $t$ } \ $in$ { ( - $1$ , 1 $)$ }^{ $h$ : LSTM 유닛의 출력 벡터라고도 불리는 숨겨진 상태 벡터
${\tilde {c}}_{t}\in {(-1,1)}^{h}$ ~ ${\tilde {c}}_{t}\in {(-1,1)}^{h}$ ${\tilde {c}}_{t}\in {(-1,1)}^{h}$ ( - ${\tilde {c}}_{t}\in {(-1,1)}^{h}$ , ${\tilde {c}}_{t}\in {(-1,1)}^{h}$ ) ${\tilde {c}}_{t}\in {(-1,1)}^{h}$ h \ $displaystyle$ { $tilde$ { $c } _$ { $t$ } \ $in$ { ( - $1$ , 1 $)$ }^{ $h$ : 셀 입력 액티베이션 벡터
$c_{t}\in \mathbb {R} ^{h}$ t $c_{t}\in \mathbb {R} ^{h}$ $c_{t}\in \mathbb {R} ^{h}$ h \ $displaystyle c_{t}\in \mathbb {R}^{h$ : 셀 상태 벡터
$W\in \mathbb {R} ^{h\times d}$ $U\in \mathbb {R} ^{h\times h}$ $W\in \mathbb {R} ^{h\times d}$ $W\in \mathbb {R} ^{h\times d}$ × $W\in \mathbb {R} ^{h\times d}$ \ $style$ W \ $in$ \ $mathbb { R$ } ^ { $h$ \ $times$ d $W\in \mathbb {R} ^{h\times d}$ } , U $U\in \mathbb {R} ^{h\times h}$ $U\in \mathbb {R} ^{h\times h}$ × $U\in \mathbb {R} ^{h\times h}$ \ $display$ U $\ in$ \ $mathbb {$ $R$ } ^ { $U\in \mathbb {R} ^{h\times h}$ $h\$ $times$ h $}$ 및 $U\in \mathbb {R} ^{h\times h}$ $b\in \mathbb {R} ^{h}$ $b\in \mathbb {R} ^{h}$ h \ $style$ b \ $display$ b $\in$ \ mathbb { R } ^ { r $}$ {{ matrices matrices matrices matrices matrices matrices and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and

여기서 $superscripts$ d $\displaystyle$ $h$ 와 $d$ h $\displaystyle$ h는 $h$ 각각 입력 기능의 수와 숨겨진 장치의 수를 나타냅니다.

활성화 기능

$§$ $g$ \ $displaystyle$ \ $display _$ { g $\sigma _{g}$ } : sigmoid $\sigma _{g}$ .
$§$ $c$ \ $displaystyle$ \ $display _$ { c $\sigma _{c}$ } : 쌍곡선 접선 $\sigma _{c}$ .
$\sigma _{h}$ \ $displaystyle$ \ $display _$ { h $\sigma _{h}$ } : $hole$ $\sigma _{h}(x)=x$ ( $\sigma _{h}(x)=x$ x^[18]^[19] ) $=$ $\sigma _{h}(x)=x$ $\sigma _{h}(x)=x$ $displaystyle$ \ $displaystyle$ _ { h （ $\sigma _{h}(x)=x$ x $）=$ } 。

피프홀 LSTM

입력(

:\displaystyle

i

),

출력(

예

:\

displaystyle o)

및 forget(

:\displaystyle

f

f

게이트가 있는 핍홀 LSTM 장치.

오른쪽 그림은 핍홀 접속(즉, 핍홀 LSTM)^[18]^[19]이 있는 LSTM 장치를 그래픽으로 나타낸 것입니다.핍홀 접속을 통해 게이트는 셀 상태를 ^[18]활성화하는 CEC(Continent Error Carousel)에 액세스할 수 있습니다. $h_{t-1}$ t $h_{t-1}$ - 1 $({$ 은 $h_{t-1}$ 사용되지 않고 $c_{t-1}$ $c_{t-1}$ t - $c_{t-1}$ ({ $displaystyle c_{t-1})$ 이 $c_{t-1}$ 사용됩니다.

{t}f_{t}&=\display_{g}(W_{f}x_{t}+U_{f}c_{t-1}+b_{f}\i_{t}&=\sigma_{g}(W_{i}x_{t}+U_{i}c_{t-1}+b_{i}\\o_{t}&=\filter_{g}(W_{o}x_{t}+U_{o}c_{t-1}+b_{o}\\c_{t}&=f_{t}\odot c_{t1}+i_{t}\odot \modot _{c}(W_{c}x_{t}+b_{c})\\h_{t}&=o_{t}\odot \modot _{h}(c_{t})\end{aligned}

각 관문은 피드포워드(또는 다층) 신경망의 "표준" 뉴런으로 생각할 수 있다. 즉, 그들은 가중치 합계의 활성화(활성화 함수를 사용하여)를 계산한다. $i_{t},o_{t}$ t $i_{t},o_{t}$ $i_{t},o_{t}$ t $i_{t},o_{t}$ \ $displaystyle i_{t$ }, $o_{t}$ $f_{t}$ $f_{t}$ ${\$ $displaystyle f_{t}$ 는 $f_{t}$ 각각 시간 단계 $t$ \ $displaystyle$ t \ $t$ 에서의 입력 게이트, 출력 게이트 및 포겟 게이트의 액티베이션을 나타냅니다.

메모리 $셀$ c $(\$ $displaystyle$ $)$ 에서 $c$ 3개의 $게이트$ i $o(\displaystyle$ i,o) 및 f(\ $displaystyle$ f $)$ 로 $f$ 가는 3개의 출구 화살표는 핍홀 연결을 나타냅니다.이러한 핍홀 접속은 실제로 시간 $t-1$ t - $({displaystyle t-1$ 에서의 메모리 $셀$ c({ $displaystyle$ c_ ${$ t}) $c_{t-1}$ 의 $c$ 영향을 나타냅니다(그림에서 알 수 있듯이 $c_{t}$ t $c_{t-1}$ - $c_{t-1}$ 1({ $displaystyle c_{t$ 즉, $게이트$ i $, o$ $f는$ 메모리 $f$ 셀 $cdisplaystyle$ 의 활성화도 고려하여 $t$ $i_{t},o_{t}$ $o,$ t $,$ o, o, $o_{$ t $i_{t},o_{t}$ } $f_{t}$ $t$ 의 시간 $스텝$ t $($ 즉, $i_{t},o_{t}$ $,$ o, $o_{$ t $f_{t}$ 에서 활성화가 계산됩니다. $t-1$ $스텝$ t $t-1$ - $({displaystyle t-1$ $c_{t-1}$ $c_{t-1}$ t $c_{t-1}$ - $c_{t-1}$ ({ $displaystyle c_{t-1$ 에서 ylec를 선택합니다 $c$ .

메모리 셀에서 나오는 왼쪽에서 오른쪽으로의 화살표는 peephole 접속이 아니며 c $c_{t}$ ${\$ 를 $c_{t}$ .

$×(\displaystyle \times)$ 기호가 $\times$ 포함된 작은 원은 입력 사이의 요소별 곱셈을 나타냅니다.S와 같은 곡선을 포함하는 큰 원은 가중치 합계에 미분 가능한 함수(Sigmoid 함수와 같은)의 적용을 나타냅니다.

피프홀 컨볼루션 LSTM

Peephole 컨볼루션 LSTM.^[20] $§$ { $displaystyle *}$ 은 $*$ 컨볼루션 연산자를 나타냅니다.

{t}f_{t}&=\display_{g}(W_{f}*x_{t}+U_{f}*h_{t-1}+V_{f}\odot c_{t-1}+b_{f}\i_{t}&=\cisco_{g}(W_{i}*x_{t}+U_{i}*h_{t-1}+V_{i}\odot c_{t-1}+b_{i})\\c_{t}&=f_{t}\odot c_{t-1}+i_{t}\odot \c}(W_{c}*x_{t}+U_{c}*h_{t-1}+b_{c}\o_{t}&=\sigma_{g}(W_{o}*x_{t}+U_{o}*h_{t-1}+V_{o}\odot c_{t}+b_{o}\\h_{t}&=o_{t}\odot \modot _{h}(c_{t})\end{aligned}

트레이닝

LSTM 장치를 사용하는 RNN은 일련의 훈련 시퀀스에 대해 감독 방식으로 훈련할 수 있으며, 시간의 역전파와 결합된 구배 강하와 같은 최적화 알고리즘을 사용하여 최적화 프로세스 중에 필요한 구배를 계산하여 오류의 도함수에 비례하여 LSTM 네트워크의 각 가중치를 변경할 수 있다.또는 (LSTM 네트워크의 출력 레이어에서) 대응하는 무게에 관한 것입니다.

표준 RNN에 구배 강하를 사용할 때의 문제는 오류 구배가 중요한 사건 사이의 시간 지연의 크기와 함께 기하급수적으로 빠르게 사라진다는 것이다.이는 $W$ 의 $W$ $스펙트럼$ 반지름이 ^[16]^[21]1보다 작을 경우 $\lim _{n\to \infty }W^{n}=0$ n $\lim _{n\to \infty }W^{n}=0$ $\lim _{n\to \infty }W^{n}=0$ n $=$ { $displaystyle$ $\lim$ _ { $n\to \infty$ } $W^{n$ }= $0}$ 에 $\lim _{n\to \infty }W^{n}=0$ 기인한다.

다만, LSTM 유닛에서는, 에러치가 출력 레이어에서 역전파되면, 에러는 LSTM 유닛의 셀에 남습니다.이 "error carousel"은 각 LSTM 유닛의 게이트가 값을 끊는 방법을 학습할 때까지 오류를 지속적으로 반환합니다.

CTC 점수 함수

많은 애플리케이션은 LSTM RNN^[22] 스택을 사용하여 연결주의 시간 분류(CTC)[23]에 의해 훈련하여 대응하는 입력 시퀀스가 주어진 트레이닝 세트에서 라벨 시퀀스의 확률을 최대화하는 RNN 가중치 매트릭스를 찾습니다.CTC는 정렬과 인식을 모두 실현합니다.

대체 수단

경우에 따라서는, 특히 「선생님」(즉, 트레이닝 라벨)이 없는 경우, 신경 진화^[24] 또는 정책 그라데이션 방법에 의해서 LSTM을 트레이닝(일부)하는 것이 유리할 수 있습니다.

성공.

LSTM 유닛을 사용한 RNN의 비지도 방식 훈련은 몇 가지 성공 사례가 있습니다.

2018년 빌 게이츠는 오픈이 개발한 봇을 "인공지능을 발전시키는 거대한 이정표"라고 불렀다.AI는 도타2 ^[9]게임에서 인간을 이길 수 있었다.OpenAI Five는 독립적이지만 조정된 5개의 뉴럴 네트워크로 구성되어 있습니다.각 네트워크는 교사를 감독하지 않고 정책 구배법에 의해 훈련되며, 현재 게임 상태를 확인하고 여러 ^[9]가능한 액션 헤드를 통해 액션을 실행하는 1024 유닛의 단일 레이어 장단기 메모리를 포함한다.

2018년 OpenAI는 이와 유사한 LSTM을 정책 구배별로 교육하여 전례 없는 능숙한 ^[8]솜씨로 물리적 물체를 조작하는 인간과 같은 로봇 손을 제어했습니다.

2019년, DeepMind의 프로그램 AlphaStar는 복잡한 비디오 게임 Starcraft ^[10]II에서 뛰어나기 위해 깊은 LSTM 코어를 사용했다.이것은 인공지능을 ^[10]향한 중요한 진보로 여겨졌다.

적용들

LSTM에는 다음이 포함됩니다.

로봇 제어^[7]
시계열 예측^[24]
음성 인식^[25]^[26]^[27]
리듬^[19] 러닝
작곡^[28]
문법^[29]^[18]^[30] 학습
필기 인식^[31]^[32]
인간의 행동 인식^[33]
수화 번역^[34]
단백질 호몰로지^[35] 검출
단백질의^[36] 세포하 국재성 예측
시계열 이상^[37] 검출
비즈니스 프로세스^[38] 관리 분야의 몇 가지 예측 태스크
의료진료^[39] 경로 예측
시멘틱 해석^[40]
오브젝트 공동 세그멘테이션^[41]^[42]
공항 승객 관리^[43]
단기 트래픽^[44] 예측
의약품 설계^[45]
시장^[46] 예측

개발 일정

1991년: Sepp Hochreiter는 위르겐 슈미트후버가 조언한 독일어 졸업장^[16] 논문에서 사라지는 구배 문제를 분석하고 방법의 원리를 개발했습니다.

1995: "Long Short-Term Memory(LSTM)"는 Sepp Hochreiter와 Jurgen Schmidhuber의 ^[47]기술 보고서에 게재되어 있습니다.

1996년: LSTM은 NIPS의 1996년 피어 리뷰 회의에서 ^[14]발표된다.

1997: 주요 LSTM 논문은 Neural Computation ^[1]저널에 게재되었습니다.LSTM은 Continent Error Carousel(CEC) 유닛을 도입함으로써 소실 구배 문제에 대처하고 있습니다.LSTM 블록의 초기 버전에는 셀, 입력 ^[48]및 출력 게이트가 포함되어 있습니다.

1999년: 펠릭스 거스와 그의 어드바이저 위르겐 슈미드허버 및 프레드 커민스는 포겟 게이트(^[49]일명 '킵 게이트')를 LSTM 아키텍처에 도입하여 LSTM이 자체 상태를 ^[48]리셋할 수 있도록 했습니다.

2000년: Gers & Schmidhuber & Cummins는 아키텍처에 ^[15]핍홀 접속(셀에서 게이트로의 접속)을 추가했습니다.또한 출력 활성화 기능이 ^[48]누락되었습니다.

2001년: Gers와 Schmidhuber는 LSTM을 훈련시켜 숨겨진 마르코프 ^[18]^[50]모델 등 전통적인 모델로는 배울 수 없는 언어를 습득했습니다.

호크레이터 외메타 학습(즉, 학습 ^[51]알고리즘 학습)에 LSTM을 사용했다.

2004년: Schmidhuber의 제자인 Alex Graves ^[52]^[50]등의 연설에 LSTM을 최초로 적용.

2005년: LSTM의 최초 발행(Graves and Schmidhuber)과 완전한 역방향 LSTM의 ^[25]^[50]동시 전개.

2005년: Daan Wierstra, Faustino Gomez 및 Schmidhuber는 ^[24]교사 없이 신경 진화를 통해 LSTM을 훈련시켰습니다.

2006년: Graves, Fernandez, Gomez 및 Schmidhuber는 ^[23]시퀀스의 동시 정렬과 인식을 위한 LSTM: Connectist Temporal Classification(CTC)의 새로운 오류 함수를 도입했다.CTC에서 훈련받은 LSTM은 음성 ^[26]^[53]^[54]^[55]인식의 비약적인 발전을 이끌었다.

Mayer 등은 LSTM을 훈련시켜 ^[7]로봇을 제어했다.

2007년: Wierstra, Foerster, Peters 및 Schmidhuber는 교사 ^[56]없이 강화 학습을 위해 정책 구배를 통해 LSTM을 교육했습니다.

Hochreiter, Huesel, Obermayr는 단백질 호몰로지 검출에 LSTM을 ^[35]적용했다.

2009년: CTC에서 트레이닝한 LSTM이 ICDAR 연결 필기 인식 대회에서 우승했습니다.이 ^[2]세 가지 모델은 알렉스 그레이브스가 이끄는 팀에 의해 제출되었다.하나는 경쟁에서 가장 정확한 모델이었고 다른 하나는 가장 ^[57]빨랐다.RNN이 국제 ^[50]대회에서 우승한 것은 이번이 처음이었다.

2009년: 저스틴 바이엘 외 연구진은 LSTM에 ^[58]^[50]대한 신경 아키텍처 검색을 도입했다.

2013년: Alex Graves, Abdel-rahman Mohamed 및 Geoffrey Hinton은 네트워크의 주요 구성요소로 LSTM 네트워크를 사용하여 TIMIT 자연 음성 ^[27]데이터 세트에서 17.7%의 음소 오류율을 달성했습니다.

2014년 : 조경현 등에서는 Gated Recurrent Unit(GRU;^[59] 게이트 반복 단위)라고 불리는 Forget Gate^[49] LSTM의 단순화된 변형을 제시하였습니다.

2015년: 구글은 구글 ^[53]^[54]보이스에서 음성 인식을 위해 CTC에서 훈련한 LSTM을 사용하기 시작했습니다.공식 블로그 투고에 따르면 새 모델은 전사 오류를 49%^[60] 줄였다고 합니다.

2015년:Rupesh 쿠마르 Srivastava, 클라우스 Greff, Schmidhuber 고속 도로 네트워크, 계층들 수백명의, 훨씬 이전의 네트워크 보다 더 깊게와 피드 포워드 신경 네트워크를 만드는데 LSTM principles[49]을 사용했다.[61][62][12]7개월 후, Kaiming 배로 돌았습니다, Xiangyu 장춘챠오 Shaoqing 렌,과 지안 태양 또는 문이 없는 open-gated 도로 netw과 ImageNet 2015년 대회에서 우승했다.잔류 신경망이라고 불리는 오크 변종.^[63]이것은 21세기에 ^[12]가장 많이 인용되는 신경망이 되었다.

2016년 : 구글은 알로 대화 ^[64]앱에서 메시지를 제안하기 위해 LSTM을 사용하기 시작했습니다.같은 해 구글은 번역 오류를 ^[5]^[65]^[66]60% 줄이기 위해 LSTM을 사용한 구글용 신경기계 번역 시스템을 출시했다.

애플은 세계 개발자 회의에서 아이폰의 퀵^[67]^[68]^[69] 타이핑과 ^[70]^[71]시리용으로 LSTM을 사용하기 시작할 것이라고 발표했다.

아마존은 텍스트 투 스피치 ^[72]테크놀로지에 쌍방향 LSTM을 사용하여 Alexa의 목소리를 생성하는 Polly를 출시했다.

2017년: 페이스북은 매일 약 45억 개의 자동번역을 장기 단기 메모리 ^[6]네트워크를 사용하여 수행하였습니다.

미시간 주립 대학, IBM 연구 및 코넬 대학의 연구원들은 KDD(^[73]^[74]^[75]Knowledge Discovery and Data Mining) 컨퍼런스에서 연구를 발표했습니다.Time-Aware LSTM(T-LSTM; 시간 인식 LSTM)은 표준 LSTM보다 특정 데이터 세트에서 더 나은 성능을 발휘합니다.

마이크로소프트는 165,000단어의 어휘를 포함하여 교환기 말뭉치에 대한 인식 정확도가 94.9%에 달했다고 보고했습니다.이 방법에서는 "대화 세션 기반 장기 기억"^[55]을 사용했습니다.

2018년 : OpenAI는 정책 구배 훈련 LSTM을 사용하여 복잡한 ^[9]도타2의 비디오 게임에서 인간을 이기고 전례 없는 솜씨로 ^[8]^[50]물체를 조종하는 인간과 같은 로봇 손을 제어하였습니다.

2019년: DeepMind는 정책 구배에서 훈련받은 LSTM을 사용하여 스타크래프트 ^[10]^[50]II의 복잡한 비디오 게임에서 뛰어난 성능을 발휘했습니다.

2021년: Google Scholar에 따르면 2021년 LSTM은 1년 동안 16,000회 이상 인용되었습니다.이는 의료 ^[11]등 다양한 분야에서 LSTM의 적용을 반영하고 있습니다.

「」를 참조해 주세요.

레퍼런스

^ ^a ^b ^c ^d Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.
^ ^a ^b Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (May 2009). "A Novel Connectionist System for Unconstrained Handwriting Recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (5): 855–868. CiteSeerX 10.1.1.139.4502. doi:10.1109/tpami.2008.137. ISSN 0162-8828. PMID 19299860. S2CID 14635907.
^ Sak, Hasim; Senior, Andrew; Beaufays, Francoise (2014). "Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling" (PDF). Archived from the original (PDF) on 2018-04-24.
^ Li, Xiangang; Wu, Xihong (2014-10-15). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". arXiv:1410.4281 [cs.CL].
^ ^a ^b Wu, Yonghui; Schuster, Mike; Chen, Zhifeng; Le, Quoc V.; Norouzi, Mohammad; Macherey, Wolfgang; Krikun, Maxim; Cao, Yuan; Gao, Qin (2016-09-26). "Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation". arXiv:1609.08144 [cs.CL].
^ ^a ^b Ong, Thuy (4 August 2017). "Facebook's translations are now powered completely by AI". www.allthingsdistributed.com. Retrieved 2019-02-15.
^ ^a ^b ^c Mayer, H.; Gomez, F.; Wierstra, D.; Nagy, I.; Knoll, A.; Schmidhuber, J. (October 2006). A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks. 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 543–548. CiteSeerX 10.1.1.218.3399. doi:10.1109/IROS.2006.282190. ISBN 978-1-4244-0258-8. S2CID 12284900.
^ ^a ^b ^c "Learning Dexterity". OpenAI Blog. July 30, 2018. Retrieved 2019-01-15.
^ ^a ^b ^c ^d Rodriguez, Jesus (July 2, 2018). "The Science Behind OpenAI Five that just Produced One of the Greatest Breakthrough in the History of AI". Towards Data Science. Archived from the original on 2019-12-26. Retrieved 2019-01-15.
^ ^a ^b ^c ^d Stanford, Stacy (January 25, 2019). "DeepMind's AI, AlphaStar Showcases Significant Progress Towards AGI". Medium ML Memoirs. Retrieved 2019-01-15.
^ ^a ^b Schmidhuber, Jürgen (2021). "The 2010s: Our Decade of Deep Learning / Outlook on the 2020s". AI Blog. IDSIA, Switzerland. Retrieved 2022-04-30.
^ ^a ^b ^c Schmidhuber, Jürgen (2021). "The most cited neural networks all build on work done in my labs". AI Blog. IDSIA, Switzerland. Retrieved 2022-04-30.
^ Elman, Jeffrey L. (March 1990). "Finding Structure in Time". Cognitive Science. 14 (2): 179–211. doi:10.1207/s15516709cog1402_1.
^ ^a ^b Hochreiter, Sepp; Schmidhuber, Juergen (1996). LSTM can solve hard long time lag problems. Advances in Neural Information Processing Systems.
^ ^a ^b ^c Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). "Learning to Forget: Continual Prediction with LSTM". Neural Computation. 12 (10): 2451–2471. CiteSeerX 10.1.1.55.5709. doi:10.1162/089976600300015015. PMID 11032042. S2CID 11598600.
^ ^a ^b ^c ^d Hochreiter, Sepp (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (diploma thesis). Technical University Munich, Institute of Computer Science, advisor: J. Schmidhuber.
^ Calin, Ovidiu (14 February 2020). Deep Learning Architectures. Cham, Switzerland: Springer Nature. p. 555. ISBN 978-3-030-36720-6.
^ ^a ^b ^c ^d ^e Gers, F. A.; Schmidhuber, J. (2001). "LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages" (PDF). IEEE Transactions on Neural Networks. 12 (6): 1333–1340. doi:10.1109/72.963769. PMID 18249962.
^ ^a ^b ^c Gers, F.; Schraudolph, N.; Schmidhuber, J. (2002). "Learning precise timing with LSTM recurrent networks" (PDF). Journal of Machine Learning Research. 3: 115–143.
^ Xingjian Shi; Zhourong Chen; Hao Wang; Dit-Yan Yeung; Wai-kin Wong; Wang-chun Woo (2015). "Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting". Proceedings of the 28th International Conference on Neural Information Processing Systems: 802–810. arXiv:1506.04214. Bibcode:2015arXiv150604214S.
^ Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). "Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies (PDF Download Available)". In Kremer and, S. C.; Kolen, J. F. (eds.). A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press.
^ Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). "Sequence labelling in structured domains with hierarchical recurrent neural networks". Proc. 20th Int. Joint Conf. On Artificial Intelligence, Ijcai 2007: 774–779. CiteSeerX 10.1.1.79.1887.
^ ^a ^b Graves, Alex; Fernández, Santiago; Gomez, Faustino; Schmidhuber, Jürgen (2006). "Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks". In Proceedings of the International Conference on Machine Learning, ICML 2006: 369–376. CiteSeerX 10.1.1.75.6306.
^ ^a ^b ^c Wierstra, Daan; Schmidhuber, J.; Gomez, F. J. (2005). "Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning". Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh: 853–858.
^ ^a ^b Graves, A.; Schmidhuber, J. (2005). "Framewise phoneme classification with bidirectional LSTM and other neural network architectures". Neural Networks. 18 (5–6): 602–610. CiteSeerX 10.1.1.331.5800. doi:10.1016/j.neunet.2005.06.042. PMID 16112549.
^ ^a ^b Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). An Application of Recurrent Neural Networks to Discriminative Keyword Spotting. Proceedings of the 17th International Conference on Artificial Neural Networks. ICANN'07. Berlin, Heidelberg: Springer-Verlag. pp. 220–229. ISBN 978-3540746935.
^ ^a ^b Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey (2013). "Speech Recognition with Deep Recurrent Neural Networks". Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on: 6645–6649. arXiv:1303.5778. doi:10.1109/ICASSP.2013.6638947. ISBN 978-1-4799-0356-6. S2CID 206741496.
^ Eck, Douglas; Schmidhuber, Jürgen (2002-08-28). Learning the Long-Term Structure of the Blues. Artificial Neural Networks — ICANN 2002. Lecture Notes in Computer Science. Vol. 2415. Springer, Berlin, Heidelberg. pp. 284–289. CiteSeerX 10.1.1.116.3620. doi:10.1007/3-540-46084-5_47. ISBN 978-3540460848.
^ Schmidhuber, J.; Gers, F.; Eck, D.; Schmidhuber, J.; Gers, F. (2002). "Learning nonregular languages: A comparison of simple recurrent networks and LSTM". Neural Computation. 14 (9): 2039–2041. CiteSeerX 10.1.1.11.7369. doi:10.1162/089976602320263980. PMID 12184841. S2CID 30459046.
^ Perez-Ortiz, J. A.; Gers, F. A.; Eck, D.; Schmidhuber, J. (2003). "Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets". Neural Networks. 16 (2): 241–250. CiteSeerX 10.1.1.381.1992. doi:10.1016/s0893-6080(02)00219-8. PMID 12628609.
^ 그레이브스, J. 슈미드허버다차원 반복 신경망을 사용한 오프라인 필기 인식.신경 정보 처리 시스템 22, NIPS'22, 페이지 545–552, 밴쿠버, MIT Press, 2009.
^ Graves, Alex; Fernández, Santiago; Liwicki, Marcus; Bunke, Horst; Schmidhuber, Jürgen (2007). Unconstrained Online Handwriting Recognition with Recurrent Neural Networks. Proceedings of the 20th International Conference on Neural Information Processing Systems. NIPS'07. USA: Curran Associates Inc. pp. 577–584. ISBN 9781605603520.
^ Baccouche, M.; Mamalet, F.; Wolf, C.; Garcia, C.; Baskurt, A. (2011). "Sequential Deep Learning for Human Action Recognition". In Salah, A. A.; Lepri, B. (eds.). 2nd International Workshop on Human Behavior Understanding (HBU). Lecture Notes in Computer Science. Vol. 7065. Amsterdam, Netherlands: Springer. pp. 29–39. doi:10.1007/978-3-642-25446-8_4. ISBN 978-3-642-25445-1.
^ Huang, Jie; Zhou, Wengang; Zhang, Qilin; Li, Houqiang; Li, Weiping (2018-01-30). "Video-based Sign Language Recognition without Temporal Segmentation". arXiv:1801.10111 [cs.CV].
^ ^a ^b Hochreiter, S.; Heusel, M.; Obermayer, K. (2007). "Fast model-based protein homology detection without alignment". Bioinformatics. 23 (14): 1728–1736. doi:10.1093/bioinformatics/btm247. PMID 17488755.
^ Thireou, T.; Reczko, M. (2007). "Bidirectional Long Short-Term Memory Networks for predicting the subcellular localization of eukaryotic proteins". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 4 (3): 441–446. doi:10.1109/tcbb.2007.1015. PMID 17666763. S2CID 11787259.
^ Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautam; Agarwal, Puneet (April 2015). "Long Short Term Memory Networks for Anomaly Detection in Time Series" (PDF). European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning — ESANN 2015. Archived from the original (PDF) on 2020-10-30. Retrieved 2018-02-21.
^ Tax, N.; Verenich, I.; La Rosa, M.; Dumas, M. (2017). Predictive Business Process Monitoring with LSTM neural networks. Proceedings of the International Conference on Advanced Information Systems Engineering (CAiSE). Lecture Notes in Computer Science. Vol. 10253. pp. 477–492. arXiv:1612.02130. doi:10.1007/978-3-319-59536-8_30. ISBN 978-3-319-59535-1. S2CID 2192354.
^ Choi, E.; Bahadori, M.T.; Schuetz, E.; Stewart, W.; Sun, J. (2016). "Doctor AI: Predicting Clinical Events via Recurrent Neural Networks". Proceedings of the 1st Machine Learning for Healthcare Conference. 56: 301–318. arXiv:1511.05942. Bibcode:2015arXiv151105942C. PMC 5341604. PMID 28286600.
^ Jia, Robin; Liang, Percy (2016). "Data Recombination for Neural Semantic Parsing". arXiv:1606.03622 [cs.CL].
^ Wang, Le; Duan, Xuhuan; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-05-22). "Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation" (PDF). Sensors. 18 (5): 1657. Bibcode:2018Senso..18.1657W. doi:10.3390/s18051657. ISSN 1424-8220. PMC 5982167. PMID 29789447.
^ Duan, Xuhuan; Wang, Le; Zhai, Changbo; Zheng, Nanning; Zhang, Qilin; Niu, Zhenxing; Hua, Gang (2018). Joint Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation. 25th IEEE International Conference on Image Processing (ICIP). doi:10.1109/icip.2018.8451692. ISBN 978-1-4799-7061-2.
^ Orsini, F.; Gastaldi, M.; Mantecchini, L.; Rossi, R. (2019). Neural networks trained with WiFi traces to predict airport passenger behavior. 6th International Conference on Models and Technologies for Intelligent Transportation Systems. Krakow: IEEE. arXiv:1910.14026. doi:10.1109/MTITS.2019.8883365. 8883365.
^ Zhao, Z.; Chen, W.; Wu, X.; Chen, P.C.Y.; Liu, J. (2017). "LSTM network: A deep learning approach for Short-term traffic forecast". IET Intelligent Transport Systems. 11 (2): 68–75. doi:10.1049/iet-its.2016.0208.
^ Gupta A, Müller AT, Huisman BJH, Fuchs JA, Schneider P, Schneider G (2018). "Generative Recurrent Networks for De Novo Drug Design". Mol Inform. 37 (1–2). doi:10.1002/minf.201700111. PMC 5836943. PMID 29095571.{{cite journal}}: CS1 maint: 여러 이름: 작성자 목록(링크)
^ Saiful Islam, Md.; Hossain, Emam (2020-10-26). "Foreign Exchange Currency Rate Prediction using a GRU-LSTM Hybrid Network". Soft Computing Letters. 3: 100009. doi:10.1016/j.socl.2020.100009. ISSN 2666-2221.
^ Sepp Hochreiter; Jürgen Schmidhuber (21 August 1995), Long Short Term Memory, Wikidata Q98967430
^ ^a ^b ^c Klaus Greff; Rupesh Kumar Srivastava; Jan Koutník; Bas R. Steunebrink; Jürgen Schmidhuber (2015). "LSTM: A Search Space Odyssey". IEEE Transactions on Neural Networks and Learning Systems. 28 (10): 2222–2232. arXiv:1503.04069. Bibcode:2015arXiv150304069G. doi:10.1109/TNNLS.2016.2582924. PMID 27411231. S2CID 3356463.
^ ^a ^b ^c Gers, Felix; Schmidhuber, Jürgen; Cummins, Fred (1999). "Learning to forget: Continual prediction with LSTM". 9th International Conference on Artificial Neural Networks: ICANN '99. Vol. 1999. pp. 850–855. doi:10.1049/cp:19991218. ISBN 0-85296-721-7.
^ ^a ^b ^c ^d ^e ^f ^g Schmidhuber, Juergen (10 May 2021). "Deep Learning: Our Miraculous Year 1990-1991". arXiv:2005.05744 [cs.NE].
^ Hochreiter, S.; Younger, A. S.; Conwell, P. R. (2001). Learning to Learn Using Gradient Descent (PDF). Lecture Notes in Computer Science - ICANN 2001. Lecture Notes in Computer Science. Vol. 2130. pp. 87–94. CiteSeerX 10.1.1.5.323. doi:10.1007/3-540-44668-0_13. ISBN 978-3-540-42486-4. ISSN 0302-9743.
^ Graves, Alex; Beringer, Nicole; Eck, Douglas; Schmidhuber, Juergen (2004). Biologically Plausible Speech Recognition with LSTM Neural Nets. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland. pp. 175–184.
^ ^a ^b Beaufays, Françoise (August 11, 2015). "The neural networks behind Google Voice transcription". Research Blog. Retrieved 2017-06-27.
^ ^a ^b Sak, Haşim; Senior, Andrew; Rao, Kanishka; Beaufays, Françoise; Schalkwyk, Johan (September 24, 2015). "Google voice search: faster and more accurate". Research Blog. Retrieved 2017-06-27.
^ ^a ^b Haridy, Rich (August 21, 2017). "Microsoft's speech recognition system is now as good as a human". newatlas.com. Retrieved 2017-08-27.
^ Wierstra, Daan; Foerster, Alexander; Peters, Jan; Schmidhuber, Juergen (2005). "Solving Deep Memory POMDPs with Recurrent Policy Gradients". International Conference on Artificial Neural Networks ICANN'07.
^ Märgner, Volker; Abed, Haikal El (July 2009). "ICDAR 2009 Arabic Handwriting Recognition Competition". 2009 10th International Conference on Document Analysis and Recognition: 1383–1387. doi:10.1109/ICDAR.2009.256. ISBN 978-1-4244-4500-4. S2CID 52851337.
^ Bayer, Justin; Wierstra, Daan; Togelius, Julian; Schmidhuber, Juergen (2009). "Evolving memory cell structures for sequence learning". International Conference on Artificial Neural Networks ICANN'09, Cyprus.
^ Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". arXiv:1406.1078 [cs.CL].
^ "Neon prescription... or rather, New transcription for Google Voice". Official Google Blog. 23 July 2015. Retrieved 2020-04-25.
^ Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (2 May 2015). "Highway Networks". arXiv:1505.00387 [cs.LG].
^ Srivastava, Rupesh K; Greff, Klaus; Schmidhuber, Juergen (2015). "Training Very Deep Networks". Advances in Neural Information Processing Systems 28. Curran Associates, Inc. 28: 2377–2385.
^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1.
^ Khaitan, Pranav (May 18, 2016). "Chat Smarter with Allo". Research Blog. Retrieved 2017-06-27.
^ Metz, Cade (September 27, 2016). "An Infusion of AI Makes Google Translate More Powerful Than Ever WIRED". Wired. Retrieved 2017-06-27.
^ "A Neural Network for Machine Translation, at Production Scale". Google AI Blog. Retrieved 2020-04-25.
^ Efrati, Amir (June 13, 2016). "Apple's Machines Can Learn Too". The Information. Retrieved 2017-06-27.
^ Ranger, Steve (June 14, 2016). "iPhone, AI and big data: Here's how Apple plans to protect your privacy ZDNet". ZDNet. Retrieved 2017-06-27.
^ "Can Global Semantic Context Improve Neural Language Models? – Apple". Apple Machine Learning Journal. Retrieved 2020-04-30.
^ Smith, Chris (2016-06-13). "iOS 10: Siri now works in third-party apps, comes with extra AI features". BGR. Retrieved 2017-06-27.
^ Capes, Tim; Coles, Paul; Conkie, Alistair; Golipour, Ladan; Hadjitarkhani, Abie; Hu, Qiong; Huddleston, Nancy; Hunt, Melvyn; Li, Jiangchuan; Neeracher, Matthias; Prahallad, Kishore (2017-08-20). "Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System". Interspeech 2017. ISCA: 4011–4015. doi:10.21437/Interspeech.2017-1798.
^ Vogels, Werner (30 November 2016). "Bringing the Magic of Amazon AI and Alexa to Apps on AWS. – All Things Distributed". www.allthingsdistributed.com. Retrieved 2017-06-27.
^ "Patient Subtyping via Time-Aware LSTM Networks" (PDF). msu.edu. Retrieved 21 Nov 2018.
^ "Patient Subtyping via Time-Aware LSTM Networks". Kdd.org. Retrieved 24 May 2018.
^ "SIGKDD". Kdd.org. Retrieved 24 May 2018.

외부 링크

IDS의 위르겐 슈미드허버 그룹에 의한 30개가 넘는 LSTM 논문의 반복 신경 네트워크IA
Gers, Felix (2001). "Long Short-Term Memory in Recurrent Neural Networks" (PDF). PhD thesis.
Gers, Felix A.; Schraudolph, Nicol N.; Schmidhuber, Jürgen (Aug 2002). "Learning precise timing with LSTM recurrent networks" (PDF). Journal of Machine Learning Research. 3: 115–143.
Abidogun, Olusola Adeniyi (2005). Data Mining, Fraud Detection and Mobile Telecommunications: Call Pattern Analysis with Unsupervised Neural Networks. Master's Thesis (Thesis). University of the Western Cape. hdl:11394/249. Archived (PDF) from the original on May 22, 2012.
- 특히 LSTM과 같은 반복적인 신경망을 설명하는 데 전념하는 두 장의 장으로 구성된 원본.
Monner, Derek D.; Reggia, James A. (2010). "A generalized LSTM-like training algorithm for second-order recurrent neural networks" (PDF). Neural Networks. 25 (1): 70–83. doi:10.1016/j.neunet.2011.07.003. PMC 3217173. PMID 21803542. High-performing extension of LSTM that has been simplified to a single node type and can train arbitrary architectures
Dolphin, R (12 November 2021). "LSTM Networks – A Detailed Explanation". Article.
Herta, Christian. "How to implement LSTM in Python with Theano". Tutorial.

[lstm1997-1] Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.

[graves2009-2] Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (May 2009). "A Novel Connectionist System for Unconstrained Handwriting Recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (5): 855–868. CiteSeerX 10.1.1.139.4502. doi:10.1109/tpami.2008.137. ISSN 0162-8828. PMID 19299860. S2CID 14635907.

[sak2014-3] Sak, Hasim; Senior, Andrew; Beaufays, Francoise (2014). "Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling" (PDF). Archived from the original (PDF) on 2018-04-24.

[liwu2015-4] Li, Xiangang; Wu, Xihong (2014-10-15). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". arXiv:1410.4281 [cs.CL].

[GoogleTranslate-5] Wu, Yonghui; Schuster, Mike; Chen, Zhifeng; Le, Quoc V.; Norouzi, Mohammad; Macherey, Wolfgang; Krikun, Maxim; Cao, Yuan; Gao, Qin (2016-09-26). "Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation". arXiv:1609.08144 [cs.CL].

[FacebookTranslate-6] Ong, Thuy (4 August 2017). "Facebook's translations are now powered completely by AI". www.allthingsdistributed.com. Retrieved 2019-02-15.

[mayer2006-7] Mayer, H.; Gomez, F.; Wierstra, D.; Nagy, I.; Knoll, A.; Schmidhuber, J. (October 2006). A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks. 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 543–548. CiteSeerX 10.1.1.218.3399. doi:10.1109/IROS.2006.282190. ISBN 978-1-4244-0258-8. S2CID 12284900.

[OpenAIhand-8] "Learning Dexterity". OpenAI Blog. July 30, 2018. Retrieved 2019-01-15.

[OpenAIfive-9] Rodriguez, Jesus (July 2, 2018). "The Science Behind OpenAI Five that just Produced One of the Greatest Breakthrough in the History of AI". Towards Data Science. Archived from the original on 2019-12-26. Retrieved 2019-01-15.

[alphastar-10] Stanford, Stacy (January 25, 2019). "DeepMind's AI, AlphaStar Showcases Significant Progress Towards AGI". Medium ML Memoirs. Retrieved 2019-01-15.

[decade2022-11] Schmidhuber, Jürgen (2021). "The 2010s: Our Decade of Deep Learning / Outlook on the 2020s". AI Blog. IDSIA, Switzerland. Retrieved 2022-04-30.

[mostcited2021-12] Schmidhuber, Jürgen (2021). "The most cited neural networks all build on work done in my labs". AI Blog. IDSIA, Switzerland. Retrieved 2022-04-30.

[13] Elman, Jeffrey L. (March 1990). "Finding Structure in Time". Cognitive Science. 14 (2): 179–211. doi:10.1207/s15516709cog1402_1.

[hochreiter1996-14] Hochreiter, Sepp; Schmidhuber, Juergen (1996). LSTM can solve hard long time lag problems. Advances in Neural Information Processing Systems.

[lstm2000-15] Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). "Learning to Forget: Continual Prediction with LSTM". Neural Computation. 12 (10): 2451–2471. CiteSeerX 10.1.1.55.5709. doi:10.1162/089976600300015015. PMID 11032042. S2CID 11598600.

[hochreiter1991-16] Hochreiter, Sepp (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (diploma thesis). Technical University Munich, Institute of Computer Science, advisor: J. Schmidhuber.

[calin2020-17] Calin, Ovidiu (14 February 2020). Deep Learning Architectures. Cham, Switzerland: Springer Nature. p. 555. ISBN 978-3-030-36720-6.

[peepholeLSTM-18] Gers, F. A.; Schmidhuber, J. (2001). "LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages" (PDF). IEEE Transactions on Neural Networks. 12 (6): 1333–1340. doi:10.1109/72.963769. PMID 18249962.

[peephole2002-19] Gers, F.; Schraudolph, N.; Schmidhuber, J. (2002). "Learning precise timing with LSTM recurrent networks" (PDF). Journal of Machine Learning Research. 3: 115–143.

[shi2015-20] Xingjian Shi; Zhourong Chen; Hao Wang; Dit-Yan Yeung; Wai-kin Wong; Wang-chun Woo (2015). "Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting". Proceedings of the 28th International Conference on Neural Information Processing Systems: 802–810. arXiv:1506.04214. Bibcode:2015arXiv150604214S.

[gradf-21] Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). "Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies (PDF Download Available)". In Kremer and, S. C.; Kolen, J. F. (eds.). A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press.

[fernandez2007ijcai-22] Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). "Sequence labelling in structured domains with hierarchical recurrent neural networks". Proc. 20th Int. Joint Conf. On Artificial Intelligence, Ijcai 2007: 774–779. CiteSeerX 10.1.1.79.1887.

[graves2006-23] Graves, Alex; Fernández, Santiago; Gomez, Faustino; Schmidhuber, Jürgen (2006). "Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks". In Proceedings of the International Conference on Machine Learning, ICML 2006: 369–376. CiteSeerX 10.1.1.75.6306.

[wierstra2005-24] Wierstra, Daan; Schmidhuber, J.; Gomez, F. J. (2005). "Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning". Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh: 853–858.

[graves2005-25] Graves, A.; Schmidhuber, J. (2005). "Framewise phoneme classification with bidirectional LSTM and other neural network architectures". Neural Networks. 18 (5–6): 602–610. CiteSeerX 10.1.1.331.5800. doi:10.1016/j.neunet.2005.06.042. PMID 16112549.

[fernandez2007icann-26] Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). An Application of Recurrent Neural Networks to Discriminative Keyword Spotting. Proceedings of the 17th International Conference on Artificial Neural Networks. ICANN'07. Berlin, Heidelberg: Springer-Verlag. pp. 220–229. ISBN 978-3540746935.

[graves2013-27] Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey (2013). "Speech Recognition with Deep Recurrent Neural Networks". Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on: 6645–6649. arXiv:1303.5778. doi:10.1109/ICASSP.2013.6638947. ISBN 978-1-4799-0356-6. S2CID 206741496.

[eck2002-28] Eck, Douglas; Schmidhuber, Jürgen (2002-08-28). Learning the Long-Term Structure of the Blues. Artificial Neural Networks — ICANN 2002. Lecture Notes in Computer Science. Vol. 2415. Springer, Berlin, Heidelberg. pp. 284–289. CiteSeerX 10.1.1.116.3620. doi:10.1007/3-540-46084-5_47. ISBN 978-3540460848.

[gers2002-29] Schmidhuber, J.; Gers, F.; Eck, D.; Schmidhuber, J.; Gers, F. (2002). "Learning nonregular languages: A comparison of simple recurrent networks and LSTM". Neural Computation. 14 (9): 2039–2041. CiteSeerX 10.1.1.11.7369. doi:10.1162/089976602320263980. PMID 12184841. S2CID 30459046.

[perez2003-30] Perez-Ortiz, J. A.; Gers, F. A.; Eck, D.; Schmidhuber, J. (2003). "Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets". Neural Networks. 16 (2): 241–250. CiteSeerX 10.1.1.381.1992. doi:10.1016/s0893-6080(02)00219-8. PMID 12628609.

[graves2009nips-31] 그레이브스, J. 슈미드허버다차원 반복 신경망을 사용한 오프라인 필기 인식.신경 정보 처리 시스템 22, NIPS'22, 페이지 545–552, 밴쿠버, MIT Press, 2009.

[32] Graves, Alex; Fernández, Santiago; Liwicki, Marcus; Bunke, Horst; Schmidhuber, Jürgen (2007). Unconstrained Online Handwriting Recognition with Recurrent Neural Networks. Proceedings of the 20th International Conference on Neural Information Processing Systems. NIPS'07. USA: Curran Associates Inc. pp. 577–584. ISBN 9781605603520.

[baccouche2011-33] Baccouche, M.; Mamalet, F.; Wolf, C.; Garcia, C.; Baskurt, A. (2011). "Sequential Deep Learning for Human Action Recognition". In Salah, A. A.; Lepri, B. (eds.). 2nd International Workshop on Human Behavior Understanding (HBU). Lecture Notes in Computer Science. Vol. 7065. Amsterdam, Netherlands: Springer. pp. 29–39. doi:10.1007/978-3-642-25446-8_4. ISBN 978-3-642-25445-1.

[huang2018-34] Huang, Jie; Zhou, Wengang; Zhang, Qilin; Li, Houqiang; Li, Weiping (2018-01-30). "Video-based Sign Language Recognition without Temporal Segmentation". arXiv:1801.10111 [cs.CV].

[hochreiter2007-35] Hochreiter, S.; Heusel, M.; Obermayer, K. (2007). "Fast model-based protein homology detection without alignment". Bioinformatics. 23 (14): 1728–1736. doi:10.1093/bioinformatics/btm247. PMID 17488755.

[thireou2007-36] Thireou, T.; Reczko, M. (2007). "Bidirectional Long Short-Term Memory Networks for predicting the subcellular localization of eukaryotic proteins". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 4 (3): 441–446. doi:10.1109/tcbb.2007.1015. PMID 17666763. S2CID 11787259.

[malhotra2015-37] Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautam; Agarwal, Puneet (April 2015). "Long Short Term Memory Networks for Anomaly Detection in Time Series" (PDF). European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning — ESANN 2015. Archived from the original (PDF) on 2020-10-30. Retrieved 2018-02-21.

[tax2017-38] Tax, N.; Verenich, I.; La Rosa, M.; Dumas, M. (2017). Predictive Business Process Monitoring with LSTM neural networks. Proceedings of the International Conference on Advanced Information Systems Engineering (CAiSE). Lecture Notes in Computer Science. Vol. 10253. pp. 477–492. arXiv:1612.02130. doi:10.1007/978-3-319-59536-8_30. ISBN 978-3-319-59535-1. S2CID 2192354.

[choi2016-39] Choi, E.; Bahadori, M.T.; Schuetz, E.; Stewart, W.; Sun, J. (2016). "Doctor AI: Predicting Clinical Events via Recurrent Neural Networks". Proceedings of the 1st Machine Learning for Healthcare Conference. 56: 301–318. arXiv:1511.05942. Bibcode:2015arXiv151105942C. PMC 5341604. PMID 28286600.

[jia2016-40] Jia, Robin; Liang, Percy (2016). "Data Recombination for Neural Semantic Parsing". arXiv:1606.03622 [cs.CL].

[Wang_Duan_Zhang_Niu_p=1657-41] Wang, Le; Duan, Xuhuan; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-05-22). "Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation" (PDF). Sensors. 18 (5): 1657. Bibcode:2018Senso..18.1657W. doi:10.3390/s18051657. ISSN 1424-8220. PMC 5982167. PMID 29789447.

[Duan_Wang_Zhai_Zheng_2018_p.-42] Duan, Xuhuan; Wang, Le; Zhai, Changbo; Zheng, Nanning; Zhang, Qilin; Niu, Zhenxing; Hua, Gang (2018). Joint Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation. 25th IEEE International Conference on Image Processing (ICIP). doi:10.1109/icip.2018.8451692. ISBN 978-1-4799-7061-2.

[orsini2019-43] Orsini, F.; Gastaldi, M.; Mantecchini, L.; Rossi, R. (2019). Neural networks trained with WiFi traces to predict airport passenger behavior. 6th International Conference on Models and Technologies for Intelligent Transportation Systems. Krakow: IEEE. arXiv:1910.14026. doi:10.1109/MTITS.2019.8883365. 8883365.

[liu2017-44] Zhao, Z.; Chen, W.; Wu, X.; Chen, P.C.Y.; Liu, J. (2017). "LSTM network: A deep learning approach for Short-term traffic forecast". IET Intelligent Transport Systems. 11 (2): 68–75. doi:10.1049/iet-its.2016.0208.

[pmid29095571-45] Gupta A, Müller AT, Huisman BJH, Fuchs JA, Schneider P, Schneider G (2018). "Generative Recurrent Networks for De Novo Drug Design". Mol Inform. 37 (1–2). doi:10.1002/minf.201700111. PMC 5836943. PMID 29095571.{{cite journal}}: CS1 maint: 여러 이름: 작성자 목록(링크)

[saiful2020-46] Saiful Islam, Md.; Hossain, Emam (2020-10-26). "Foreign Exchange Currency Rate Prediction using a GRU-LSTM Hybrid Network". Soft Computing Letters. 3: 100009. doi:10.1016/j.socl.2020.100009. ISSN 2666-2221.

[47] Sepp Hochreiter; Jürgen Schmidhuber (21 August 1995), Long Short Term Memory, Wikidata Q98967430

[ASearchSpaceOdyssey-48] Klaus Greff; Rupesh Kumar Srivastava; Jan Koutník; Bas R. Steunebrink; Jürgen Schmidhuber (2015). "LSTM: A Search Space Odyssey". IEEE Transactions on Neural Networks and Learning Systems. 28 (10): 2222–2232. arXiv:1503.04069. Bibcode:2015arXiv150304069G. doi:10.1109/TNNLS.2016.2582924. PMID 27411231. S2CID 3356463.

[lstm1999-49] Gers, Felix; Schmidhuber, Jürgen; Cummins, Fred (1999). "Learning to forget: Continual prediction with LSTM". 9th International Conference on Artificial Neural Networks: ICANN '99. Vol. 1999. pp. 850–855. doi:10.1049/cp:19991218. ISBN 0-85296-721-7.

[miraculous2021-50] ^ ^a ^b ^c ^d ^e ^f ^g Schmidhuber, Juergen (10 May 2021). "Deep Learning: Our Miraculous Year 1990-1991". arXiv:2005.05744 [cs.NE].

[51] Hochreiter, S.; Younger, A. S.; Conwell, P. R. (2001). Learning to Learn Using Gradient Descent (PDF). Lecture Notes in Computer Science - ICANN 2001. Lecture Notes in Computer Science. Vol. 2130. pp. 87–94. CiteSeerX 10.1.1.5.323. doi:10.1007/3-540-44668-0_13. ISBN 978-3-540-42486-4. ISSN 0302-9743.

[graves2004-52] Graves, Alex; Beringer, Nicole; Eck, Douglas; Schmidhuber, Juergen (2004). Biologically Plausible Speech Recognition with LSTM Neural Nets. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland. pp. 175–184.

[Beau15-53] Beaufays, Françoise (August 11, 2015). "The neural networks behind Google Voice transcription". Research Blog. Retrieved 2017-06-27.

[GoogleVoiceSearch-54] Sak, Haşim; Senior, Andrew; Rao, Kanishka; Beaufays, Françoise; Schalkwyk, Johan (September 24, 2015). "Google voice search: faster and more accurate". Research Blog. Retrieved 2017-06-27.

[microsoft2017-55] Haridy, Rich (August 21, 2017). "Microsoft's speech recognition system is now as good as a human". newatlas.com. Retrieved 2017-08-27.

[wierstra2007-56] Wierstra, Daan; Foerster, Alexander; Peters, Jan; Schmidhuber, Juergen (2005). "Solving Deep Memory POMDPs with Recurrent Policy Gradients". International Conference on Artificial Neural Networks ICANN'07.

[maergner2009-57] Märgner, Volker; Abed, Haikal El (July 2009). "ICDAR 2009 Arabic Handwriting Recognition Competition". 2009 10th International Conference on Document Analysis and Recognition: 1383–1387. doi:10.1109/ICDAR.2009.256. ISBN 978-1-4244-4500-4. S2CID 52851337.

[bayer2009-58] Bayer, Justin; Wierstra, Daan; Togelius, Julian; Schmidhuber, Juergen (2009). "Evolving memory cell structures for sequence learning". International Conference on Artificial Neural Networks ICANN'09, Cyprus.

[cho2014-59] Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". arXiv:1406.1078 [cs.CL].

[googleblog2015-60] "Neon prescription... or rather, New transcription for Google Voice". Official Google Blog. 23 July 2015. Retrieved 2020-04-25.

[highway2015-61] Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (2 May 2015). "Highway Networks". arXiv:1505.00387 [cs.LG].

[highway2015neurips-62] Srivastava, Rupesh K; Greff, Klaus; Schmidhuber, Juergen (2015). "Training Very Deep Networks". Advances in Neural Information Processing Systems 28. Curran Associates, Inc. 28: 2377–2385.

[resnet2015-63] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1.

[GoogleAllo-64] Khaitan, Pranav (May 18, 2016). "Chat Smarter with Allo". Research Blog. Retrieved 2017-06-27.

[WiredGoogleTranslate-65] Metz, Cade (September 27, 2016). "An Infusion of AI Makes Google Translate More Powerful Than Ever WIRED". Wired. Retrieved 2017-06-27.

[googleblog2016-66] "A Neural Network for Machine Translation, at Production Scale". Google AI Blog. Retrieved 2020-04-25.

[AppleQuicktype-67] Efrati, Amir (June 13, 2016). "Apple's Machines Can Learn Too". The Information. Retrieved 2017-06-27.

[AppleQuicktype2-68] Ranger, Steve (June 14, 2016). "iPhone, AI and big data: Here's how Apple plans to protect your privacy ZDNet". ZDNet. Retrieved 2017-06-27.

[69] "Can Global Semantic Context Improve Neural Language Models? – Apple". Apple Machine Learning Journal. Retrieved 2020-04-30.

[AppleSiri-70] Smith, Chris (2016-06-13). "iOS 10: Siri now works in third-party apps, comes with extra AI features". BGR. Retrieved 2017-06-27.

[capes2017-71] Capes, Tim; Coles, Paul; Conkie, Alistair; Golipour, Ladan; Hadjitarkhani, Abie; Hu, Qiong; Huddleston, Nancy; Hunt, Melvyn; Li, Jiangchuan; Neeracher, Matthias; Prahallad, Kishore (2017-08-20). "Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System". Interspeech 2017. ISCA: 4011–4015. doi:10.21437/Interspeech.2017-1798.

[AmazonAlexa-72] Vogels, Werner (30 November 2016). "Bringing the Magic of Amazon AI and Alexa to Apps on AWS. – All Things Distributed". www.allthingsdistributed.com. Retrieved 2017-06-27.

[73] "Patient Subtyping via Time-Aware LSTM Networks" (PDF). msu.edu. Retrieved 21 Nov 2018.

[74] "Patient Subtyping via Time-Aware LSTM Networks". Kdd.org. Retrieved 24 May 2018.

[75] "SIGKDD". Kdd.org. Retrieved 24 May 2018.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[23]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

Search

장기 단기 기억력

네임스페이스

더

목차

아이디어

변종

포겟 게이트가 있는 LSTM

변수

활성화 기능

피프홀 LSTM

피프홀 컨볼루션 LSTM

트레이닝

CTC 점수 함수

대체 수단

성공.

적용들

개발 일정

「」를 참조해 주세요.

레퍼런스

외부 링크

Search

장기 단기 기억력

아이디어

변종

포겟 게이트가 있는 LSTM

변수

활성화 기능

피프홀 LSTM

피프홀 컨볼루션 LSTM

트레이닝

CTC 점수 함수

대체 수단

성공.

적용들

개발 일정

「 」를 참조해 주세요.

레퍼런스

외부 링크

「」를 참조해 주세요.