게이트 반복 단위

게이트 반복 단위(GRU)는 경현 ^[1]조 등이 2014년에 도입한 순환 신경망의 게이트 메커니즘이다.GRU는 포겟게이트가 ^[2]있는 롱단기메모리(LSTM)와 비슷하지만 출력게이트가 ^[3]없기 때문에 LSTM보다 파라미터가 적습니다.폴리폰 음악 모델링, 음성 신호 모델링 및 자연 언어 처리와 같은 특정 작업에 대한 GRU의 성능은 ^[4]^[5]LSTM의 것과 유사한 것으로 밝혀졌다. GRU는 더 작고 덜 빈번한 특정 ^[6]^[7]데이터 세트에서 더 나은 성능을 보이는 것으로 나타났다.

아키텍처

완전 게이트 유닛에는 몇 가지 변형이 있으며, 게이트는 이전의 숨겨진 상태와 다양한 조합의 바이어스를 사용하여 수행되며, 최소 게이트 ^[8]유닛이라고 불리는 단순화된 형식을 사용합니다.

연산자 $⊙$ {\ $displaystyle \odot}$ 은 $\odot$ 다음에서 Hadamard 제품을 나타냅니다.

완전 게이트 장치

게이트 순환 장치, 완전 게이트 버전

처음에 t $t=0$ { $displaystyle$ t $=0$ 의 경우 출력 벡터는 $h_{0}=0$ 0 $=$ { $displaystyle h_{0$ }= $0$ 입니다.

{t}z_{t}&=\display_{g}(W_{z}x_{t}+U_{z}h_{t-1}+b_{z}\r_{t}&=\sigma_{g}(W_{r}x_{t}+U_{r}h_{t-1}+b_{r}\{\hat {h}_{t}&=\phi _{h}(W_{h}x_{t}+U_{h}(r_{t}\odot h_{t-1}+b_{h})\\h_{t}&=z_{t}\odot {h}_{t}+(1-z_{t})\odot h_{t-1}\end {aligned}

변수

$x_{t}$ t {\ $displaystyle x_{t$ : 입력 벡터
$h_{t}$ t $h_{t}$ \ $displaystyle h_{t$ : 출력 벡터
${\hat {h}}_{t}$ ^ ${\hat {h}}_{t}$ \ $displaystyle$ \ $hat$ { $h} _$ { $t$ } : 후보 액티베이션벡터
$z_{t}$ t \ $displaystyle z_{t$ : 게이트 벡터 업데이트
$r_{t}$ t $r_{t}$ \ $displaystyle r$ _ { $t$ } :게이트 벡터 리셋
$W$ {\ $displaystyle$ W $W$ , $U$ {\ $displaystyle$ U $}$ 및 $U$ $b$ {\ $displaystyle$ b $b$ : 파라미터 매트릭스 및 벡터

활성화 기능

$§$ $g$ \ $displaystyle$ \ $sigma _$ { g $\sigma _{g}$ } :원래는 Sigmoid $\sigma _{g}$ 입니다.
$§$ $h$ \ $displaystyle$ \ $phi _$ {h $\phi _{h}$ : 원문은 쌍곡선 접선입니다.

$\sigma _{g}(x)\in [0,1]$ ( $\sigma _{g}(x)\in [0,1]$ ) $\sigma _{g}(x)\in [0,1]$ [ [ $\sigma _{g}(x)\in [0,1]$ , $]{$ $displaystyle \sigma$ _ ${g}(x)\in [ 0$ , $1$ 인 경우 대체 활성화 기능을 사용할 수 있습니다.

유형 1

유형 2

형식 3.

$z_{t}$ t { $displaystyle z_{t}$ $r_{t}$ $r_{t}$ t $r_{t}$ { $displaystyle r_{t}$ 를 $z_{t}$ 하여 대체 폼을 생성할 수 있습니다.

유형 1. 각 게이트는 이전의 숨겨진 상태와 치우침에만 의존합니다.
${t}z_{t}&={g}(U_{z}h_{t-1}+b_{z})\\r_{t}&=sigma_{g}(U_{r}h_{t-1}+b_{r}\end {aligned}}}$
유형 2. 각 게이트는 이전 숨겨진 상태에만 의존합니다.
$({displaystyle {displaystyle {arg}z_{t}&= {g}(U_{z}h_{t-1})\\sigma _{g}(U_{r}h_{t-1})\\end{aligned}})$
유형 3, 각 게이트는 바이어스만 사용하여 계산됩니다.
${displaystyle {aligned}z_{t}&=\r_{t}&=\displaystyle {g}(b_{r})\\end {aligned}$

최소 게이트 유닛

최소 게이트 장치는 업데이트 및 재설정 게이트 벡터가 포겟 게이트에 병합된다는 점을 제외하고 완전 게이트 장치와 유사합니다.이는 출력 벡터의 방정식을 변경해야 ^[10]함을 의미합니다.

{t}f_{t}&=\display_{g}(W_{f}x_{t}+U_{f}h_{t-1}+b_{f}\{\hat {h}_{t}&=\phi _{h}(W_{h}x_{t}+U_{h}(f_{t}\odot h_{t-1}+b_{h})\\h_{t}&=(1-f_{t})\odot h_{t-1}+f_{t}\odot {hat}_{t}\end {aligned}

변수

$x_{t}$ t {\ $displaystyle x_{t$ : 입력 벡터
$h_{t}$ t $h_{t}$ \ $displaystyle h_{t$ : 출력 벡터
${\hat {h}}_{t}$ ^ ${\hat {h}}_{t}$ \ $displaystyle$ \ $hat$ { $h} _$ { $t$ } : 후보 액티베이션벡터
$f_{t}$ t $f_{t}$ { $display$ style $f$ _ { $t$ :포기 벡터
$W$ {\ $displaystyle$ W $W$ , $U$ {\ $displaystyle$ U $}$ 및 $U$ $b$ {\ $displaystyle$ b $b$ : 파라미터 매트릭스 및 벡터

콘텐츠 적응형 반복 단위

완전한 CARU 아키텍처데이터 흐름의 방향은 화살표, 관련된 기능은 노란색 직사각형, 다양한 게이트(작업)는 파란색 원으로 표시됩니다.

CARU(Content Adaptive Recurrent Unit)는 2020년 Ka-Hou Chan 등에 의해 도입된 GRU의 변형입니다.^[11]CARU에는 GRU와 같이 업데이트 게이트가 포함되지만 리셋 게이트 대신 콘텐츠 적응형 게이트가 도입됩니다.CARU는 RNN 모델의 장기적인 의존성 문제를 완화하기 위해 설계되었습니다.이는 NLP 작업에서 약간의 성능 향상을 보였으며 GRU보다 ^[12]매개변수가 적은 것으로 밝혀졌다.

다음 방정식에서 소문자 변수는 벡터를 나타내고 $\left[W;B\right]$ [ $](\displaystyle \left[W;B\$ right])는 $\left[W;B\right]$ 가중치와 바이어스로 구성된 선형 레이어인 트레이닝 파라미터를 나타냅니다. $t=0$ 에 t $t=0$ { $displaystyle$ t $h^{(1)}\gets W_{vn}v^{(0)}+B_{vn}$ $h^{(1)}\gets W_{vn}v^{(0)}+B_{vn}$ $t=0$ 、 CARU $h^{(1)}\gets W_{vn}v^{(0)}+B_{vn}$ $h^{(1)}\gets W_{vn}v^{(0)}+B_{vn}$ $h^{(1)}\gets W_{vn}v^{(0)}+B_{vn}$ v ( $h^{(1)}\gets W_{vn}v^{(0)}+B_{vn}$ ) + $h^{(1)}\gets W_{vn}v^{(0)}+B_{vn}$ $h^{(1)}\gets W_{vn}v^{(0)}+B_{vn}$ n $h^{(1)}\gets W_{vn}v^{(0)}+B_{vn}$ { $displaystyle$ h^ { (1 $)$ } \ $gets$ W _ { $vn$ } $v^$ { ( 0 ) } + B $_$ { $vn$ } $t>0$ 、 t $t>0$ > $0 0$ $h^{(1)}\gets W_{vn}v^{(0)}+B_{vn}$

${\displaystyle {begin { aligned}x^{(t)}&={B_{vn}}\n^{(t)}&=\phi({W_{hn}h^{(t)}+{B_{hn}+x^{t(t)}\z})\pi)$

각 반복 루프의 끝에는 t $t\gets t+1$ + $t\gets t+1$ {\ $displaystyle$ t $\gets$ t $+1$ }이 $t\gets t+1$ $t\gets t+1$ .연산자 $⊙$ {\ $displaystyle \odot}$ 은 $\odot$ $\sigma$ Hadamard 제품을 나타내고, $\phi$ {\ $displaystyle \phi}$ 는 $\phi$ Sigmoid 및 쌍곡선 탄젠트의 활성화 함수를 나타냅니다.

변수

$x^{(t)}$ ( $x^{(t)}$ ) { $displaystyle$ x^ { ( $x^{(t)}$ $t$ ) $x^{(t)}$ } :현재 $v^{(t)}$ v ( $v^{(t)}$ ){ $displaystyle$ v^ { ( $v^{(t)}$ $t$ )}를 $v^{(t)}$ $x^{(t)}$ 입력 $x^{(t)}$ 으로 x $x^{(t)}$ ( $x^{(t)}$ t ){ $displaystyle x^$ { ( t )} 에 $x^{(t)}$ 투영합니다.이 결과는 다음 숨겨진 상태에서 사용되며 제안된 콘텐츠 적응형 게이트로 전달됩니다.
$n^{(t)}$ ( $n^{(t)}$ ) { $displaystyle$ n $^{$ ( $t$ ) $n^{(t)}$ } : GRU와 비교하여 리셋게이트가 제외되었습니다.이 $명령어는$ h(\ $displaystyle h^{($ $x^{(t)}$ 와 $x^{(t)}$ t $x^{(t)}$ )})에 $h^{(t)}$ 된 파라미터를 조합하여 새로운 숨김 $)\$ n $^{(t$ 을 생성합니다.
$z^{(t)}$ ( $z^{(t)}$ ) { $displaystyle$ z $^{$ ( $t$ ) $z^{(t)}$ } : GRU의 업데이트게이트와 동일하며 숨김 상태의 천이에 사용됩니다.
$l^{(t)}$ ( $l^{(t)}$ ) { $displaystyle$ l $^{$ ( $t$ ) $l^{(t)}$ } : 업데이트게이트와 현재 기능의 무게를 조합하는 Hadamard 연산자가 있습니다.이 게이트는 현재 숨겨진 상태를 희석시키는 것이 아니라 점진적인 전환의 양에 영향을 미치는 콘텐츠 적응 게이트로 명명되었습니다.
$h^{(t+1)}$ ( $h^{(t+1)}$ + $h^{(t+1)}$ ) { { $displaystyle$ h^ { ( t + $1$ ) $h^{(t+1)}$ } :다음 숨김 $h^{(t)}$ 는 h $h^{(t)}$ ( $h^{(t)}$ $h^{(t)}$ h^ { ( $t$ )} 및 $n^{(t)}$ ( $){$ $displaystyle$ n^ { ( $n^{(t)}$ t $n^{(t)}$ 와 조합됩니다.

데이터 흐름

CARU의 또 다른 특징은 장기 콘텐츠 의존도를 완화하기 위해 재설정 게이트를 사용하는 대신 현재 단어에 따른 숨겨진 상태의 가중치와 콘텐츠 적응형 게이트의 도입이다.CARU에 의해 처리되는 데이터 흐름에는 다음 3가지 트렁크가 있습니다.

내용 상태:이를 통해 선형 레이어에 의해 실현되는 새로운 $n^{(t)}$ $n^{(t)}$ ( t ) \ $displaystyle$ n $^{$ ( $t$ ) }이 $n^{(t)}$ 생성됩니다.이 부분은 단순한 RNN 네트워크에 해당합니다.

Word-weight:현재 워드의 무게 $\sigma (x^{(t)})$ ( $\sigma (x^{(t)})$ ( $\sigma (x^{(t)})$ ) \ $displaystyle \sigma$ ( $x^$ { ( t ) $\sigma (x^{(t)})$ } )를 $\sigma (x^{(t)})$ 생성합니다.GRU 리셋게이트와 같은 기능을 가지고 있지만 내용 전체가 아닌 현재 워드를 기반으로 합니다.구체적으로는 무게와 음성 부분 간의 관계를 연결하는 태그 지정 태스크로 간주할 수 있습니다.
콘텐츠 무게:현재 컨텐츠의 $z^{(t)}$ z $(\$ z $^{(t)})$ 를 $z^{(t)}$ 생성합니다. 형태는 GRU 업데이트 게이트와 동일하지만 장기적인 의존성을 극복하는 것을 목적으로 합니다.

GRU와 대조적으로 CARU는 이러한 데이터 흐름을 처리할 의도가 없으며, 대신 워드 웨이트를 콘텐츠 적응형 게이트에 디스패치하고 콘텐츠 웨이트와 곱합니다.이렇게 해서 내용 적응 게이트는 단어와 내용을 모두 고려합니다.

레퍼런스

^ Cho, Kyunghyun; van Merrienboer, Bart; Bahdanau, DZmitry; Bengio, Yoshua (2014). "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches". arXiv:1409.1259. {{cite journal}}:Cite 저널 요구 사항 journal=(도움말)
^ Felix Gers; Jürgen Schmidhuber; Fred Cummins (1999). "Learning to Forget: Continual Prediction with LSTM". Proc. ICANN'99, IEE, London. 1999: 850–855. doi:10.1049/cp:19991218. ISBN 0-85296-721-7.
^ "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. 2015-10-27. Archived from the original on 2021-11-10. Retrieved May 18, 2016.
^ Ravanelli, Mirco; Brakel, Philemon; Omologo, Maurizio; Bengio, Yoshua (2018). "Light Gated Recurrent Units for Speech Recognition". IEEE Transactions on Emerging Topics in Computational Intelligence. 2 (2): 92–102. arXiv:1803.10225. doi:10.1109/TETCI.2017.2762739. S2CID 4402991.
^ Su, Yuahang; Kuo, Jay (2019). "On extended long short-term memory and dependent bidirectional recurrent neural network". Neurocomputing. 356: 151–161. arXiv:1803.01686. doi:10.1016/j.neucom.2019.04.044. S2CID 3675055.
^ Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].
^ Gruber, N.; Jockisch, A. (2020), "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?", Frontiers in Artificial Intelligence, 3: 40, doi:10.3389/frai.2020.00040, PMC 7861254, PMID 33733157, S2CID 220252321
^ Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].
^ Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE].
^ Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE].
^ Chan, Ka-Hou; Ke, Wei; Im, Sio-Kei (2020), Yang, Haiqin; Pasupa, Kitsuchart; Leung, Andrew Chi-Sing; Kwok, James T. (eds.), "CARU: A Content-Adaptive Recurrent Unit for the Transition of Hidden State in NLP", Neural Information Processing, Cham: Springer International Publishing, vol. 12532, pp. 693–703, doi:10.1007/978-3-030-63830-6_58, ISBN 978-3-030-63829-0, S2CID 227075832, retrieved 2022-02-18
^ Ke, Wei; Chan, Ka-Hou (2021-11-30). "A Multilayer CARU Framework to Obtain Probability Distribution for Paragraph-Based Sentiment Analysis". Applied Sciences. 11 (23): 11344. doi:10.3390/app112311344. ISSN 2076-3417.

[1] Cho, Kyunghyun; van Merrienboer, Bart; Bahdanau, DZmitry; Bengio, Yoshua (2014). "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches". arXiv:1409.1259. {{cite journal}}:Cite 저널 요구 사항 journal=(도움말)

[lstm1999-2] Felix Gers; Jürgen Schmidhuber; Fred Cummins (1999). "Learning to Forget: Continual Prediction with LSTM". Proc. ICANN'99, IEE, London. 1999: 850–855. doi:10.1049/cp:19991218. ISBN 0-85296-721-7.

[MyUser_Wildml.com_May_18_2016c-3] "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. 2015-10-27. Archived from the original on 2021-11-10. Retrieved May 18, 2016.

[Ravalli2018-4] Ravanelli, Mirco; Brakel, Philemon; Omologo, Maurizio; Bengio, Yoshua (2018). "Light Gated Recurrent Units for Speech Recognition". IEEE Transactions on Emerging Topics in Computational Intelligence. 2 (2): 92–102. arXiv:1803.10225. doi:10.1109/TETCI.2017.2762739. S2CID 4402991.

[Su2019-5] Su, Yuahang; Kuo, Jay (2019). "On extended long short-term memory and dependent bidirectional recurrent neural network". Neurocomputing. 356: 151–161. arXiv:1803.01686. doi:10.1016/j.neucom.2019.04.044. S2CID 3675055.

[MyUser_Arxiv.org_May_18_2016c-6] Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].

[gruber_jockisch-7] Gruber, N.; Jockisch, A. (2020), "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?", Frontiers in Artificial Intelligence, 3: 40, doi:10.3389/frai.2020.00040, PMC 7861254, PMID 33733157, S2CID 220252321

[Chung_18_2016c-8] Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].

[9] Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE].

[10] Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE].

[11] Chan, Ka-Hou; Ke, Wei; Im, Sio-Kei (2020), Yang, Haiqin; Pasupa, Kitsuchart; Leung, Andrew Chi-Sing; Kwok, James T. (eds.), "CARU: A Content-Adaptive Recurrent Unit for the Transition of Hidden State in NLP", Neural Information Processing, Cham: Springer International Publishing, vol. 12532, pp. 693–703, doi:10.1007/978-3-030-63830-6_58, ISBN 978-3-030-63829-0, S2CID 227075832, retrieved 2022-02-18

[12] Ke, Wei; Chan, Ka-Hou (2021-11-30). "A Multilayer CARU Framework to Obtain Probability Distribution for Paragraph-Based Sentiment Analysis". Applied Sciences. 11 (23): 11344. doi:10.3390/app112311344. ISSN 2076-3417.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[10]

[11]

[12]

Search