정류기(신경 네트워크)

x = 0에 가까운 ReLU 정류기(파란색) 및 GELU(녹색) 기능의 그림

인공 신경망의 맥락에서 정류기 또는 ReLU(정류 선형 유닛) 활성화^[1]^[2] 함수는 인수의 양의 부분으로 정의되는 활성화 함수입니다.

f(x)=x^{+}=\max(0,x)

여기서 x는 뉴런에 대한 입력입니다.이것은 램프 기능이라고도 하며 전기 공학 분야의 반파 정류와 유사합니다.

이 활성화 함수는 ^[3]^[4]1960년대 후반부터 계층형 신경망의 시각적 특징 추출 맥락에서 나타나기 시작했다.그것은 후에 강력한 생물학적 동기와 수학적 ^[5]^[6]정당성을 가지고 있다고 주장되었다.2011년에는 2011년 이전에 널리 사용된 활성화 함수, 예를 들어 로지스틱 Sigmoid(확률 이론에서 영감을 얻은 로지스틱 회귀 참조) 및 보다 실용적인^[8] 대응물인 쌍곡선 탄젠트와 비교하여 더 깊은 네트워크의 ^[7]더 나은 훈련을 가능하게 하는 것으로 확인되었다.정류기는 2017년 현재^[update] 심층 신경망의 ^[9]가장 인기 있는 활성화 기능입니다.

수정된 선형 장치는 심층 신경망과 컴퓨터 ^[12]^[13]^[14]신경과학을 사용하여 컴퓨터^[7] 비전 및 음성 인식에서^[10]^[11] 응용 프로그램을 찾습니다.

이점

스파스 활성화:예를 들어 랜덤하게 초기화된 네트워크에서는 숨겨진 유닛의 약 50%만이 활성화됩니다(출력이 0이 아닙니다).
향상된 그라데이션 전파:양방향으로 ^[7]포화되는 S자형 활성화 함수에 비해 사라지는 구배 문제가 적습니다.
효율적인 계산:비교, 덧셈, 곱셈만 가능합니다.
스케일 $\max(0,ax)=a\max(0,x){\text{ for }}a\geq 0$ : $\max(0,ax)=a\max(0,x){\text{ for }}a\geq 0$ max ( $\max(0,ax)=a\max(0,x){\text{ for }}a\geq 0$ , $\max(0,ax)=a\max(0,x){\text{ for }}a\geq 0$ ) $=$ $\max$ ( 0 , $ax$ )= a $max$ ( 0 , $x$ { $for }a \geq$ 0 $\max(0,ax)=a\max(0,x){\text{ for }}a\geq 0$ 의 경우.

교정 활성화 기능은 신경 추상화 피라미드에서 특정 흥분과 비특정 억제를 분리하기 위해 사용되었으며, 여러 컴퓨터 비전 작업을 ^[15]학습하기 위해 감독 방식으로 훈련되었다.2011년에는 ^[7]정류기를 비선형성으로 사용하면 감독되지 않은 사전 훈련 없이 심층 감독 신경망을 훈련할 수 있는 것으로 나타났다.Sigmoid 함수 또는 유사한 활성화 함수와 비교하여 수정된 선형 단위를 사용하면 크고 복잡한 데이터셋에서 심층 신경 아키텍처를 보다 빠르고 효과적으로 교육할 수 있습니다.

잠재적인 문제

0에서는 미분할 수 없지만, 다른 곳에서는 미분할 수 있으며, 0에서의 도함수 값은 임의로 0 또는 1로 선택할 수 있습니다.
0이 아니라.
무한.
다잉 ReLU 문제:ReLU(정직 선형 단위) 뉴런은 때때로 기본적으로 모든 입력에 대해 비활성 상태가 될 수 있습니다.이 상태에서는 어떤 구배도 뉴런을 통해 역류하지 않기 때문에 뉴런은 지속적으로 비활성 상태로 고착되어 "디" 상태가 됩니다.이것은 사라지는 구배 문제의 한 형태이다.경우에 따라서는 네트워크 내의 많은 수의 뉴런이 사멸 상태로 고착되어 모델 용량이 효과적으로 감소될 수 있습니다.일반적으로 이 문제는 학습률이 너무 높게 설정되어 있을 때 발생합니다.대신 누출이 있는 ReLU를 사용하면 완화될 수 있습니다.이것에 의해, x < 0 의 작은 양의 기울기가 할당됩니다.단, 퍼포먼스는 저하됩니다.

변종

부분 선형 변형

리키리LU

누출성 ReLU는 장치가 ^[11]활성화되지 않을 때 작은 양의 구배를 허용합니다.

f(x)=syslog{case}x&{\text{if}}x>0,\0.01x&{\text{cases}}x}}.\end {case}

파라메트릭 참조LU

Parametric ReLU(PReLU; 파라미터 ReLU)는 누출계수를 다른 뉴럴 네트워크 ^[16]파라미터와 함께 학습된 파라미터로 함으로써 이 아이디어를 더욱 발전시킵니다.

f(x)=cases}x&{\text{if}}x>0,\ax&{\text{cases}}}.\end {case}

1 1의 경우 이는 다음과 같습니다.

f(x)=\max(x,ax)

따라서 "maxout"^[16] 네트워크와 관계가 있습니다.

기타 비선형 변형

가우스 오차 선형 단위(GELU)

GELU는 정류기에 대한 부드러운 근사치입니다.x < 0일 때 비단조적인 "bump"를 가지며 ^[17]BERT 등의 모델에서는 기본 액티베이션으로 기능합니다.

$f(x)=x\cdot \Phi (x)$ ( $f(x)=x\cdot \Phi (x)$ ) $=$ $f(x)=x\cdot \Phi (x)$ $f(x)=x\cdot \Phi (x)$ ( ( $f(x)=x\cdot \Phi (x)$ ) $f(x)=x\cdot \Phi (x)$ { $displaystyle$ f $(x$ )= $x\cdot \Phi (x$

여기서 δ(x)는 표준 정규 분포의 누적 분포 함수입니다.

이 활성화 기능은 이 문서의 시작 부분에 있는 그림에 설명되어 있습니다.

SiLU

SiLU(Sigmoid Linear Unit) 또는 Swish^[18] 함수는 GELU 용지에서 처음 만들어진 또 다른 매끄러운 근사치입니다.^[17]

$f(x)=x\cdot\operatorname {sigmoid}(x)$

$\operatorname {sigmoid} (x)$ 서 sigmoid $\operatorname {sigmoid} (x)$ ( $\operatorname {sigmoid} (x)$ ) { $style \operatorname {sigmoid}$ ( $x)}$ 는 $\operatorname {sigmoid} (x)$ sigmoid 함수입니다.

소프트플러스

정류기에 대한 부드러운 근사치는 분석 함수입니다.

f(x)=\ln(1+e^{x}),

softplus^[19]^[7] 또는 smoothRe라고 합니다.LU ^[20]기능큰 $음수$ x $({displaystyle$ x})의 $x$ 경우 $ln(1)$ 1 $({displaystyle$ ln $(1$ )이므로 $ln(1)$ 0 바로 위), 큰 $양수$ x({ $displaystyle$ x})의 $x$ 경우 $ln(e^{x})$ $ln(e^{x})$ $ln(1)$ n $ln(e^{x})$ x $){$ $displaystyle$ ln $(e^{x$ $x$ 이 $ln(e^{x})$ 됩니다 $.$

샤프니스 $파라미터$ k(\ $displaystyle$ k $)$ 는 $k$ 다음과 같습니다.

f(x)=frac {ln \left(1+e^{kx}\오른쪽)}{k}}

softplus의 도함수는 로지스틱 함수입니다.파라메트릭 버전부터 시작해서

f'(x)=subscfrac {e^{kx}}{1+e^{kx}}=subscfrac {1}{1+e^{-kx}}

로지스틱 Sigmoid 함수는 정류기의 도함수인 헤비사이드 스텝 함수의 부드러운 근사치입니다.

단일 변수 softplus의 다변수 일반화는 첫 번째 인수가 0으로 설정된 LogSumExp입니다.

\operatorname {LSE_{0}}^{+}(x_{1},\dots,x_{n}):=\operatorname {LSE}(0,x_{1},\operatorname,x_{n}=\log \left(1+e^{x_{1}}+\cdots +e^{x_{n}}}\오른쪽).

LogSumExp 함수는 다음과 같습니다.

\displaystyle \operatorname {LSE}(x_{1},\display,x_{n})=\log \left(e^{x_{1}}+\cdots +e^{x_{n}}}\right)}}

그 구배는 softmax입니다.첫 번째 인수가 0으로 설정된 softmax는 로지스틱 함수의 다변수 일반화입니다.LogSumExp와 softmax는 모두 기계학습에 사용됩니다.

ELU

지수 선형 단위는 평균 활성화를 0에 가깝게 만들어 학습 속도를 높입니다.ELU는 ^[21]ReLU보다 높은 분류 정확도를 얻을 수 있는 것으로 나타났습니다.

\displaystyle f(x)=case{case}x&{\text{if}}x>0,\a\left(e^{x}-1\right)&{\text{case}}}

$a$ 서\ $displaystyle$ a는 $a$ 튜닝할 하이퍼 $a\geq 0$ 이고 $a\geq 0$ 0 \ $displaystyle$ a \ $geq$ 0은 $a\geq 0$ 제약조건입니다.

ELU는 $shifted ReLU$ (SReLU)의 스무스 버전으로 볼 수 있습니다 $f(x)=\max(-a,x)$ SReLU는 f ( x ) $=$ max ( - $f(x)=\max(-a,x)$ , $f(x)=\max(-a,x)$ $)\displaystyle$ f $(x)=\maxsquala,x$ ) $f(x)=\max(-a,x)$ 을 $f(x)=\max(-a,x)$ 가지며 {displaystyle a $a$ 를 $a$ 하게 해석할 수 있습니다.

미쉬

misch 함수를 정류기의 부드러운 ^[18]근사치로 사용할 수도 있습니다.다음과 같이 정의됩니다.

\displaystyle f(x)=x\tanh(\operatorname {softplus}(x))}

$\tanh(x)$ 서 tanh $($ ( $\tanh(x)$ ) { $displaystyle \tanh (x)}$ 는 $\tanh(x)$ 쌍곡선 접선이고 $\operatorname {softplus(x)}$ $\operatorname {softplus(x)}$ $\operatorname {softplus(x)}$ $\operatorname {softplus(x)}$ $\operatorname {softplus(x)}$ $\operatorname {softplus(x)}$ ( x $\operatorname {softplus(x)}$ ) { $displaystyle$ \ $operatorname$ { $softplus (x)}$ 는 $\operatorname {softplus(x)}$ softplus 함수입니다.

Mish는 비단조적이고 자기 ^[22]게이트적이다.그것은 그 자체가 ReLU의 ^[22]변형인 Swish에서 영감을 얻었다.

「」를 참조해 주세요.

레퍼런스

^ Brownlee, Jason (8 January 2019). "A Gentle Introduction to the Rectified Linear Unit (ReLU)". Machine Learning Mastery. Retrieved 8 April 2021.
^ Liu, Danqing (30 November 2017). "A Practical Guide to ReLU". Medium. Retrieved 8 April 2021.
^ Fukushima, K. (1969). "Visual feature extraction by a multilayered network of analog threshold elements". IEEE Transactions on Systems Science and Cybernetics. 5 (4): 322–333. doi:10.1109/TSSC.1969.300225.
^ Fukushima, K.; Miyake, S. (1982). "Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition". In Competition and Cooperation in Neural Nets. Lecture Notes in Biomathematics. Springer. 45: 267–285. doi:10.1007/978-3-642-46466-9_18. ISBN 978-3-540-11574-8.
^ Hahnloser, R.; Sarpeshkar, R.; Mahowald, M. A.; Douglas, R. J.; Seung, H. S. (2000). "Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit". Nature. 405 (6789): 947–951. Bibcode:2000Natur.405..947H. doi:10.1038/35016072. PMID 10879535. S2CID 4399014.
^ Hahnloser, R.; Seung, H. S. (2001). Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks. NIPS 2001.
^ ^a ^b ^c ^d ^e Xavier Glorot, Antoine Bordes and Yoshua Bengio (2011). Deep sparse rectifier neural networks (PDF). AISTATS. Rectifier and softplus activation functions. The second one is a smooth version of the first.{{cite conference}}: CS1 maint: 작성자 파라미터 사용(링크)
^ Yann LeCun, Leon Bottou, Genevieve B. Orr and Klaus-Robert Müller (1998). "Efficient BackProp" (PDF). In G. Orr; K. Müller (eds.). Neural Networks: Tricks of the Trade. Springer.{{cite encyclopedia}}: CS1 maint: 작성자 파라미터 사용(링크)
^ Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (October 16, 2017). "Searching for Activation Functions". arXiv:1710.05941 [cs.NE].
^ László Tóth (2013). Phone Recognition with Deep Sparse Rectifier Neural Networks (PDF). ICASSP.{{cite conference}}: CS1 maint: 작성자 파라미터 사용(링크)
^ ^a ^b 앤드류 L. 마스, 오니 YHannun, Andrew Y. Ng(2014).정류기 비선형성은 뉴럴 네트워크 음향 모델을 개선합니다.
^ Hansel, D.; van Vreeswijk, C. (2002). "How noise contributes to contrast invariance of orientation tuning in cat visual cortex". J. Neurosci. 22 (12): 5118–5128. doi:10.1523/JNEUROSCI.22-12-05118.2002. PMC 6757721. PMID 12077207.
^ Kadmon, Jonathan; Sompolinsky, Haim (2015-11-19). "Transition to Chaos in Random Neuronal Networks". Physical Review X. 5 (4): 041030. arXiv:1508.06486. Bibcode:2015PhRvX...5d1030K. doi:10.1103/PhysRevX.5.041030. S2CID 7813832.
^ Engelken, Rainer; Wolf, Fred; Abbott, L. F. (2020-06-03). "Lyapunov spectra of chaotic recurrent neural networks". arXiv:2006.02427 [nlin.CD].
^ Behnke, Sven (2003). Hierarchical Neural Networks for Image Interpretation. Lecture Notes in Computer Science. Vol. 2766. Springer. doi:10.1007/b11963. ISBN 978-3-540-40722-5. S2CID 1304548.
^ ^a ^b He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). "Delving Deep into Rectifiers: Surpassing Human-Level Performance on Image Net Classification". arXiv:1502.01852 [cs.CV].
^ ^a ^b Hendrycks, Dan; Gimpel, Kevin (2016). "Gaussian Error Linear Units (GELUs)". arXiv:1606.08415 [cs.LG].
^ ^a ^b Diganta Misra (23 Aug 2019), Mish: A Self Regularized Non-Monotonic Activation Function (PDF), arXiv:1908.08681v1, retrieved 26 March 2022
^ Dugas, Charles; Bengio, Yoshua; Bélisle, François; Nadeau, Claude; Garcia, René (2000-01-01). "Incorporating second-order functional knowledge for better option pricing" (PDF). Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'00). MIT Press: 451–457. Since the sigmoid h has a positive first derivative, its primitive, which we call softplus, is convex.
^ "Smooth Rectifier Linear Unit (SmoothReLU) Forward Layer". Developer Guide for Intel Data Analytics Acceleration Library. 2017. Retrieved 2018-12-04.
^ Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp (2015). "Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)". arXiv:1511.07289 [cs.LG].
^ ^a ^b Shaw, Sweta (2020-05-10). "Activation Functions Compared with Experiments". W&B. Retrieved 2022-07-11.

[brownlee-1] Brownlee, Jason (8 January 2019). "A Gentle Introduction to the Rectified Linear Unit (ReLU)". Machine Learning Mastery. Retrieved 8 April 2021.

[medium-relu-2] Liu, Danqing (30 November 2017). "A Practical Guide to ReLU". Medium. Retrieved 8 April 2021.

[Fukushima1969-3] Fukushima, K. (1969). "Visual feature extraction by a multilayered network of analog threshold elements". IEEE Transactions on Systems Science and Cybernetics. 5 (4): 322–333. doi:10.1109/TSSC.1969.300225.

[Fukushima1982-4] Fukushima, K.; Miyake, S. (1982). "Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition". In Competition and Cooperation in Neural Nets. Lecture Notes in Biomathematics. Springer. 45: 267–285. doi:10.1007/978-3-642-46466-9_18. ISBN 978-3-540-11574-8.

[Hahnloser2000-5] Hahnloser, R.; Sarpeshkar, R.; Mahowald, M. A.; Douglas, R. J.; Seung, H. S. (2000). "Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit". Nature. 405 (6789): 947–951. Bibcode:2000Natur.405..947H. doi:10.1038/35016072. PMID 10879535. S2CID 4399014.

[Hahnloser2001-6] Hahnloser, R.; Seung, H. S. (2001). Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks. NIPS 2001.

[glorot2011-7] Xavier Glorot, Antoine Bordes and Yoshua Bengio (2011). Deep sparse rectifier neural networks (PDF). AISTATS. Rectifier and softplus activation functions. The second one is a smooth version of the first.{{cite conference}}: CS1 maint: 작성자 파라미터 사용(링크)

[8] Yann LeCun, Leon Bottou, Genevieve B. Orr and Klaus-Robert Müller (1998). "Efficient BackProp" (PDF). In G. Orr; K. Müller (eds.). Neural Networks: Tricks of the Trade. Springer.{{cite encyclopedia}}: CS1 maint: 작성자 파라미터 사용(링크)

[9] Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (October 16, 2017). "Searching for Activation Functions". arXiv:1710.05941 [cs.NE].

[tothl2013-10] László Tóth (2013). Phone Recognition with Deep Sparse Rectifier Neural Networks (PDF). ICASSP.{{cite conference}}: CS1 maint: 작성자 파라미터 사용(링크)

[maas2014-11] 앤드류 L. 마스, 오니 YHannun, Andrew Y. Ng(2014).정류기 비선형성은 뉴럴 네트워크 음향 모델을 개선합니다.

[hansel2002-12] Hansel, D.; van Vreeswijk, C. (2002). "How noise contributes to contrast invariance of orientation tuning in cat visual cortex". J. Neurosci. 22 (12): 5118–5128. doi:10.1523/JNEUROSCI.22-12-05118.2002. PMC 6757721. PMID 12077207.

[13] Kadmon, Jonathan; Sompolinsky, Haim (2015-11-19). "Transition to Chaos in Random Neuronal Networks". Physical Review X. 5 (4): 041030. arXiv:1508.06486. Bibcode:2015PhRvX...5d1030K. doi:10.1103/PhysRevX.5.041030. S2CID 7813832.

[14] Engelken, Rainer; Wolf, Fred; Abbott, L. F. (2020-06-03). "Lyapunov spectra of chaotic recurrent neural networks". arXiv:2006.02427 [nlin.CD].

[NeuralAbstractionPyramid-15] Behnke, Sven (2003). Hierarchical Neural Networks for Image Interpretation. Lecture Notes in Computer Science. Vol. 2766. Springer. doi:10.1007/b11963. ISBN 978-3-540-40722-5. S2CID 1304548.

[prelu-16] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). "Delving Deep into Rectifiers: Surpassing Human-Level Performance on Image Net Classification". arXiv:1502.01852 [cs.CV].

[ReferenceA-17] Hendrycks, Dan; Gimpel, Kevin (2016). "Gaussian Error Linear Units (GELUs)". arXiv:1606.08415 [cs.LG].

[Misra-18] Diganta Misra (23 Aug 2019), Mish: A Self Regularized Non-Monotonic Activation Function (PDF), arXiv:1908.08681v1, retrieved 26 March 2022

[19] Dugas, Charles; Bengio, Yoshua; Bélisle, François; Nadeau, Claude; Garcia, René (2000-01-01). "Incorporating second-order functional knowledge for better option pricing" (PDF). Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'00). MIT Press: 451–457. Since the sigmoid h has a positive first derivative, its primitive, which we call softplus, is convex.

[20] "Smooth Rectifier Linear Unit (SmoothReLU) Forward Layer". Developer Guide for Intel Data Analytics Acceleration Library. 2017. Retrieved 2018-12-04.

[21] Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp (2015). "Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)". arXiv:1511.07289 [cs.LG].

[shaw-22] Shaw, Sweta (2020-05-10). "Activation Functions Compared with Experiments". W&B. Retrieved 2022-07-11.

[1]

[2]

[3]

[4]

[5]

[6]

[8]

[7]

[9]

[12]

[13]

[14]

[10]

[11]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

Search

정류기(신경 네트워크)

네임스페이스

더

목차

이점

잠재적인 문제