학습 곡선(기계 학습)

교육 점수 및 교차 검증 점수를 보여주는 학습 곡선

머신러닝에서 학습 곡선(또는 훈련 곡선)은 최적의 함수를 생성하는 것과 동일한 매개변수로 검증 데이터 집합에서 평가한 이 손실 함수에 대한 훈련 세트에 대한 모델의 손실 함수의 최적 값을 나타낸다.훈련 데이터를 더 추가하면 기계 모델이 얼마나 이득인지, 추정자가 분산 오류나 치우침 오류로 더 고통받는지를 알아보는 도구다.검증 점수와 훈련 점수가 모두 훈련 세트의 크기가 증가하면서 너무 낮은 값으로 수렴될 경우, 더 많은 훈련 데이터로부터 큰 이익을 얻지 못할 것이다.^[1]

머신러닝 곡선은 다양한 알고리즘 비교,^[2] 설계 중 모델 파라미터 선택,^[3] 수렴 개선을 위한 최적화 조정, 훈련에 사용되는 데이터 양 결정 등 여러 목적에 유용하다.^[4]

머신러닝 영역에서는 학습에 사용되는 훈련 사례의 수 또는 모델을 훈련하는 데 사용되는 반복 횟수로 그래프로 표시된 모델의 경험과 함께 커브의 x축에서 다른 학습 곡선의 두 가지 의미가 있다.^[5]

형식 정의

기계학습의 한 모델은 $기능인$ f $(x)$ 를 생성하는 것인데, x는 $훈련$ 데이터 $X_{\text{train}}$ ${\$ 와 $Y_{\text{train}}$ ${\$ 에서 $X_{\text{train}}$ 약간의 $변수$ 를 예측한다 $Y_{\text{train}}$ $f$ ${\displaystystyle$ f $}$ 가 예측해야 $f$ 하기 때문에 수학적 최적화와는 구별된다. $X_{\text{train}}$ ${\$ 이외의 $x$ x {\ $displaystyle$ x $}$ 에 $대한$ 웰.

우리는 종종 가능한 기능을 매개 변수화된 기능 계열로 제한한다 $\{f_{\theta }(x):\theta \in \Theta \}$ { $\{f_{\theta }(x):\theta \in \Theta \}$ $\{f_{\theta }(x):\theta \in \Theta \}$ ( x $\{f_{\theta }(x):\theta \in \Theta \}$ ) : $\{f_{\theta }(x):\theta \in \Theta \}$ $\{f_{\theta }(x):\theta \in \Theta \}$ \ } $\{f_{\theta }(x):\theta \in \Theta \}$ } ${\displaystyle \{f_{\teta }(x):$ $\theta \in \theta \}}}$ 은 $\{f_{\theta }(x):\theta \in \Theta \}$ 는) 우리의 기능이 좀 더 일반화될^[6] 수 있도록 함수에 $f$ 한 f $f$ 을(를) 쉽게 $f$ 찾을 수 있는 것과 같은 특정 속성이 있거나 이러한 속성이 사실이라고 생각하는 사전적 이유가 있기 때문이다.^[6]^{: 172}

데이터를 완벽하게 적합시키는 함수를 생산할 수 없다는 점을 감안하여 손실 $L(f_{\theta }(X),Y')$ L $L(f_{\theta }(X),Y')$ $L(f_{\theta }(X),Y')$ ( $L(f_{\theta }(X),Y')$ ) , $L(f_{\theta }(X),Y')$ $){\displaystyle L(f_{\theta }(X$ ))을 생성할 필요가 있다 $.$ 우리의 예측이 얼마나 좋은지 측정하기 위해 $L(f_{\theta }(X),Y')$ ' $Y')}.$ 그런 $L(f_{\theta }(X_{,}Y))$ 다음 $\theta ^{*}(X,Y)$ ( $\theta ^{*}(X,Y)$ $L(f_{\theta }(X_{,}Y))$ Y $L(f_{\theta }(X_{,}Y))$ 을 최소화하는 $L(f_{\theta }(X_{,}Y))$ ${\displaystyle \theta$ $L(f_{\theta }(X_{,}Y))$ $}$ 을(를) 찾는 최적화 프로세스를 정의하여 θ $\theta ^{*}(X,Y)$ ( $\theta ^{*}(X,Y)$ , $\theta ^{*}(X,Y)$ Y $\theta ^{*}(X,Y)$ ) ${\displaystyle L(f_{\theta },}Y$ $)$ 을( $\theta ^{*}(X,Y)$ 를) ${\displaystysty \ta ^{*}.$

데이터 양에 대한 교육 곡선

Then if our training data is $\{x_{1},x_{2},\dots ,x_{n}\},\{y_{1},y_{2},\dots y_{n}\}$ and our validation data is $\{x_{1}',x_{2}',\dots x_{m}'\},\{y_{1$ $',y_{2}',\cHB y_{m}\}}}$ 학습 $\{x_{1}',x_{2}',\dots x_{m}'\},\{y_{1}',y_{2}',\dots y_{m}'\}$ 곡선은 두 곡선의 플롯이다.

$i\mapsto L(f_{\theta ^{*})(X_{i},Y_{i}}(X_{i}),Y_{i}}$
$i\mapsto L(f_{\theta ^{*})(X_{i},Y_{i}}(X_{i}'),Y_{i}')$

여기서 $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$ $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$ = $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$ { $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$ $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$ , $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$ 2 $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$ , $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$ … $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$ $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$ $X_{i}=\{x_{1},x_{2},\dots x_{i}\}}$

반복 횟수에 대한 교육 곡선

많은 최적화 프로세스는 반복적이며 공정이 최적의 값으로 수렴될 때까지 동일한 단계를 반복한다.그라데이션 강도는 그러한 알고리즘 중 하나이다. $i$ $i$ step $i$ 후 $\theta$ 최적 $\theta$ $\theta$ 의 근사치로 $\theta _{i}^{*}$ $\theta _{i}^{*}$ i ${$ 를 정의하면 학습 곡선이 다음 그림이다.

$i\mapsto L(f_{\theta_{i}^{*}(X,Y)}(X,Y)}}$
$i\mapsto L(f_{\theta_{i}^{*}(X,Y)}(X'),Y')$

참고 항목

참조

^ scikit-learn developers. "Validation curves: plotting scores to evaluate models — scikit-learn 0.20.2 documentation". Retrieved February 15, 2019.
^ Madhavan, P.G. (1997). "A New Recurrent Neural Network Learning Algorithm for Time Series Prediction" (PDF). Journal of Intelligent Systems. p. 113 Fig. 3.
^ "Machine Learning 102: Practical Advice". Tutorial: Machine Learning for Astronomy with Scikit-learn.
^ Meek, Christopher; Thiesson, Bo; Heckerman, David (Summer 2002). "The Learning-Curve Sampling Method Applied to Model-Based Clustering". Journal of Machine Learning Research. 2 (3): 397. Archived from the original on 2013-07-15.
^ Sammut, Claude; Webb, Geoffrey I. (Eds.) (28 March 2011). Encyclopedia of Machine Learning (1st ed.). Springer. p. 578. ISBN 978-0-387-30768-8.
^ ^a ^b Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016-11-18). Deep Learning. MIT Press. p. 108. ISBN 978-0-262-03561-3.

[scikit-learn_learning-curve-1] scikit-learn developers. "Validation curves: plotting scores to evaluate models — scikit-learn 0.20.2 documentation". Retrieved February 15, 2019.

[2] Madhavan, P.G. (1997). "A New Recurrent Neural Network Learning Algorithm for Time Series Prediction" (PDF). Journal of Intelligent Systems. p. 113 Fig. 3.

[3] "Machine Learning 102: Practical Advice". Tutorial: Machine Learning for Astronomy with Scikit-learn.

[4] Meek, Christopher; Thiesson, Bo; Heckerman, David (Summer 2002). "The Learning-Curve Sampling Method Applied to Model-Based Clustering". Journal of Machine Learning Research. 2 (3): 397. Archived from the original on 2013-07-15.

[5] Sammut, Claude; Webb, Geoffrey I. (Eds.) (28 March 2011). Encyclopedia of Machine Learning (1st ed.). Springer. p. 578. ISBN 978-0-387-30768-8.

[:0-6] Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016-11-18). Deep Learning. MIT Press. p. 108. ISBN 978-0-262-03561-3.

[1]

[2]

[3]

[4]

[5]

[6]

Search

학습 곡선(기계 학습)

네임스페이스

더

목차

형식 정의

데이터 양에 대한 교육 곡선

반복 횟수에 대한 교육 곡선

참고 항목

참조