확률적 근사

확률적 근사법은 근원 찾기 문제 또는 최적화 문제에 일반적으로 사용되는 반복 방법의 집합이다.확률적 근사 방법의 재귀적 업데이트 규칙은 무엇보다도 수집된 데이터가 노이즈에 의해 손상되었을 때 선형 시스템을 해결하거나 직접 계산할 수 없고 노이즈가 많은 관측치를 통해서만 추정되는 함수의 극한값을 근사하는 데 사용될 수 있다.

한마디로 확률적 근사 알고리즘은 ${\textstyle \xi }$ 의 ${\textstyle \xi }$ 변수에 따라 함수의 기대치를 갖는 f $text)$ ${\textstyle f(\theta )=\operatorname {E} _{\xi }[F(\theta ,\xi )]}$ ${\textstyle f(\theta )=\operatorname {E} _{\xi }[F(\theta ,\xi )]}$ ( $text$ [ ${\textstyle f(\theta )=\operatorname {E} _{\xi }[F(\theta ,\xi )]}$ F ( \ $theta$ ) = $\operatorname$ { $E$ } _ { \ $xi$ $}형식$ 의 ${\textstyle f(\theta )=\operatorname {E} _{\xi }[F(\theta ,\xi )]}$ 함수를 다룬다. $이러한$ 기능의 ${\textstyle f}$ ${\textstyle f}$ 을 직접 평가하지 않고 복원하는 것이 목표입니다 $.$ 대신 확률적 근사 알고리즘은 0이나 극단값과 같은 ${\textstyle f}$ f의 $특성$ 을 ${\textstyle f}$ 으로 근사하기 위해 F ${\textstyle F(\theta ,\xi )}$ ${\textstyle F(\theta ,\xi )}$ ${\textstyle F(\theta ,\xi )}$ $),$ ${\textstyle F(\theta ,\xi )}$ , \xi)의 랜덤 표본을 F(\ $textstyle$ F $(\$ $theta,$ \ $xi))$ 로 ${\textstyle F(\theta ,\xi )}$ ${\textstyle f}$ 만든다.

최근 확률적 근사치는 통계 및 기계 학습 분야, 특히 빅데이터가 있는 환경에서 광범위하게 적용되고 있다.이러한 애플리케이션은 확률적 최적화 방법 및 알고리즘에서 전자파 알고리즘의 온라인 형식, 시간적 차이를 통한 강화 학습, ^[1]딥 러닝 등에 이르기까지 다양하다.확률적 근사 알고리즘은 또한 집단 역학을 설명하기 위해 사회과학에서 사용되어 왔다: 학습 이론에서 가상의 놀이와 합의 알고리즘은 그들의 ^[2]이론을 사용하여 연구될 수 있다.

이러한 종류의 가장 초기의, 그리고 프로토타입의 알고리즘은 1951년과 1952년에 각각 도입된 로빈스-몬로와 키퍼-울포위츠 알고리즘이다.

로빈스-몬로 알고리즘

1951년 허버트 로빈스와 서튼 ^[3]먼로에 의해 도입된 로빈스-먼로 알고리즘은 함수를 기대치로 나타내는 근원 발견 문제를 해결하기 위한 방법론을 제시하였다. ${\textstyle M(\theta )}$ ${\textstyle M(\theta )}$ ( ${\textstyle M(\theta )=\alpha }$ ${\textstyle M(\theta )}$ ) { $textstyle$ $M$ ${\textstyle M(\theta )}$ ( \ $theta$ ) ${\textstyle M(\theta )}$ } ${\textstyle M(\theta )=\alpha }$ ${\textstyle \theta ^{*}}$ ( \ $textstyle$ ${\textstyle \alpha }$ M ( \ $theta$ ) = \ $alpha }$ } ${\textstyle M(\theta )=\alpha }$ ∗ ${\textstyle \theta ^{*}}$ ∗ ∗ ${\textstyle \theta ^{*}}$ ∗ ∗ ∗ ∗ ∗ ∗ ∗ {\ {\ {\ {\ {\ {\ {\ （ \ $textstyle \theta$ ^ { }}} ${\textstyle \theta ^{*}}$ ） {\ {\ ${\textstyle M(\theta )}$ {\ {\ {\ {\ {\ {\ {\ {\ {\ {\ ${\$ {\ {\ {\ {\ {\ {\ ${\textstyle \alpha }$ $(\theta$ 대신 ${\textstyle N(\theta )}$ ${\textstyle N(\theta )}$ N ${\textstyle N(\theta )}$ ${\textstyle N(\theta )}$ )(\ $textstyle$ N $(\$ ${\textstyle \operatorname {E} [N(\theta )]=M(\theta )}$ ${\textstyle N(\theta )}$ )의 ${\textstyle N(\theta )}$ 측정을 얻을 수 있습니다. ${\textstyle \operatorname {E} [N(\theta )]=M(\theta )}$ 서 E ${\textstyle \operatorname {E} [N(\theta )]=M(\theta )}$ [ ${\textstyle \operatorname {E} [N(\theta )]=M(\theta )}$ ( ${\textstyle \operatorname {E} [N(\theta )]=M(\theta )}$ ) ${\textstyle \operatorname {E} [N(\theta )]=M(\theta )}$ ( ${\textstyle \operatorname {E} [N(\theta )]=M(\theta )}$ ) \ $textstyle$ \ $operatorname$ { $E$ } [ $N$ ( \ $theta$ ) ${\textstyle \operatorname {E} [N(\theta )]=M(\theta )}$ = $M$ ( \ $theta$ ) 。알고리즘의 구조는 다음 형식의 반복을 생성하는 것입니다.

\displaystyle \theta _{n+1}=\theta _{n}-a_{n}(N(\theta _{n})-\alpha )}

$a_{1},a_{2},\dots$ 서 a $a_{1},a_{2},\dots$ , $a_{1},a_{2},\dots$ 2, $…({displaystyle a_{1}, a_{2},\dots})$ 는 $a_{1},a_{2},\dots$ 양의 스텝사이즈 시퀀스입니다.Robbins와 Monro는^[3]^{, Theorem 2} 다음과 같은 경우에 $\theta _{n}$ $L^{2}$ n \ $displaystyle$ \ $theta$ $\theta _{n}$ _ { $n$ $\theta _{n}$ }이(가) $L^{2}$ 에서 ${\$ { $displaystyle$ L^ { $\theta ^{*}$ 및 확률적으로도)로 $\theta_n$ 수렴됨을 증명했으며, Blum은^[4] 나중에 수렴이 실제로 확률 1임을 증명했다.

${\textstyle N(\theta )}$ ( ${\textstyle N(\theta )}$ ) { $textstyle$ N ( \ $theta$ ) } 은 ${\textstyle N(\theta )}$ 균일하게 경계되어 있습니다.
${\textstyle M(\theta )}$ ( ${\textstyle M(\theta )}$ ) { $textstyle$ M ( \ $theta$ ) }은 ${\textstyle M(\theta )}$ (는) 감소하지 않습니다.
${\textstyle M'(\theta ^{*})}$ ${\textstyle M'(\theta ^{*})}$ ${\textstyle M'(\theta ^{*})}$ {\ ${\textstyle M'(\theta ^{*})}$ { $textstyle$ M （ \ $theta$ ^ { * } ）이 ${\textstyle M'(\theta ^{*})}$ 존재하며 양의
${\textstyle a_{n}}$ ${\$ 의 ${\textstyle a_{n}}$ 시퀀스는 다음 요건을 충족합니다.

\qquad \sum _{n=0}^{\infty }a_{n}=\infty \sum {mbox { and }}\infty \sum _{n=0}^{2} <\infty \infty

Robbins-Monro에 의해 제안되고 이러한 조건을 만족시키는 특정 일련의 단계는 ${\textstyle a>0}$ $>$ 0 { $textstyle$ a $>$ 0 ${\textstyle a>0}$ 에 ${\textstyle a_{n}=a/n}$ ${\textstyle a_{n}=a/n}$ $=$ a ${\textstyle a_{n}=a/n}$ / ${\textstyle a_{n}=a/n}$ { $textstyle a_{n}=$ a $/n$ 의 형태를 가진다.다른 시리즈도 가능하지만 ${\textstyle N(\theta )}$ ( ${\textstyle N(\theta )}$ ) ${\textstyle N(\theta )}$ ) \ $textstyle$ N ( \ $theta$ )의 ${\textstyle N(\theta )}$ 를 평균화하기 위해서는 위의 조건을 충족해야 합니다.

복잡도 결과

${\textstyle f(\theta )}$ $))$ { $textstyle$ f $(\theta)}$ 가 ${\textstyle f(\theta )}$ 2배 연속적으로 미분 가능하고 강하게 볼록하며 f ${\textstyle f(\theta )}$ $))$ { $textstyle$ f $(\theta)}$ 의 ${\textstyle f(\theta )}$ 미니마이저가 ${\textstyle \Theta }$ { $textstyle \Theta$ 의 내부에 있는 ${\textstyle f(\theta )}$ 로빈스-몬로 알고리즘은 목적 함수에 대해 점근적으로 최적의 수렴률을 달성한다. ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ , E ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ [ ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ ( ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ n ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ ) - ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ ( ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ / ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ n ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ ){ $textstyle \operatorname$ { $E }[$ f ( \ $theta$ _ { n ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ } - f^ { * ${\textstyle \operatorname {E} [f(\theta _{n})-f^{*}]=O(1/n)}$ } $=$ O ( $1$ / n ) ${\textstyle f(\theta )}$ 。 $여기$ 서 $f$ ${\textstyle f^{*}}$ \ $text$ style f $^$ { * }}}는 ${\textstyle f^{*}}$ ${\textstyle f(\theta )}$ f ( tyle f ( ${\textstyle f(\theta )}$ tyle) $style$ f ( $tyle$ ) style f ( tyle )의 최소값입니다.
반대로, 평활성과 강한 볼록성의 가정이 모두 결여된 일반적인 볼록의 경우, Nemirovski와 Yudin은^[7] 목적 함수 값에 대해 점근적으로 최적의 수렴률이 ${\textstyle O(1/{\sqrt {n}})}$ O ${\textstyle O(1/{\sqrt {n}})}$ ( ${\textstyle O(1/{\sqrt {n}})}$ / n ${\textstyle O(1/{\sqrt {n}})}$ ) { $textstyle$ O ( $1/{\sqrt {n$ 을 보여주었다.그들은 또한 이 비율이 개선될 수 없다는 것을 증명했다.

후속 개발 및 Polyak-Ruppert 평균화

로빈스-몬로 알고리즘은 이론적으로 O( $)$ { $textstyle$ O $(1/n)}$ 를 ${\textstyle O(1/n)}$ 2회 연속 미분성과 강한 볼록성을 가정하여 달성할 수 있지만 구현 시 성능이 상당히 저하될 수 있습니다.이는 주로 알고리즘이 스텝사이즈 시퀀스의 선택에 매우 민감하기 때문에 점근적으로 최적의 스텝사이즈 정책이 ^[6]^[8]초기에는 상당히 해로울 수 있기 때문입니다.

Chung[9](1954년)과 Fabian[10](1968년)우리가 최적 융합률을 달성할 것 아 nx▽ 2f(θ ∗)−과{O(1/{\sqrt{n}})\textstyle}(1/n)을 보여 주1/n{\textstyle a_{n}=\bigtriangledown ^ᆳf(\theta ^{*})^{)}/n}(또는 n=1(nM′(θ ∗)){\textstyle a_{n}={\frac{1}{(n.M'(\the $ta ^{*}}}}}}}).$ Lai와^[11]^[12] Robbins는 θ ${\textstyle \theta _{n}}$ \ $textstyle \theta$ _ ${n}$ 이 ${\textstyle \theta _{n}}$ (가) 최소 점근 분산을 가지도록 M ${\textstyle M'(\theta ^{*})}$ θ $(\textstyle$ M $'(\theta$ ^{*})를 ${\textstyle M'(\theta ^{*})}$ ${\textstyle M'(\theta ^{*})}$ 하는 적응 절차를 설계했다.그러나 그러한 최적의 방법을 적용하기 위해서는 대부분의 상황에서 얻기 어려운 많은 선험적 정보가 필요하다.이러한 부족을 극복하기 위해, Polyak^[13](1991)과^[14] Ruppert(1988)는 궤적을 평균화하는 아이디어를 바탕으로 새로운 최적 알고리즘을 독자적으로 개발했다.Polyak과 Juditsky는^[15] 또한 더 긴 단계와 반복의 평균을 사용하여 선형 및 비선형 근원 탐색 문제에 대해 Robbins-Monro를 가속하는 방법을 제시했다.알고리즘의 구조는 다음과 같습니다.

\displaystyle \theta _{n}-\theta _{n}(\theta _{n}), \qquad {\bar {\theta }}_{n}=sum _{i=0}^{n-1}\theta _{i})

unique

{\bar {\theta }}_{n}

\

display

style

\ theta }

_

{\bar {\theta }}_{n}

{

n

} the root the thedisplay

\theta ^{*}

{

\theta ^{*}

{ { { （ \

display

style \

theta ^

{ * } ）로의

\theta ^{*}

수렴은 스텝시퀀스

\{a_{n}\}

{

\{a_{n}\}

}

{

displaystyle

\ {

a

_ {

n

}}}}의

\{a_{n}\}

감소가 충분히 느린 상태에 의존합니다.그것은

답1)

{displaystyle a_{n}\rightarrow 0,\qquad {frac {a_{n}-a_{n+1}}=o(a_{n})}

$따라서$ ${\textstyle a_{n}=n^{-\alpha }}$ ${\textstyle 0<\alpha <1}$ $=$ ${\textstyle a_{n}=n^{-\alpha }}$ - α({ $textstyle a_$ { ${\textstyle 0<\alpha <1}$ $}=n$ $^{-\alpha$ ${\textstyle a_{n}=n^{-\alpha }}$ 는 이 제한을 ${\textstyle \alpha =1}$ 하지만, ${\textstyle 0<\alpha <1}$ ${\textstyle \alpha =1}$ $=$ 1 $(\textstyle$ \ $alpha =$ 1)은 ${\textstyle \alpha =1}$ 충족되지 않으므로 긴 단계이다.Robbins-Monro 알고리즘에 기재되어 있는 전제 조건 하에서 변경 결과 동일한 점근적으로 최적의 컨버전스 ${\textstyle O(1/{\sqrt {n}})}$ O( $1/$ O(1/{\ $sqrt {n}))$ 가 ${\textstyle O(1/{\sqrt {n}})}$ 됩니다.단,^[15] 보다 견고한 스텝사이즈 정책이 적용됩니다.이에 앞서 Nemirovski와 Yudin은^[16] 연속 볼록 목표와 볼록-오목 안장점 문제를 해결하는 경우에 대해 더 긴 단계를 사용하고 반복을 평균화하는 아이디어를 이미 제안했다.이러한 알고리즘은 비점근율 ${\textstyle O(1/{\sqrt {n}})}$ ( ${\textstyle O(1/{\sqrt {n}})}$ / $)$ { $textstyle$ O( $1/{\sqrt {n$ 에 도달하는 것이 관찰되었습니다.

보다 일반적인 결과는 Kushner와 In의^[17] 11장에서 보간 시간 ${\textstyle t_{n}=\sum _{i=0}^{n-1}a_{i}}$ ${\textstyle t_{n}=\sum _{i=0}^{n-1}a_{i}}$ ${\textstyle t_{n}=\sum _{i=0}^{n-1}a_{i}}$ ${\textstyle t_{n}=\sum _{i=0}^{n-1}a_{i}}$ i ${\textstyle t_{n}=\sum _{i=0}^{n-1}a_{i}}$ 0 ${\textstyle t_{n}=\sum _{i=0}^{n-1}a_{i}}$ - ${\textstyle t_{n}=\sum _{i=0}^{n-1}a_{i}}$ ${\textstyle t_{n}=\sum _{i=0}^{n-1}a_{i}}$ { $textstyle t_{n$ } = \ $sum$ _ { i ${\textstyle t_{n}=\sum _{i=0}^{n-1}a_{i}}$ = $0$ }^{ $n-1}a_{i$ , 보간 프로세스 ${\textstyle \theta ^{n}(\cdot )}$ n ( ${\textstyle \theta ^{n}(\cdot )}$ ) , ${\textstyle U^{n}(\cdot )}$ n ( \ $textstyle \ta$ ^{ $n}$ \ $cd$ u} )를 ${\textstyle \theta ^{n}(\cdot )}$ 정의하고 정규화된 프로세스를 정의함으로써 주어진다.

{\displaystyle \theta ^{n}(t)=\theta U^{n}(t)=(\theta _{n+i}-\theta ^{*})/{\signrt {a_{n+i}}\sign {mbox{for}}\in[t_n}-{n}_i}_t}_t},

반복 평균을

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

n

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

t

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

+

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

/

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

-

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

i {

displaystyle

\

Theta

_ {

n

} =

sum

frac { a

_

{ n } {

t

} { i =

n

{\hat {U}}^{n}(t)={\frac {\sqrt {a_{n}}}{t}}\sum _{i=n}^{n+t/a_{n}-1}(\theta _{i}-\theta ^{*})

}^n +

t

/

a _

{ n } \

ta

_

\Theta _{n}={\frac {a_{n}}{t}}\sum _{i=n}^{n+t/a_{n}-1}\theta _{i}

{ i } } } ^

{\hat {U}}^{n}(t)={\frac {\sqrt {a_{n}}}{t}}\sum _{i=n}^{n+t/a_{n}-1}(\theta _{i}-\theta ^{*})

+ ta _

{\hat {U}}^{n}(t)={\frac {\sqrt {a_{n}}}{t}}\sum _{i=n}^{n+t/a_{n}-1}(\theta _{i}-\theta ^{*})

_ { i

{\hat {U}}^{n}(t)={\frac {\sqrt {a_{n}}}{t}}\sum _{i=n}^{n+t/a_{n}-1}(\theta _{i}-\theta ^{*})

\ ta _ ta _ ta _ { i } } } } } \ ta _

ta

_ ta _ ta _ ta _ ta _ ta _ ta _

t {U}}^{n

}(

t)=sum

frac

{i=n}^{n

}^{

n+t/a_{n}-1}(\theta

_

{i}-\theta

{\hat {U}}^{n}(t)={\frac {\sqrt {a_{n}}}{t}}\sum _{i=n}^{n+t/a_{n}-1}(\theta _{i}-\theta ^{*})

A1) 및 다음 A2)의 전제 조건

답 2) 후르비츠 ${\textstyle A}$ A $(\textstyle$ A $)$ 와 ${\textstyle A}$ 대칭 정의 행렬 δ ${\textstyle \{U^{n}(\cdot )\}}$ $textstyle$ \Sigma $)$ 가 ${\textstyle \Sigma }$ 있습니다 $.$ ${\textstyle U(\cdot )}$ 서 $textstyle \{U^{n}(\cdot)\}$ 은 ${\textstyle \{U^{n}(\cdot )\}}$ 약하게 U $textstyle$ U $(\cdot$ 로 수렴합니다 $.$

dU=AU,dt+\Sigma ^{1/2},sigma

{\textstyle w(\cdot )}

서

{\textstyle w(\cdot )}

w (

{\

) {

textstyle

w ( \

cdot )

는

{\textstyle w(\cdot )}

Wiener 표준 프로세스입니다.

만족하고 V ${\textstyle {\bar {V}}=(A^{-1})'\Sigma (A')^{-1}}$ $=$ ( ${\textstyle {\bar {V}}=(A^{-1})'\Sigma (A')^{-1}}$ - 1) ${\textstyle {\bar {V}}=(A^{-1})'\Sigma (A')^{-1}}$ ${\textstyle {\bar {V}}=(A^{-1})'\Sigma (A')^{-1}}$ ${\textstyle {\bar {V}}=(A^{-1})'\Sigma (A')^{-1}}$ A ${\textstyle {\bar {V}}=(A^{-1})'\Sigma (A')^{-1}}$ $）$ - ${\textstyle {\bar {V}}=(A^{-1})'\Sigma (A')^{-1}}$ ${\$ }}=( $A^{-1})'\Sigma(A')^{-1$ 을 ${\textstyle {\bar {V}}=(A^{-1})'\Sigma (A')^{-1}}$ 합니다. ${\textstyle t}$ t $\textstyle$ t에 대해

{{displaystyle {hat {U}}{\stackrel {\mathcal {D}}{\longrightarrow}}{\mathcal {N}}(0,V_{t}),\quad {t}=mathcal {V_{t}}/t^{2}.}

평균화 아이디어의 성공은 원본 시퀀스 $textstyle \{\theta$ _ ${n}\}}$ 과 ${\textstyle \{\theta _{n}\}}$ (와 ${\textstyle \{\Theta _{n}\}}$ ${\textstyle \{\Theta _{n}\}}$ 시퀀스 ${\$ textstyle $\{n$ }\}의 시간 척도 분리 때문이다. $Theta$ _ ${n$ 앞의 시간 척도가 더 빠릅니다.

확률적 최적화에서의 응용

다음과 같은 확률적 최적화 문제를 해결하고 싶다고 가정합니다.

g(\theta ^{*})=\min _{\theta \in \Theta }\operatorname {E} [Q(\theta,X)],

{\textstyle g(\theta )=\operatorname {E} [Q(\theta ,X)]}

서

{\textstyle g(\theta )=\operatorname {E} [Q(\theta ,X)]}

g (

{\textstyle g(\theta )=\operatorname {E} [Q(\theta ,X)]}

)

{\textstyle g(\theta )=\operatorname {E} [Q(\theta ,X)]}

{\textstyle g(\theta )=\operatorname {E} [Q(\theta ,X)]}

{\textstyle g(\theta )=\operatorname {E} [Q(\theta ,X)]}

[

{\textstyle g(\theta )=\operatorname {E} [Q(\theta ,X)]}

Q (

{\textstyle g(\theta )=\operatorname {E} [Q(\theta ,X)]}

,

{\textstyle g(\theta )=\operatorname {E} [Q(\theta ,X)]}

)

]{

textstyle

g ( \

theta

)

\nabla g(\theta )=0

\

operatorname

{

E

} [

Q

( \

theta

,

X

)

\nabla g(\theta )=0

는

{\textstyle g(\theta )=\operatorname {E} [Q(\theta ,X)]}

미분가능하고 볼록합니다.이 문제는

\nabla g(\theta )=0

g

\display

의

\theta ^{*}

∗ ( \

displaystyle

\

ta ^

{ * } )를

\theta^*

찾는 것과 같습니다.

\theta

된

{\(\

displaystyle

\theta)

및

\theta

랜덤

효과

X(\

displaystyle

X

X

의 함수로 "관측된" 비용으로 해석됩니다.실제로는

g

g (

)

) \

displaystyle

\

nabla

g ( \theta

\nabla g(\theta )

)

\nabla g(\theta )

Robins - Monro method display

(\theta _{n})_{n\geq 0}

displaydisplaydisplaydisplaydisplaydisplaydisplaydisplaydisplaydisplaydisplaydisplaydisplaydisplaydisplay

(\theta _{n})_{n\geq 0}

（

(\theta _{n})_{n\geq 0}

）

(\theta _{n})_{n\geq 0}

(\theta

_

{n})

(X_{n})_{n\geq 0}

{n\

geq

0

}

_

(\theta _{n})_{n\geq 0}

{

n

(X_{n})_{n\geq 0}

_

\theta ^{*}

{displaystyle

(X_{n})_{n\geq

0

}

X_{n}

(X_{n})_{n\geq 0}

n}_{

displaystyle

(

X_{

n

X_{n}

})_{n

}

\theta _{n}

{n}

\theta _{n}

{displaystyle

})

_{n

}

_style_displaystyle_{n}을 생성할 수 있는 경우

\theta ^{*}

}

_style _ta\

ta

style_{n}에 대해 설명합니다.

abla

g

(\theta

_

{n

즉

X_{n}

n

(\

은

X_{n}

다음에 정의된 조건부 분포에서 시뮬레이션됩니다.

\displaystyle \operatorname {E} [H(\theta,X) \theta =\theta _{n}]=\theta g(\theta _{n}).}

$H(\theta ,X)$ 서 H $H(\theta ,X)$ $H(\theta ,X)$ H $(\theta, X)$ 는 $H(\theta ,X)$ $(\displaystyle \nabla$ g $(\theta$ 의 편향되지 않은 추정치입니다 $.$ X({ $displaystyle \theta$ })가 $X$ $†$ 에 의존하는 $경우$ $H(\theta ,X)$ 으로 $H(\theta ,X)$ H $H(\theta ,X)$ displaystyle \theta)를 생성할 수 없습니다 $.$ 그라데이션의 모방자.IPA 또는 우도비 방법을 적용할 수 있는 특수한 경우 편향되지 않은 구배 $H(\theta ,X)$ H $H(\theta ,X)$ , $H(\theta ,X)$ X )(\ $displaystyle H(\theta$ , X $H(\theta ,X)$ 를 얻을 수 $있습니다.$ X(\ $displaystyle$ $\theta$ X $\theta$ 가 $X$ \ $displaystyle$ 과 독립적으로 생성되는 "기본" 프로세스로 간주되는 $경우$ ,파생 교환 연산을 위한 일부 정규화 조건 하에서 E $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ [ [ $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ ∂ $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ ） ] $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ ( $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ ) $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ （ $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ \ $display \operatorname { E$ } { $Big$ [ } { \ $frac$ \ $ta }$ { \ $tea$ } } $Q($ \ $teX$ ) = { } $\operatorname {E} {\Big [}{\frac {\partial }{\partial \theta }}Q(\theta ,X){\Big ]}=\nabla g(\theta )$ ）。 $X)=bigrat$ frac ${\theat \theta }Q(\theta, X)$ 는 $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)$ 기본적인 기울기 불편 추정치를 제공한다.단, 어플리케이션에 따라서는 H $H(\theta ,X)$ $)(\displaystyle$ H(\ $theta,$ $H(\theta ,X)$ X))의 $H(\theta ,X)$ $\nabla g(\theta )$ 기대치가 g $( ))$ 에 $\nabla g(\theta )$ 가깝지만 정확히 $동일하지$ 는 않은 유한차이 방식을 사용해야 $H(\theta ,X)$ .

그런 다음 결정론적 알고리즘에서 뉴턴의 방법과 유사하게 재귀를 정의한다.

(\displaystyle \theta _{n+1}=\theta _{n}-\varepsilon _{n}H(\theta _{n},X_{n+1}).}

알고리즘의 컨버전스

$\theta _{n}$ 결과는 알고리즘이 ^[18]수렴하기에 충분한 조건을 n $\$ $displaystyle \theta$ _ { $n}$ 에 $\theta _{n}$ 제시합니다.

C1) $\varepsilon _{n}\geq 0,\forall \;n\geq 0.$ 0 $\varepsilon _{n}\geq 0,\forall \;n\geq 0.$ 、 $\varepsilon _{n}\geq 0,\forall \;n\geq 0.$ n $\varepsilon _{n}\geq 0,\forall \;n\geq 0.$ 0 $\varepsilon _{n}\geq 0,\forall \;n\geq 0.$ . \ $displaystyle$ \ $varepsilon$ _ { $n$ } \ $geq$ 0 , \ $forall$ \ ; $n$ \ $geq$ 0 . $\varepsilon _{n}\geq 0,\forall \;n\geq 0.$ }

C2) $\sum _{n=0}^{\infty }\varepsilon _{n}=\infty$ n $\sum _{n=0}^{\infty }\varepsilon _{n}=\infty$ 0 $\sum _{n=0}^{\infty }\varepsilon _{n}=\infty$ $\sum _{n=0}^{\infty }\varepsilon _{n}=\infty$ = {\ n $\sum _{n=0}^{\infty }\varepsilon _{n}=\infty$ ${\$ { $displaystyle \sum$ _ { $n=0}^{\infty }\varepsilon$ _{n}=\ $infty }$

C3) $\sum _{n=0}^{\infty }\varepsilon _{n}^{2}<\infty$ n $\sum _{n=0}^{\infty }\varepsilon _{n}^{2}<\infty$ $\sum _{n=0}^{\infty }\varepsilon _{n}^{2}<\infty$ $\sum _{n=0}^{\infty }\varepsilon _{n}^{2}<\infty$ $\sum _{n=0}^{\infty }\varepsilon _{n}^{2}<\infty$ 2 < ${\$ \ $displaystyle \sum$ _ { $n=0}^{\infty$ }\ $varepsilon$ _{ $n}^$ 2} $<\infty }$

C4) $|X_{n}|\leq B,{\text{ for a fixed bound }}B.$ $|X_{n}|\leq B,{\text{ for a fixed bound }}B.$ B의 $|X_{n}|\leq B,{\text{ for a fixed bound }}B.$ $|X_{n}|\leq B,{\text{ for a fixed bound }}B.$ n $|X_{n}|\leq B,{\text{ for a fixed bound }}B.$ B $.$ {\ $displaystyle X_{n}$ \ $leq$ B, $고정 바인딩의 경우$ {\text{} $B.$ $}$

C5) $g(\theta ){\text{ is strictly convex, i.e.}}$ ( $g(\theta ){\text{ is strictly convex, i.e.}}$ )는 $g(\theta ){\text{ is strictly convex, i.e.}}$ 볼록합니다 $.$ \ $displaystyle$ g $(\theta){\text{는 엄밀하게 볼록합니다.$ $}}}$

\displaystyle \inf \leq \theta -\theta ^{*} \leq 1/\theta }\le \theta ^{*},\displayla g(\theta)\rangle > 0,{\text{\text}는 모든 }0 <\text <1입니다.}

그 후, 「 $\$ _ ${n}」$ 은, 거의 확실히 「 $n\$ displaystyle $\theta$ ^{*}」로 $\theta ^{*}$ 수렴합니다 $\theta _{n}$ .

다음은 이러한 상황에 대한 직관적인 설명입니다. $H(\theta _{n},X_{n+1})$ $H(\theta _{n},X_{n+1})$ n $H(\theta _{n},X_{n+1})$ $H(\theta _{n},X_{n+1})$ + $H(\theta _{n},X_{n+1})$ ) $H(\theta _{n},X_{n+1})$ { $displaystyle H(\theta$ _ ${n,X_{n+1})}$ 가 $H(\theta _{n},X_{n+1})$ 균일한 경계의 랜덤 변수라고 가정합니다.C2)가 충족되지 않는 경우(즉, $\sum _{n=0}^{\infty }\varepsilon _{n}<\infty$ n $\sum _{n=0}^{\infty }\varepsilon _{n}<\infty$ $\sum _{n=0}^{\infty }\varepsilon _{n}<\infty$ $\sum _{n=0}^{\infty }\varepsilon _{n}<\infty$ $\sum _{n=0}^{\infty }\varepsilon _{n}<\infty$ < $display$ \ displaystyle $\sum _$ ${$ n $\sum _{n=0}^{\infty }\varepsilon _{n}<\infty$ = $0$ }^{\infty $}\varepsilon$ _{n} <\ $infty$ }), $\sum _{n=0}^{\infty }\varepsilon _{n}<\infty$

(\displaystyle \theta _{n}-\theta _{0}=-\sum _{i=0}^{n-1}\varepsilon _{i}H(\theta _{i},X_{i+1})

는유계시퀀스이므로초기추정0

(\

이

\theta _{0}

(

)에서 너무 멀면

(\

displaystyle

\theta

으로 수렴할 수 없습니다.C3의 경우)

(\

^{

n})

이

\theta ^{*}

\theta _{n}

(\

displaystyle)

로 수렴하는 경우)에 주의해 주십시오.

^{*}}:

그럼

\theta ^{*}

\displaystyle \theta _{n}=-\varepsilon _{n}H(\theta _{n},X_{n+1})\rightarrow 0,{\text{as }n\rightarrow \infty}}

따라서

\varepsilon _{n}\downarrow 0

\varepsilon _{n}\downarrow 0

↓

0

(\displaystyle

\

varepsilon _{

n}\

downarrow

0

)

이어야 하며 조건 C3)이 이를 보증합니다.자연선택은

g(\theta )

n

=

/

\varepsilon _{n}=1/n

(\displaystyle \varepsilon

_

{n

}=

1/n

입니다.조건 C5는 g

(\displaystyle

g

(\theta

의 형상에 대해 상당히 엄격한 조건이며 알고리즘의 검색 방향을 제시합니다.

예(확률적 경사법이 ^[8]적절한 경우)

$Q(\theta ,X)=f(\theta )+\theta ^{T}X$ ( $Q(\theta ,X)=f(\theta )+\theta ^{T}X$ " , $Q(\theta ,X)=f(\theta )+\theta ^{T}X$ ) $Q(\theta ,X)=f(\theta )+\theta ^{T}X$ ( $Q(\theta ,X)=f(\theta )+\theta ^{T}X$ " ) + $Q(\theta ,X)=f(\theta )+\theta ^{T}X$ $Q(\theta ,X)=f(\theta )+\theta ^{T}X$ ( \ $display style Q$ ( \ $theta$ ) $= f$ ( \ $theta$ ) + \ $theta ^$ { $T$ } $X$ $X\in \mathbb {R} ^{p}$ $f$ $X\in \mathbb {R} ^{p}$ { $display style$ f}는 $f$ 미분 가능하며 X $X\in \mathbb {R} ^{p}$ $X\in \mathbb {R} ^{p}$ p $X\in \mathbb {R} ^{p}$ ( \ $displaystyle$ X \ $in$ \ $mathb$ R $)는 랜덤입니다.$ $g(\theta )=\operatorname {E} [Q(\theta ,X)]=f(\theta )+\theta ^{T}\operatorname {E} X$ X {\ $displaystyle$ g $(\theta)=\operatorname {E}$ [ $Q(\theta,X$ )]= $f(\theta)+\theta$ ^{ $T}\operatorname {E$ $X$ X $}$ 는 $g(\theta )=\operatorname {E} [Q(\theta ,X)]=f(\theta )+\theta ^{T}\operatorname {E} X$ $X$ 의 $X$ 에 의존하며, 이 문제에서는 확률적 경사법이 적합합니다. $H$ ( $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ ) $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ ( $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ ) $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ ( $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ ） $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ + $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ . $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ \ $display style$ H ( \ $theta$ , $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ X ) $= Frac$ ( \ display $style$ H ( \ $theta$ , X ) = $Q$ ( \ $theta$ , X ) = $Frac$ \ $frac$ { } 를 $H(\theta ,X)={\frac {\partial }{\partial \theta }}Q(\theta ,X)={\frac {\partial }{\partial \theta }}f(\theta )+X.$ 할 수 $있습니다$ . $}$

키퍼-울포위츠 알고리즘

키퍼-울포위츠 알고리즘은 1952년 제이콥 울포위츠(Jacob Wolfowitz)와 잭 키퍼(Jack Kiefer)^[19]에 의해 도입되었으며 로빈스-몬로 알고리즘의 출판에 의해 동기 부여되었다.그러나 알고리즘은 함수의 최대값을 확률적으로 추정하는 방법으로 제시되었다. $\theta$ $)$ { $displaystyle$ M $(x)}$ 을 $M(x)$ (를) \ $displaystyle \theta$ 점에서 최대값을 갖는 $M(x)$ 라고 가정합니다. $M(x)$ $N(x)$ M $($ $\operatorname {E} [N(x)]=M(x)$ ) { $displaystyle$ M( $N(x)$ $)}$ { $displaystyle$ $M(x)$ N $(x$ { $displaystyle$ N $)}$ 은 $\operatorname {E} [N(x)]=M(x)$ 알 수 $\operatorname {E} [N(x)]=M(x)$ 가정합니다. $\operatorname {E} [N(x)]=M(x)$ 서 E $\operatorname {E} [N(x)]=M(x)$ M은 알 수 없습니다.임의의 $포인트$ x(\ $displaystyle$ x $}$ 알고리즘의 구조는 구배와 같은 방법을 따르며 반복은 다음과 같이 생성됩니다.

\displaystyle x_{n+1}=x_{n}+a_{n}{\bigg(})-N(x_{n}-c_{n}}{2c_{n}}}{\bigg}}}}}

$N(x_{n}+c_{n})$ 서 $N(x_{n}+c_{n})$ N ( $N(x_{n}+c_{n})$ n + $N(x_{n}+c_{n})$ n ) $N(x_{n}+c_{n})$ { $displaystyle$ N ( x $_$ { $n$ ) + c _ { $n$ } $N(x_{n}-c_{n})$ 및 $N(x_{n}+c_{n})$ $N(x_{n}-c_{n})$ ( x $N(x_{n}-c_{n})$ - $N(x_{n}-c_{n})$ n $N(x_{n}-c_{n})$ ) { $displaystyle$ $N(x_{n}-c_{n})$ N ( $x$ $N(x_{n}-c_{n})$ _ { $n$ }- $c$ _ { $n$ }} 은 $N(x_{n}-c_{n})$ 독립적이며 $M(x)$ M $($ x $)$ 의 $M(x)$ $기울기$ 는 유한차이를 사용하여 근사치됩니다. $\{c_{n}\}$ { $\{c_{n}\}$ n $}$ { $displaystyle$ \ { $c$ _ { $n$ } \ }는 $\{c_{n}\}$ 그라데이션 근사치에 사용되는 유한 차분 폭의 시퀀스를 지정합니다 $\{a_{n}\}$ { $a$ } { $displaystyle$ \ { a _ { n $}\}}$ 는 해당 $\{a_{n}\}$ 방향을 따라 실행되는 양의 스텝사이즈 시퀀스를 지정합니다.키퍼와 월포위츠 만약 M()){M())\displaystyle}특정 규칙적 조건 만족하면,)n{\displaystyle x_{n}}θ{\theta\displaystyle}의 진실성에서 n→∞{\displaystyle n\to\infty}로, 그리고 후에 Blum[4]1954년에 나타났다)n{\displaystyle x_{n}}hub 모일 것이라는 것을 증명했습니다.s $다음$ 과 같은 $\theta$ 거의 확실하게 $§$ (\ $displaystyle$ \theta $)$ 에 $\theta$ 전달합니다.

$\operatorname {Var} (N(x))\leq S<\infty$ $x$ (\ $displaystyle$ x $x$ 에 대해 Var $\operatorname {Var} (N(x))\leq S<\infty$ " ( $\operatorname {Var} (N(x))\leq S<\infty$ ( $\operatorname {Var} (N(x))\leq S<\infty$ $\operatorname {Var} (N(x))\leq S<\infty$ < \ $displaystyle \operatorname { Var$ } ( $N$ ( $x$ ) \ $leq$ S < \ $infty$ } $\operatorname {Var} (N(x))\leq S<\infty$ 。
$)$ { $displaystyle$ M $(x)}$ 함수는 $M(x)$ 고유한 최대점(최소점)을 가지며 강한 오목(볼록점)입니다.
- 이 알고리즘은 처음에 $M(\cdot )$ M $))$ { $displaystyle$ M $(\cdot)}$ 이 $M(\cdot )$ 실현 가능한 공간 전체에 걸쳐 강한 전역 볼록성(요철성)을 유지해야 한다는 요건을 제시하였다.이 조건이 도메인 전체에 적용하기에는 너무 제한적이기 때문에 Kiefer와 Wolfowitz는 최적 솔루션을 포함하는 것으로 알려진 콤팩트 $C_{0}\subset \mathbb {R} ^{d}$ C $C_{0}\subset \mathbb {R} ^{d}$ $C_{0}\subset \mathbb {R} ^{d}$ $C_{0}\subset \mathbb {R} ^{d}$ \ $displaystyle C_{0}\subset \mathbb {R}$ ^{ $d}$ 에 $C_{0}\subset {\mathbb R}^{d}$ 적용하면 충분하다고 제안했다.
$M(x)$ ( $M(x)$ ) $M(x)$ { $displaystyle$ M $(x)}$ 함수는 $M(x)$ 다음과 같은 규칙성 조건을 충족합니다.
- 다음과 같이 $\beta >0$ > 0(\ $displaystyle \beta > 0$ )과 $\beta >0$ B $B>0$ > $B>0$ (\ $displaystyle$ B> $0)$ 이 $B>0$ $\beta >0$ 합니다. $(\displaystyle x'-\theta + x'-\theta <\beta \quad \오른쪽 화살표 \quad M(x')-M(x') <B x'-x'}$
- 다음과 $\rho >0$ 0(\ $displaystyle \rho$ > $0)$ 과 $\rho >0$ R $R>0$ > $R>0$ (\ $displaystyle$ R> $0)$ 이 $R>0$ 존재합니다. $\displaystyle x'-x" <\rho \quad \오른쪽 화살표 \quad M(x')-M(x') <R}$
- $\delta >0$ $(\displaystyle \delta$ > $0$ $\pi (\delta )>0$ 마다 $\pi (\delta )>0$ $\pi (\delta )>0$ 과 같이 $\pi (\delta )>0$ (\ $displaystyle \pi (\delta$ 0 $)$ 이 존재합니다 $.$ $\displaystyle z-\theta >\delta \quad \quad \inf _{\delta / 2>\varepsilon > 0}{\frac { M(z+\varepsilon}-M(z-\varepsilon}}>\pi(\delta)$
선택한 시퀀스 $\{a_{n}\}$ { $\{a_{n}\}$ n $}$ { $display \{a_{n}\}}$ $\{c_{n}\}$ 및 $\{a_{n}\}$ { $\{c_{n}\}$ n $}$ { $display \{c_{n}\}}$ 은 $\{c_{n}\}$ (는) 다음과 같은 양의 무한 시퀀스여야 합니다.
- $\displaystyle \syslog c_{n}\right 화살표 0\syslog\text{as}\syslog n\to\infty}$
- $\displaystyle \sum _{n=0}^{\infty}a_{n}=\infty}$
- $\displaystyle \sum _{n=0}^{infty}a_{n}c_{n}<\infty }$
- $\displaystyle \sum _{n=0}^{infty}a_{n}^{2}c_{n}^{-2}<\infty }$

Kiefer와 Wolfowitz가 권장하는 적절한 시퀀스 선택은 $a_{n}=1/n$ $=$ $a_{n}=1/n$ / $a_{n}=1/n$ (\ $displaystyle a_{n$ }= $1$ $c_{n}=n^{-1/3}$ $n})$ $c_{n}=n^{-1/3}$ $c_{n}=n^{-1/3}$ n $c_{n}=n^{-1/3}$ $c_{n}=n^{-1/3}$ / $c_{n}=n^{-1/3}$ (\ $displaystyle c_{n$ }= $n^{-1/$ 3 $c_{n}=n^{-1/3}$ 입니다.

후속 개발 및 중요한 문제

Kiefer Wolfowitz 알고리즘에서는 각 구배 계산에 대해 알고리즘의 반복마다 최소 $d+1$ + $d+1$ {\ $displaystyle d+1$ }개의 $d+1$ 다른 파라미터 값을 시뮬레이션해야 합니다. $d$ 서 d $\displaystyle$ d는 $d$ 서치 공간의 치수입니다.즉, d $\displaystyle$ d가 $d$ $크면$ Kiefer-Wolfowitz 알고리즘은 반복마다 상당한 계산 작업을 필요로 하므로 컨버전스가 느려집니다.
1. 이 문제를 해결하기 위해 스폴은 구배를 추정하기 위해 동시 섭동의 사용을 제안했다. $이$ 방법에서는 치수 d $\displaystyle$ ^[20]d에 관계없이 반복당 2개의 시뮬레이션만 필요합니다.
수렴에 필요한 조건에서는 강한 볼록함(또는 오목함)을 충족하고 고유한 용액을 포함하는 소정의 콤팩트 세트를 특정하는 능력은 찾기 어려울 수 있다.실제 어플리케이션과 관련하여 도메인이 상당히 큰 경우 이러한 가정은 상당히 제한적이고 비현실적일 수 있습니다.

추가 개발

수렴 조건, 수렴 속도, 다변량 및 기타 일반화, 단계 크기의 적절한 선택, 가능한 소음 모델 등에 ^[21]^[22]관한 광범위한 이론적 문헌이 이러한 알고리즘을 중심으로 개발되었다.이 방법들은 제어 이론에도 적용되는데, 이 경우 우리가 최적화하거나 0을 찾고자 하는 미지의 함수는 시간에 따라 달라질 수 있다.이 경우 $a_{n}$ n(\ $displaystyle$ a_ ${n$ })는 0으로 수렴하지 않고 기능을 ^[21]^{, 2nd ed., chapter 3}추적하기 위해 선택해야 합니다.

C. 요한 마스렐리에즈와 R. Douglas Martin은 확률적 근사치를 강력한 ^[23]추정에 적용한 최초의 사람이다.

확률 근사 알고리즘(Robbins-Monro 및 Kiefer-Wolfowitz 알고리즘 포함)을 분석하기 위한 주요 도구는 수학 통계와 확률에 관한 제3차 버클리 심포지엄에서 발표된 아리에 드보레츠키의 정리이다.^[24]

「」를 참조해 주세요.

레퍼런스

^ Toulis, Panos; Airoldi, Edoardo (2015). "Scalable estimation strategies based on stochastic approximations: classical results and new insights". Statistics and Computing. 25 (4): 781–795. doi:10.1007/s11222-015-9560-y. PMC 4484776. PMID 26139959.
^ Le Ny, Jerome. "Introduction to Stochastic Approximation Algorithms" (PDF). Polytechnique Montreal. Teaching Notes. Retrieved 16 November 2016.
^ ^a ^b Robbins, H.; Monro, S. (1951). "A Stochastic Approximation Method". The Annals of Mathematical Statistics. 22 (3): 400. doi:10.1214/aoms/1177729586.
^ ^a ^b Blum, Julius R. (1954-06-01). "Approximation Methods which Converge with Probability one". The Annals of Mathematical Statistics. 25 (2): 382–386. doi:10.1214/aoms/1177728794. ISSN 0003-4851.
^ Sacks, J. (1958). "Asymptotic Distribution of Stochastic Approximation Procedures". The Annals of Mathematical Statistics. 29 (2): 373–405. doi:10.1214/aoms/1177706619. JSTOR 2237335.
^ ^a ^b Nemirovski, A.; Juditsky, A.; Lan, G.; Shapiro, A. (2009). "Robust Stochastic Approximation Approach to Stochastic Programming". SIAM Journal on Optimization. 19 (4): 1574. doi:10.1137/070704277.
^ 최적화의 문제 복잡성 및 방법 효율성, A.네미로브스키와 D.유딘, 와일리 - 인터시 Ser. 이산수학 15 John Wiley New York(1983)
^ ^a ^b 확률적 검색 및 최적화 소개: 추정, 시뮬레이션 및 제어, J.C. Spall, John Wiley Hoboken, NJ, (2003)
^ Chung, K. L. (1954-09-01). "On a Stochastic Approximation Method". The Annals of Mathematical Statistics. 25 (3): 463–483. doi:10.1214/aoms/1177728716. ISSN 0003-4851.
^ Fabian, Vaclav (1968-08-01). "On Asymptotic Normality in Stochastic Approximation". The Annals of Mathematical Statistics. 39 (4): 1327–1332. doi:10.1214/aoms/1177698258. ISSN 0003-4851.
^ Lai, T. L.; Robbins, Herbert (1979-11-01). "Adaptive Design and Stochastic Approximation". The Annals of Statistics. 7 (6): 1196–1221. doi:10.1214/aos/1176344840. ISSN 0090-5364.
^ Lai, Tze Leung; Robbins, Herbert (1981-09-01). "Consistency and asymptotic efficiency of slope estimates in stochastic approximation schemes". Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete. 56 (3): 329–360. doi:10.1007/BF00536178. ISSN 0044-3719. S2CID 122109044.
^ Polyak, B T (1990-01-01). "New stochastic approximation type procedures. (In Russian.)". 7 (7). {{cite journal}}:Cite 저널 요구 사항 journal=(도움말)
^ Ruppert, D. "Efficient estimators from a slowly converging robbins-monro process". {{cite journal}}:Cite 저널 요구 사항 journal=(도움말)
^ ^a ^b Polyak, B. T.; Juditsky, A. B. (1992). "Acceleration of Stochastic Approximation by Averaging". SIAM Journal on Control and Optimization. 30 (4): 838. doi:10.1137/0330046.
^ 볼록-오목 함수의 안장점 근사치를 위한 최경사 강하 방법의 세자리 수렴에 대해 A.네미로브스키와 D.유딘, 도클. 아카드, 나우크 SSR 2939(1978년(러시아어), 소련 수학.Dokl. 19(1978년(영어))
^ Kushner, Harold; George Yin, G. (2003-07-17). Stochastic Approximation and Recursive Algorithms and Harold Kushner Springer. www.springer.com. ISBN 9780387008943. Retrieved 2016-05-16.
^ Bouleau, N.; Lepingle, D. (1994). Numerical Methods for stochastic Processes. New York: John Wiley. ISBN 9780471546412.
^ Kiefer, J.; Wolfowitz, J. (1952). "Stochastic Estimation of the Maximum of a Regression Function". The Annals of Mathematical Statistics. 23 (3): 462. doi:10.1214/aoms/1177729392.
^ Spall, J. C. (2000). "Adaptive stochastic approximation by the simultaneous perturbation method". IEEE Transactions on Automatic Control. 45 (10): 1839–1853. doi:10.1109/TAC.2000.880982.
^ ^a ^b Kushner, H. J.; Yin, G. G. (1997). Stochastic Approximation Algorithms and Applications. doi:10.1007/978-1-4899-2696-8. ISBN 978-1-4899-2698-2.
^ 확률적 근사치와 재귀적 추정, 미하일 보리소비치 네벨손과 라팔 잘마노비치 하스민스키, 이스라엘 과학번역 프로그램 번역 및 B.Silver, Providence, RI: American Mathematical Society, 1973, 1976.ISBN 0-8218-1597-0.
^ Martin, R.; Masreliez, C. (1975). "Robust estimation via stochastic approximation". IEEE Transactions on Information Theory. 21 (3): 263. doi:10.1109/TIT.1975.1055386.
^ Dvoretzky, Aryeh (1956-01-01). "On Stochastic Approximation". The Regents of the University of California. {{cite journal}}:Cite 저널 요구 사항 journal=(도움말)

[:1-1] Toulis, Panos; Airoldi, Edoardo (2015). "Scalable estimation strategies based on stochastic approximations: classical results and new insights". Statistics and Computing. 25 (4): 781–795. doi:10.1007/s11222-015-9560-y. PMC 4484776. PMID 26139959.

[2] Le Ny, Jerome. "Introduction to Stochastic Approximation Algorithms" (PDF). Polytechnique Montreal. Teaching Notes. Retrieved 16 November 2016.

[rm-3] Robbins, H.; Monro, S. (1951). "A Stochastic Approximation Method". The Annals of Mathematical Statistics. 22 (3): 400. doi:10.1214/aoms/1177729586.

[:0-4] Blum, Julius R. (1954-06-01). "Approximation Methods which Converge with Probability one". The Annals of Mathematical Statistics. 25 (2): 382–386. doi:10.1214/aoms/1177728794. ISSN 0003-4851.

[jsacks-5] Sacks, J. (1958). "Asymptotic Distribution of Stochastic Approximation Procedures". The Annals of Mathematical Statistics. 29 (2): 373–405. doi:10.1214/aoms/1177706619. JSTOR 2237335.

[NJLS-6] Nemirovski, A.; Juditsky, A.; Lan, G.; Shapiro, A. (2009). "Robust Stochastic Approximation Approach to Stochastic Programming". SIAM Journal on Optimization. 19 (4): 1574. doi:10.1137/070704277.

[NYcomp-7] 최적화의 문제 복잡성 및 방법 효율성, A.네미로브스키와 D.유딘, 와일리 - 인터시 Ser. 이산수학 15 John Wiley New York(1983)

[jcsbook-8] 확률적 검색 및 최적화 소개: 추정, 시뮬레이션 및 제어, J.C. Spall, John Wiley Hoboken, NJ, (2003)

[9] Chung, K. L. (1954-09-01). "On a Stochastic Approximation Method". The Annals of Mathematical Statistics. 25 (3): 463–483. doi:10.1214/aoms/1177728716. ISSN 0003-4851.

[10] Fabian, Vaclav (1968-08-01). "On Asymptotic Normality in Stochastic Approximation". The Annals of Mathematical Statistics. 39 (4): 1327–1332. doi:10.1214/aoms/1177698258. ISSN 0003-4851.

[11] Lai, T. L.; Robbins, Herbert (1979-11-01). "Adaptive Design and Stochastic Approximation". The Annals of Statistics. 7 (6): 1196–1221. doi:10.1214/aos/1176344840. ISSN 0090-5364.

[12] Lai, Tze Leung; Robbins, Herbert (1981-09-01). "Consistency and asymptotic efficiency of slope estimates in stochastic approximation schemes". Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete. 56 (3): 329–360. doi:10.1007/BF00536178. ISSN 0044-3719. S2CID 122109044.

[13] Polyak, B T (1990-01-01). "New stochastic approximation type procedures. (In Russian.)". 7 (7). {{cite journal}}:Cite 저널 요구 사항 journal=(도움말)

[14] Ruppert, D. "Efficient estimators from a slowly converging robbins-monro process". {{cite journal}}:Cite 저널 요구 사항 journal=(도움말)

[pj-15] Polyak, B. T.; Juditsky, A. B. (1992). "Acceleration of Stochastic Approximation by Averaging". SIAM Journal on Control and Optimization. 30 (4): 838. doi:10.1137/0330046.

[NY-16] 볼록-오목 함수의 안장점 근사치를 위한 최경사 강하 방법의 세자리 수렴에 대해 A.네미로브스키와 D.유딘, 도클. 아카드, 나우크 SSR 2939(1978년(러시아어), 소련 수학.Dokl. 19(1978년(영어))

[17] Kushner, Harold; George Yin, G. (2003-07-17). Stochastic Approximation and Recursive Algorithms and Harold Kushner Springer. www.springer.com. ISBN 9780387008943. Retrieved 2016-05-16.

[18] Bouleau, N.; Lepingle, D. (1994). Numerical Methods for stochastic Processes. New York: John Wiley. ISBN 9780471546412.

[KW-19] Kiefer, J.; Wolfowitz, J. (1952). "Stochastic Estimation of the Maximum of a Regression Function". The Annals of Mathematical Statistics. 23 (3): 462. doi:10.1214/aoms/1177729392.

[Jsp-20] Spall, J. C. (2000). "Adaptive stochastic approximation by the simultaneous perturbation method". IEEE Transactions on Automatic Control. 45 (10): 1839–1853. doi:10.1109/TAC.2000.880982.

[kushneryin-21] Kushner, H. J.; Yin, G. G. (1997). Stochastic Approximation Algorithms and Applications. doi:10.1007/978-1-4899-2696-8. ISBN 978-1-4899-2698-2.

[22] 확률적 근사치와 재귀적 추정, 미하일 보리소비치 네벨손과 라팔 잘마노비치 하스민스키, 이스라엘 과학번역 프로그램 번역 및 B.Silver, Providence, RI: American Mathematical Society, 1973, 1976.ISBN 0-8218-1597-0.

[23] Martin, R.; Masreliez, C. (1975). "Robust estimation via stochastic approximation". IEEE Transactions on Information Theory. 21 (3): 263. doi:10.1109/TIT.1975.1055386.

[24] Dvoretzky, Aryeh (1956-01-01). "On Stochastic Approximation". The Regents of the University of California. {{cite journal}}:Cite 저널 요구 사항 journal=(도움말)

[1]

[2]

[3]

[4]

[7]

[6]

[8]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

Search

확률적 근사

네임스페이스

더

목차

로빈스-몬로 알고리즘

복잡도 결과

후속 개발 및 Polyak-Ruppert 평균화

확률적 최적화에서의 응용

알고리즘의 컨버전스

예(확률적 경사법이 ^[8]적절한 경우)

키퍼-울포위츠 알고리즘

후속 개발 및 중요한 문제

추가 개발

「」를 참조해 주세요.

레퍼런스

Search

확률적 근사

로빈스-몬로 알고리즘

복잡도 결과

후속 개발 및 Polyak-Ruppert 평균화

확률적 최적화에서의 응용

알고리즘의 컨버전스

예(확률적 경사법이 [8]적절한 경우)

키퍼-울포위츠 알고리즘

후속 개발 및 중요한 문제

추가 개발

「 」를 참조해 주세요.

레퍼런스

예(확률적 경사법이 ^[8]적절한 경우)

「」를 참조해 주세요.