제임스-슈타인 추정기

제임스–Stein estimator는 (가우스 분포 랜덤 벡터 $Y=\{Y_{1},Y_{2},...,Y_{m}\}$ = $Y=\{Y_{1},Y_{2},...,Y_{m}\}$ { Y $Y=\{Y_{1},Y_{2},...,Y_{m}\}$ , $Y=\{Y_{1},Y_{2},...,Y_{m}\}$ $Y=\{Y_{1},Y_{2},...,Y_{m}\}$ , $Y=\{Y_{1},Y_{2},...,Y_{m}\}$ . . . $Y=\{Y_{1},Y_{2},...,Y_{m}\}$ $Y=\{Y_{1},Y_{2},...,Y_{m}\}$ ${\displaystyle Y=\{Y_{1$ }, $Y_{$ 1}, $Y_$ {2}, $Y=\{Y_{1},Y_{2},...,Y_{m}\}$ $},$ }, ${\boldsymbol {\theta }}$ ,의 평균 ${\boldsymbol {\theta }}$ ${\displaystystyle$ Y =}, {}, biased}, {}의 편향상관.알 $Y=\{Y_{1},Y_{2},...,Y_{m}\}$ 수 없는 $Y_{m}\}\}$ 은 $\{{\boldsymbol {\theta }}_{1},{\boldsymbol {\theta }}_{2},...,{\boldsymbol {\theta }}_{m}\}$ 는) { $\{{\boldsymbol {\theta }}_{1},{\boldsymbol {\theta }}_{2},...,{\boldsymbol {\theta }}_{m}\}$ $\{{\boldsymbol {\theta }}_{1},{\boldsymbol {\theta }}_{2},...,{\boldsymbol {\theta }}_{m}\}$ , $\{{\boldsymbol {\theta }}_{1},{\boldsymbol {\theta }}_{2},...,{\boldsymbol {\theta }}_{m}\}$ 2 $\{{\boldsymbol {\theta }}_{1},{\boldsymbol {\theta }}_{2},...,{\boldsymbol {\theta }}_{m}\}$ , . . $\{{\boldsymbol {\theta }}_{1},{\boldsymbol {\theta }}_{2},...,{\boldsymbol {\theta }}_{m}\}$ $\{{\boldsymbol {\theta }}_{1},{\boldsymbol {\theta }}_{2},...,{\boldsymbol {\theta }}_{m}\}$ ${\displaystyle \{\\boldsymbol {\\\theta }}}{1},{\boldsymbol {\}}, {\\\\boldsymbold$

그것은 순차적으로 평가자의 이전 버전 찰스 스타인에 의해 1956,[1]는 비교적 충격적인 결론이 평균의 평소 예측 또는 샘플 평균 스타인과 제임스에 의해θ ^(Y나는) 쓰여진)θ{\displaystyle{\boldsymbol{\hat{\thet에 도착했다에서 개발된 것 두가지 주요 출판된 논문, 생각해냈다.는} $}}(Y_{i})={\boldsymbol {\theta }}}$ , is admissible when $m\leq 2$ , however it is inadmissible when $m\geq 3$ and proposed a possible improvement to the estimator that shrinks the sample means ${{\boldsymbol {\theta }}_{i}}$ towards a more central mean vector ${\boldsymbol {\nu }}$ ${\$ 모든 표본이 동일한 크기를 공유하는 경우 표본 평균의 "평균값" 또는 일반적으로 "평균값"을 선택할 수 있음)을 일반적으로 스타인의 예 또는 역설이라고 한다.이 초기 결과는 1961년 윌러드 제임스와 찰스 스타인에 의해 원래의 과정을 단순화함으로써 개선되었다.^[2]

제임스가 저승사자임을 알 수 있다.–스타인 추정기는 "일반적인" 최소 제곱 접근방식을 지배하며, 이는 제임스-스타인 추정기가 "일반적인" 최소 제곱 추정기보다 낮거나 같은 평균 제곱 오차를 가지고 있다는 것을 의미한다.

설정

Let ${\mathbf {Y} }\sim N_{m}({\boldsymbol {\theta }},\sigma ^{2}I),\,$ where the vector ${\boldsymbol {\theta }}$ is the unknown mean of ${\mathbf {Y} }$ , which is $m$ -variate normally distributed and with known공분산 행렬 $\sigma ^{2}I$ $\sigma ^{2}I$ $\sigma ^{2}I$ $\sigma ^{2}I$ .

${\widehat {\boldsymbol {\theta }}}$ 는 단일 ${\boldsymbol {\theta }}$ 인 ${\mathbf {Y} }$ ${\$ {\ $mathbf$ {\ $mathbf{$ $y}}}$ 의 추정치인 , ${\boldsymbol {\theta }}$ $^{\$ $displaystyle {\boldsymbol{\theta}}}}{\$ displaystystyle ${\\mathbf{$ Y ${\mathbf {Y} }$ 을 얻으려고 한다 ${\boldsymbol {\theta }}$

실제 적용에서 이것은 일련의 매개변수가 샘플링되고, 샘플이 독립적인 가우스 노이즈에 의해 손상되는 일반적인 상황이다.이 소음은 평균이 0이므로 표본 자체를 모수의 추정치로 사용하는 것이 합리적일 수 있다. $이$ 접근방식은 ${\widehat {\boldsymbol {\theta }}}_{LS}={\mathbf {y} }$ 제곱 추정기로, ${\widehat {\boldsymbol {\theta }}}_{LS}={\mathbf {y} }$ ${\widehat {\boldsymbol {\theta }}}_{LS}={\mathbf {y} }$ = y ${\$ {\ $thea}}{{$ $LS}={\mathbf {y$

Stein demonstrated that in terms of mean squared error $\operatorname {E} \left[\left\ {\boldsymbol {\theta }}-{\widehat {\boldsymbol {\theta }}}\right\ ^{2}\right]$ , the least squares estimator, ${\displaystyle {\widehat {\boldsymbol {\theta }}}_{$ $LS}$ 은 ${\widehat {\boldsymbol {\theta }}}_{LS}$ 는) James와 같은 축소 기반 추정기에 최적화되지 않음–Stein estimator, ${\widehat {\boldsymbol {\theta }}}_{JS}$ ${\widehat {\boldsymbol {\theta }}}_{JS}$ ${\widehat {\boldsymbol {\theta }}}_{JS}$ S {\ $displaystyle$ {\ $widehat$ {\\ $boldsymbol {\the}}_{$ $JS$ ^[1] 표본 평균과 비교했을 때 평균 제곱 오차의 ${\boldsymbol {\theta }}$ ${\$ 보다 더 나은 (가능) 추정치가 있고 결코 더 나쁜 추정치가 없다는 역설적인 결과가 스타인의 예로 알려지게 되었다.

제임스–스테인 추정기

최소 제곱 추정기(ML)의 MSE(R) 대제임스-슈타인 추정기(JS).제임스–Stein Estimator는 실제 모수 벡터 θ의 규범이 0에 가까울 때 최선의 추정치를 제시한다.

$\sigma ^{2}$ $\sigma ^{2}$ ${\$ 알려진 $\sigma ^{2}$ 경우, 제임스–Stein Estimator 제공자:

{\widehat}\\\symbol {\theta}}}{{JS}=\왼쪽(1-{\frac {(m-2)\sigma ^{2}}:{{}}{\mathbf {y}\{2}}:\오른쪽){\mathbf {y}}}}

제임스와 스타인은 ${\widehat {\boldsymbol {\theta }}}_{LS}$ 의 추정기가 ${\widehat {\boldsymbol {\theta }}}_{LS}$ ${\widehat {\boldsymbol {\theta }}}_{LS}$ ${\widehat {\boldsymbol {\theta }}}_{LS}$ ${\$ 을(를) 지배한다는 것을 보여주었다. $LS}:$ 임의의 ${\widehat {\boldsymbol {\theta }}}_{LS}$ $m\geq 3$ $m\geq 3$ $m\geq 3$ ${\displaystyle m\geq$ 3 $m\geq 3$ 즉 제임스를 의미한다.–Stein Estimator는 항상 최대우도 추정기보다 낮은 평균 제곱 오차(MSE)를 달성한다.^[2]^[3]정의에 따르면, m $m\geq 3$ 3 ${\displaystyle m\geq$ 3 $m\geq 3$ 인 경우 최소 제곱 추정기가 허용되지 않는다.

$(m-2)\sigma ^{2}<\|{\mathbf {y} }\|^{2}$ - $(m-2)\sigma ^{2}<\|{\mathbf {y} }\|^{2}$ ) $(m-2)\sigma ^{2}<\|{\mathbf {y} }\|^{2}$ $(m-2)\sigma ^{2}<\|{\mathbf {y} }\|^{2}$ $(m-2)\sigma ^{2}<\|{\mathbf {y} }\|^{2}$ $(m-2)\sigma ^{2}<\|{\mathbf {y} }\|^{2}$ $(m-2)\sigma ^{2}<\|{\mathbf {y} }\|^{2}$ $(m-2)\sigma ^{2}<\|{\mathbf {y} }\|^{2}$ ${\$ 이 $(m-2)\sigma ^{2}<\|{\mathbf {y} }\|^{2}$ 추정기가 $\mathbf {y}$ 추정기 y ${\\\$ 을(으 $\mathbf {y}$ )로 축소하면 된다.사실 이것이 효과가 있는 유일한 수축 방향은 아니다. $치수$ m ${\displaystyle m$ 의 임의 고정 벡터가 되도록 한다.그리고 ν 쪽으로 움츠러드는 제임스-슈타인 유형의 추정자가 존재한다.

{\widehat}\\\symbol {\theta}}}{{JS}=\left(1-{\frac {(m-2)\sigma ^{2}}{\ {\mathbf {y} }-{\boldsymbol {\nu }}\ ^{2}}}\right)({\mathbf {y} }-{\boldsymbol {\nu }})+{\boldsymbol {\nu }},\qquad m\geq 3.

제임스–스테인 추정기가 모든 ν에 대해 일반적인 추정기를 지배한다.당연한 질문은 통상적인 추정기에 대한 개선이 ν의 선택과 무관한가 하는 것이다.대답은 '아니오'이다. $\|{{\boldsymbol {\theta }}-{\boldsymbol {\nu }}}\|$ - $\|{{\boldsymbol {\theta }}-{\boldsymbol {\nu }}}\|$ - $\|{{\boldsymbol {\theta }}-{\boldsymbol {\nu }}}\|$ ${$ {\ $displaystyle$ \{\ $boldsymbol {\theta}-{\\boldsymbol$ {\ $nu}}\}}}}}$ 이(가 $\|{{\boldsymbol {\theta }}-{\boldsymbol {\nu }}}\|$ ) 크면 개선이 적다.따라서 매우 큰 개선을 위해서는 θ의 위치에 대한 약간의 지식이 필요하다.물론 이것은 우리가 추정하려고 하는 수량이기 때문에 우리는 선험적인 지식을 가지고 있지 않다.하지만 우리는 평균 벡터가 무엇인지에 대해 어느 정도 추측해 볼 수 있을 것이다.이것은 추정자의 단점으로 간주될 수 있다: 그 선택은 연구자의 믿음에 의존할 수 있기 때문에 객관적이지 않다.그럼에도 불구하고 제임스와 스타인의 결과는 어떤 유한한 추측 ν이 최대우도 추정자보다 기대 MSE를 향상시킨다는 것인데, 이것은 무한대의 ν을 사용하는 것이나 다름없다는 것은 분명 서투른 추측이다.

해석

제임스 보기–경험적 베이즈 방법으로서의 Stein Estimator는 다음과 같은 결과에 약간의 직관을 준다.θ 자체는 사전 분포 ~ $\sim N(0,A)$ ( 0 $\sim N(0,A)$ , $\sim N(0,A)$ ) $\sim N (0,A)$ 을 $\sim N(0,A)$ 를) 갖는 랜덤 변수라고 가정하며, 여기서 A는 데이터 자체로부터 추정된다.치수 $m$ $m$ 이 $m$ (가) 충분히 크기 때문에 $m\leq 2$ $m\leq 2$ 2 ${\displaystyle$ m $\leq 2}$ 에서는 A 추정이 최대 우도 추정기와 비교하여 유리하다 $m\leq 2$ 제임스–Stein Estimator는 최대 우도 추정기를 지배하는 베이지안 추정기 클래스의 멤버다.^[4]

위에서 논의한 결과는 다음과 같은 직관에 반하는 결과다.3개 이상의 관련 없는 파라미터를 측정했을 때, James와 같은 조합된 추정기를 사용함으로써 이들의 총 MSE를 줄일 수 있다.–Stein Estimator. 각 모수를 개별적으로 추정할 때 최소 제곱(LS) 추정기는 허용된다.특이한 예로는 빛의 속도, 대만의 차 소비량, 몬태나주의 돼지 무게 등을 모두 추정하는 것이다.제임스–Stein Estimator는 항상 MSE 총계, 즉 각 구성 요소의 예상 오차 제곱 합계를 개선한다.따라서, 광속, 차 소비량, 그리고 돼지 무게를 측정하는 데 있어서 총 MSE는 제임스를 사용함으로써 향상될 것이다.–스테인 추정기그러나 특정 성분(빛의 속도 등)은 일부 매개변수 값에 대해 개선되고 다른 요소에 대해서는 악화된다.그러므로 비록 제임스가–Stein Estimator는 3개 이상의 파라미터가 추정될 때 LS Estimator를 지배하며, LS Estimator의 각 구성 요소를 지배하지 않는다.

이 가상의 예에서 결론은 총 MSE를 최소화하는 데 관심이 있는 경우 측정을 결합해야 한다는 것이다. 예를 들어, 통신 설정에서는 총 채널 추정 오류를 최소화하는 것이 목적이므로 채널 추정 시나리오에서 채널 탭 측정을 결합하는 것이 합리적이다.반대로, 평균 네트워크 성능을 향상시키기 위해 어떤 사용자도 그들의 채널 추정치가 악화되는 것을 원하지 않을 것이기 때문에, 다른 사용자의 채널 추정치를 결합하는 것에 반대할 수 있다.^{[citation needed]}

제임스–Stein Estimator는 또한 기본 양자 이론에서도 사용된다는 것을 발견했는데, 여기서 추정기는 3개 이상의 측정에 대해 등방성 불확실성 원리의 이론적 한계를 개선하는 데 사용되었다.^[5]

직관적인 파생과 해석은 갈토니아적 관점에 의해 주어진다.^[6]이 해석에 따르면, 불완전하게 측정된 표본 평균을 사용하여 모집단 평균을 예측하는 것을 목표로 한다.표본 평균에서 모집단 평균의 가상 회귀 분석에서 OLS 추정기의 방정식은 제임스-스타인 추정기(OLS 절편을 0으로 강제할 때) 또는 에프론-모리스 추정기(절편을 변화시킬 수 있도록 허용할 때)의 형태의 추정기를 제공한다.

개선사항

Despite the intuition that the James-Stein estimator shrinks the maximum-likelihood estimate ${\mathbf {y} }$ toward ${\boldsymbol {\nu }}$ , the estimate actually moves away from ${\boldsymbol {\nu }}$ for small values of ${\displaystyle \ {\mathb$ ${\mathbf {y} }-{\boldsymbol {\nu }}$ ${y}-{\symbol$ {\ $nu }\},$ ${\mathbf {y} }-{\boldsymbol {\nu }}$ - ${\mathbf {y} }-{\boldsymbol {\nu }}$ { ${\$ {\ $mathbf {y}-{\symbol$ {\ $nu}}}$ 에 대한 승수로는 $\|{\mathbf {y} }-{\boldsymbol {\nu }}\|,$ 음수가 ${\mathbf {y} }-{\boldsymbol {\nu }}$ 된다.이 승수는 음수일 때 0으로 교체하면 쉽게 교정할 수 있다.결과 추정기를 양의 부분 제임스라고 한다.–Stein Estimator가 제공되며

{\widehat}\\\symbol {\theta}}}{{JS+}=\left(1-{\frac {(m-3)\sigma ^{2}}{\ {\mathbf {y} }-{\boldsymbol {\nu }}\ ^{2}}}\right)^{+}({\mathbf {y} }-{\boldsymbol {\nu }})+{\boldsymbol {\nu }},m\geq 4.

이 추정기는 기본 제임스보다 리스크가 작다.–스테인 추정기그 뒤를 이어 기본 제임스가–스테인 추정기 자체는 허용되지 않는다.^[7]

그러나, 긍정적인 부분 추정자 역시 받아들일 수 없는 것으로 밝혀졌다.^[3]이는 수용 가능한 추정기가 매끄러워야 하는 일반적인 결과에서 비롯된다.

확장

제임스–스테인 추정기는 첫눈에 문제 설정의 일부 특수성의 결과로 보일 수 있다.사실, 추정자는 매우 광범위한 효과를 예시한다. 즉, "보통" 또는 최소 제곱 추정기가 종종 여러 매개변수의 동시 추정이 불가능하다는 사실이다.^{[citation needed]}이러한 효과는 스타인의 현상이라고 불리며, 몇 가지 다른 문제 설정에 대해 입증되었으며, 그 중 일부는 아래에 간략하게 요약되어 있다.

James and Stein demonstrated that the estimator presented above can still be used when the variance $\sigma ^{2}$ is unknown, by replacing it with the standard estimator of the variance, ${\displaystyle {\widehat {\sigma }}^{2}={\frac {1}{n}}\sum (y_{i}-{\ove$ $r라인{y}}^{2$ 지배 결과는 여전히 같은 조건, $m>2$ $m>2$ > 2 ${\displaystyle$ m $>2}$ 에서 유지된다 $m>2$ ^[2]
이 글의 결과는 단일 관측 벡터 y만 사용할 수 있는 경우를 위한 것이다. $n$ $n$ 벡터를 $n$ 사용할 수 있는 일반적인 경우 결과는 유사하다.^{[citation needed]}

{\widehat}\\\symbol {\theta}}}{{JS}=\왼쪽(1-{\frac {(m-2){\frac {\sigma ^{2}}}{n}}{\\overline {\mathbf {y}}}\{2}}\오른쪽){\overline {\mathbf {y}}}}}}},

여기서

{\overline {\mathbf {y} }}

{\

은(는)

n

n

관측치의

n

m

m

-길이

m

평균이다

{\overline {\mathbf {y} }}

.

James와 Stein의 작업은 일반 측정 공분산 행렬의 경우, 즉 측정치가 통계적으로 의존할 수 있고 분산이 다를 수 있는 경우까지 확장되었다.^[8]유사한 지배적 추정기를 적절히 일반화된 지배 조건과 함께 구성할 수 있다.이를 통해 LS 추정기의 표준 적용을 능가하는 선형 회귀 기법을 구성할 수 있다.^[8]
스타인의 결과는 광범위한 분포 및 손실 함수로 확장되었다.그러나, 이 이론은 명시적으로 지배적인 추정자들이 실제로 전시되지 않았다는 점에서, 존재의 결과만을 제공한다.^[9]기초적인 분포에 대한 특정한 제한 없이 통상적인 추정기로 개선되는 명시적 추정기를 얻는 것은 상당히 어렵다.^[3]

참고 항목

참조

^ ^a ^b Stein, C. (1956), "Inadmissibility of the usual estimator for the mean of a multivariate distribution", Proc. Third Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 197–206, MR 0084922, Zbl 0073.35602
^ ^a ^b ^c James, W.; Stein, C. (1961), "Estimation with quadratic loss", Proc. Fourth Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 361–379, MR 0133191
^ ^a ^b ^c Lehmann, E. L.; Casella, G. (1998), Theory of Point Estimation (2nd ed.), New York: Springer
^ Efron, B.; Morris, C. (1973). "Stein's Estimation Rule and Its Competitors—An Empirical Bayes Approach". Journal of the American Statistical Association. American Statistical Association. 68 (341): 117–130. doi:10.2307/2284155. JSTOR 2284155.
^ Stander, M. (2017), Using Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements, arXiv:1702.02440, Bibcode:2017arXiv170202440S
^ Stigler, Stephen M. (1990-02-01). "The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators". Statistical Science. 5 (1). doi:10.1214/ss/1177012274. ISSN 0883-4237.
^ Anderson, T. W. (1984), An Introduction to Multivariate Statistical Analysis (2nd ed.), New York: John Wiley & Sons
^ ^a ^b Bock, M. E. (1975), "Minimax estimators of the mean of a multivariate normal distribution", Annals of Statistics, 3 (1): 209–218, doi:10.1214/aos/1176343009, MR 0381064, Zbl 0314.62005
^ Brown, L. D. (1966), "On the admissibility of invariant estimators of one or more location parameters", Annals of Mathematical Statistics, 37 (5): 1087–1136, doi:10.1214/aoms/1177699259, MR 0216647, Zbl 0156.39401

추가 읽기

Judge, George G.; Bock, M. E. (1978). The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics. New York: North Holland. pp. 229–257. ISBN 0-7204-0729-X.

[stein-56-1] Stein, C. (1956), "Inadmissibility of the usual estimator for the mean of a multivariate distribution", Proc. Third Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 197–206, MR 0084922, Zbl 0073.35602

[james-stein-61-2] James, W.; Stein, C. (1961), "Estimation with quadratic loss", Proc. Fourth Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 361–379, MR 0133191

[lehmann-casella-98-3] Lehmann, E. L.; Casella, G. (1998), Theory of Point Estimation (2nd ed.), New York: Springer

[4] Efron, B.; Morris, C. (1973). "Stein's Estimation Rule and Its Competitors—An Empirical Bayes Approach". Journal of the American Statistical Association. American Statistical Association. 68 (341): 117–130. doi:10.2307/2284155. JSTOR 2284155.

[stander-17-5] Stander, M. (2017), Using Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements, arXiv:1702.02440, Bibcode:2017arXiv170202440S

[6] Stigler, Stephen M. (1990-02-01). "The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators". Statistical Science. 5 (1). doi:10.1214/ss/1177012274. ISSN 0883-4237.

[Anderson-84-7] Anderson, T. W. (1984), An Introduction to Multivariate Statistical Analysis (2nd ed.), New York: John Wiley & Sons

[bock75-8] Bock, M. E. (1975), "Minimax estimators of the mean of a multivariate normal distribution", Annals of Statistics, 3 (1): 209–218, doi:10.1214/aos/1176343009, MR 0381064, Zbl 0314.62005

[brown66-9] Brown, L. D. (1966), "On the admissibility of invariant estimators of one or more location parameters", Annals of Mathematical Statistics, 37 (5): 1087–1136, doi:10.1214/aoms/1177699259, MR 0216647, Zbl 0156.39401

[2]

[1]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Search

제임스-슈타인 추정기

네임스페이스

더

목차

설정

제임스–스테인 추정기

해석

개선사항

확장

참고 항목

참조

추가 읽기