최대 사후 추정치

베이지안 통계에서 최대 사후 확률(MAP) 추정치는 사후 분포의 모드와 동일한 미지의 수량에 대한 추정치이다.MAP은 경험적 데이터를 기반으로 관측되지 않은 수량의 점 추정치를 얻는 데 사용할 수 있다.이것은 최대우도(ML) 추정 방법과 밀접하게 관련되어 있지만, 추정하고자 하는 수량에 대해 사전 분포(관련 사건에 대한 사전 지식을 통해 이용 가능한 추가 정보를 수량화하는)를 통합하는 증강 최적화 목표를 사용한다.따라서 MAP 추정은 최대우도 추정의 정규화로 볼 수 있습니다.

묘사

$관측치$ x(\ $displaystyle$ x $x$ 를 바탕으로 관측되지 않은 모집단 모수 $\theta$ {\(\ $displaystyle \theta)$ 를 $\theta$ 추정한다고 가정합니다.f {\ $displaystyle$ f $}$ 를 $f$ x(\ $displaystyle$ x $x$ 의 $샘플링$ 분포로 하여 f $))$ 를x(\ $displaystyle f($ x\ $mid$ \theta $)$ 의 $확률$ 로 $f(x\mid \theta )$ . $aystyle$ x}(기본 $x$ 모집단 $\theta$ 가 $\theta$ \ $displaystyle$ $theta$ 일 경우) 함수:

\displaystyle \theta \mapsto f(x\mid \theta)\!}

우도 함수 및 추정치로 알려져 있습니다.

{\displaystyle\hat\theta }_{\mathrm {MLE}(x)={\operatorname {max}}\f(x\mid \theta}\!}

는 $"\$ display \ $theta$ 의 최대우도 추정치입니다.

이제 $§$ 에 대한 사전 $배포$ g $(\displaystyle$ $\theta)$ 가 $g$ $\theta$ 존재한다고 가정합니다.이를 통해 $\theta$ "\ $displaystyle \theta"$ 를 $\theta$ 베이지안 통계와 같이 랜덤 변수로 취급할 수 있습니다.베이즈 정리를 사용하여 $\theta$ (\ $displaystyle \theta)$ 의 $\theta$ 후방 분포를 계산할 수 있습니다.

\theta \mapsto f(\theta \mid x)=param frac {f(x\mid \theta ),g(\theta)}{\displaystyle \int _{\\displaystyle\intTheta }f(x\mid \vartheta), g(\vartheta), d\vartheta }}\!

$g$ 서g {\ $displaystyle$ $\theta$ g}는 $g$ ${\$ {\ $displaystyle$ \theta $}$ 의 $\Theta$ 밀도 $\Theta$ 이고 $style {\$ displaystyle \ $Theta$ }는 g {\ $displaystyle$ g $g$ 의 도메인입니다.

최대 사후 추정 방법은 이 랜덤 변수의 사후 분포 모드로서 $\theta$ (\ $displaystyle \theta)$ 를 $\theta$ 추정한다.

{\hat {\theta } {\mathrm {MAP} }(x) = {\operatorname {\theta,max} } {\theta \mid x} = {\theta } {\operatorname {\max} } {\frac} {\f}Theta }f(x\mid \vartheta ), g(\vartheta ), d\vartheta }}\\&= 언더셋 {\operatorname {set,max}}}\f(x\mid \theta),g(\theta)\end{aligned}}\!

후방 분포의 분모(일명 한계우도)는 항상 양의 값이며 $(\display$ \ $theta)$ 에 $\theta$ 의존하지 않으므로 최적화에 아무런 역할을 하지 않는다. $이전$ g\ $displaystyle$ g가 균일한 경우 $($ 즉 $g$ , $\displaystyle$ g는 $g$ 상수 함수임)의 MAP $\theta$ $§$ { $displaystyle \theta}$ 가 $\theta$ ML 추정치와 일치하는지 확인합니다.

손실 함수가 형식인 경우

L(\theta,a)=cases{case}0,&{\text{if}}} a-\theta <c,\1,&{\text{cases}},\\end{cases}}}

$\theta$ {\ $displaystyle$ c $}$ 가 $c$ 0이 $되면$ Bayes 추정기가 MAP 추정기에 접근합니다 $(\displaystyle \theta}$ 분포가 $\theta$ ^[1]준오목형인 경우).그러나 $\theta$ 으로 MAP 추정기는 § $\theta$ { $displaystyle \theta}$ 가 이산적이지 $\theta$ 않는 한 Bayes 추정기가 아닙니다.

계산

MAP 추정치는 여러 가지 방법으로 계산할 수 있습니다.

해석적으로, 후방 분포의 모드를 닫힌 형태로 제공할 수 있는 경우.이것은 켤레 프리어를 사용하는 경우입니다.
공역 구배법이나 뉴턴의 방법과 같은 수치 최적화를 통해.여기에는 일반적으로 분석적 또는 수치적으로 평가해야 하는 첫 번째 또는 두 번째 파생상품이 필요합니다.
기대 최대화 알고리즘의 변경을 통해서.이것은 후방 밀도의 유도체를 필요로 하지 않는다.
시뮬레이션 어닐링을 사용한 몬테카를로 방법 사용

제한 사항

MAP 추정이 베이즈 추정의 제한 사례(0-1 손실 함수)^[1]가 되기 위해서는 가벼운 조건만 요구되지만, 일반적으로 베이지안 방법을 매우 대표하지는 않는다.이는 MAP 추정치가 점 추정치인 반면, 베이지안 방법은 데이터를 요약하고 추론을 도출하기 위해 분포를 사용하는 것이 특징이기 때문이다. 따라서 베이지안 방법은 신뢰할 수 있는 구간과 함께 후방 평균 또는 중위수를 대신 보고하는 경향이 있다.이는 이러한 추정치가 각각 일반적인 손실 함수를 더 잘 대표하는 제곱 오류 및 선형 오류 손실 각각에 최적화되어 있기 때문이며, 연속적인 후방 분포의 경우 MAP가 최적의 점 추정기임을 시사하는 손실 함수가 없기 때문이다.또한, 후방 분포는 종종 단순한 분석 형태를 가지지 않을 수 있다. 이 경우, 분포를 마르코프 연쇄 몬테 카를로 기술을 사용하여 시뮬레이션할 수 있는 반면, 그 모드를 찾기 위한 최적화는 어렵거나 ^{[citation needed]}불가능할 수 있다.

가장 높은 모드의 분포가 대부분의 분포에서 특징적이지 않은 이원 분포 밀도의 예제

혼합물 모형과 같은 여러 유형의 모형에서 후면은 다중 모형일 수 있습니다.이러한 경우, 통상, 최고 모드를 선택할 필요가 있습니다.이것은 항상 실현 가능한 것은 아닙니다(글로벌 최적화는 어려운 문제입니다).또한 경우에 따라서는(식별성 문제가 발생했을 경우 등).또한, 최고 모드는 후방의 대부분에서 특징적이지 않을 수 있다.

마지막으로 ML 추정치와 달리 MAP 추정치는 재매개변수화 하에서 불변하지 않는다.매개 변수화 간에 전환하려면 최대값 위치에 영향을 미치는 ^[2]Jacobian을 도입해야 합니다.

위에서 언급한 베이즈 추정치(평균 추정치와 중위 추정치)와 MAP 추정치의 차이의 예로서 $입력$ x{\ $style$ x $}$ 를 $x$ 양수 또는 음수 중 하나로 분류할 필요가 있는 경우(예: 대출은 위험 또는 안전)를 고려한다.h $({$ $h_{3}$ 2 $({$ 및 $h_{3}$ 3({ $displaystyle h_{$ 3 $h_{3}$ })의 $h_{1}$ 분류 방법에 대한 가설이 각각 0.4, 0.3 및 0.3이라고 가정합니다. $h_{1}$ $인스턴스$ x {\ $displaystyle$ x $x$ $h_{1}$ 1 {\ $displaystyle h_{$ 1 $h_{1}$ }}이(가) 양으로 분류되고 나머지 2개는 음으로 분류된다고 가정합니다.올바른 $h_{1}$ $h_{1}$ 1 $(\$ 에 대한 MAP 추정치를 사용하면x(\ $displaystyle$ x)는 $x$ $x$ 로 분류되지만, Bayes 추정치는 모든 가설에 대해 평균을 내고 x $(\displaystyle$ x $)$ 를 $x$ 음수로 $x$ 합니다.

예

IID $N(\mu ,\sigma _{v}^{2})$ , † $N(\mu ,\sigma _{v}^{2})$ 2)의 $(x_{1},\dots ,x_{n})$ 시퀀스 $(x_{1},\dots ,x_{n})$ $(x_{1},\dots ,x_{n})$ 1, $(x_{1},\dots ,x_{n})$ , $(x_{1},\dots ,x_{n})$ ) { $displaystyle$ ( $x$ _ {1} , \ $dots$ , x $_$ { $n$ } ) ${$ $displaystyle$ N ( \ $N(\mu _{0},\sigma _{m}^{2})$ $N(\mu ,\sigma _{v}^{2})$ , \ $sigma$ _ { $v$ $}^{$ $2}$ } } ${\$ $N(\mu _{0},\sigma _{m}^{2})$ variables variables variables and $\mu$ and and and and and and and and and and and and and and and and and and and and and and and and and and and $N(\mu _{0},\sigma _{m}^{2})$ and and and and and and and and and and and and and and2 $N(\mu _{0},\sigma _{m}^{2})$ and and and and and and and and and and and and and and and and and and and and $N(\mu _{0},\sigma _{m}^{2})$ MAP $\mu$ 를 $μ{displaystyle$ \mu $\mu$ 로 구하고자 하며, 정규분포는 그 이전의 자체 켤레이므로 분석적으로 닫힌 형태의 해법을 찾을 수 있습니다 $.$

최대화할 함수는 다음과 같이 주어진다.

f(x\mu)f(x\mid \mu)=\pi(\mu)L(\mu)=parcfrac {2\pi}}\exp \leftfrac {1}{2}\frac\frec\frac {1}{{{\mu}{{\mu}}}{\right}^{{{{{}}}}}}^{{{{{{\}}}}}}}}}}}}}}{\frightfright}}}}}}

이는 $\mu$ μ {\ $displaystyle \mu$ 의 $\mu$ 함수를 최소화하는 것과 같습니다.

\displaystyle \sum _{j=1}^{n}\leftfrac {x_{j}-\mu}{\right}{\frac\frac {mu -\mu _{0}}{{2}+\frac\fright}{{m}}}.

따라서 μ에 대한 MAP 추정치는 다음과 같이 표시됩니다.

{\mathrm {MAP} = frac _ {m}^{2},n} {\flac _{v}^2},n+\flac _{1}{n}\sum _{j}^{j}{\frac {\fr}_frac {\fr}.

이전 평균과 표본 평균 사이의 선형 보간으로 판명되었습니다.

$\sigma _{m}\to \infty$ m $\sigma _{m}\to \infty$ $（$ \ $displaystyle$ $\sigma _{m}\to \infty$ \ ） $hathat$ 、 { $m }$ 、 \ $infty$ }의 $\sigma _{m}\to \infty$ 경우는 비타협적 priority라고 불리며, ${\hat {\mu }}_{\mathrm {MAP} }\to {\hat {\mu }}_{\mathrm {ML} }.$ ${\hat {\mu }}_{\mathrm {MAP} }\to {\hat {\mu }}_{\mathrm {ML} }.$ μ ${\hat {\mu }}_{\mathrm {MAP} }\to {\hat {\mu }}_{\mathrm {ML} }.$ ^ ${\hat {\mu }}_{\mathrm {MAP} }\to {\hat {\mu }}_{\mathrm {ML} }.$ P ${\hat {\mu }}_{\mathrm {MAP} }\to {\hat {\mu }}_{\mathrm {ML} }.$ ${\hat {\mu }}_{\mathrm {MAP} }\to {\hat {\mu }}_{\mathrm {ML} }.$ ^ ${\hat {\mu }}_{\mathrm {MAP} }\to {\hat {\mu }}_{\mathrm {ML} }.$ L $.$ { $displaystyle$ \ $mu$ } {\ $mathrm { map } } 。$

레퍼런스

^ ^a ^b Bassett, Robert; Deride, Julio (2018-01-30). "Maximum a posteriori estimators as a limit of Bayes estimators". Mathematical Programming: 1–16. arXiv:1611.05917. doi:10.1007/s10107-018-1241-0. ISSN 0025-5610.
^ Murphy, Kevin P. (2012). Machine learning : a probabilistic perspective. Cambridge, Massachusetts: MIT Press. pp. 151–152. ISBN 978-0-262-01802-9.

DeGroot, M. (1970). Optimal Statistical Decisions. McGraw-Hill. ISBN 0-07-016242-5.
Sorenson, Harold W. (1980). Parameter Estimation: Principles and Problems. Marcel Dekker. ISBN 0-8247-6987-2.
Hald, Anders (2007). "Gauss's Derivation of the Normal Distribution and the Method of Least Squares, 1809". A History of Parametric Statistical Inference from Bernoulli to Fisher, 1713–1935. New York: Springer. pp. 55–61. ISBN 978-0-387-46409-1.

[:0-1] Bassett, Robert; Deride, Julio (2018-01-30). "Maximum a posteriori estimators as a limit of Bayes estimators". Mathematical Programming: 1–16. arXiv:1611.05917. doi:10.1007/s10107-018-1241-0. ISSN 0025-5610.

[2] Murphy, Kevin P. (2012). Machine learning : a probabilistic perspective. Cambridge, Massachusetts: MIT Press. pp. 151–152. ISBN 978-0-262-01802-9.

[1]

[2]

Search

최대 사후 추정치

네임스페이스

더

목차

묘사

계산

제한 사항

예

레퍼런스