그룹화된 디리클레 분포

통계에서 그룹화된 디리클레 분포(GDD)는 디리클레 분포의 다변량 일반화(Ng et al. 2008)에 의해 처음 설명되었다.^[1]그룹화된 디리클레 분포는 일부 관측치가 다른 '크리스프' 범주의 집합에 속할 수 있는 범주형 데이터의 분석에서 발생한다.예를 들어, 두 가지 다른 조건에서 사례와 대조군으로 구성된 데이터 세트를 가질 수 있다.완전한 데이터로 질병 상태의 교차 분류는 세포 확률을 가진 2(사례/통제)-x-(조건/조건 없음) 표를 형성한다.

	치료	치료 금지
컨트롤	θ₁	θ₂
경우들	θ₃	θ₄

그러나 데이터에 통제나 사례로 알려진 비응답자가 포함된 경우 질병 상태의 교차 분류는 2-x-3 표를 형성한다.마지막 열의 확률은 각 행의 처음 두 열의 확률을 합한 것이다. 예를 들어,

	치료	치료 금지	실종
컨트롤	θ₁	θ₂	θ₁+θ₂
경우들	θ₃	θ₄	θ₃+θ₄

GDD는 그러한 집적 조건에서 세포 확률을 완전히 추정할 수 있도록 한다.^[1]

확률 분포

Consider the closed simplex set ${\displaystyle {\mathcal {T}}_{n}=\left\{\left(x_{1},\ldots x_{n}\right)\left x_{i}\geq 0,i=1,\cdots ,n,\sum _{i=1}^{n}x_{n}=1\right.$ $\right\}}$ and $\mathbf {x} \in {\mathcal {T}}_{n}$ . Writing $\mathbf {x} _{-n}=\left(x_{1},\ldots ,x_{n-1}\right)$ for the first $n-1$ elements of a member of ${\displaystyle {\mathcal {T}}_{$ $n$ 두 파티션에 $\mathbf {x}$ x ${\$ 의 분포에 다음이 제공하는 밀도 함수가 있음

\operatorname {GD} _{n,2,s}\왼쪽(\왼쪽)\mathbf {x} _{-n}\right \mathbf {a} ,\mathbf {b} \right)={\frac {\left(\prod _{i=1}^{n}x_{i}^{a_{i}-1}\right)\cdot \left(\sum _{i=1}^{s}x_{i}\right)^{b_{1}}\cdot \left(\sum _{i=s+1}^{n}x_{i}\right)^{b_{2}}}{\operatorname {\mathrm {B} } \left(a_{1},\ldots ,a_{s}\right)\cdot \operatorname {\mathrm {B} } \left(a_{s+1},\ldots ,a_{n}\right)\cdot \operatorname {\mathrm {B} \left(b_{1}+\sum _{i}a_{i}a_{n2}+\sum _{i=s+1}^{n}a_{i}}\right)}}

$\operatorname {\mathrm {B} } \left(\mathbf {a} \right)$ 서 B $\operatorname {\mathrm {B} } \left(\mathbf {a} \right)$ ( $\operatorname {\mathrm {B} } \left(\mathbf {a} \right)$ ) ${\$ 는 $\operatorname {\mathrm {B} } \left(\mathbf {a} \right)$ 다변량 베타 함수다.

Ng 외.^[1]는 계속해서 $\mathbf {x} _{-n}$ - $\mathbf {x} _{-n}$ ${\$ { $x} _{-n}$ 의 밀도로 그룹화된 Diriclet 분포를 정의했다.

\operatorname {GD} _{n,m,\mathbf {s}}\왼쪽(\왼쪽)\mathbf {x} _{-n}\right \mathbf {a} ,\mathbf {b} \right)=c_{m}^{-1}\cdot \left(\prod _{i=1}^{n}x_{i}^{a_{i}-1}\right)\cdot \prod _{j=1}^{m}\left(\sum _{k=s_{j-1}+1}^{s_{j}}x_{k}\right)^{b_{j}}

where $\mathbf {s} =\left(s_{1},\ldots ,s_{m}\right)$ is a vector of integers with $0=s_{0}<s_{1}\leqslant \cdots \leqslant s_{m}=n$ .정규화 상수는 다음과 같다.

c_{m}=\left\{\prod _{j=1}^{m}\operatorname {\mathrm {B} } \left(a_{s_{j-1}+1},\ldots ,a_{s_{j}}\right)\right\}\cdot \operatorname {\mathrm {B} } \left(b_{1}+\sum _{k=1}^{s_{1}}a_{k},\ldots ,b_{m}+\sum _{k=s_{m-1}+1}^{s_{m}}a_{k}\right)

저자들은 의학의 세 가지 다른 적용의 맥락에서 이러한 분포를 계속 사용하였다.

참조

^ ^a ^b ^c Ng, Kai Wang (2008). "Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis". Journal of Multivariate Analysis. 99: 490–509.

[ng2008-1] Ng, Kai Wang (2008). "Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis". Journal of Multivariate Analysis. 99: 490–509.

[1]

Search

그룹화된 디리클레 분포

네임스페이스

더

확률 분포

참조