영향력 있는 관찰

안스콤베의 4중주곡에서 하단에 있는 두 데이터 집합은 모두 영향력 있는 포인트를 포함하고 있다.단순 요약 통계로 조사하면 4개 세트가 모두 동일하지만 그래프로 조사하면 상당히 달라진다.한 점을 빼면 선이 아주 달라 보일 것이다.

통계에서 영향력 있는 관측치는 데이터 집합에서 삭제하면 계산 결과가 눈에 띄게 변경되는 통계적 계산에 대한 관측이다.^[1]특히 회귀 분석에서 영향력 있는 관측치는 삭제하는 것이 모수 추정치에 큰 영향을 미치는 관측치다.^[2]

평가

영향력 측정을 위한 다양한 방법이 제안되었다.^[3]^[4]Assume an estimated regression $\mathbf {y} =\mathbf {X} \mathbf {b} +\mathbf {e}$ , where $\mathbf {y}$ is an n×1 column vector for the response variable, $\mathbf {X}$ is the n×k design matrix of explanatory variables (including a constant), $\mathbf {e}$ $\mathbf {e}$ is the n×1 residual vector, and $\mathbf {b}$ is a k×1 vector of estimates of some population parameter $\mathbf {\beta } \in \mathbb {R} ^{k}$ . Also define ${\displaystyle \mathbf {H} \equiv \m$ $atsbf {X} \left(\mathbf {X} ^{\mathsf {T}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}$ 의 투영 매트릭스인 $\mathbf {H} \equiv \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}$ X ${\\$ { $X}$ 그러면 다음과 같은 영향을 받게 된다 $\mathbf {X}$

${\$ $ETA}}_{i}\equiv \mathbf {b} -\mathbf {b} _{(-i)}={\frac {\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {x} _{i}^{\mathsf {T}}e_{i}}{1-h_{i\cdot }}}}$ , where $\mathbf {b} _{(-i)}$ denotes the coefficients estimated with the i-th row $\mathbf {x} _{i}$ of $\mathbf {X}$ $\mathbf {X}$ deleted, $h_{i\cdot }=\mathbf {x} _{i}\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {x} _{i}^{\mathsf {T}}$ denotes the i-th row of $\mathbf {H}$ . Thus DFBETA measures the differe영향력 있는 점을 포함하거나 포함하지 않은 각 모수 추정치.각 변수와 각 관측치에 대해 DFBETA가 있다(N 관측치와 k 변수가 있는 경우 N·k DFBTA가 있다).^[5]표에는 안스콤베 4중주단의 세 번째 데이터 집합(그림의 왼쪽 아래 차트)에 대한 DFBEA가 표시된다.

x	y	가로채다	경사지게 하다
10.0	7.46	-0.005	-0.044
8.0	6.77	-0.037	0.019
13.0	12.74	-357.910	525.268
9.0	7.11	-0.033	0
11.0	7.81	0.049	-0.117
14.0	8.84	0.490	-0.667
6.0	6.08	0.027	-0.021
4.0	5.39	0.241	-0.209
12.0	8.15	0.137	-0.231
7.0	6.42	-0.020	0.013
5.0	5.73	0.105	-0.087

DFFITS - 적합치 차이
Cook의 D는 결합된 모든 파라미터에 대한 데이터 포인트 제거의 효과를 측정한다.^[2]

특이치, 레버리지 및 영향력

특이치는 다른 관측치와 유의하게 다른 데이터 점으로 정의할 수 있다.^[6]^[7]레버리지가 높은 지점은 독립 변수의 극단값에서 이루어진 관측이다.^[8]두 유형의 비정형 관측치 모두 회귀선이 점 가까이에 있어야 한다.^[2]안스콤비의 사중주에서는 오른쪽 아래 이미지의 지렛대가 높은 포인트가 있고 왼쪽 아래 이미지의 바깥쪽 포인트가 있다.

참고 항목

참조

^ Burt, James E.; Barber, Gerald M.; Rigby, David L. (2009), Elementary Statistics for Geographers, Guilford Press, p. 513, ISBN 9781572304840.
^ ^a ^b ^c Everitt, Brian (1998). The Cambridge Dictionary of Statistics. Cambridge, UK New York: Cambridge University Press. ISBN 0-521-59346-8.
^ Winner, Larry (March 25, 2002). "Influence Statistics, Outliers, and Collinearity Diagnostics".
^ Belsley, David A.; Kuh, Edwin; Welsh, Roy E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley & Sons. pp. 11–16. ISBN 0-471-05856-4.
^ "Outliers and DFBETA" (PDF). Archived (PDF) from the original on May 11, 2013.
^ Grubbs, F. E. (February 1969). "Procedures for detecting outlying observations in samples". Technometrics. 11 (1): 1–21. doi:10.1080/00401706.1969.10490657. An outlying observation, or "outlier," is one that appears to deviate markedly from other members of the sample in which it occurs.
^ Maddala, G. S. (1992). "Outliers". Introduction to Econometrics (2nd ed.). New York: MacMillan. pp. 89. ISBN 978-0-02-374545-4. An outlier is an observation that is far removed from the rest of the observations.
^ Everitt, B. S. (2002). Cambridge Dictionary of Statistics. Cambridge University Press. ISBN 0-521-81099-X.

추가 읽기

Dehon, Catherine; Gassner, Marjorie; Verardi, Vincenzo (2009). "Beware of 'Good' Outliers and Overoptimistic Conclusions". Oxford Bulletin of Economics and Statistics. 71 (3): 437–452. doi:10.1111/j.1468-0084.2009.00543.x.
Kennedy, Peter (2003). "Robust Estimation". A Guide to Econometrics (Fifth ed.). Cambridge: The MIT Press. pp. 372–388. ISBN 0-262-61183-X.

[1] Burt, James E.; Barber, Gerald M.; Rigby, David L. (2009), Elementary Statistics for Geographers, Guilford Press, p. 513, ISBN 9781572304840.

[Everitt-2] Everitt, Brian (1998). The Cambridge Dictionary of Statistics. Cambridge, UK New York: Cambridge University Press. ISBN 0-521-59346-8.

[3] Winner, Larry (March 25, 2002). "Influence Statistics, Outliers, and Collinearity Diagnostics".

[4] Belsley, David A.; Kuh, Edwin; Welsh, Roy E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley & Sons. pp. 11–16. ISBN 0-471-05856-4.

[5] "Outliers and DFBETA" (PDF). Archived (PDF) from the original on May 11, 2013.

[6] Grubbs, F. E. (February 1969). "Procedures for detecting outlying observations in samples". Technometrics. 11 (1): 1–21. doi:10.1080/00401706.1969.10490657. An outlying observation, or "outlier," is one that appears to deviate markedly from other members of the sample in which it occurs.

[7] Maddala, G. S. (1992). "Outliers". Introduction to Econometrics (2nd ed.). New York: MacMillan. pp. 89. ISBN 978-0-02-374545-4. An outlier is an observation that is far removed from the rest of the observations.

[8] Everitt, B. S. (2002). Cambridge Dictionary of Statistics. Cambridge University Press. ISBN 0-521-81099-X.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Search

영향력 있는 관찰

네임스페이스

더

목차

평가

특이치, 레버리지 및 영향력

참고 항목

참조

추가 읽기