커널 밀도 추정

서로 다른 평활 대역폭을 사용하여 정규 분포를 따르는 100개의 랜덤 숫자에 대한 커널 밀도 추정.

통계에서 커널 밀도 추정(KDE)은 랜덤 변수의 확률 밀도 함수를 추정하는 비모수 방법이다. 커널 밀도 추정은 유한한 데이터 표본을 바탕으로 모집단에 대한 추론이 이루어지는 근본적인 데이터 평활 문제다. 신호 처리와 계량학 같은 일부 분야에서는 에마누엘 파젠과 머레이 로젠블라트의 이름을 따서 파젠-로센블랫 창법으로 부르기도 하는데, 보통 현재의 형태로 독립적으로 창안한 것으로 인정받고 있다.^[1]^[2] 커널 밀도 추정의 유명한 응용 프로그램 중 하나는 순진한 베이즈 분류기를 사용할 때 데이터의 클래스 조건 한계 밀도를 추정하는 것인데,^[3]^[4] 이것은 예측 정확도를 향상시킬 수 있다.^[3]

정의

let₁_n (x₂, x, ..., x)은 독립적이며 주어진 지점 x에서 알 수 없는 밀도 ƒ을 가진 어떤 일변량 분포로부터 추출한 표본을 동일하게 분포시킨다. 우리는 이 함수 ƒ의 형상을 추정하는데 관심이 있다. 그것의 커널 밀도 추정기는

{\widehat {f}}_{h}(x)={\frac {1}{n}}\sum _{i=1}^{n}K_{h}(x-x_{i})={\frac {1}{nh}}\sum _{i=1}^{n}K{\Big (}{\frac {x-x_{i}}{h}}{\Big )},

여기서 K는 커널이고 - 음이 아닌 함수 - h > 0은 대역폭이라고 하는 스무딩 파라미터다. 첨자 h가 있는 커널을 스케일링 커널이라고 하며 K_h(x) = 1/h K(x/h)로 정의한다. 직관적으로 사람들은 데이터가 허용하는 한 작은 h를 선택하기를 원한다. 그러나 추정기의 편향과 그것의 분산 사이에는 항상 절충이 있다. 대역폭의 선택은 아래에서 더 자세히 논의된다.

다양한 커널 함수가 일반적으로 사용된다: 균일, 삼각, 비급, 트라이급, 에판치니코프, 정상 등. 에판치니코프 커널은 이전에 열거한 커널에 비해 효율성의 손실은 ^[5]작지만 평균 제곱 오차 관점에서 최적이다.^[6] 수학적 특성이 편리하기 때문에 보통 커널이 자주 사용되는데, 여기서 ϕ은 표준 정상 밀도 함수인 K(x) = ϕ(x)를 의미한다.

커널 밀도 추정치의 구성은 밀도 추정 외의 분야에서 해석을 찾는다.^[7] 예를 들어, 열역학에서 이것은 각 데이터 지점 위치에 열 커널(열 방정식의 기본 해결책)을 배치할 때 발생하는 열의 양과_i 동등하다. 다지관 학습을 위해 점 구름 위에 이산 라플라스 연산자를 구성하는 유사한 방법이 사용된다(예: 확산 지도).

예

커널 밀도 추정치는 히스토그램과 밀접한 관련이 있지만 적합한 커널을 사용함으로써 부드러움이나 연속성 등의 속성을 부여할 수 있다. 아래 다이어그램은 이 6가지 데이터 포인트를 바탕으로 다음과 같은 관계를 보여준다.

샘플	1	2	3	4	5	6
가치	-2.1	-1.3	-0.4	1.9	5.1	6.2

히스토그램의 경우 먼저 수평 축은 데이터의 범위를 포함하는 하위 간격 또는 빈으로 구분된다. 이 경우 폭 2의 각 빈은 6개다. 데이터 포인트가 이 간격 안에 들어갈 때마다 높이 1/12의 상자가 그곳에 놓여진다. 둘 이상의 데이터 점이 같은 빈에 들어가면 상자들이 서로 위로 쌓인다.

커널 밀도 추정치의 경우 표준 편차 2.25(빨간색 점선으로 표시)를 가진 일반 커널이 각 데이터 포인트 x에_i 배치된다. 커널을 합쳐서 커널 밀도 추정치(파란색 단단한 곡선)를 만든다. 커널 밀도 추정치의 부드러움(히스토그램의 불명확성과 비교)은 커널 밀도 추정치가 연속 랜덤 변수에 대한 진정한 기본 밀도로 어떻게 더 빨리 수렴되는지 보여준다.^[8]

동일한 데이터를 사용하여 생성된 히스토그램(왼쪽)과 커널 밀도 추정치(오른쪽)의 비교. 6개의 개별 커널은 빨간색 점선 곡선으로, 커널 밀도는 파란색 커브를 추정한다. 데이터 포인트는 수평 축의 깔개 그림이다.

대역폭 선택

표준 정규 분포에서 랜덤 표본의 대역폭이 100포인트인 KDE(커널 밀도 추정치) 회색: 참 밀도(표준 정상). 빨간색: kDE with h=0.05 검정: KDE(h=0.337) 녹색: kDE with h=2.

커널의 대역폭은 결과 추정에 강한 영향을 미치는 자유 매개변수다. 그 효과를 설명하기 위해, 우리는 표준 정규 분포로부터 시뮬레이션된 무작위 표본을 추출한다(수평 축의 깔개 그림의 파란색 스파이크에 표시). 회색 곡선은 참 밀도(평균 0과 분산 1)이다. 이에 비해 빨간색 곡선은 너무 작은 대역폭 h = 0.05를 사용함으로써 발생하는 너무 많은 모의 데이터 아티팩트가 포함되어 있기 때문에 밑도는 것이다. 대역폭 h = 2를 사용하면 기초 구조의 많은 부분이 가려지기 때문에 녹색 곡선이 지나치게 부풀어 있다. 대역폭이 h = 0.337인 검은색 곡선은 밀도 추정치가 실제 밀도에 가깝기 때문에 최적으로 평활화된 것으로 간주된다. $h\to 0$ 상황은 $h\to 0$ h $h\to 0$ → 0 ${\displaystyle$ h $\to 0}($ 평활 없음)에서 발생한다. 여기서 추정치는 분석된 샘플의 좌표를 중심으로 한 n개의 델타 함수의 합이다. 다른 극단 한계 $h\to \infty$ → $h\to \infty$ $h\to \infit$ 에서 추정치는 $h\to \infty$ 사용된 커널의 모양을 유지하며, 표본의 평균을 중심으로 한다(완전히 매끄럽다).

이 매개변수를 선택하는 데 사용되는 가장 일반적인 최적성 기준은 기대 L₂ 위험 함수로, 평균 통합 제곱 오차라고도 한다.

\operatorname {MISE}(h)=\operatorname {E}\!\왼쪽[\,\int({\hat {f}_{h}(x)-f(x))^{2}\,dx\right]

ƒ과 K에 대한 약한 가정 하에서, (일반적으로 알 수 없는, 실제 밀도 함수)^[1]^[2]

\operatorname {MISE}(h)=\operatorname {AMISE}(h)+{\mathcal {o}(nh)^{-1}+h^{4}}

여기서 o는 작은 o 표기법이고, n 표본 크기(위)이다. AMISE는 점증적 MISE, 즉 두 개의 주요 용어,

\operatorname {AMISE}(h)={\frac {R(K)}{nh}+{4}m_{2}(K)^{2}h^{4}R(f')}

where $R(g)=\int g(x)^{2}\,dx$ for a function g, $m_{2}(K)=\int x^{2}K(x)\,dx$ and $f''$ is the second derivative of $f$ . 이 AMISE의 최소값은 이 미분 방정식의 해결책이다.

{\frac {\partial h}{\partial h}\operatorname {AMISE}(h)=-{\frac {R(K)}{nh^{2}}+m_{2}(K)^{2}h^{3(f')=0

또는

{\displaystyle h_{\operatorname {AMISE}}}={\frac {R(K)^{1/5}{2/5}R(f")^{1/5}{1/5}}^{1/5}n^{1/5}=Cn^{-1/5}}}}}:{1/5}}}}}}}}}}}}}}}}}.

AMISE나 h_AMISE 공식은 알려지지 않은 $밀도함수$ f $[\displaystyle f}$ 또는 $f$ 두 번째 $파생상품$ f $″[\$ 를 $f''$ 포함하기 때문에 직접 사용할 수 없기 때문에 대역폭 선택을 위한 다양한 자동 데이터 기반 방법이 개발되었다. 플러그인 셀렉터와 [7]^[16]^[17] 교차 검증^[18]^[19]^[20] 셀렉터가 광범위한 데이터 세트에서 가장 유용하다는 일반적인 공감대와 함께 그 효율성을 비교하기 위해 많은 검토 연구가 수행되었다.^[9]^[10]^[11]^[12]^[13]^[14]^[15]

h와_AMISE 점근법이 동일한^−1/5 대역폭 h를 AMISE로 대체하면 AMISE(h) = O(n^−4/5), 여기서 O는 큰 o 표기법이다. 약한 가정에서는 커널 추정기보다 빠른 속도로 수렴하는 비모수 추정기가 존재할 수 없음을 알 수 있다.^[21] n^−4/5 속도는 파라메트릭 방법의 일반적인 n 수렴⁻¹ 속도보다 느리다는 점에 유의한다.

대역폭이 고정되어 있지 않지만 추정치(풍선 추정기) 또는 표본(점 추정기)의 위치에 따라 달라지는 경우, 이는 적응형 또는 가변 대역폭 커널 밀도 추정이라고 불리는 특히 강력한 방법을 생성한다.

헤비테일 분포의 커널 밀도 추정을 위한 대역폭 선택은 상대적으로 어렵다.^[22]

Rule-of-Tumb 대역폭 추정기

단변량 데이터의 근사치를 위해 가우스 기초 함수를 사용하고, 추정되고 있는 기본 밀도가 가우스인 경우, h에 대한 최적의 선택(즉, 평균 통합 제곱 오차를 최소화하는 대역폭)^[23]은 다음과 같다.

h=\왼쪽left\frac {4}\hat{}}{5}}{3n}\오른쪽)^{\frac {1}{1}{1}}\frac {1}}\1}\cH00\ 약 1.06\,{\hat {\chma }\,n^{-1/5}},},

긴꼬리와 꼬치 분포와 양방향 혼합물 분포 모두에 적합하도록 h 값을 ${\hat {\sigma }}$ 견고하게 만들기 위해서는 $^{\$ 의 값을 다른 매개 변수 A로 ${\hat {\sigma }}$ 대체하는 것이 좋다.

A = 최소(표준 편차, 사분위간 범위/1.34)

엄지손가락 규칙과 등가 대역폭의 비교.

이 모형을 개선할 또 다른 수정은 인자를 1.06에서 0.9로 줄이는 것이다. 그 다음 최종 공식은 다음과 같다.

h=0.9\,\min \left({\hat {\sigma }},{\frac {IQR}{1.34}\오른쪽)\,n^{-{\frac{1}{5}}}}}}}}}

여기서 ${\hat {\sigma }}$ ${\hat {\sigma }}$ ${\$ 은 ${\hat {\sigma }}$ (는) 표본의 표준 편차이고 n은 표본 크기입니다. IQR은 사분위간 범위다. 이 근사치는 정규 분포 근사치, 가우스 근사치 또는 실버맨의 엄지손가락으로 불린다.^[23] 이 엄지손가락 법칙은 계산이 쉽지만 밀도가 정상에 근접하지 않을 때 광범위하게 부정확한 추정치를 산출할 수 있으므로 주의해서 사용해야 한다. 예를 들어, 이원 가우스 혼합물 모형을 추정할 때

\textstyle {\frac {1}{2}{\sqrt{2\pi }}}}e^{-{\frac {1}{2}}(x-10)^{2}}}+{\frac{1}{2}{{\sqrt{2\pi }}}}}e^{-{\frac {1}{1}{1}(x+10)^{2}}}

200점의 견본에서 오른쪽 그림은 실제 밀도와 두 개의 커널 밀도 추정치를 보여준다. 하나는 썸의 법칙 대역폭을 사용하고 다른 하나는 등가 대역폭을 사용한다.^[7]^[17] 규칙적인 대역폭에 기반한 추정치는 상당히 과대평가되어 있다.

특성함수밀도추정기와의 관계

표본(x₁, x₂, x, ..., x)을_n 고려할 때 특성 함수 φ(t) = E[e^itX]를 다음과 같이 추정하는 것은 당연하다.

{\widehat{\varphi }}}}}={\frac {1}{n}\sum _{j=1}^{n^{itx_{j}}}}}}

특성 함수를 알면 푸리에 변환식을 통해 해당 확률밀도함수를 찾을 수 있다. 이 반전 공식을 적용할 때 한 $\scriptstyle {\widehat {\varphi }}(t)$ 어려운 점은 추정치 $\scriptstyle {\widehat {\varphi }}(t)$ ( t $\scriptstyle {\widehat {\varphi }}(t)$ ) ${\$ 은 큰 t에 대해 신뢰할 수 없기 $\scriptstyle {\widehat {\varphi }}(t)$ 때문에 분산 적분으로 이어진다는 것이다. 이 문제를 피하기 위해 추정기 $\scriptstyle {\widehat {\varphi }}(t)$ $\scriptstyle {\widehat {\varphi }}(t)$ ) ( t $\scriptstyle {\widehat {\varphi }}(t)$ ) ${\$ 에 댐핑 함수 ψ_h(t) = multiplied(ht)를 곱하여 $\scriptstyle {\widehat {\varphi }}(t)$ 원점에서 1과 같다가 무한대에서 0으로 떨어진다. The “bandwidth parameter” h controls how fast we try to dampen the function $\scriptstyle {\widehat {\varphi }}(t)$ . In particular when h is small, then ψ_h(t) will be approximately one for a large range of t’s, which means that $\scriptstyle {\widehat {\varphi }}(t)$ reT의 가장 중요한 지역에서 실질적으로 주 전원은 변경되지 않았다.

ψ함수의 가장 일반적인 선택은 균일한 함수 ψ(t) = 1{-1 ≤ t ≤ 1} 중 하나로서, 이는 효과적으로 반전 공식의 통합 간격을 [-1/h, 1/h]로 단축하는 것을 의미하거나 가우스 함수 ψ(t) = e^{− $π$ t²}. 일단 ψ 함수를 선택한 후에는 반전 공식이 적용될 수 있으며, 밀도 추정기가 적용될 것이다.

{\begin{aligned}{\widehat {f}}(x)&={\frac {1}{2\pi }}\int _{-\infty }^{+\infty }{\widehat {\varphi }}(t)\psi _{h}(t)e^{-itx}\,dt={\frac {1}{2\pi }}\int _{-\infty }^{+\infty }{\frac {1}{n}}\sum _{j=1}^{n}e^{it(x_{j}-x)}\psi (ht)\,dt\\[5pt]&={\frac {1}{nh}}\sum _{j=1}^{n}{\frac {1}{2\pi }}\int _{-\infty }^{+\infty }e^{-i(ht){\frac {x-x_{j}}{h}}\psi(ht)\,d(ht)={\frac {1}{nh}\sum _{j=1}^{n1}K{{\frac {x-x_{j}}{h}}{{h}}}}}}{nd{aigned}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

여기서 K는 댐핑 기능 ψ의 푸리에 변환이다. 따라서 커널 밀도 추정기는 특성 함수 밀도 추정기와 일치한다.

기하학적 및 위상학적 형상

우리는 (글로벌) 모드의 정의를 로컬 감각으로 확장하고 로컬 모드를 정의할 수 있다.

M=\{x(x)=0,\lambda _{1}(x)<0\}

즉, $M$ $M$ 은 $M$ 밀도 함수가 로컬로 최대화되는 지점의 집합이다. KDE,[24][25]g()){\displaystyle g())}과 λ 1()){\displaystyle \lambda_{1}())에서 온화한 가정들, Mc{\displaystyle에 따라 M{M\displaystyle} 같은 자연적인 평가자는 플러그 인 g()){\displaystyle g())}의}은 K데스크 톱 환경 버전과 λ 1()){\displaystyle \lambda_{1}())}..M_{c}} $M_{c}$ $M$ $M$ 의 일관된 추정기 $M$ 평균 이동 알고리즘을^[26]^[27]^[28] 사용하여 추정기 $M_{c}$ $M_{c}$ ${\$ 를 숫자로 $M_{c}$ 계산할 수 있다는 점에 유의하십시오.

통계적 구현

커널 밀도 추정기의 소프트웨어 구현 목록에는 다음이 포함된다.

Analytica 릴리즈 4.4에서 PDF 결과에 대한 스무딩 옵션은 KDE를 사용하며, 표현식에서는 내장 기능을 통해 사용할 수 있다. Pdf 기능을 발휘하다
C/C++에서 FIGTree는 일반 커널을 사용하여 커널 밀도 추정치를 계산하는 데 사용할 수 있는 라이브러리다. MATLAB 인터페이스 사용 가능.
C++에서 libagf는 가변 커널 밀도 추정을 위한 라이브러리다.
C++에서 mlpack은 많은 다른 커널을 사용하여 KDE를 계산할 수 있는 라이브러리다. 더 빠른 계산을 위해 오류 허용오차를 설정할 수 있다. Python과 R 인터페이스를 이용할 수 있다.
C#과 F#, 수학에서.NET 숫자는 커널 밀도 추정을 포함하는 숫자 계산을 위한 오픈 소스 라이브러리 입니다.
CrimStat에서 커널 밀도 추정은 정규, 균일, 사분위수, 음수 지수, 삼각형의 다섯 가지 커널 함수를 사용하여 구현된다. 단일 및 이중-커널 밀도 추정 루틴을 모두 사용할 수 있다. 커널 밀도 추정은 헤드 뱅 루틴의 보간, 2차원 이동-범죄 밀도 함수 추정, 3차원 베이시안 이동-범죄 추정에도 사용된다.
ELKI에서 커널 밀도 함수는 패키지에서 찾을 수 있다. de.lmu.ifi.dbs.elki.math.statistics.kernelfunctions
ESRI 제품에서는 공간 분석 툴박스에서 커널 밀도 매핑을 관리하며 쿼틱(비중량) 커널을 사용한다.
엑셀에서는 왕립화학회가 분석방법위원회 기술브리프 4에 근거한 커널 밀도 추정을 실행하기 위해 애드인(Add-in)을 만들었다.
gnuplot에서 커널 밀도 추정은 에 의해 구현된다. smooth kdensity 옵션으로 데이터 파일은 각 포인트에 대한 무게와 대역폭을 포함하거나, 대역폭을 "실버맨의 엄지손가락 규칙"(위 참조)에 따라 자동으로^[29] 설정할 수 있다.
하스켈에서는 커널 밀도가 통계 패키지에 구현된다.
IGOR Pro에서 커널 밀도 추정은 에 의해 구현된다. StatsKDE 운영(Igor Pro 7.00에 추가됨). 대역폭은 Silverman, Scott 또는 Bowmann 및 Azzalini를 통해 사용자가 지정하거나 추정할 수 있다. 커널 유형은 에판치니코프, 바이 웨이트, 트라이 웨이트, 삼각, 가우스, 사각형이다.
자바에서는 Weka(기계학습) 패키지가 weka.estimator를 제공한다.그 중에서도 커널에스티메이터.
자바스크립트에서 시각화 패키지 D3.js는 science.stats 패키지에 KDE 패키지를 제공한다.
JMP에서 Graph Builder 플랫폼은 커널 밀도 추정을 활용하여 이변산 밀도에 대한 등고선도와 고밀도 영역(HDR), 일변량 밀도에 대한 바이올린 플롯과 HDR을 제공한다. 슬라이더로 대역폭을 변경할 수 있다. 또한 Fit Y by X 및 Distribution 플랫폼에 의해 Bivariate와 일변량 커널 밀도 추정치가 각각 제공된다.
줄리아에서는 커널 밀도 추정치가 커널 밀도.jl 패키지에 구현된다.
MATLAB에서 커널 밀도 추정은 다음을 통해 구현된다. ksdensity 함수(통계 도구 상자). MATLAB의 2018a 릴리즈를 기점으로, 커널 밀도의 범위 지정 등의 다른 옵션을 포함해 대역폭과 커널 매너 모두를 지정할 수 있다.^[30] 또는 자동 대역폭 선택^[7] 방법을 구현하는 무료 MATLAB 소프트웨어 패키지는 MATLAB Central File Exchange에서 다음 용도로 사용할 수 있다.
- 1차원 데이터
- 2차원 데이터
- n차원 데이터
  커널 회귀 분석, 커널 밀도 추정, 위험 함수의 커널 추정 등의 구현이 포함된 무료 MATLAB 도구 상자(이 도구 상자는 이 페이지의 일부임)
Mathematica에서 숫자 커널 밀도 추정은 함수에 의해 구현된다. SmoothKernelDistribution^[32] 그리고 상징적인 추정은 함수를 사용하여 실행된다. KernelMixtureDistribution^[33] 둘 다 데이터 기반 대역폭을 제공한다.
Minitab에서 왕립 화학 협회는 분석 방법 위원회 기술 개요 4에 기반한 커널 밀도 추정을 실행할 매크로를 만들었다.^[34]
NAG 라이브러리에서 커널 밀도 추정은 다음을 통해 구현된다. g10ba 루틴(Fortran과^[35] C 버전의^[36] 라이브러리에서 모두 사용 가능).
누클리에서 C++ 커널 밀도 방법은 특수 유클리드 그룹 $SE(3)$ $SE(3)$ ) ${\displaystyle SE($ 3 $SE(3)$ 의 데이터에 초점을 맞춘다.
옥타브에서 커널 밀도 추정은 에 의해 구현된다. kernel_density 옵션(계량학 패키지).
오리진에서는 사용자 인터페이스로 2D 커널 밀도 플롯을 만들 수 있으며, 랩토크, 파이썬, C코드에서 1D용 Ksdraxy와 2D용 Ks2 밀도의 두 가지 기능을 사용할 수 있다.
Perl에서 구현은 Statistics-KernelEstimation 모듈에서 확인할 수 있다.
PHP에서는 MathPHP 라이브러리에서 구현을 찾을 수 있다.
Python에는 PyQt-Fit 패키지의 Pyqt_fit.kde 모듈(SciPy) 등 많은 구현이 존재한다.scipy.stats.gaussian_kde)), Statsmodels (KDEUnivariate 그리고 KDEMultivariate(), 그리고 Scikit-learn ()KernelDensity) (비교^[37] 참조). KDEpy는 가중 데이터를 지원하며 FFT 구현이 다른 구현보다 훨씬 빠르다. 흔히 사용되는 판다 도서관[1]은 플롯 방법을 통한 kde 플로팅 지원을 제공한다.df.plot(kind='kde')[2]). 가중 및 상관된 MCMC 검체를 위한 getdist 패키지는 1D 및 2D 배포에 최적화된 대역폭, 경계 보정 및 고차 방법을 지원한다. 커널 밀도 추정에 새롭게 사용되는 하나의 패키지는 seaorn이다. import seaborn as sns , sns.kdeplot() )^[38] KDE의 GPU 구현도 존재한다.^[39]
R에서는 다음을 통해 구현된다. density 기본 분포에서, 그리고 bw.nrd0 함수는 통계 패키지에 사용되며, 이 함수는 Silverman의 책에 최적화된 공식을 사용한다. bkde KernSmooth 도서관에서 ParetoDensityEstimation DataVisualization 라이브러리(파레토 분포 밀도 추정용) kde ks도서관에서 dkden 그리고 dbckden evmix 라이브러리(경계 보정 커널 밀도 추정용 경계 수정 커널 밀도 추정 기준) npudens np 라이브러리(일반 및 범주형 데이터)에서 sm.density sm도서관에서. 의 구현을 위해 kde.R 패키지 또는 라이브러리를 설치할 필요가 없는 기능은 kde.R을 참조하십시오. 도시분석 전용 btb 도서관은 커널 밀도 추정을 통해 kernel_smoothing.
SAS에서는, proc kde 일변량 및 이변량 커널 밀도를 추정하는 데 사용될 수 있다.
아파치 스파크에서 KernelDensity() 계급^[40]
Stata에서는 다음을 통해 구현된다. kdensity;^[41] 예를 들어 histogram x, kdensity또는 여기에서 무료 Stata 모듈 KDENS를 이용할 수 있어 사용자가 1D 또는 2D 밀도 함수를 추정할 수 있다.
Swift에서는 을 통해 구현된다. SwiftStats.KernelDensityEstimation 오픈 소스 통계 라이브러리 SwiftStats에서.

참고 항목

커널(통계)
커널 스무딩
커널 회귀 분석
밀도 추정(기타 예시 표시 포함)
평균시프트
공간 축척: 대역폭 h가 x: 모두 x, h > 0}인 트리플릿 {(x, h, KDE)은 데이터의 척도 공간을 형성한다.
다변량 커널 밀도 추정
가변 커널 밀도 추정
머리/꼬리 부러짐

추가 읽기

Herdle, Müler, Sperlich, Werwatz, 비모수 및 Semiparametric Methods, Springer-Verlag Berlin Heidelberg 2004, 페이지 39–83

참조

^ ^a ^b Rosenblatt, M. (1956). "Remarks on Some Nonparametric Estimates of a Density Function". The Annals of Mathematical Statistics. 27 (3): 832–837. doi:10.1214/aoms/1177728190.
^ ^a ^b Parzen, E. (1962). "On Estimation of a Probability Density Function and Mode". The Annals of Mathematical Statistics. 33 (3): 1065–1076. doi:10.1214/aoms/1177704472. JSTOR 2237880.
^ ^a ^b Piryonesi S. Madeh; El-Diraby Tamer E. (2020-06-01). "Role of Data Analytics in Infrastructure Asset Management: Overcoming Data Size and Quality Problems". Journal of Transportation Engineering, Part B: Pavements. 146 (2): 04020022. doi:10.1061/JPEODX.0000175.
^ Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome H. (2001). The Elements of Statistical Learning : Data Mining, Inference, and Prediction : with 200 full-color illustrations. New York: Springer. ISBN 0-387-95284-5. OCLC 46809224.
^ Epanechnikov, V.A. (1969). "Non-parametric estimation of a multivariate probability density". Theory of Probability and Its Applications. 14: 153–158. doi:10.1137/1114019.
^ Wand, M.P; Jones, M.C. (1995). Kernel Smoothing. London: Chapman & Hall/CRC. ISBN 978-0-412-55270-0.
^ ^a ^b ^c ^d Botev, Zdravko (2007). Nonparametric Density Estimation via Diffusion Mixing (Technical report). University of Queensland.
^ Scott, D. (1979). "On optimal and data-based histograms". Biometrika. 66 (3): 605–610. doi:10.1093/biomet/66.3.605.
^ Park, B.U.; Marron, J.S. (1990). "Comparison of data-driven bandwidth selectors". Journal of the American Statistical Association. 85 (409): 66–72. CiteSeerX 10.1.1.154.7321. doi:10.1080/01621459.1990.10475307. JSTOR 2289526.
^ Park, B.U.; Turlach, B.A. (1992). "Practical performance of several data driven bandwidth selectors (with discussion)". Computational Statistics. 7: 251–270.
^ Cao, R.; Cuevas, A.; Manteiga, W. G. (1994). "A comparative study of several smoothing methods in density estimation". Computational Statistics and Data Analysis. 17 (2): 153–176. doi:10.1016/0167-9473(92)00066-Z.
^ Jones, M.C.; Marron, J.S.; Sheather, S. J. (1996). "A brief survey of bandwidth selection for density estimation". Journal of the American Statistical Association. 91 (433): 401–407. doi:10.2307/2291420. JSTOR 2291420.
^ Sheather, S.J. (1992). "The performance of six popular bandwidth selection methods on some real data sets (with discussion)". Computational Statistics. 7: 225–250, 271–281.
^ Agarwal, N.; Aluru, N.R. (2010). "A data-driven stochastic collocation approach for uncertainty quantification in MEMS" (PDF). International Journal for Numerical Methods in Engineering. 83 (5): 575–597. Bibcode:2010IJNME..83..575A. doi:10.1002/nme.2844.
^ Xu, X.; Yan, Z.; Xu, S. (2015). "Estimating wind speed probability distribution by diffusion-based kernel density method". Electric Power Systems Research. 121: 28–37. doi:10.1016/j.epsr.2014.11.029.
^ Botev, Z.I.; Grotowski, J.F.; Kroese, D.P. (2010). "Kernel density estimation via diffusion". Annals of Statistics. 38 (5): 2916–2957. arXiv:1011.2602. doi:10.1214/10-AOS799. S2CID 41350591.
^ ^a ^b Sheather, S.J.; Jones, M.C. (1991). "A reliable data-based bandwidth selection method for kernel density estimation". Journal of the Royal Statistical Society, Series B. 53 (3): 683–690. doi:10.1111/j.2517-6161.1991.tb01857.x. JSTOR 2345597.
^ Rudemo, M. (1982). "Empirical choice of histograms and kernel density estimators". Scandinavian Journal of Statistics. 9 (2): 65–78. JSTOR 4615859.
^ Bowman, A.W. (1984). "An alternative method of cross-validation for the smoothing of density estimates". Biometrika. 71 (2): 353–360. doi:10.1093/biomet/71.2.353.
^ Hall, P.; Marron, J.S.; Park, B.U. (1992). "Smoothed cross-validation". Probability Theory and Related Fields. 92: 1–20. doi:10.1007/BF01205233. S2CID 121181481.
^ Wahba, G. (1975). "Optimal convergence properties of variable knot, kernel, and orthogonal series methods for density estimation". Annals of Statistics. 3 (1): 15–29. doi:10.1214/aos/1176342997.
^ Buch-Larsen, TINE (2005). "Kernel density estimation for heavy-tailed distributions using the Champernowne transformation". Statistics. 39 (6): 503–518. CiteSeerX 10.1.1.457.1544. doi:10.1080/02331880500439782. S2CID 219697435.
^ ^a ^b Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman & Hall/CRC. p. 45. ISBN 978-0-412-24620-3.
^ Chen, Yen-Chi; Genovese, Christopher R.; Wasserman, Larry (2016). "A comprehensive approach to mode clustering". Electronic Journal of Statistics. 10 (1): 210–241. doi:10.1214/15-ejs1102. ISSN 1935-7524.
^ Chazal, Frédéric; Fasy, Brittany Terese; Lecci, Fabrizio; Rinaldo, Alessandro; Wasserman, Larry (2014). "Stochastic Convergence of Persistence Landscapes and Silhouettes". Annual Symposium on Computational Geometry - SOCG'14. New York, New York, USA: ACM Press. 6 (2): 474–483. doi:10.1145/2582112.2582128. ISBN 978-1-4503-2594-3. S2CID 6029340.
^ Fukunaga, K.; Hostetler, L. (January 1975). "The estimation of the gradient of a density function, with applications in pattern recognition". IEEE Transactions on Information Theory. 21 (1): 32–40. doi:10.1109/tit.1975.1055330. ISSN 0018-9448.
^ Yizong Cheng (1995). "Mean shift, mode seeking, and clustering". IEEE Transactions on Pattern Analysis and Machine Intelligence. 17 (8): 790–799. doi:10.1109/34.400568. ISSN 0162-8828.
^ Comaniciu, D.; Meer, P. (May 2002). "Mean shift: a robust approach toward feature space analysis". IEEE Transactions on Pattern Analysis and Machine Intelligence. 24 (5): 603–619. doi:10.1109/34.1000236. ISSN 0162-8828.
^ Janert, Philipp K (2009). Gnuplot in action : understanding data with graphs. Connecticut, USA: Manning Publications. ISBN 978-1-933988-39-9. 섹션 13.2.2 커널 밀도 추정치를 참조하십시오.
^ "Kernel smoothing function estimate for univariate and bivariate data - MATLAB ksdensity". www.mathworks.com. Retrieved 2020-11-05.
^ Horová, I.; Koláček, J.; Zelinka, J. (2012). Kernel Smoothing in MATLAB: Theory and Practice of Kernel Smoothing. Singapore: World Scientific Publishing. ISBN 978-981-4405-48-5.
^ "SmoothKernelDistribution—Wolfram Language Documentation". reference.wolfram.com. Retrieved 2020-11-05.
^ "KernelMixtureDistribution—Wolfram Language Documentation". reference.wolfram.com. Retrieved 2020-11-05.
^ "Software for calculating kernel densities". www.rsc.org. Retrieved 2020-11-05.
^ The Numerical Algorithms Group. "NAG Library Routine Document: nagf_smooth_kerndens_gauss (g10baf)" (PDF). NAG Library Manual, Mark 23. Retrieved 2012-02-16.
^ The Numerical Algorithms Group. "NAG Library Routine Document: nag_kernel_density_estim (g10bac)" (PDF). NAG Library Manual, Mark 9. Archived from the original (PDF) on 2011-11-24. Retrieved 2012-02-16.
^ Vanderplas, Jake (2013-12-01). "Kernel Density Estimation in Python". Retrieved 2014-03-12.
^ "seaborn.kdeplot — seaborn 0.10.1 documentation". seaborn.pydata.org. Retrieved 2020-05-12.
^ "Kde-gpu: We implemented nadaraya waston kernel density and kernel conditional probability estimator using cuda through cupy. It is much faster than cpu version but it requires GPU with high memory".
^ "Basic Statistics - RDD-based API - Spark 3.0.1 Documentation". spark.apache.org. Retrieved 2020-11-05.
^ https://www.stata.com/manuals15/rkdensity.pdf

외부 링크

커널 밀도 추정기 소개 히스토그램보다 향상된 커널 밀도 추정기의 동기를 부여하는 짧은 자습서.
커널 대역폭 최적화 최적화된 커널 밀도 추정치를 생성하는 무료 온라인 도구.
무료 온라인 소프트웨어(계산기)는 가우스안, 에판치니코프, 사각형, 삼각형, 비급량, 코사인, 옵트코사인 커널에 따라 데이터 시리즈의 커널 밀도 추정치를 계산한다.
커널 밀도 추정 애플릿 커널 밀도 추정의 온라인 대화형 예. 필요.NET 3.0 이상.

[Ros1956-1] Rosenblatt, M. (1956). "Remarks on Some Nonparametric Estimates of a Density Function". The Annals of Mathematical Statistics. 27 (3): 832–837. doi:10.1214/aoms/1177728190.

[Par1962-2] Parzen, E. (1962). "On Estimation of a Probability Density Function and Mode". The Annals of Mathematical Statistics. 33 (3): 1065–1076. doi:10.1214/aoms/1177704472. JSTOR 2237880.

[:0-3] Piryonesi S. Madeh; El-Diraby Tamer E. (2020-06-01). "Role of Data Analytics in Infrastructure Asset Management: Overcoming Data Size and Quality Problems". Journal of Transportation Engineering, Part B: Pavements. 146 (2): 04020022. doi:10.1061/JPEODX.0000175.

[4] Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome H. (2001). The Elements of Statistical Learning : Data Mining, Inference, and Prediction : with 200 full-color illustrations. New York: Springer. ISBN 0-387-95284-5. OCLC 46809224.

[5] Epanechnikov, V.A. (1969). "Non-parametric estimation of a multivariate probability density". Theory of Probability and Its Applications. 14: 153–158. doi:10.1137/1114019.

[WJ1995-6] Wand, M.P; Jones, M.C. (1995). Kernel Smoothing. London: Chapman & Hall/CRC. ISBN 978-0-412-55270-0.

[bo07-7] Botev, Zdravko (2007). Nonparametric Density Estimation via Diffusion Mixing (Technical report). University of Queensland.

[8] Scott, D. (1979). "On optimal and data-based histograms". Biometrika. 66 (3): 605–610. doi:10.1093/biomet/66.3.605.

[9] Park, B.U.; Marron, J.S. (1990). "Comparison of data-driven bandwidth selectors". Journal of the American Statistical Association. 85 (409): 66–72. CiteSeerX 10.1.1.154.7321. doi:10.1080/01621459.1990.10475307. JSTOR 2289526.

[10] Park, B.U.; Turlach, B.A. (1992). "Practical performance of several data driven bandwidth selectors (with discussion)". Computational Statistics. 7: 251–270.

[11] Cao, R.; Cuevas, A.; Manteiga, W. G. (1994). "A comparative study of several smoothing methods in density estimation". Computational Statistics and Data Analysis. 17 (2): 153–176. doi:10.1016/0167-9473(92)00066-Z.

[12] Jones, M.C.; Marron, J.S.; Sheather, S. J. (1996). "A brief survey of bandwidth selection for density estimation". Journal of the American Statistical Association. 91 (433): 401–407. doi:10.2307/2291420. JSTOR 2291420.

[13] Sheather, S.J. (1992). "The performance of six popular bandwidth selection methods on some real data sets (with discussion)". Computational Statistics. 7: 225–250, 271–281.

[14] Agarwal, N.; Aluru, N.R. (2010). "A data-driven stochastic collocation approach for uncertainty quantification in MEMS" (PDF). International Journal for Numerical Methods in Engineering. 83 (5): 575–597. Bibcode:2010IJNME..83..575A. doi:10.1002/nme.2844.

[15] Xu, X.; Yan, Z.; Xu, S. (2015). "Estimating wind speed probability distribution by diffusion-based kernel density method". Electric Power Systems Research. 121: 28–37. doi:10.1016/j.epsr.2014.11.029.

[bo10-16] Botev, Z.I.; Grotowski, J.F.; Kroese, D.P. (2010). "Kernel density estimation via diffusion". Annals of Statistics. 38 (5): 2916–2957. arXiv:1011.2602. doi:10.1214/10-AOS799. S2CID 41350591.

[SJ91-17] Sheather, S.J.; Jones, M.C. (1991). "A reliable data-based bandwidth selection method for kernel density estimation". Journal of the Royal Statistical Society, Series B. 53 (3): 683–690. doi:10.1111/j.2517-6161.1991.tb01857.x. JSTOR 2345597.

[18] Rudemo, M. (1982). "Empirical choice of histograms and kernel density estimators". Scandinavian Journal of Statistics. 9 (2): 65–78. JSTOR 4615859.

[19] Bowman, A.W. (1984). "An alternative method of cross-validation for the smoothing of density estimates". Biometrika. 71 (2): 353–360. doi:10.1093/biomet/71.2.353.

[20] Hall, P.; Marron, J.S.; Park, B.U. (1992). "Smoothed cross-validation". Probability Theory and Related Fields. 92: 1–20. doi:10.1007/BF01205233. S2CID 121181481.

[21] Wahba, G. (1975). "Optimal convergence properties of variable knot, kernel, and orthogonal series methods for density estimation". Annals of Statistics. 3 (1): 15–29. doi:10.1214/aos/1176342997.

[Buch2005-22] Buch-Larsen, TINE (2005). "Kernel density estimation for heavy-tailed distributions using the Champernowne transformation". Statistics. 39 (6): 503–518. CiteSeerX 10.1.1.457.1544. doi:10.1080/02331880500439782. S2CID 219697435.

[SI1998-23] Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman & Hall/CRC. p. 45. ISBN 978-0-412-24620-3.

[24] Chen, Yen-Chi; Genovese, Christopher R.; Wasserman, Larry (2016). "A comprehensive approach to mode clustering". Electronic Journal of Statistics. 10 (1): 210–241. doi:10.1214/15-ejs1102. ISSN 1935-7524.

[25] Chazal, Frédéric; Fasy, Brittany Terese; Lecci, Fabrizio; Rinaldo, Alessandro; Wasserman, Larry (2014). "Stochastic Convergence of Persistence Landscapes and Silhouettes". Annual Symposium on Computational Geometry - SOCG'14. New York, New York, USA: ACM Press. 6 (2): 474–483. doi:10.1145/2582112.2582128. ISBN 978-1-4503-2594-3. S2CID 6029340.

[26] Fukunaga, K.; Hostetler, L. (January 1975). "The estimation of the gradient of a density function, with applications in pattern recognition". IEEE Transactions on Information Theory. 21 (1): 32–40. doi:10.1109/tit.1975.1055330. ISSN 0018-9448.

[27] Yizong Cheng (1995). "Mean shift, mode seeking, and clustering". IEEE Transactions on Pattern Analysis and Machine Intelligence. 17 (8): 790–799. doi:10.1109/34.400568. ISSN 0162-8828.

[28] Comaniciu, D.; Meer, P. (May 2002). "Mean shift: a robust approach toward feature space analysis". IEEE Transactions on Pattern Analysis and Machine Intelligence. 24 (5): 603–619. doi:10.1109/34.1000236. ISSN 0162-8828.

[29] Janert, Philipp K (2009). Gnuplot in action : understanding data with graphs. Connecticut, USA: Manning Publications. ISBN 978-1-933988-39-9. 섹션 13.2.2 커널 밀도 추정치를 참조하십시오.

[30] "Kernel smoothing function estimate for univariate and bivariate data - MATLAB ksdensity". www.mathworks.com. Retrieved 2020-11-05.

[HorKolZel-31] Horová, I.; Koláček, J.; Zelinka, J. (2012). Kernel Smoothing in MATLAB: Theory and Practice of Kernel Smoothing. Singapore: World Scientific Publishing. ISBN 978-981-4405-48-5.

[32] "SmoothKernelDistribution—Wolfram Language Documentation". reference.wolfram.com. Retrieved 2020-11-05.

[33] "KernelMixtureDistribution—Wolfram Language Documentation". reference.wolfram.com. Retrieved 2020-11-05.

[34] "Software for calculating kernel densities". www.rsc.org. Retrieved 2020-11-05.

[35] The Numerical Algorithms Group. "NAG Library Routine Document: nagf_smooth_kerndens_gauss (g10baf)" (PDF). NAG Library Manual, Mark 23. Retrieved 2012-02-16.

[36] The Numerical Algorithms Group. "NAG Library Routine Document: nag_kernel_density_estim (g10bac)" (PDF). NAG Library Manual, Mark 9. Archived from the original (PDF) on 2011-11-24. Retrieved 2012-02-16.

[37] Vanderplas, Jake (2013-12-01). "Kernel Density Estimation in Python". Retrieved 2014-03-12.

[38] "seaborn.kdeplot — seaborn 0.10.1 documentation". seaborn.pydata.org. Retrieved 2020-05-12.

[39] "Kde-gpu: We implemented nadaraya waston kernel density and kernel conditional probability estimator using cuda through cupy. It is much faster than cpu version but it requires GPU with high memory".

[40] "Basic Statistics - RDD-based API - Spark 3.0.1 Documentation". spark.apache.org. Retrieved 2020-11-05.

[41] ttps://www.stata.com/manuals15/rkdensity.pdf

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[16]

[17]

[18]

[19]

[20]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[21]

[22]

[23]

[26]

[27]

[28]

[29]

[30]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

Search