켄달 타우 거리

Kendall tau 순위 거리는 두 순위 목록 사이의 쌍별 불일치 수를 계산하는 미터법(거리 함수)이다.거리가 클수록 두 명단의 차이점은 더 크다.Kendall tau 거리는 버블 정렬 알고리즘이 한 리스트를 다른 리스트와 같은 순서로 배치하기 위해 걸리는 스왑 수와 같기 때문에 버블 정렬 거리라고도 불린다.Kendall tau 거리는 Maurice Kendall에 의해 만들어졌다.

정의

두 $\tau _{2}$ 목록 사이의 Kendall tau 순위 거리는 $\tau _{1}$ 1 ${\$ {1} $\tau _{1}$ 및 $\tau _{1}$ $\tau _{2}$ 2 ${\$

K_{d}(\tau _{1},\tau _{2})= \{(i,j):i<j,[\tau _{1}(i)<\tau _{1}(j)\wedge \tau _{2}(i)>\tau _{2}(j)]\vee [\tau _{1}(i)>\tau _{1}(j)\wedge \tau _{2}(i)<\tau _{2}(j)]\} .

어디에

$\tau _{1}(i)$ $\tau _{1}(i)$ ( $\tau _{1}(i)$ ) ${\displaystyle \tau_{1}($ $i$ $)}$ 및 $\tau _{2}(i)$ 2 ( $\tau _{2}$ $) {\displaystyle \tau_$ ${$ $1$ }}는 $\tau _{1}$ 각각 $\tau _{2}$ $\tau _{1}$ ${\$ $\tau_{$ 2}}의 $요소$ i $i$ ${\$ i $}$ 의 순위다 $\tau _{2}(i)$ .

$K_{d}(\tau _{1},\tau _{2})$ will be equal to 0 if the two lists are identical and ${\frac {1}{2}}n(n-1)$ (where $n$ is the list size) if one list is the reverse of the other.

Kendall tau 거리는 또한 다음과 같이 정의될 수 있다.

{\displaystyle K_{1}(\tau _{1},\tau _{2}=\sum _{\i,j\}\in P,i<i>{\bar{K}_{k}_{i,j}(\tau _{1},\tau _{2})

어디에

P는 $\tau _{1}$ $\tau _{1}$ ${\$ 및 $\tau _{1}$ $\tau _{2}$ $\tau _{2}$ ${\$ }}의 순서가 없는 별개의 원소들의 집합이다.
${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ $\tau _{1}$ ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ 의 ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ ( ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ , ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ 2 ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ ) ${\displaystyle {\bar{K}_{i,j}(\tau_{1},\tau_{2})$ ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ = $\tau _{1}$ ${\$ $\tau _{2}$ $\tau _{2}$ ${\$ 0
${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ K ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ 의 ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ ( ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ , ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ ) ${\displaystyle {\bar{K}_{i,j}(\tau_{1},\tau_{2})$ ${\bar {K}}_{i,j}(\tau _{1},\tau _{2})$ = i와 j가 $\tau _{1}$ $\tau _{1}$ ${\$ $τ$ 2 $\tau _{2}.$ . ${\displaystytau_{2}.$ $}$

Kendall tau 거리는 불협화음 쌍의 총 수로 정의될 수도 있다.

순위 내 Kendall tau 거리: 순열(또는 순위)은 0과 N-1 사이의 각 정수가 정확히 한 번 나타나는 N 정수의 배열이다.

두 순위 사이의 Kendall tau 거리는 두 순위에서 순서가 다른 쌍의 수입니다.예를 들어 0 3 1 6 2 5 4와 1 0 3 6 4 2 5 사이의 켄달 타우 거리는 4인데, 0-1, 3-1, 2-4, 5-4는 두 순위에서 순서가 다르지만 다른 쌍은 모두 순서가 같기 때문이다.^[1]

The normalized Kendall tau distance $K_{n}$ is ${\frac {K_{d}}{{\frac {1}{2}}n(n-1)}}={\frac {2K_{d}}{n(n-1)}}$ and therefore lies in the interval [0,1].

If Kendall tau distance function is performed as $K(L1,L2)$ instead of $K(\tau _{1},\tau _{2})$ (where $\tau _{1}$ and $\tau _{2}$ are the rankings of $L1$ and ${\displ$ $aystyle L2}$ 원소 각각 $L2$ ), 그러면 삼각 불평등이 보장되지 않는다.삼각 불평등은 리스트에 반복이 있는 경우에도 실패하는 경우가 있다.그래서 우리는 더 이상 미터법을 다루지 않는다.

일반화된 Kendall tau 거리 버전은 다른 항목과 순위 내 다른 위치에 가중치를 부여하기 위해 제안되었다.^[2]

Kendall tau 순위 상관 계수와 비교

켄들 타우 거리(Kd{\displaystyle K_{d}})는 통계에 사용되는 켄들 타우안 된다 혼동해서는상관 계수(Kc{\displaystyle K_{c})와 순위.

They are related by $K_{c}=1-4K_{d}/(n(n-1))$ , $K_{d}=(1-K_{c})(n(n-1))/4$

Or simpler by $K_{c}=1-2K_{n},K_{n}=(1-K_{c})/2$ where $K_{n}$ is the normalised distance $2K_{d}/(n(n-1))$ see above)

여전히 그것들은 근본적으로 다른 개념들이다.

거리는 0과 $n(n-1)/2$ 사이의 값 $n(n-1)/2$ - 1 ) $n(n-1)/2$ / $n(n-1)/2$ $n(n-1)/2$ 이다 $n(n-1)/2$ (정상화된 거리는 0과 1)

상관관계는 -1과 1사이에 있다.

등가 사이의 거리는 0이고, 등가와의 상관관계는 1이다.

역행 간 거리는 $n(n-1)/2$ ( $n(n-1)/2$ - 1 $n(n-1)/2$ ) $n(n-1)/2$ / 2 $n(n-1)/2$ 이고 $n(n-1)/2$ 역행 간 상관관계는 -1이다.

예를 들어 순위 A>B>C>D와 A>B>C>D를 비교하면 거리는 0이다. 상관관계는 1이다.

순위 A>B>C>D와 D>C>B>A의 거리를 비교하면 -1이다.

A>B>C>D와 B>D>의 순위 비교A>C 거리는 3이고 상관관계는 0이다.

예

한 그룹이 다섯 명으로 이루어진 그룹을 키와 몸무게별로 순위를 매긴다고 가정합시다.

사람	A	B	C	D	E	순위
높이를 기준으로 순위 지정	1	2	3	4	5	A>B>C>D>E
무게별 순위 매기기	3	4	1	2	5	C>D>A>B>E

여기서 A는 키가 크고 세 번째로 무겁고, B는 키가 두 번째로 크고 네 번째로 무겁다.

Kendall tau 거리를 계산하려면 각 사람을 다른 모든 사람과 짝을 지어 목록 1의 값이 목록 2의 값과 반대 순서로 있는 횟수를 세십시오.

짝을	높이	무게	카운트
(A,B)	1 < 2	3 < 4
(A,C)	1 < 3	3 > 1	X
(A,D)	1 < 4	3 > 2	X
(A,E)	1 < 5	3 < 5
(B,C)	2 < 3	4 > 1	X
(B,D)	2 < 4	4 > 2	X
(B,E)	2 < 5	4 < 5
(C,D)	3 < 4	1 < 2
(C,E)	3 < 5	1 < 5
(D,E)	4 < 5	2 < 5

값이 반대 순서로 되어 있는 네 쌍이 있기 때문에 켄달 타우 거리는 4이다.표준화된 Kendall tau 거리는

{\frac {4}{5(5-1)/2}}=0.4.

값이 0.4이면 두 리스트 사이의 순서에서 쌍의 40%가 다르다는 것을 나타낸다.

Kendall tau 거리 계산

Python(NumPy 사용)에서 순진한 구현은 다음과 같다.

수입하다 불결한 로서 np  반항하다 normalized_kendall_tau_distance(값1, 값2):     """켄달 타우 거리를 계산하십시오."""     n = 렌(값1)     주장하다 렌(값2) == n, "두 목록 모두 길이가 같아야 함"     i, j = np.메쉬그리드(np.아랑주름하다(n), np.아랑주름하다(n))     a = np.아그소트(값1)     b = np.아그소트(값2)     무질서한 = np.논리_또는(np.논리_및(a[i] < a[j], b[i] > b[j]), np.논리_및(a[i] > a[j], b[i] < b[j])).합계를 내다()     돌아오다 무질서한 / (n * (n - 1))

그러나 이를 위해서는 n ${\$ n $^{2}}$ 개의 $n^{2}$ 메모리가 필요하므로 대형 어레이에서는 비효율적이다.

Given two rankings $\tau _{1},\tau _{2}$ , it is possible to rename the items such that $\tau _{1}=(1,2,3,...)$ . Then, the problem of computing the Kendall tau distance reduces to computing the number of inversions in $\tau _{2}$ $\tau _{2}$ — $i,j$ pair $수$ $i<j$ i , j {\ $displaystyle i<j},$ $\tau _{2}(i)>\tau _{2}(j)$ 2 $\tau _{2}(i)>\tau _{2}(j)$ ( $\tau _{2}(i)>\tau _{2}(j)$ ) $\tau _{2}(i)>\tau _{2}(j)$ $\tau _{2}(i)>\tau _{2}(j)$ ( $\tau _{2}(i)>\tau _{2}(j)$ ) ${\displaystyle \tau _{2}(i)\tau _{2}(j$ 이 숫자를 계산하는 알고리즘은 여러 가지가 있다.

병합 정렬을 기반으로 하는 간단한 알고리즘에는 $O(n\log n)$ O $O(n\log n)$ log $O(n\log n)$ ) ${\displaystyle$ O(n $\log$ n $)}$ 이(가) 필요하다 $O(n\log n)$ ^[3]
보다 고급 알고리즘에는 시간 $O(n{\sqrt {\log {n}}})$ $O(n{\sqrt {\log {n}}})$ $){\displaystyle O(n{\sqrt {\log{n}}})$ 가 필요하다 $O(n{\sqrt {\log {n}}})$ ^[4]

여기에 기본적인 C 구현이 있다.

#include <stdbool.h>  인트로 켄달토우(키가 작은 x[], 키가 작은 y[], 인트로 렌) {     인트로 i, j, v = 0;     바가지 긁다 a, b;      을 위해 (i = 0; i < 렌; i++) {         을 위해 (j = i + 1; j < 렌; j++) {              a = x[i] < x[j] && y[i] > y[j];             b = x[i] > x[j] && y[i] < y[j];              만일 (a    b)                 v++;          }     }      돌아오다 복근(v); }  둥둥 뜨다 정상화하다(인트로 kt, 인트로 렌) {     돌아오다 kt / (렌 * (렌 - 1) / 2.0); }

참고 항목

참조

^ "Sorting Applications".
^ Ravi Kumar and Sergei Vassilvitskii (2010). Generalized Distances between Rankings (PDF).
^ Ionescu, Vlad. "calculating the number of "inversions" in a permutation". Stack overflow. Retrieved 24 February 2017.
^ Chan, Timothy M.; Pătraşcu, Mihai (2010). "Counting Inversions, Offline Orthogonal Range Counting, and Related Problems". Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms. p. 161. CiteSeerX 10.1.1.208.2715. doi:10.1137/1.9781611973075.15. ISBN 978-0-89871-701-3.

Fagin, R.; Kumar, R.; Sivakumar, D. (2003). "Comparing top k lists". SIAM Journal on Discrete Mathematics. 17 (1): 134–160. CiteSeerX 10.1.1.86.3234. doi:10.1137/S0895480102412856. S2CID 6249357.
Kendall, M. (1948). "Rank Correlation Methods". Charles Griffin & Company Limited. {{cite journal}}:Cite 저널은 필요로 한다. journal=(도움말)
Kendall, M. (1938). "A New Measure of Rank Correlation". Biometrika. 30 (1/2): 81–89. doi:10.2307/2332226. JSTOR 2332226.

외부 링크

온라인 소프트웨어: Kendall의 tau 순위 상관 관계 계산

[1] "Sorting Applications".

[2] Ravi Kumar and Sergei Vassilvitskii (2010). Generalized Distances between Rankings (PDF).

[3] Ionescu, Vlad. "calculating the number of "inversions" in a permutation". Stack overflow. Retrieved 24 February 2017.

[4] Chan, Timothy M.; Pătraşcu, Mihai (2010). "Counting Inversions, Offline Orthogonal Range Counting, and Related Problems". Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms. p. 161. CiteSeerX 10.1.1.208.2715. doi:10.1137/1.9781611973075.15. ISBN 978-0-89871-701-3.

[1]

[2]

[3]

[4]

Search

켄달 타우 거리

네임스페이스

더

목차

정의

Kendall tau 순위 상관 계수와 비교

예

Kendall tau 거리 계산

참고 항목

참조

외부 링크