콜모고로프 구조함수

1973년 안드레이 콜모고로프는 통계와 모델 선택에 대한 비확률론적 접근방식을 제안했다.각 기준점을 유한 이항 문자열로 하고 모델은 유한 이항 문자열 집합으로 한다.주어진 최대 Kolmogorov 복잡성의 모델로 구성된 모델 클래스를 고려하십시오.개별 데이터 문자열의 Kolmogorov 구조 함수는 모델 클래스의 복잡성 수준 제약 조건과 데이터를 포함하는 클래스에서 모델의 최소 로그 카디널리티 사이의 관계를 나타낸다.구조 함수는 개별 데이터 문자열의 모든 확률적 특성을 결정한다. 즉, 모든 제약된 모델 클래스에 대해 실제 모델이 고려된 모델 클래스에 있는지 여부에 관계없이 클래스에서 개별 가장 적합한 모델을 결정한다.고전적인 사례에서 우리는 확률 분포를 가진 일련의 데이터에 대해 이야기하며, 그 속성은 기대치의 그것들이다.이와는 대조적으로, 여기서는 개별 데이터 문자열과 개별 문자열의 속성을 중점적으로 다룬다.이 설정에서 속성은 고전적인 경우와 같이 높은 확률보다는 확실성을 가지고 있다.Kolmogorov 구조 함수는 개별 데이터에 대한 개별 모델의 적합도를 정밀하게 정량화한다.

Kolmogorov 구조 함수는 Kolmogorov 복잡성 이론으로도 알려진 알고리즘 정보 이론에서 복잡성이 증가하는 모델을 이용하여 문자열의 구조를 기술하는 데 사용된다.

콜모고로프의 정의

콜모고로프(왼쪽)는 (탈린, 1973년)에서 구조 기능에 대해 이야기한다.

구조함수는 원래 1973년 탈린에서 열린 소련 정보이론 심포지엄에서 콜모고로프가 제안했지만, 이러한 결과는 182페이지에 발표되지^[1] 않았다.그러나 그 결과는 1974년에^[2] 발표되었는데, 콜모고로프 자신이 작성한 유일한 기록이다.그의 마지막 과학적 진술 중 하나는 (L.A에 의해 러시아 원문에서 번역된) 것이다.레빈:

각각의 건설적 객체에 $\Phi _{x}(k)$ $\Phi _{x}(k)$ x ( k $\Phi _{x}(k)$ ) $\Phi _{x}(k)$ 함수 즉, 최대 k에서 복잡성의 정의를 허용하는 x-content 세트의 최소 카디널리티 로그에 해당한다.요소 x 자체가 간단한 정의를 허용하면, $\Phi$ k에도 $display$ {\displaystyle \ $Phi }$ 함수가 0으로 떨어진다 $\Phi$ .그러한 정의가 결여되어 있는 원소는 부정적인 의미에서 "랜덤"이다.But it is positively "probabilistically random" only when function $\Phi$ having taken the value $\Phi _{0}$ at a relatively small $k=k_{0}$ , then changes approximately as $\Phi (k)=\Phi _{0}-(k-k_{0})$ .
— Kolmogorov, announcement cited above

현대적 정의

그것은 Cover와 Thomas에서 논의된다.^[1]베레쉬차긴과 비타니에서^[3] 광범위하게 연구되어 주요 성질도 해결된다.Kolmogorov 구조 함수는 다음과 같이 쓸 수 있다.

h_{x}(\alpha )=\min _{S}\{\log S :x\in S,K(S)\leq \alpha \}}

where $x$ is a binary string of length $n$ with $x\in S$ where $S$ is a contemplated model (set of n-length strings) for $x$ , $K(S)$ is the Kolmogorov complexity of $S$ and $\alpha$ $\alpha$ 은 $S$ 는) 사색된 $S$ $S$ 의 복잡성을 경계하는 음이 아닌 정수 값이다 $\alpha$ .Clearly, this function is nonincreasing and reaches $\log \{x\} =0$ for $\alpha =K(x)+c$ where $c$ is the required number of bits to change $x$ into $\{x\}$ and ${\displaystyl$ $e K(x)}$ 는 $K(x)$ $x$ ${\displaystyle x$ 의 Kolmogorov 복잡도 입니다.

알고리즘 충분한 통계량

$x$ $x$ 을(를) 포함하는 $S$ 세트 $S$ $S$ 을(를) 정의하여 $x$

K(S)+K(x|S)=K(x)+O(1)

( S

K(S)+K(x|S)=K(x)+O(1)

)

K(S)+K(x|S)=K(x)+O(1)

+ K

K(S)+K(x|S)=K(x)+O(1)

(

K(S)+K(x|S)=K(x)+O(1)

K(S)+K(x|S)=K(x)+O(1)

)

K(S)+K(x|S)=K(x)+O(1)

=

K(S)+K(x|S)=K(x)+O(1)

( x

K(S)+K(x|S)=K(x)+O(1)

)

K(S)+K(x|S)=K(x)+O(1)

+

K(S)+K(x|S)=K(x)+O(1)

( 1

K(S)+K(x|S)=K(x)+O(1)

)

{\displaystyle K(S)+K(x

S

)=K(x)+O(

1

K(S)+K(x|S)=K(x)+O(1)

함수 $h_{x}(\alpha )$ $h_{x}(\alpha )$ ( $h_{x}(\alpha )$ ) $h_{x}(\alpha )$ 은(는) 다음에 의해 정의된 대각선이라고 하는 대각선 아래에 고정된 독립 상수 이상을 결코 감소시키지 않는다 $h_{x}(\alpha )$ .

L(\alpha )+\alpha =K(x)

(

L(\alpha )+\alpha =K(x)

)

L(\alpha )+\alpha =K(x)

+

L(\alpha )+\alpha =K(x)

=

L(\alpha )+\alpha =K(x)

( x

L(\alpha )+\alpha =K(x)

)

{\displaystyle L(\alpha )+\alpha

= K

(x)}

.

특정 인수(예: $\alpha =K(x)+c$ = $\alpha =K(x)+c$ ( $\alpha =K(x)+c$ ) $\alpha =K(x)+c$ + $\alpha =K(x)+c$ ${\displaystyle \alpha$ = $K(x)+c})$ 에 $h_{x}$ $h_{x}$ $h_{x}$ x $h_{x}$ {\ $displaystyle h_{x}$ 그래프에 의해 일정한 거리 내에 접근한다. $\alpha =K(x)+c$ For these $\alpha$ 's we have $\alpha +h_{x}(\alpha )=K(x)+O(1)$ and the associated model $S$ (witness for $h_{x}(\alpha )$ ) is called an optimal set for $x$ , and its description o따라서 f $K(S)\leq \alpha$ ( $K(S)\leq \alpha$ ) $K(S)\leq \alpha$ $K(S)\leq \alpha$ 비트는 $K(S)\leq \alpha$ 알고리즘적으로 충분한 통계량이다.우리는 관습에 의해 '콜모고로프 복잡성'에 대한 '알고리즘'을 쓴다.알고리즘 충분한 통계량의 주요 속성은 다음과 같다: $S$ ${\displaystyle$ $S$ $}$ 이 $S$ (가) $x$ ${\displaystyle$ x $}$ 에 대한 알고리즘 충분한 통계라면 $x$

K(S)+\log |S|=K(x)+O(1)

(

K(S)+\log |S|=K(x)+O(1)

)

K(S)+\log |S|=K(x)+O(1)

+

K(S)+\log |S|=K(x)+O(1)

K(S)+\log |S|=K(x)+O(1)

K(S)+\log |S|=K(x)+O(1)

=

K(S)+\log |S|=K(x)+O(1)

( x

K(S)+\log |S|=K(x)+O(1)

)

K(S)+\log |S|=K(x)+O(1)

+

K(S)+\log |S|=K(x)+O(1)

( 1

K(S)+\log |S|=K(x)+O(1)

)

{\displaystyle K(S)+\log S

=

K(x)+O(1)}

.

즉, $\log |S|$ $S$ $S$ 을 $S$ (를) 사용하는 $x$ x $x$ 의 2부 코드로서 로그 $displaystyle \log$ S $}$ 비트에 $S$ $\log |S|$ $있는$ S $S$ 의 열거에 있는 $x$ $x$ ${\displaystystyle$ 의 $가장$ 짧은 1부 코드만큼 간결하다. $yle x}($ $K(x)$ ( $K(x)$ ) $K(x)$ 비트 $x$ $K(x)$ .이는 다음과 같이 쉽게 알 수 있다.

K(x)\leq K(x,S)+O(1)\leq K(S)+K(x S)+O(1)\leq K(S)+\log  S +O(1)\leq K(x)+O(1)

,

using straightforward inequalities and the sufficiency property, we find that $K(x S)=\log S +O(1)$ . (For example, given $S\ni x$ , we can describe $x$ self-delimitingly (you can determine its end) in $\log |S|+O(1)$ ${\displaystyle \log$ S $+O($ 1) $비트$ 따라서 임의성 결핍 $\log |S|-K(x|S)$ $\log |S|-K(x|S)$ $\log |S|-K(x|S)$ - $\log |S|-K(x|S)$ ( x $\log |S|-K(x|S)$ ) $\log S(x S)$ 은 S $S$ 의 $x$ x $x$ 의 $\log |S|-K(x|S)$ 상수로서 $S$ , $x$ ${\displaystytle x}$ 은 S의 전형적인 (임의) 요소라는 $x$ 뜻이다.그러나 $충분한$ 통계가 없는 $x$ x $x$ 을(를) 포함하는 $S$ $모델$ S $S$ 이(가) 있을 수 있다.An algorithmic sufficient statistic $S$ for $x$ has the additional property, apart from being a model of best fit, that $K(x,S)=K(x)+O(1)$ and therefore by the Kolmogorov complexity symmetry of information (the information about ${\displays$ $tyle x}$ in $S$ is about the same as the information about $S$ in x) we have $K(S x^{*})=O(1)$ : the algorithmic sufficient statistic $S$ is a model of best fit that is almost completely determined by $x$ . ( $∗{\$ x $^{*})$ 는 $x^{*}$ x $x$ 를 위한 최단 시간 프로그램이다. $x$ 그러한 최소한의 $\alpha$ $\alpha$ 과(와) 관련된 알고리즘적 충분 통계량을 알고리즘적 최소 충분 통계량이라고 한다 $\alpha$ .

그림 관련:MDL 구조함수 ∆ $\lambda _{x}(\alpha )$ ( $\lambda _{x}(\alpha )$ ) $\lambda _{x}(\alpha )$ 는 아래에 설명되어 $\lambda _{x}(\alpha )$ 있다.The Goodness-of-fit structure function $\beta _{x}(\alpha )$ is the least randomness deficiency (see above) of any model $S\ni x$ for $x$ such that $K(S)\leq \alpha$ . This structure function gives the goodness-of-fit of a model $문자열$ x에 대한 S ${\displaystyle$ S $}($ x 포함)모형이 낮으면 모형이 잘 맞고, 높으면 모형이 잘 맞지 않는다.If $\beta _{x}(\alpha )=0$ for some $\alpha$ then there is a typical model $S\ni x$ for $x$ such that $K(S)\leq \alpha$ and $x$ is typical (random) for S.즉, $S$ $S$ 이(가) x에 가장 적합한 모델이다 $S$ . 자세한 내용은 특히 및 를^[3] 참조하십시오^[1].^[4]

속성 선택

그래프가 최소 45도의 각도로 내려가는 제약조건 내에서, 그래프는 $K(x)$ 에서 시작하여 K $K(x)$ ( x $K(x)$ ) $K(x)$ 에서 대략적으로 끝나는 것으로 $K(x)$ 모든 그래프( $O(\log n)$ O $O(\log n)$ ( $O(\log n)$ $O(\log n)$ n $O(\log n)$ ) $O(\log n)$ 인수와 값의 가법 $O(\log n)$ )는 일부 데이터 x의 구조에 의해 실현되며 그 반대의 경우도 실현된다.그래프가 대각선에 먼저 도달하는 경우 인수(복잡성)는 최소 충분 통계량의 인수(복잡성)이다.이곳을 결정하는 것은 무리다.봐봐^[3]

주재산

구조 함수가 복잡성의 각 $\alpha$ 수준 $\alpha$ $\alpha$ 에서 $O(\log n)$ $O(\log n)$ ) ${\displaystyle O(\log$ n $)}$ 의 스트립 내에서 개별 문자열 x에 대해 $S$ 가장 $O(\log n)$ 적합한 모델 $S$ $S$ 을(으)^[3]로 선택할 수 있다는 것이 입증되었다.

MDL 변종

최소 설명 길이(MDL) 기능:The length of the minimal two-part code for x consisting of the model cost K(S) and the length of the index of x in S, in the model class of sets of given maximal Kolmogorov complexity $\alpha$ , the complexity of S upper bounded by $\alpha$ , is given by the MDL function or constrained MDL estimator:

\lambda _{x}(\alpha )=\min _{S}\{\Lambda(S):S\ni x,\;K(S)\leq \alpha \}

여기서 $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ ( $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ ) $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ = $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ + K $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ ( $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ ) $≥$ $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ ( ) - $($ $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ ( ) - $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ ( 1 ) \ $displaystyle \Lambda$ ( $S)=\log$ S +K $(S)\geq K($ x $\Lambda (S)=\log |S|+K(S)\geq K(x)-O(1)$ )-O(1)는 모델 S의 도움을 받아 x의 총 길이 이다.

주재산

구조 함수는 복잡성의 각 $\alpha$ 수준 $\alpha$ $\alpha$ 에서 $O(\log n)$ $){\displaystyle O(\log n)}$ 의 스트립 내에서 개별 문자열 x에 대해 가장 $O(\log n)$ 좋은 모델 S를 선택하도록 허용하지만 큰 확률은 아니다.^[3]

통계적용

위에서 개발된 수학은 발명가 조르마 리사넨에 의해 MDL의 기초가 되었다.^[5]

확률 모형

모든 계산 가능한 확률 분포 $P$ $P$ 에 $P$ 대해 다음 사항을 증명할^[6] 수 있다.

-\log P(x)=\log |S|+O(\log n)

-\log P(x)=\log |S|+O(\log n)

-\log P(x)=\log |S|+O(\log n)

-\log P(x)=\log |S|+O(\log n)

( x

-\log P(x)=\log |S|+O(\log n)

)

-\log P(x)=\log |S|+O(\log n)

=

-\log P(x)=\log |S|+O(\log n)

-\log P(x)=\log |S|+O(\log n)

-\log P(x)=\log |S|+O(\log n)

+ O

-\log P(x)=\log |S|+O(\log n)

(

-\log P(x)=\log |S|+O(\log n)

-\log P(x)=\log |S|+O(\log n)

n

-\log P(x)=\log |S|+O(\log n)

)

{\displaystyle -\log P(x)=\log S +O(\log

n

)}

.

For example, if $P$ is some computable distribution on the set $S$ of strings of length $n$ , then each $x\in S$ has probability ${\displaystyle P(x)=\exp(O(\log n))/ S$ $=n^{O(1)/S }$ . 콜모고로프의 구조함수는

h'_{x}(\alpha )=\min _{P}\{-\log P(x):P(x)>0,K(P)\leq \alpha \}}

길이 n의− 로그 ⁡ P())>와 함께 있습니다 x은 이진 문자열에 P의 x{\displaystyle)}, K){K(P)\displaystyle}은 콜모고로프 복잡도{년에 P{P\displaystyle}은 심사 숙고되는 모델(n의 계산할 수 있는 확률{\displaystyle n}-length 문자열)0{\displaystyle -\log P())>0}.displa $ystyle$ $P}$ 및 $P$ $\alpha$ $\alpha$ 은 사색된 P $P$ 의 복잡성을 경계하는 $P$ 정수 값이다 $\alpha$ .Clearly, this function is non-increasing and reaches $\log \{x\} =0$ for $\alpha =K(x)+c$ where c is the required number of bits to change $x$ into $\{x\}$ and $K(x)$ is the Kolmogor $x$ 의 난형 복잡성 ${\displaystyle$ x $x$ Then $h'_{x}(\alpha )=h_{x}(\alpha )+O(\log n)$ . For every complexity level $\alpha$ the function $h'_{x}(\alpha )$ is the Kolmogorov complexity version of the maximum likelihood (ML).

주재산

구조 함수가 복잡성의 각 $\alpha$ 수준 $\alpha$ $\alpha$ 에서 $($ $O(\log n)$ $O(\log n)$ ${\displaystyle x$ $}$ 에 대한 최상의 $S$ $x$ 모델 $S$ $S$ 을(를 $)$ 큰 확률은 아니지만 확실성 있게 $O(\log n)$ 선택할 수 있다는 것이 입증되었다.^[3]

MDL 변종 및 확률 모델

MDL 함수:The length of the minimal two-part code for x consisting of the model cost K(P) and the length of $-\log P(x)$ , in the model class of computable probability mass functions of given maximal Kolmogorov complexity $\alpha$ , the complexity of P upper bounded by ${\displaystyle \alph$ $}$ 은 $\alpha$ 는) MDL 함수 또는 제한된 MDL 추정기에 의해 제공됨:

\lambda '_{x}(\alpha )=\min _{P}\\lambda(P):P(x)>0,\;K(P)\leq \alpha \}

where $\Lambda (P)=-\log P(x)+K(P)\geq K(x)-O(1)$ is the total length of two-part code of x with help of model P.

주재산

MDL 함수는 복잡성의 각 $\alpha$ 수준 $\alpha$ $\alpha$ 에서 $O(\log n)$ $){\displaystyle$ O $(\log n)}$ 의 스트립 내에서 개별 문자열 x에 대해 가장 $O(\log n)$ 적합한 모델 P를 선택할 수 있다는 것이 입증되었다.^[3]

변형률 및 변성률로 확장

Kolmogorov 복잡성을 이용하여 개별 유한 시퀀스의 속도 왜곡과 개별 유한 시퀀스의^[7] 변성 이론으로 확장될 수 있는 것으로 나타났다.실제 압축기 프로그램을 이용한 실험은 성공리에 진행되었다.^[8]여기서 가정은 자연 데이터의 경우 Kolmogorov 복잡성이 양호한 압축기를 사용하는 압축 버전의 길이에서 멀지 않다는 것이다.

참조

^ ^a ^b ^c Cover, Thomas M.; Thomas, Joy A. (1991). Elements of information theory. New York: Wiley. pp. 175–178. ISBN 978-0471062592.
^ Uspekhi Mat의 모스크바 수학 학회를 위한 연설의 추상화.나우크 29권, 모스크바 수학 학회의 통신 155페이지의 이슈 4(178) (러시아어 판에서 영어로 번역되지 않음)
^ ^a ^b ^c ^d ^e ^f ^g Vereshchagin, N.K.; Vitanyi, P.M.B. (1 December 2004). "Kolmogorov's Structure Functions and Model Selection". IEEE Transactions on Information Theory. 50 (12): 3265–3290. arXiv:cs/0204037. doi:10.1109/TIT.2004.838346.
^ Gacs, P.; Tromp, J.T.; Vitanyi, P.M.B. (2001). "Algorithmic statistics". IEEE Transactions on Information Theory. 47 (6): 2443–2463. arXiv:math/0006233. doi:10.1109/18.945257.
^ Rissanen, Jorma (2007). Information and complexity in statistical modeling (Online-Ausg. ed.). New York: Springer. ISBN 978-0-387-36610-4.
^ A.Kh. Shen, Kolmogorov 감각에서의 (α, β)-stopchasticity의 개념과 그 속성인 소비에트 수학.독, 28:1(1983년), 295--299년
^ Vereshchagin, Nikolai K.; Vitanyi, Paul M.B. (1 July 2010). "Rate Distortion and Denoising of Individual Data Using Kolmogorov Complexity". IEEE Transactions on Information Theory. 56 (7): 3438–3454. arXiv:cs/0411014. doi:10.1109/TIT.2010.2048491.
^ de Rooij, Steven; Vitanyi, Paul (1 March 2012). "Approximating Rate-Distortion Graphs of Individual Data: Experiments in Lossy Compression and Denoising". IEEE Transactions on Computers. 61 (3): 395–407. arXiv:cs/0609121. doi:10.1109/TC.2011.25.

문학

Cover, T.M.; P. Gacs; R.M. Gray (1989). "Kolmogorov's contributions to Information Theory and Algorithmic Complexity". Annals of Probability. 17 (3): 840–865. doi:10.1214/aop/1176991250. JSTOR 2244387.
Kolmogorov, A. N.; Uspenskii, V. A. (1 January 1987). "Algorithms and Randomness". Theory of Probability and Its Applications. 32 (3): 389–412. doi:10.1137/1132060.
Li, M., Vitányi, P.M.B. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). New York: Springer. ISBN 978-0387339986., 특히 Kolmogorov 구조 기능에 대한 페이지 401–431과 개별 시퀀스의 속도 왜곡 및 변위에 대한 페이지 613–629.
Shen, A. (1 April 1999). "Discussion on Kolmogorov Complexity and Statistical Analysis". The Computer Journal. 42 (4): 340–342. doi:10.1093/comjnl/42.4.340.
V'yugin, V.V. (1987). "On Randomness Defect of a Finite Object Relative to Measures with Given Complexity Bounds". Theory of Probability and Its Applications. 32 (3): 508–512. doi:10.1137/1132071.
V'yugin, V. V. (1 April 1999). "Algorithmic Complexity and Stochastic Properties of Finite Binary Sequences". The Computer Journal. 42 (4): 294–317. doi:10.1093/comjnl/42.4.294.

[CT91-1] Cover, Thomas M.; Thomas, Joy A. (1991). Elements of information theory. New York: Wiley. pp. 175–178. ISBN 978-0471062592.

[2] Uspekhi Mat의 모스크바 수학 학회를 위한 연설의 추상화.나우크 29권, 모스크바 수학 학회의 통신 155페이지의 이슈 4(178) (러시아어 판에서 영어로 번역되지 않음)

[VV04-3] ^ ^a ^b ^c ^d ^e ^f ^g Vereshchagin, N.K.; Vitanyi, P.M.B. (1 December 2004). "Kolmogorov's Structure Functions and Model Selection". IEEE Transactions on Information Theory. 50 (12): 3265–3290. arXiv:cs/0204037. doi:10.1109/TIT.2004.838346.

[4] Gacs, P.; Tromp, J.T.; Vitanyi, P.M.B. (2001). "Algorithmic statistics". IEEE Transactions on Information Theory. 47 (6): 2443–2463. arXiv:math/0006233. doi:10.1109/18.945257.

[5] Rissanen, Jorma (2007). Information and complexity in statistical modeling (Online-Ausg. ed.). New York: Springer. ISBN 978-0-387-36610-4.

[6] A.Kh. Shen, Kolmogorov 감각에서의 (α, β)-stopchasticity의 개념과 그 속성인 소비에트 수학.독, 28:1(1983년), 295--299년

[7] Vereshchagin, Nikolai K.; Vitanyi, Paul M.B. (1 July 2010). "Rate Distortion and Denoising of Individual Data Using Kolmogorov Complexity". IEEE Transactions on Information Theory. 56 (7): 3438–3454. arXiv:cs/0411014. doi:10.1109/TIT.2010.2048491.

[8] Rooij, Steven; Vitanyi, Paul (1 March 2012). "Approximating Rate-Distortion Graphs of Individual Data: Experiments in Lossy Compression and Denoising". IEEE Transactions on Computers. 61 (3): 395–407. arXiv:cs/0609121. doi:10.1109/TC.2011.25.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Search