러프 세트

컴퓨터 과학에서, 폴란드의 컴퓨터 과학자 Zdziswaw I. Pawlak에 의해 처음 기술된 대략적인 집합은 원래 집합의 낮은 근사치와 높은 근사치를 제공하는 한 쌍의 집합의 관점에서 바삭바삭한 집합(즉, 관습 집합)의 형식적인 근사치다. 대략적인 집합 이론의 표준 버전(Pawlak 1991)에서 하한 및 상한 근사치 집합은 바삭바삭한 집합이지만, 다른 변형에서 근사치 집합은 퍼지 집합일 수 있다.

정의들

다음 절에는 Zdziswaw I. Pawlak이 원래 제안한 것과 같이 대략적인 집합 이론의 기본 틀에 대한 개요와 몇 가지 핵심 정의가 수록되어 있다. 러프 집합의 더 공식적인 특성과 경계가 Pawlak(1991)에서 찾을 수 있으며 인용된 참고문헌을 인용할 수 있다. 러프 세트의 초기 및 기본 이론은 보다 최근의 확장 및 일반화와 구별하기 위한 수단으로서 "폴락 러프 세트" 또는 "클래식 러프 세트"라고 부르기도 한다.

정보 시스템 프레임워크

Let $I=(\mathbb {U} ,\mathbb {A} )$ be an information system (attribute-value system), where $\mathbb {U}$ is a non-empty, finite set of objects (the universe) and $\mathbb {A}$ is a non-empty, finite set of attributes such that ${\d$ $isplaystyle I:\mathb {U}$ $a\in \mathbb {A}$ $rightarrow V_{a}$ $a\in \mathbb {A}$ $a\in \mathbb {A}$ 마다 $I:\mathb {U}\오른쪽 화살표 V_{a}$ ${\$ $V_{a}$ a ${\$ 는 $V_{a}$ $a$ {\ $displaystystyle a}$ 의 속성 집합이다. 정보 테이블은 V $V_{a}$ ${\$ $V_$ ${a}$ 의 $a(x)$ 값 $a(x)$ ( $a(x)$ x $a(x)$ ) ${\displaystyle$ a $}$ 과 $a$ universe $\mathbb {U}$ {\ $displaystyle \mathb {U}$ 의 $x$ $개체$ x ${\displaystyle$ x $}$ 에 $V_{a}$ 값을 할당한다 $\mathbb {U}$

$P\subseteq \mathbb {A}$ $P\subseteq \mathbb {A}$ $P\subseteq \mathbb {A}$ ${\$ 과(와) 연관된 동등성 관계 $\mathrm {IND} (P)$ $\mathrm {IND} (P)$ $\mathrm {IND}(P)$ 이 $P\subseteq {\mathb{A}}$ (가) 있음 $\mathrm {IND} (P)$

\displaystyle \mathrm {IND}=\왼쪽\{(x,y)\in \mathb {U}^{2}\mid \in a,a(x)=a(y)\right\}}}}}

관계 $\mathrm {IND} (P)$ $\mathrm {IND} (P)$ $\mathrm {IND} (P)$ ( $\mathrm {IND} (P)$ ) $\mathrm {IND} (P)$ 을(를) P $P$ - 불분명한 $p$ 관계라고 한다 $\mathrm {IND} (P)$ . The partition of $\mathbb {U}$ is a family of all equivalence classes of $\mathrm {IND} (P)$ and is denoted by $\mathbb {U} /\mathrm {IND} (P)$ (or $\mathbb {U} /P$ ).

$(x,y)\in \mathrm {IND} (P)$ $(x,y)\in \mathrm {IND} (P)$ , $(x,y)\in \mathrm {IND} (P)$ ) $(x,y)\in \mathrm {IND} (P)$ $(x,y)\in \mathrm {IND} (P)$ N $(x,y)\in \mathrm {IND} (P)$ ( P $(x,y)\in \mathrm {IND} (P)$ ) ${\displaystyle (x,y)\in \mathrm {IND} (P$ p $x$ 및 $y$ ${\\displaystyle y}$ 이( $P$ )P {\\ $displaystystyle P}$ 의 속성으로 구별할 수 없거나 구별할 수 없는 경우.

$P$ $P$ - 불분명한 $P$ 관계에서 동등성 등급은 $[x]_{P}$ [ $[x]_{P}$ $[x]_{P}$ P ${\$ 로 표시된다. $P}.$

예제: 동등성 등급 구조

예를 들어 다음 정보 표를 참조하십시오.

샘플 정보 시스템

오브젝트	${\displaystyle P_{1}.$	${\displaystyle P_{2}}:$	$P_{3}$	$P_{4}$	$P_{5}$
${\displaystyle O_{1}.$	1	2	0	1	1
${\displaystyle O_{2}}:$	1	2	0	1	1
$O_{3}$	2	0	0	1	0
$O_{4}$	0	0	1	2	1
$O_{5}$	2	1	0	2	1
$O_{6}$	0	0	1	2	2
$O_{7}$	2	0	0	1	0
$O_{8}$	0	1	2	2	1
$O_{9}$	2	1	0	2	2
$O_{10}$	2	0	0	1	0

속성 $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ = $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ { P $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ , $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ P $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ , $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ P 3, $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ , $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}}}$ 의 전체 집합을 고려할 때 다음과 같은 7가지 동등성 클래스가 있음을 알 수 있다.

디스플레이 스타일 {\displaystyle}\{O_{1},O_{2}\}\\{O_{3}\{O_{7},O_{10}\}\\\\{O_{4}\\\\{O_{5}\\\}\\{O_{6}\}\\\{O_{8}\}\\\{O_{8}\}\\}\}\}\}\}\{{{O_{O_{}\}}}}}}}}}}}}}\O_{9}\}\end{case}}}

Thus, the two objects within the first equivalence class, $\{O_{1},O_{2}\}$ , cannot be distinguished from each other based on the available attributes, and the three objects within the second equivalence class, $\{O_{3},O_{7},O_{10}\}$ , cannot be dist이용 가능한 속성에 근거하여 서로에게서 빼앗겼다. 나머지 5개의 물체는 각각 다른 모든 물체와 구별할 수 있다.

다른 속성 부분집합 선택은 일반적으로 서로 다른 불분명한 등급으로 이어질 것이 분명하다. 예를 들어 속성 $P=\{P_{1}\}$ = $P=\{P_{1}\}$ { $P=\{P_{1}\}$ $P=\{P_{1}\}$ $P=\{P_{1}\}$ $P=\{P_{1}\$ 만 $P=\{P_{1}}$ 선택하면 다음과 같은 훨씬 더 강력한 동등성 등급 구조를 얻는다.

디스플레이 스타일 {\displaystyle}\{O_{1},O_{2}\}\\\{O_{3}\\{O_{5},O_{7},O_{9},O_{10}\}\\\\{O_{4}},O_{6}\},O_{8}\}\case{case}}}}}}}}}}}}}

러프 집합의 정의

$X\subseteq \mathbb {U}$ $X\subseteq \mathbb {U}$ $X\subseteq \mathbb {U}$ ${\$ 를) 특성 $부분$ 집합 P ${\displaystyle$ $P$ $}$ 을(를) 사용하여 나타내기를 원하는 대상 집합이 되도록 $X\subseteq \mathbb {U}$ 하자 $P$ 즉, $X$ 의 개체 집합 X ${\displaystyle$ X $}$ 은 단일 클래스로 구성되며 $X$ , 동등성 클래스 유도 클래스를 사용하여 이 클래스(즉 이 부분 집합)를 표현하고자 한다. $속성$ 하위 집합 $P$ ${\displaystyle$ $P}$ 을(를 $P$ 기준으로 구분할 수 없는 개체를 집합에 포함하거나 제외할 수 있기 때문에 $X$ 으로 X ${\displaystyle$ $X}$ 을(를) 정확하게 표현할 수 없다 $X$ $P$

For example, consider the target set $X=\{O_{1},O_{2},O_{3},O_{4}\}$ , and let attribute subset $P=\{P_{1},P_{2},P_{3},P_{4},P_{5}\}$ , the full available set of features. $[x]_{P},$ $[x]_{P},$ ] P , {\ $displaystyle [x]_{P}}$ 에서 $[x]_{P},$ 개체 $\{O_{3},O_{7},O_{10}\}$ { O $\{O_{3},O_{7},O_{10}\}$ , $\{O_{3},O_{7},O_{10}\}$ $\{O_{3},O_{7},O_{10}\}$ , O $\{O_{3},O_{7},O_{10}\}$ $\{O_{3},O_{7},O_{10}\}$ ${\displaystyle$ \{ $O_{3},O_{7},O_{10}\}}}}}}$ 은 $\{O_{3},O_{7},O_{{10}\}$ (는) 명확하게 표현할 수 없기 때문에 설정된 $X$ 를 정확하게 표현할 수 없다 $X$ . 따라서 $O_{3}$ $O_{7}$ ${\$ 을 포함하지만 $O_{3}}$ O $O_{7}$ ${\$ 과 $O_{7}}$ $O_{10}$ $O_{10}$ ${\$ 은 제외되는 $X$ 집합X {\ $displaystyle X}$ 을(를) 나타낼 방법이 없다 $O_{10}$

단 $X$ $X$ ${\$ $displaystyle X$ $}$ 의 P{\ $displaystyle$ P} -lower $P$ $및$ P{\ $displaystyle$ P} -upper $P$ 근사치를 구성하여 $P$ $P$ $P$ 에 포함된 정보만 사용하여 대상 $집합$ X{\ $displaystyle X$ $}$ 의 근사치를 추정할 수 있다 $X$ .

{\underline{P}X=\{x\mid [x]_{P}\subseteq X\}

{\overline{P}X=\{x\mid [x]_{P}\cap X\neq \emptyset \}

낮은 근사치 및 양의 영역

$P$ $P$ -하한 $P$ 근사치 또는 양의 영역은 $[x]_{P}$ [ x $[x]_{P}$ $[x]_{P}$ ${\$ 의 모든 동등성 클래스의 조합이다. ${P}}$ which are contained by (i.e., are subsets of) the target set – in the example, ${\underline {P}}X=\{O_{1},O_{2}\}\cup \{O_{4}\}$ , the union of the two equivalence classes in ${\displaystyle [x]_$ 대상 $[x]_{P}$ 집합에 포함된 ${P}.$ 하한 근사치는 $대상$ $\mathbb {U} /P$ $\mathbb {U} /P$ ${\$ $displaystyle$ $\mathb$ { $U}$ $/P}$ 에 속할 수 있는 $\mathbb {U} /P$ 개체의 전체 집합이다 $X$

상한 근사치 및 음영 영역

$P$ $P$ -upper $P$ 근사치는 $[x]_{P}$ [ $[x]_{P}$ P ${\$ 의 모든 동등성 클래스의 결합이다. ${P}}$ which have non-empty intersection with the target set – in the example, ${\overline {P}}X=\{O_{1},O_{2}\}\cup \{O_{4}\}\cup \{O_{3},O_{7},O_{10}\}$ , the union of the three equivalence classes in ${\displaystyle [x]_$ 목표값 집합과 비어 있지 않은 교차점이 있는 $[x]_{P}$ ${P}.$ 상위 근사치는 $\mathbb {U} /P$ $\mathb {U} /P$ 에서 ${\mathbb {U}}/P$ 목표 집합 $X$ ${\$ $displaystyle$ {\ $displaystyle$ ${\X$ $})$ 의 보완물( $})$ 에 속한다고 분류할 수 없는 개체의 전체 집합이다 $X$ 즉, 상위 근사치는 com이다.대상 집합 $X$ {\ $displaystyle X}$ 의 멤버일 가능성이 있는 객체의 플레인 집합 $X$

$\mathbb {U} -{\overline {P}}X$ U $\mathbb {U} -{\overline {P}}X$ - $\mathbb {U} -{\overline {P}}X$ $\mathbb {U} -{\overline {P}}X$ $\mathb {U} -{\overline {P}X$ 집합은 음의 영역을 나타내며, 대상 집합의 멤버로 확실히 배제할 수 있는 개체 집합을 포함한다.

경계 영역

설정 차이 ${\overline {P}}X-{\underline {P}}X$ - ${\overline {P}}X-{\underline {P}}X$ ${\overline {P}}X-{\underline {P}}X$ ${\p}X-{\underline{P}X$ 에 의해 주어진 경계 영역은 대상 집합 $X$ $X$ 의 구성원으로서 배제하거나 배제할 수 없는 개체들로 구성되어 있다 $X$

요약하면, 목표 집합의 하위 근사치는 집합의 구성원으로 확실히 식별할 수 있는 개체들로만 구성된 보수적인 근사치다. (이 개체들은 목표 집합에서 제외되는 명백한 "클론"을 가지고 있지 않다.) 상한 근사치는 목표 집합의 구성원이 될 수 있는 모든 객체를 포함하는 자유도 근사치(상위 근사치의 일부 객체는 목표 집합의 구성원이 아닐 수 있음)이다. $\mathbb {U} /P$ / $\mathbb {U} /P$ $\mathb {U} /P$ 의 관점에서 $\mathbb {U} /P$ 하한 근사치에는 확실성이 있는 목표 집합의 멤버인 객체(확률 = 1)가 포함된 반면, 상한 근사치에는 0이 아닌 확률(확률 이상)의 목표 집합의 멤버인 객체가 포함되어 있다.

러프 세트

The tuple $\langle {\underline {P}}X,{\overline {P}}X\rangle$ composed of the lower and upper approximation is called a rough set; thus, a rough set is composed of two crisp sets, one representing a lower boundary of the target set $X$ , and the other representing an upper boun $대상$ 집합 X $X$ 의 데리 $X$

세트 $X$ $X$ 의 대략적인 표현 정확도는 다음을 통해 얻을 수 있다 $X$ (Pawlak 1991).

\alpha _{P}(X)={\frac {\좌측 {P}X\우측}{\좌측 {\오버라인 {P}X\우측}}}}}

That is, the accuracy of the rough set representation of $X$ , $\alpha _{P}(X)$ , $0\leq \alpha _{P}(X)\leq 1$ , is the ratio of the number of objects which can positively be placed in $X$ to the number of objects that can possially를 $X$ {\ $displaystyle$ X}에 배치한다. $X$ – 이는 대략적인 집합이 목표 집합에 근접한 정도를 측정한다 $.$ 분명히, 상한과 하한 근사치가 같을 때( $\alpha _{P}(X)=1$ , 경계 영역이 비어 있는 경우), 그 다음 $\alpha _{P}(X)=1$ $\alpha _{P}(X)=1$ ( X $\alpha _{P}(X)=1$ ) $\alpha _{P}(X)=1$ = $\alpha _{P}(X)=1$ ${\displaystyle \alpha _{P}(X)=1$ 그리고 근사치가 완벽할 때, 다른 극단에서는 하한 근사치가 비어 있을 때마다 정확도가 0(상위 근사치의 크기와 무관)이다.

객관적 분석

개략 집합 이론은 확률, 통계, 엔트로피, 뎀프스터-샤퍼 이론의 전통적인 방법보다 덜 보편적이긴 하지만 불확실한 (막연한) 시스템을 분석하기 위해 채택될 수 있는 많은 방법들 중 하나이다. 그러나 고전적인 러프 집합 이론을 사용하는 주요 차이점과 독특한 강점은 객관적인 분석 형태를 제공한다는 것이다(Pawlak et al. 1995). 다른 방법과는 달리, 위에서 제시된 것과 달리, 고전적인 대략적인 집합 분석은 정해진 멤버십을 결정하기 위해 추가 정보, 외부 매개변수, 모델, 기능, 등급 또는 주관적 해석을 필요로 하지 않는다. 대신 주어진 데이터 내에서 제시된 정보만을 사용한다(Düntsch와 Gediga 1995). 지배력 기반, 의사결정 이데아틱 및 퍼지 러프 집합과 같은 거친 집합 이론의 보다 최근의 적응은 분석에 더 주관성을 도입했다.

정의 가능성

일반적으로 상한과 하한 근사치가 같지 않다. 이러한 경우 우리는 목표 $집합$ X ${\displaystyle$ X $}$ 은(는) 특성 $집합$ P $P$ 에서 정의되지 않거나 $X$ 대략적으로 정의할 수 없다고 말한다 $P$ 상한과 하한 근사치가 같을 때(즉, 경계가 비어 있을 때), P의 ${\overline {P}}X={\underline {P}}X$ = ${\overline {P}}X={\underline {P}}X$ ${\overline {P}}X={\underline {P}}X$ X ${\displaystylean {\{\{$ } $X={\underline{P}X$ 그러면 속성 $집합$ $P$ ${\displaystyle$ X $}$ 에서 대상 $집합$ X ${\displaystyle$ X}을(를 $X$ ) 정의할 수 있다 $P$ 정의하기 어려운 다음과 같은 특수한 경우를 구별할 수 있다.

Set $X$ is internally undefinable if ${\underline {P}}X=\emptyset$ and ${\overline {P}}X\neq \mathbb {U}$ . This means that on attribute set $P$ , there are no objects which we can be certain belong to target set ${\disp$ $레이스타일 X$ 그러나 우리가 세트 $X$ $X$ 에서 확실히 제외할 수 있는 개체가 있다 $X$
Set $X$ is externally undefinable if ${\underline {P}}X\neq \emptyset$ and ${\overline {P}}X=\mathbb {U}$ . This means that on attribute set $P$ , there are objects which we can be certain belong to target set ${\display$ $스타일 X$ 그러나 세트 $X$ $X$ 에서 확실히 제외할 수 있는 개체는 없다 $X$
Set $X$ is totally undefinable if ${\underline {P}}X=\emptyset$ and ${\overline {P}}X=\mathbb {U}$ . This means that on attribute set $P$ , there are no objects which we can be certain belong to target set ${\displaystyl$ $e X}$ , 그리고 우리가 $세트$ X{\ $displaystyle$ X}에서 확실히 제외할 수 있는 개체가 없다 $X$ 따라서 속성 $집합$ P{\ $displaystyle P}$ 에서 우리는 어떤 $X$ 가 X ${\displaystyle$ X}의 멤버인지 아닌지를 결정할 수 없다 $X$

환원 및 코어

흥미로운 질문은 정보시스템(속성-값표)에 다른 속성보다 동등성 등급 구조에 표현된 지식에서 더 중요한 속성이 있는지 여부다. 종종 우리는 데이터베이스에 있는 지식을 완전히 특성화할 수 있는 속성의 하위 집합이 있는지 궁금해한다. 그러한 속성 집합을 환원제라고 한다.

공식적으로 환원제는 R $\mathrm {RED} \subseteq P$ $\mathrm {RED} \subseteq P$ $\mathrm {RED} \subseteq P$ P ${\displaystyle \mathrm$ { $RED} \subseteq P}$ 속성의 $\mathrm {RED} \subseteq P$ 하위 집합이다.

$[x]_{\mathrm {RED} }$ $[x]_{\mathrm {RED} }$ $[x]_{\mathrm {RED} }$ $[x]_{\mathrm {RED} }$ $[x]_{\mathrm {RED} }$ $[x]_{\mathrm {RED} }$ ${\$ = $[x]_{P}$ [ $[x]_{P}$ $[x]_{P}$ P ${\$ ${P$ 즉 감소된 속성 집합 R $\mathrm {RED}$ $\mathrm {RED}$ ${\$ { $RED}$ 에 의해 유도된 동등성 등급은 전체 속성 집합 $P$ $P$ 에 의해 유도된 동등성 등급 구조와 동일하다 ${\mathrm {RED}}$ $P$
속성 집합 $\mathrm {RED}$ $\mathrm {RED}$ $\mathrm {RED}$ ${\$ $[x]_{(\mathrm {RED} -\{a\})}\neq [x]_{P}$ $\mathrm {RED}$ { $[x]_{(\mathrm {RED} -\{a\})}\neq [x]_{P}$ $}$ 은 $[x]_{(\mathrm {RED} -\{a\})}\neq [x]_{P}$ 는 $[x]_{(\mathrm {RED} -\{a\})}\neq [x]_{P}$ [ $[x]_{(\mathrm {RED} -\{a\})}\neq [x]_{P}$ P ${\$ 은(는) 최소값이다 ${\mathrm {RED}}$ $.$ $a\in \mathrm {RED}$ R E $[x]_{(\mathrm {RED} -\{a\})}\neq [x]_{P}$ $a\in \mathrm {RED}$ $[x]_{(\mathrm {RED} -\{a\})}\neq [x]_{P}$ ${\$ 즉, $\mathrm {RED}$ 클래스를 변경하지 않고 $\mathrm {RED}$ 설정된 R E $\mathrm {RED}$ ${\$ $displaystyle$ $\mathrm$ { $RED}$ 에서 어떤 속성도 제거할 수 없다 $x$ $P}.$

환원제는 범주 구조를 나타내기에 충분한 특징 집합으로 생각할 수 있다. 위의 예제 표에서 속성 집합 $\{P_{3},P_{4},P_{5}\}$ { $\{P_{3},P_{4},P_{5}\}$ 3, $\{P_{3},P_{4},P_{5}\}$ P $\{P_{3},P_{4},P_{5}\}$ , $\{P_{3},P_{4},P_{5}\}$ $\{P_{3},P_{4},P_{5}\}$ $\{P_{3},P_{4},P_{5}\}}$ 은 $\{P_{3},P_{4},P_{5}\}$ (는) 환원제임 – 이러한 속성에만 투영된 정보 시스템은 전체 속성 집합에 의해 표현된 것과 동일한 동등성 클래스 구조를 갖는다.

디스플레이 스타일 {\displaystyle}\{O_{1},O_{2}\}\\{O_{3}\{O_{7},O_{10}\}\\\\{O_{4}\\\\{O_{5}\\\}\\{O_{6}\}\\\{O_{8}\}\\\{O_{8}\}\\}\}\}\}\}\{{{O_{O_{}\}}}}}}}}}}}}}\O_{9}\}\end{case}}}

Attribute set $\{P_{3},P_{4},P_{5}\}$ is a reduct because eliminating any of these attributes causes a collapse of the equivalence-class structure, with the result that ${\displaystyle [x]_{\mathrm {RED} }\neq [x]_$ $P}.$

정보시스템의 환원은 고유하지 않다. 정보시스템에 표현된 동등성급 구조(즉, 지식)를 보존하는 속성의 하위 집합이 많을 수 있다. 위의 예시 정보시스템에서 또 다른 환원제는 { $\{P_{1},P_{2},P_{5}\}$ $\{P_{1},P_{2},P_{5}\}$ , $\{P_{1},P_{2},P_{5}\}$ P $\{P_{1},P_{2},P_{5}\}$ , $\{P_{1},P_{2},P_{5}\}$ $\{P_{1},P_{2},P_{5}\}$ $\{P_{1},P_{2},P_{5}\}$ ${\displaystyle \{P_{1},P_{2$ }, $P_{5}\}\}}}$ 이며 $[x]_{P}$ [ $[x]_{P}$ $[x]_{P}$ $[x]_{P}$ ${\$ 와 동일한 동등 등급 구조를 생성한다. $P}.$

모든 환원제에 공통되는 속성 집합을 코어라고 한다. 코어(core)는 모든 환원제가 보유하고 있는 속성 집합이므로 등가 등급 구조의 붕괴를 초래하지 않고는 정보시스템에서 제거할 수 없는 속성으로 구성된다. 핵심을 필수 속성 집합(즉, 범주 구조가 표현되기 위해 필요함)으로 생각할 수 있다. 이 예에서 그러한 속성만 $\{P_{5}\}$ { $\{P_{5}\}$ $\{P_{5}\}$ $\{P_{5}\}$ $\{P_{5}\}}$ 이다 $\{P_{5}\}$ 다른 속성 중 하나는 동등성 등급 구조를 손상시키지 않고 개별적으로 제거할 수 있으며, 따라서 이러한 속성들은 모두 불필요한 것이다. 그러나 $\{P_{5}\}$ P $}}$ {\ $displaystyle \{P_{$ 5}\}}을(를 $\{P_{5}\}$ ) 제거하는 것 자체로 동등성 등급 구조가 변경되므로 $\{P_{5}\}$ { $\{P_{5}\}$ 5 $\{P_{5}\}$ {\ $displaystyle \{P_{5$ }\}\}}}}}은 $\{P_{5}\}$ (는) 이 정보 시스템의 필수불가결한 속성이며, 따라서 핵심이다.

코어가 비어 있는 것은 가능한데, 이는 필수불가결한 속성이 없다는 것을 의미한다: 그러한 정보 시스템의 어떤 하나의 속성도 등가 등급 구조를 변경하지 않고 삭제할 수 있다. 이 경우, 클래스 구조가 표현되는 데 필요한 본질적 또는 필요한 속성은 없다.

속성 종속성

데이터베이스 분석이나 데이터 획득의 가장 중요한 측면 중 하나는 속성 의존성의 발견이다. 즉, 우리는 어떤 변수가 어떤 다른 변수와 강하게 연관되어 있는지 발견하기를 원한다. 일반적으로, 이러한 강한 관계는 추가 조사를 보장하며, 이는 궁극적으로 예측 모델링에 유용하게 사용될 것이다.

대략적인 집합 이론에서, 의존성의 개념은 매우 간단하게 정의된다. 두 개의 (분리) 속성 집합을 $취해서$ P $P$ 와 $P$ Q ${\displaystyle Q$ 를 설정하고 이들 속성 간에 어느 정도의 종속성을 얻는지 알아보자. 각 속성 집합은 $[x]_{P}$ [x $[x]_{P}$ $[x]_{P}$ ${\$ 가 부여한 $P$ $P$ $P$ 에 의해 유도된 동등성 등급인 (불확실성) 동등성 등급 구조를 유도한다. ${P}$ , 그리고 [ $[x]_{Q}$ $[x]_{Q}$ Q ${\$ 부여한 $Q$ $Q$ $Q$ 에 의해 유도된 동등성 등급 $Q}.$

$[x]_{Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}$ [ $[x]_{Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}$ x $[x]_{Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}$ = { $[x]_{Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}$ $[x]_{Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}$ , $[x]_{Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}$ Q $[x]_{Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}$ , $[x]_{Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}$ Q $[x]_{Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}$ , $[x]_{Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}$ N $}$ {\ $displaystyle$ [ $x$ _ ${Q}=\{Q_{1},Q_{2},Q_{3},\dots ,Q_{N}\}}$ , where $Q_{i}$ is a given equivalence class from the equivalence-class structure induced by attribute set $Q$ . Then, the dependency of attribute set $Q$ on attribute set $P$ , ${\displays$ $tyle \gamma _{P}(Q)}$ 는 다음에 의해 주어진다 $\gamma _{P}(Q)$

\gamma \{P}(Q)={\frac {\sum _{i=1}^{N}\왼쪽 {\underline {P}Q_{i}\오른쪽 {\\mathb {U}\오른쪽 \}}}\mathb {}\leq 1

즉 $[x]_{Q}$ [ $[x]_{Q}$ $[x]_{Q}$ $[x]_{Q}$ ${\$ 의 $Q_{i}$ 각 동등성 클래스 $Q_{i}$ $Q_{i}$ ${\$ 에 대해 ${Q$ $P$ ${\displaystyle P$ 즉 ${\underline {P}}Q_{i}$ ${\underline {P}}Q_{i}$ Q i ${\$ 의 속성에 따라 하한 근사치의 크기를 더한다 ${\underline {P}}Q_{i}$ 이 근사치( $X$ 의 임의 세트 X ${\displaystyle$ X $}$ 에 대해 위와 같이 $X$ 는 $속성$ 집합 P {\ $displaystyle$ P $}$ 에서 표적 $Q_{i}$ Q i {\ $displaystyle Q_$ {i $}}$ 에 속한다고 확실하게 식별할 수 있는 개체 $P$ 수입니다 $[x]_{Q}$ [ $[x]_{Q}$ $[x]_{Q}$ $[x]_{Q}$ ${\$ 의 모든 동등성 클래스에 추가됨위의 $분자$ 는 $[x]_{Q}$ 속성 집합 $P$ ${\displaystyle$ P $}$ 에 $기초$ 하여 속성 $P$ Q $Q$ 에 의해 유도된 분류에 따라 긍정적으로 분류될 수 있는 총 개체 수를 나타낸다 $Q$ 따라서 종속 비율은 그러한 분류 가능한 개체의 비율(전체 우주 내)을 나타낸다. 종속성 $\gamma _{P}(Q)$ $\gamma _{P}(Q)$ ( $\gamma _{P}(Q)$ ) $\gamma _{P}(Q)$ $\gamma _{P}(Q)$ "은 정보 시스템에서 $그러한$ 개체의 비율로 해석할 수 있으며, 이 경우 $Q$ $Q$ 의 속성 값을 결정하기에 $P$ 충분하다. $Q$

의존성을 고려하는 또 다른 직관적이고 직관적인 $Q$ 은 Q $Q$ 에 $Q$ 의해 유도된 파티션을 $대상$ 클래스 C{\ $displaystyle$ C $C$ 그리고 $P$ ${\$ $displaystyle P}$ 을(를) 대상 $클래스$ C ${\\\displaystyle$ C $}$ 을(를) "재구축"하기 $P$ 위해 사용하고자 하는 속성 집합으로 $P$ 간주하는 것이다 $C$ 만약 P ${\$ . $실제로$ $C$ $C$ 을 $C$ 를) 재구성하면 $Q$ $Q$ 이(가) $P$ 으로 P $P$ 에 의존하고 $Q$ $P$ $C$ P $C$ 의 불량한 임의 재구성을 초래하는 $P$ $경우$ Q ${\$ $displaystystylease$ $Q}$ 은 $P$ $P}$ 에 의존하지 않는다 $Q$ .

따라서 이 의존성 측정은 속성 $집합$ P $P$ 에 $Q$ 대한 속성 $집합$ Q $Q$ 의 기능적(즉, 결정론적) 의존성의 정도를 나타내며 $P$ 대칭성이 아니다. 이러한 속성 의존성의 개념과 더 전통적인 정보-이론적(즉, 내향적) 속성 의존성의 개념의 관계는 여러 출처에서 논의되었다(예: Pawlak, Wong, & Ziarko 1988; Yao & Yoon 2002; Wong, Ziarko & Ye 1986, Quafou & Boussouf 2000).

규칙 추출

위에서 논의한 범주 표현은 본질적으로 모두 확장적이다. 즉, 범주나 복잡한 세분류는 단순히 모든 구성원의 합이다. 범주를 나타내는 것은 단지 해당 범주에 속하는 모든 객체를 나열하거나 식별할 수 있는 것이다. 그러나 확장 범주 표현은 참신한(전혀 볼 수 없는) 개체가 범주의 구성원이 되는지를 결정하기 위한 통찰력을 제공하지 않기 때문에 실제 사용에는 매우 제한적이다.

일반적으로 원하는 것은 범주에 대한 의도적인 설명이며, 범주의 범위를 설명하는 일련의 규칙에 기초한 범주의 표현이다. 그러한 규칙의 선택은 독특한 것이 아니며, 거기에는 귀납적 편견의 문제가 있다. 이 문제에 대한 자세한 내용은 버전 공간 및 모델 선택을 참조하십시오.

몇 가지 규칙 추출 방법이 있다. 우리는 Ziarko & Shan(1995년)에 근거한 규칙 추출 절차부터 시작할 것이다.

결정 행렬

우리의 샘플 시스템을 특징짓는 최소한의 일관된 규칙 집합(논리적 함의)을 찾기를 원한다고 하자. For a set of condition attributes ${\mathcal {P}}=\{P_{1},P_{2},P_{3},\dots ,P_{n}\}$ and a decision attribute $Q,Q\notin {\mathcal {P}}$ , these rules should have the form ${\displays$ $tyle P_{i}^{a$ }^{a $}P_{j}^{b}\dots P_{k}^{c}\{d$ 또는 철자가,

(P_{i}=a)\land (P_{j}=b)\land \dots \land (P_{k}=c)\to (Q=d)

여기서 $\{a,b,c,\dots \}$ { $\{a,b,c,\dots \}$ , $\{a,b,c,\dots \}$ , c $\{a,b,c,\dots \}$ , $\{a,b,c,\dots \}$ $\{a,b,c,\properties \}$ 은 $\{a,b,c,\dots \}$ (는) 각 속성의 도메인에서 합법적인 값이다. 이는 연결 규칙의 일반적인 형식이며, 조건/선결과 일치하는 $\mathbb {U}$ $\mathbb {U}$ ${\$ 의 항목 수를 규칙 지원이라고 한다. 이러한 규칙은 Ziarko 및에서 제공하는 메서드, 샨(1995년)결정의 각 개별 값 d{\displaystyle d}에 해당하는 Q{Q\displaystyle}속성은 의사 결정 매트릭스가 형성된다. Informally, 결정의 가치 d{\displaystyle d}의 결정 매트릭스 모든 a. Q{Q\displaystyle}목록은 아니다tt $Q=d$ = $Q=d$ ${\dapplaystyle$ Q $Q=d$ $Q\neq d$ $d}$ 및 $Q\neq d$ $style$ d {\dapplaystyle $Q\neq d}$ 을(를 $)$ 갖는 개체 간에 다른 리부트-값 쌍 $Q\neq d$

이것은 예시로 가장 잘 설명된다(또한 많은 표기법을 피한다). 위의 표를 고려하여 P $P_{4}$ ${\$ 를 결정 변수(즉, 함축의 오른쪽에 있는 변수)로 $P_{4}$ 하고 $\{P_{1},P_{2},P_{3}\}$ { $\{P_{1},P_{2},P_{3}\}$ 1, $\{P_{1},P_{2},P_{3}\}$ P $\{P_{1},P_{2},P_{3}\}$ , $\{P_{1},P_{2},P_{3}\}$ $\{P_{1},P_{2},P_{3}\}$ $\{P_{1},P_{2},P_{3}\}$ ${\displaystyle$ \{ $P_{1},P_{2},P_{3}\}}}}$ 을 조건 변수(함축의 왼쪽)로 $\{P_{1},P_{2},P_{3}\}$ 한다. 결정 변수 $P_{4}$ $P_{4}$ ${\$ 는 두 가지 다른 값, 즉 $\{1,2\}$ { $\{1,2\}$ , $\{1,2\}$ $\{1,2\}$ 을(를) 사용한다는 $P_{4}$ 점에 유의한다 $\{1,2\}$ 각 경우는 별도로 취급한다.

First, we look at the case $P_{4}=1$ , and we divide up $\mathbb {U}$ into objects that have $P_{4}=1$ and those that have $P_{4}\neq 1$ . (Note that objects with $P_{4}\neq 1$ in this case are simply the objects that have $P_{4}=2$ , but in general, $P_{4}\neq 1$ would include all objects having any value for $P_{4}$ other than $P_{4}=1$ , and there may be several such classes of objects (for example, those having $P_{4}=2,3,4,etc.$ ).) In this case, the objects having $P_{4}=1$ are $\{O_{1},O_{2},O_{3},O_{7},O_{10}\}$ while the objects which have ${\displaystyle P_{4}\neq$ $1}$ 은 $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ 는) { O $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ , $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ , $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ , $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ O $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ 이다 $P_{{4}}\neq 1$ $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ The decision matrix for $P_{4}=1$ lists all the differences between the objects having $P_{4}=1$ and those having $P_{4}\neq 1$ ; that is, the decision matrix lists all the differences between $\{O_{1},O_{2},O_{3},O_{7},O_{10}\}$ $\{O_{1},O_{2},O_{3},O_{7},O_{10}\}$ and $\{O_{4},O_{5},O_{6},O_{8},O_{9}\}$ . We put the "positive" objects ( $P_{4}=1$ ) as the rows, and the "negative" objects $P_{4}\neq 1$ as the colu암탉들

P_{4}=1

P_{4}=1

=

P_{4}=1

P_{4}=1

에 대한 결정 행렬

오브젝트	$O_{4}$	$O_{5}$	$O_{6}$	$O_{8}$	$O_{9}$
${\displaystyle O_{1}.$	$P_{1}^{1}^{1},P_{2}^{2},P_{3}^{0}}}$	$P_{1}^{1}^{1},P_{2}^{2}$	$P_{1}^{1}^{1},P_{2}^{2},P_{3}^{0}}}$	$P_{1}^{1}^{1},P_{2}^{2},P_{3}^{0}}}$	$P_{1}^{1}^{1},P_{2}^{2}$
${\displaystyle O_{2}}:$	$P_{1}^{1}^{1},P_{2}^{2},P_{3}^{0}}}$	$P_{1}^{1}^{1},P_{2}^{2}$	$P_{1}^{1}^{1},P_{2}^{2},P_{3}^{0}}}$	$P_{1}^{1}^{1},P_{2}^{2},P_{3}^{0}}}$	$P_{1}^{1}^{1},P_{2}^{2}$
$O_{3}$	$P_{1}^{2},P_{3}^{0}}$	$P_{2}^{0}}$	$P_{1}^{2},P_{3}^{0}}$	$P_{1}^{2},P_{2}^{0},P_{3}^{0}}}}$	$P_{2}^{0}}$
$O_{7}$	$P_{1}^{2},P_{3}^{0}}$	$P_{2}^{0}}$	$P_{1}^{2},P_{3}^{0}}$	$P_{1}^{2},P_{2}^{0},P_{3}^{0}}}}$	$P_{2}^{0}}$
$O_{10}$	$P_{1}^{2},P_{3}^{0}}$	$P_{2}^{0}}$	$P_{1}^{2},P_{3}^{0}}$	$P_{1}^{2},P_{2}^{0},P_{3}^{0}}}}$	$P_{2}^{0}}$

이 결정 행렬을 읽으려면 예를 들어, $P_{1}^{2},P_{3}^{0}$ 에 $P_{1}^{2},P_{3}^{0}$ P 1 $P_{1}^{2},P_{3}^{0}$ , $P_{1}^{2},P_{3}^{0}$ $P_{1}^{2},P_{3}^{0}$ $P_{1}^{2},P_{3}^{0}$ ${\$ $displaystyle$ $O_{$ 1 $}^{2},P_{3}^{0}$ 을 $O_{3}$ 하는 O $O_{3}$ 의 $O_{{3}}$ 교차점 $및$ O ${\$ $O_{6$ $O_{6}$ 을(를) 보십시오. 즉, 결정 값 $P_{4}=1$ $P_{4}=1$ = $P_{4}=1$ ${\displaystyle P_{4}=1$ $O_{3}$ $P_{1}$ $P_{1}$ 1 ${\$ $}$ 의 $O_{{6}}$ $O_{6}$ 6 ${\$ 와 P $P_{3}$ ${\$ 및 $O_{6}$ 에 대한 이러한 속성에 대한 특정 값이 다르다는 $O_{3}$ 것을 의미한다.ive 객체 $O_{3}$ $O_{3}$ ${\$ 는 $P_{1}=2$ 1 $P_{1}=2$ = $P_{1}=2$ $P_{1}=2$ 이고 $P_{1}=2$ $P_{3}=0$ $P_{3}=0$ = 0 $P_{3}=0$ 이다 $O_{{3}}$ $P_{3}=0$ 이는 O $O_{3}$ ${\$ 을(를) $P_{4}=1$ 등급 $P_{4}=1$ $P_{4}=1$ = 1 $P_{4}=1$ 에 $P_{1}$ 속하는 것으로서 $O_{3}$ 올바른 분류는 $P_{1}$ $P_{1}$ 1 ${\$ } $P_{3}$ P $P_{3}$ {\ $displaysty P_{3}$ 에 있음을 $P_{4}=1$ 알려준다 $P_{3}$ 하나 또는 다른 하나가 불필요한 것일 수 있지만, 우리는 이 중 하나라도 최소한 이 중 하나를 알고 있다.공물은 없어서는 안 된다

다음으로, 각 결정 행렬에서 우리는 행렬의 각 행에 대해 하나의 식인 부울 식 집합을 형성한다. 각 셀 내의 항목은 분리하여 집계되고, 개별 셀은 결합하여 집계된다. 따라서 위의 표에 대해 다음과 같은 다섯 가지 부울 식이 있다.

{\begin{cases}(P_{1}^{1}\lor P_{2}^{2}\lor P_{3}^{0})\land (P_{1}^{1}\lor P_{2}^{2})\land (P_{1}^{1}\lor P_{2}^{2}\lor P_{3}^{0})\land (P_{1}^{1}\lor P_{2}^{2}\lor P_{3}^{0})\land (P_{1}^{1}\lor P_{2}^{2})\\(P_{1}^{1}\lor P_{2}^{2}\lor P_{3}^{0})\land (P_{1}^{1}\lor P_{2}^{2})\land (P_{1}^{1}\lor P_{2}^{2}\lor P_{3}^{0})\land (P_{1}^{1}\lor P_{2}^{2}\lor P_{3}^{0})\land (P_{1}^{1}\lor P_{2}^{2})\\(P_{1}^{2}\lor P_{3}^{0})\land (P_{2}^{0})\land (P_{1}^{2}\lor P_{3}^{0})\land (P_{1}^{2}\lor P_{2}^{0}\lor P_{3}^{0})\land (P_{2}^{0})\\(P_{1}^{2}\lor P_{3}^{0})\land (P_{2}^{0})\land (P_{1}^{2}\lor P_{3}^{0})\land (P_{1}^{2}\lor P_{2}^{0}\lor P_{3}^{0})\land (P_{2}^{0})\\(P_{1}^{2}\lor P_{3}^{0})\land (P_{2}^{0})\land (P_{1}^{2}\lor P_{3}^{0})\land (P_{1}^{2}\lor P_{2}^{0}\lor P_{3}^{0})\land (P_{2}^{0})\end{cases}}

여기서의 각 문장은 본질적으로 해당 개체의 클래스 $P_{4}=1$ P $P_{4}=1$ = $P_{4}=1$ $P_{4}=1$ 의 멤버십을 지배하는 매우 구체적인(아마도 너무 구체적인) 규칙이다. 예를 들어 개체 $O_{10}$ $O_{10}$ ${\$ 에 해당하는 마지막 문에는 다음 사항이 모두 충족되어야 한다고 명시되어 있다 $O_{10}$

$P_{1}$ $P_{3}$ $P_{1}$ ${\$ 중 하나에는 값 2가 있어야 $P_{1}$ 하며, $P_{3}$ 3 ${\$ P_ ${3$ }는 값 0이거나 둘 다 있어야 한다 $P_{3}$ .
$P_{2}$ $P_{2}$ ${\$ } 값은 0이어야 한다 $P_{2}$ .
$P_{1}$ $P_{3}$ $P_{1}$ ${\$ 중 하나에는 값 2가 있어야 $P_{1}$ 하며, $P_{3}$ 3 ${\$ P_ ${3$ }는 값 0이거나 둘 다 있어야 한다 $P_{3}$ .
$P_{1}$ $P_{1}$ ${\$ 또는 $P_{2}$ $P_{3}$ ${\$ 0 또는 $P_{2}$ P $P_{3}$ ${\$ 중 하나의 조합이 0이어야 한다 $P_{1}$ $P_{3}$ .
$P_{2}$ $P_{2}$ ${\$ } 값은 0이어야 한다 $P_{2}$ .

여기에는 많은 양의 중복이 존재한다는 것은 분명하며, 다음 단계는 전통적인 부울대수를 이용하여 단순화하는 것이다. The statement ${\displaystyle (P_{1}^{1}\lor P_{2}^{2}\lor P_{3}^{0})\land (P_{1}^{1}\lor P_{2}^{2})\land (P_{1}^{1}\lor P_{2}^{2}\lor P_{3}^{0$ $})\land (P_{1}^{1}\lor P_{2}^{2}\lor P_{3}^{0})\land (P_{1}^{1}\lor P_{2}^{2})}$ corresponding to objects $\{O_{1},O_{2}\}$ simplifies to $P_{1}^{1}\lor P_{2}^{2}$ , which yields the implication

{\displaystyle(P_{1}=1)\lor(P_{2}=2)\to(P_{4}=1)}

Likewise, the statement $(P_{1}^{2}\lor P_{3}^{0})\land (P_{2}^{0})\land (P_{1}^{2}\lor P_{3}^{0})\land (P_{1}^{2}\lor P_{2}^{0}\lor P_{3}^{0})\land (P_{2}^{0})$ corresponding to objects $\{O_{3},O_{7},O_{10}\}$ simplifies to $P_{1}^{2}P_{2}^{0}\lor P_{3}^{0}P_{2}^{0}$ . 이것은 우리에게 시사하는 바가 있다.

{\displaystyle(P_{1}=2\land P_{2}=0)\lor(P_{3}=0\land P_{2}=0)\to(P_{4}=1)}

위의 함축적 의미는 다음과 같은 규칙 집합으로도 쓸 수 있다.

{\begin{cases}(P_{1}=1)\to (P_{4}=1)\\(P_{2}=2)\to (P_{4}=1)\\(P_{1}=2)\land (P_{2}=0)\to (P_{4}=1)\\(P_{3}=0)\land (P_{2}=0)\to (P_{4}=1)\end{cases}}

앞의 두 규칙은 각각 1의 지원(즉 선행자가 두 개의 객체를 일치시키는 것)을 가지고 있는 반면, 마지막 두 규칙은 각각 2의 지지를 가지고 있다는 점에 유의할 수 있다. 이 지식 시스템에 대한 규칙 집합 작성을 완료하려면 $P_{4}=2$ $P_{4}=2$ = 2 ${\displaystyle P_{4}=2$ 의 경우에 대해 위와 같은 절차(새 결정 매트릭스 작성 시작)를 따라야 하므로, 따라서 해당 결정 값에 대한 일련의 새로운 의미( $P_{4}=2$ : P $P_{4}=2$ = $P_{4}=2$ ${\displaysty P_{4})$ 를 제공해야 한다.결과로서 $P_{4}=2$ $=2}).$ 일반적으로 의사결정 변수의 가능한 각 값에 대해 절차가 반복된다.

LERS 규칙 유도 시스템

데이터 시스템 LERS(러브 집합에 기초한 예에서 학습) Grzymala-Busse(1997)는 일관되지 않은 데이터, 즉 충돌하는 객체가 있는 데이터로부터 규칙을 유도할 수 있다. 두 개체는 모든 속성의 동일한 값으로 특징지어질 때 상충되지만, 서로 다른 개념(클래스)에 속한다. LERS는 다른 개념과의 충돌에 관련된 개념에 대한 하위 및 상위 근사치를 계산하기 위해 대략적인 집합 이론을 사용한다.

개념의 낮은 근사치에서 유도된 규칙은 개념을 확실히 기술하므로 그러한 규칙은 확실하다고 불린다. 한편, 개념의 상위 근사치에서 유도된 규칙은 가능한 개념을 기술하므로, 이러한 규칙들을 가능이라고 한다. 규칙 유도 LERS의 경우 LEM1, LEM2, IRIM의 세 가지 알고리즘을 사용한다.

LERS의 LEM2 알고리즘은 규칙 유도를 위해 자주 사용되며 LERS뿐만 아니라 RSE(Bazan et al)와 같은 다른 시스템에서도 사용된다. (2004). LEM2는 속성-값 쌍의 검색 공간을 탐색한다. 그것의 입력 데이터 세트는 개념의 하위 또는 상위 근사치여서, 그것의 입력 데이터 세트는 항상 일관된다. 일반적으로 LEM2는 로컬 커버를 계산한 후 규칙 집합으로 변환한다. 우리는 LEM2 알고리즘을 설명하기 위해 몇 가지 정의를 인용할 것이다.

LEM2 알고리즘은 속성-값 쌍 블록의 개념을 기반으로 한다. $X$ $X$ 을(를 $)$ 의사결정 값 쌍 $d$ , $w$ )으로 $(d,w)$ 표현되는 개념의 비어 있지 않은 하한 또는 상한 근사치로만 $X$ 두십시오 $(d,w)$ $X$ 설정 $X$ 은 $t=(a,v)$ 값 쌍 t $t=(a,v)$ = $t=(a,v)$ ( $t=(a,v)$ , $t=(a,v)$ ) $t=(a,v)$ = ( , $v$ ) ${\displaystystyle t}$ 의 $t=(a,v)$ $T$ T ${\$ 에 따라 $X$ 달라진다 $.$

\emptyet \neq [T]=\bigcap _{t\in T}[t]\subseteq X.

Set $T$ is a minimal complex of $X$ if and only if $X$ depends on $T$ and no proper subset $S$ of $T$ exists such that $X$ depends on $S$ . Let ${\di$ $splaystyle \mathb {T}$ }은 $\mathbb {T}$ (는) 비어 있지 않은 속성 값 쌍 집합의 집합이다. 다음 세 가지 조건이 충족되는 경우에만 $X$ $\mathbb {T}$ ${\$ 가) $X$ ${\displaystyle$ X $}$ 의 로컬 커버가 된다 $\mathbb {T}$ .

$\mathbb {T}$ ${\$ 의 $T$ 각 멤버 $T$ $T$ 은 $\mathbb {T}$ (는) $X$ $X$ 의 최소 복합체 입니다 $X$

\bigcup _{t\in \mathb {T}}[T]=X,

\mathbb {T}

{\

은

\mathbb {T}

(

는) 최소값. 즉,

\mathbb {T}

{\

은

\mathbb {T}

(

는) 가능한 멤버 수가 가장 적다.

당사의 샘플 정보 시스템에 대해 LEM2는 다음과 같은 규칙을 유도할 것이다.

{\begin{cases}(P_{1},1)\to (P_{4},1)\\(P_{5},0)\to (P_{4},1)\\(P_{1},0)\to (P_{4},2)\\(P_{2},1)\to (P_{4},2)\end{cases}}

예를 들어, Pawlak(1991), Stefanowski(1998), Bazan 등에서는 다른 규칙 학습 방법을 찾을 수 있다. (2004) 등

불완전한 데이터

대략적인 집합 이론은 불완전한 데이터 집합에서 규칙 유도에 유용하다. 이 접근법을 사용하여 우리는 세 가지 유형의 결측 속성 값을 구별할 수 있다: 손실된 값(기록되었지만 현재 사용할 수 없는 값), 속성 개념 값(이 결측 속성 값은 동일한 개념으로 제한된 속성 값으로 대체될 수 있음), 그리고 "상관 없음" 조건(원래 값은 불손함).레반트 개념(클래스)은 동일한 방식으로 분류(또는 진단)된 모든 개체의 집합이다.

결측 속성 값이 있는 두 개의 특수 데이터 세트가 광범위하게 연구되었다. 첫 번째 사례에서는 모든 결측 속성 값이 손실되었다(Stefanowskias, 2001), 두 번째 사례에서는 모든 결측 속성 값이 "상관 없음" 조건이었다(Kriiszkiewicz, 1999).

누락된 속성 값의 속성 개념 값 해석에서 누락된 속성 값은 누락된 속성 값을 가진 개체가 속한 개념으로 제한된 속성 도메인의 값으로 대체할 수 있다(Grzymala-Busse 및 Grzymala-Busse, 2007). 예를 들어, 환자의 경우 온도 속성의 값이 누락되어 있고, 이 환자는 독감으로 병들었으며, 감기로 병든 나머지 모든 환자가 누락된 속성 값의 해석을 속성 개념 값으로 사용할 때 온도에 대해 높은 값 또는 매우 높은 값을 갖는 경우, 누락된 속성 값을 높은 값으로 대체한다. 그리고 매우 높다. 또한 특성 관계(예: Grzymala-Busse 및 Grzymala-Busse, 2007 참조)를 통해 손실, "상관 없음" 조건 및 속성 개념 값의 세 가지 속성 값을 모두 사용하여 데이터 세트를 동시에 처리할 수 있다.

적용들

러프 세트 방식은 머신러닝(machine learning)과 데이터 마이닝(data mining)에서 하이브리드 솔루션의 구성요소로 적용할 수 있다. 그것들은 규칙 유도 및 형상 선택에 특히 유용한 것으로 밝혀졌다(반틱스-보존 차원성 감소). 생물정보학, 경제 및 금융, 의학, 멀티미디어, 웹 및 텍스트 마이닝, 신호 및 이미지 처리, 소프트웨어 엔지니어링, 로보틱스 및 엔지니어링(예: 전력 시스템 및 제어 엔지니어링)에 대략적인 세트 기반 데이터 분석 방법이 성공적으로 적용되었다. 최근 러프 세트의 세 지역은 수용, 거부, 지연의 지역으로 해석된다. 이는 모델을 사용한 3방향 의사결정 접근방식으로 이어져 잠재적으로 흥미로운 미래 애플리케이션으로 이어질 수 있다.

역사

대략적인 집합의 아이디어는 모호한 개념을 다루기 위한 새로운 수학 도구로 Pawlak(1981)에 의해 제안되었다. 코메르, 그리말라-부세, 이와인스키, 니이민엔, 노보트니, 파월락, 오불로비치, 포미칼라 등은 거친 집합의 대수적 성질을 연구해 왔다. 다른 대수적 의미론들은 P. Pagliani에 의해 개발되었다, I. 던치, M. K. 차크라보티, M. 배너지와 A. 마니; 이것들은 D에 의해 더 일반화된 러프 세트로 확장되었다. 캣타네오와 A. 특히 마니. 대략적인 집합은 모호성, 모호성 및 일반적인 불확실성을 나타내기 위해 사용될 수 있다.

확장 및 일반화

러프 집합의 개발 이후, 확장, 일반화의 진화가 계속되어 왔다. 초기 개발은 퍼지 집합과 유사점과 차이점 둘 다의 관계에 초점을 맞췄다. 일부 문헌은 이러한 개념들이 다르다고 주장하는 반면, 다른 문헌들은 대략적인 집합이 퍼지 집합 또는 거친 퍼지 집합으로 표현되는 퍼지 집합의 일반화라고 간주한다. Pawlak(1995)은 퍼지 집합과 거친 집합은 서로 보완적인 것으로 취급되어야 하며 불확실성과 모호성의 다른 측면을 다루어야 한다고 생각했다.

Three notable extensions of classical rough sets are:

Dominance-based rough set approach (DRSA) is an extension of rough set theory for multi-criteria decision analysis (MCDA), introduced by Greco, Matarazzo and Słowiński (2001). The main change in this extension of classical rough sets is the substitution of the indiscernibility relation by a dominance relation, which permits the formalism to deal with inconsistencies typical in consideration of criteria and preference-ordered decision classes.
Decision-theoretic rough sets (DTRS) is a probabilistic extension of rough set theory introduced by Yao, Wong, and Lingras (1990). It utilizes a Bayesian decision procedure for minimum risk decision making. Elements are included into the lower and upper approximations based on whether their conditional probability is above thresholds $\textstyle \alpha$ and $\textstyle \beta$ . These upper and lower thresholds determine region inclusion for elements. This model is unique and powerful since the thresholds themselves are calculated from a set of six loss functions representing classification risks.
Game-theoretic rough sets (GTRS) is a game theory-based extension of rough set that was introduced by Herbert and Yao (2011). It utilizes a game-theoretic environment to optimize certain criteria of rough sets based classification or decision making in order to obtain effective region sizes.

Rough membership

Rough sets can be also defined, as a generalisation, by employing a rough membership function instead of objective approximation. The rough membership function expresses a conditional probability that $x$ belongs to $X$ given $\textstyle \mathbb {R}$ . This can be interpreted as a degree that $x$ belongs to $X$ in terms of information about $x$ expressed by $\textstyle \mathbb {R}$ .

Rough membership primarily differs from the fuzzy membership in that the membership of union and intersection of sets cannot, in general, be computed from their constituent membership as is the case of fuzzy sets. In this, rough membership is a generalization of fuzzy membership. Furthermore, the rough membership function is grounded more in probability than the conventionally held concepts of the fuzzy membership function.

Other generalizations

Several generalizations of rough sets have been introduced, studied and applied to solving problems. Here are some of these generalizations:

rough multisets (Grzymala-Busse, 1987)
fuzzy rough sets extend the rough set concept through the use of fuzzy equivalence classes(Nakamura, 1988)
Alpha rough set theory (α-RST) - a generalization of rough set theory that allows approximation using of fuzzy concepts (Quafafou, 2000)
intuitionistic fuzzy rough sets (Cornelis, De Cock and Kerre, 2003)
generalized rough fuzzy sets (Feng, 2010)
rough intuitionistic fuzzy sets (Thomas and Nair, 2011)
soft rough fuzzy sets and soft fuzzy rough sets (Meng, Zhang and Qin, 2011)
composite rough sets (Zhang, Li and Chen, 2014)

References

Pawlak, Zdzisław (1982). "Rough sets". International Journal of Parallel Programming. 11 (5): 341–356. doi:10.1007/BF01001956. S2CID 9240608.
Bazan, Jan; Szczuka, Marcin; Wojna, Arkadiusz; Wojnarski, Marcin (2004). On the evolution of rough set exploration system. Proceedings of the RSCTC 2004. Lecture Notes in Computer Science. Vol. 3066. pp. 592–601. CiteSeerX 10.1.1.60.3957. doi:10.1007/978-3-540-25929-9_73. ISBN 978-3-540-22117-3.
Dubois, D.; Prade, H. (1990). "Rough fuzzy sets and fuzzy rough sets". International Journal of General Systems. 17 (2–3): 191–209. doi:10.1080/03081079008935107.
Herbert, J. P.; Yao, J. T. (2011). "Game-theoretic Rough Sets". Fundamenta Informaticae. 108 (3–4): 267–286. doi:10.3233/FI-2011-423.
Greco, Salvatore; Matarazzo, Benedetto; Słowiński, Roman (2001). "Rough sets theory for multicriteria decision analysis". European Journal of Operational Research. 129 (1): 1–47. doi:10.1016/S0377-2217(00)00167-3.
Grzymala-Busse, Jerzy (1997). "A new version of the rule induction system LERS". Fundamenta Informaticae. 31: 27–39. doi:10.3233/FI-1997-3113.
Grzymala-Busse, Jerzy; Grzymala-Busse, Witold (2007). An experimental comparison of three rough set approaches to missing attribute values. Transactions on Rough Sets. Lecture Notes in Computer Science. Vol. 6. pp. 31–50. doi:10.1007/978-3-540-71200-8_3. ISBN 978-3-540-71198-8.
Kryszkiewicz, Marzena (1999). "Rules in incomplete systems". Information Sciences. 113 (3–4): 271–292. doi:10.1016/S0020-0255(98)10065-8.
Pawlak, Zdzisław Rough Sets Research Report PAS 431, Institute of Computer Science, Polish Academy of Sciences (1981)
Pawlak, Zdzisław; Wong, S. K. M.; Ziarko, Wojciech (1988). "Rough sets: Probabilistic versus deterministic approach". International Journal of Man-Machine Studies. 29: 81–95. doi:10.1016/S0020-7373(88)80032-4.
Pawlak, Zdzisław (1991). Rough Sets: Theoretical Aspects of Reasoning About Data. Dordrecht: Kluwer Academic Publishing. ISBN 978-0-7923-1472-1.
Slezak, Dominik; Wroblewski, Jakub; Eastwood, Victoria; Synak, Piotr (2008). "Brighthouse: an analytic data warehouse for ad-hoc queries" (PDF). Proceedings of the VLDB Endowment. 1 (2): 1337–1345. doi:10.14778/1454159.1454174.
Stefanowski, Jerzy (1998). "On rough set based approaches to induction of decision rules". In Polkowski, Lech; Skowron, Andrzej (eds.). Rough Sets in Knowledge Discovery 1: Methodology and Applications. Heidelberg: Physica-Verlag. pp. 500–529.
Stefanowski, Jerzy; Tsoukias, Alexis (2001). Incomplete information tables and rough classification. Computational Intelligence. Vol. 17. pp. 545–566. doi:10.1111/0824-7935.00162.
Wong, S. K. M.; Ziarko, Wojciech; Ye, R. Li (1986). "Comparison of rough-set and statistical methods in inductive learning". International Journal of Man-Machine Studies. 24: 53–72. doi:10.1016/S0020-7373(86)80033-5.
Yao, J. T.; Yao, Y. Y. (2002). "Induction of classification rules by granular computing". Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing (TSCTC'02). London, UK: Springer-Verlag. pp. 331–338.
Ziarko, Wojciech (1998). "Rough sets as a methodology for data mining". Rough Sets in Knowledge Discovery 1: Methodology and Applications. Heidelberg: Physica-Verlag. pp. 554–576.
Ziarko, Wojciech; Shan, Ning (1995). "Discovering attribute relationships, dependencies and rules by using rough sets". Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS'95). Hawaii. pp. 293–299.
Pawlak, Zdzisław (1999). "Decision rules, Bayes' rule and rough sets". New Direction in Rough Sets, Data Mining, and Granular-soft Computing: 1–9.
Pawlak, Zdzisław. "Rough relations, reports". 435. Institute of Computer Science. {{cite journal}}: Cite journal requires journal= (help)
Orlowska, E. (1987). "Reasoning about vague concepts". Bulletin of the Polish Academy of Sciences. 35: 643–652.
Polkowski, L. (2002). "Rough sets: Mathematical foundations". Advances in Soft Computing.
Skowron, A. (1996). "Rough sets and vague concepts". Fundamenta Informaticae: 417–431.
Burgin M. (1990). Theory of Named Sets as a Foundational Basis for Mathematics, In Structures in mathematical theories: Reports of the San Sebastian international symposium, September 25–29, 1990 (http://www.blogg.org/blog-30140-date-2005-10-26.html)
Burgin, M. (2004). Unified Foundations of Mathematics, Preprint Mathematics LO/0403186, p39. (electronic edition: https://arxiv.org/ftp/math/papers/0403/0403186.pdf)
Burgin, M. (2011), Theory of Named Sets, Mathematics Research Developments, Nova Science Pub Inc, ISBN 978-1-61122-788-8
Cornelis, C., De Cock, M. and Kerre, E. (2003) Intuitionistic fuzzy rough sets: at the crossroads of imperfect knowledge, Expert Systems, 20:5, pp260–270
Düntsch, I. and Gediga, G. (1995) Rough Set Dependency Analysis in Evaluation Studies – An Application in the Study of Repeated Heart Attacks. University of Ulster, Informatics Research Reports No. 10
Feng F. (2010). Generalized Rough Fuzzy Sets Based on Soft Sets, Soft Computing, 14:9, pp 899–911
Grzymala-Busse, J. (1987). Learning from examples based on rough multisets, in Proceedings of the 2nd International Symposium on Methodologies for Intelligent Systems, pp. 325–332. Charlotte, NC, USA,
Meng, D., Zhang, X. and Qin, K. (2011). Soft rough fuzzy sets and soft fuzzy rough sets, Computers & Mathematics with Applications, 62:12, pp4635–4645
Quafafou M. (2000). α-RST: a generalization of rough set theory, Information Sciences, 124:1–4, pp301–316.
Quafafou M. and Boussouf M. (2000). Generalized rough sets based feature selection. Journal Intelligent Data Analysis, 4:1 pp3 – 17
Nakamura, A. (1988) Fuzzy rough sets, ‘Notes on Multiple-valued Logic in Japan’, 9:1, pp1–8
Pawlak, Z., Grzymala-Busse, J., Slowinski, R. Ziarko, W. (1995). Rough Sets. Communications of the ACM, 38:11, pp88–95
Thomas, K. and Nair, L. (2011). Rough intuitionistic fuzzy sets in a lattice, International Mathematical Forum, 6:27, pp1327–1335
Zhang J., Li T., Chen H. (2014). Composite rough sets for dynamic data mining, Information Sciences, 257, pp81–100
Zhang J., Wong J-S, Pan Y, Li T. (2015). A parallel matrix-based method for computing approximations in incomplete information systems, IEEE Transactions on Knowledge and Data Engineering, 27(2): 326-339
Chen H., Li T., Luo C., Horng S-J., Wang G. (2015). A decision-theoretic rough set approach for dynamic data mining. IEEE Transactions on Fuzzy Systems, 23(6): 1958-1970
Chen H., Li T., Luo C., Horng S-J., Wang G. (2014). A rough set-based method for updating decision rules on attribute values' coarsening and refining, IEEE Transactions on Knowledge and Data Engineering, 26(12): 2886-2899
Chen H., Li T., Ruan D., Lin J., Hu C, (2013) A rough-set based incremental approach for updating approximations under dynamic maintenance environments. IEEE Transactions on Knowledge and Data Engineering, 25(2): 274-284

External links

Search

러프 세트

네임스페이스

더

목차

정의들

정보 시스템 프레임워크

예제: 동등성 등급 구조

러프 집합의 정의

낮은 근사치 및 양의 영역

상한 근사치 및 음영 영역

경계 영역

러프 세트

객관적 분석

정의 가능성

환원 및 코어

속성 종속성

규칙 추출

결정 행렬

LERS 규칙 유도 시스템

불완전한 데이터

적용들

역사

확장 및 일반화

Rough membership

Other generalizations

See also

References

Further reading

External links