서포트 벡터 머신

머신 러닝에서 서포트 벡터 머신(SVM, 서포트 벡터 네트워크^[1])은 분류 및 회귀 분석을 위해 데이터를 분석하는 관련 학습 알고리즘과 함께 감독되는 학습 모델입니다.Vladimir Vapnik이 동료와 함께 AT&T Bell 연구소에서 개발한 SVM은 통계 학습 프레임워크 또는 Vapnik이 제안한 VC 이론에 기초하는 가장 강력한 예측 방법 중 하나이다(Boser 등, 1992년, Guyon 등, 1993년, Cortes 및 ^[1]Vapnik, 1995년, 1997년^{[citation needed]}).각각 두 범주 중 하나에 속하는 것으로 표시된 일련의 훈련 예가 주어지면, SVM 훈련 알고리즘은 한 범주 또는 다른 범주에 새로운 예를 할당하는 모델을 구축하여 비확률적 이진 선형 분류기로 만든다(확률적 분류 설정에서 SVM을 사용하는 플랫 스케일링과 같은 방법이 존재함).SVM은 두 범주 사이의 격차 폭을 최대화하기 위해 훈련 예를 공간의 지점에 매핑합니다.그런 다음 동일한 공간에 새로운 예제를 매핑하고 간격의 어느 쪽에 속하는지 기준으로 범주에 속할 것으로 예측합니다.

SVM은 선형 분류를 수행할 뿐만 아니라 커널 트릭이라고 불리는 것을 사용하여 비선형 분류를 효율적으로 수행할 수 있으며, 입력을 고차원 피쳐 공간에 암묵적으로 매핑할 수 있습니다.

데이터에 라벨이 부착되지 않은 경우, 지도 학습이 불가능하며, 데이터에 대한 자연스러운 군집을 찾은 다음 새로운 데이터를 이러한 형성된 그룹에 매핑하는 비지도 학습 접근법이 필요하다.Hava Siegelmann과 Vapnik에 의해 작성된 서포트 벡터 클러스터링^[2] 알고리즘은 서포트 벡터 머신 알고리즘에서 개발된 서포트 벡터 통계를 적용하여 라벨이 부착되지 않은 ^{[citation needed]}데이터를 분류합니다.

동기

H는₁ 클래스를 분리하지 않습니다.H는₂ 있지만, 약간의 마진만 있다.H는₃ 그것들을 최대 여백으로 구분한다.

데이터 분류는 기계 학습에서 일반적인 작업입니다.일부 데이터 점이 각각 두 클래스 중 하나에 속하고 새 데이터 점이 어떤 클래스에 속하는지 결정하는 것이 목표라고 가정합니다.서포트 벡터 머신의 경우 데이터 포인트는p차원 $p$ ( $p\displaystyle$ p $)$ 로 $p$ 간주되며 ( $(p-1)$ - $(p-1)$ 1 $)\displaystyle$ ( $p-1)}$ 차원 $(p-1)$ 하이퍼플레인으로 분리할 수 있는지 알고 싶습니다.이를 선형 분류기라고 합니다.데이터를 분류할 수 있는 하이퍼플레인이 많이 있습니다.최적의 하이퍼플레인으로서 타당한 선택 중 하나는 두 클래스 간에 가장 큰 간격(마진)을 나타내는 것입니다.따라서 하이퍼플레인에서 각 면의 가장 가까운 데이터 포인트까지의 거리가 최대가 되도록 선택합니다.그러한 하이퍼플레인이 존재하는 경우, 그것은 최대 마진 하이퍼플레인으로 알려져 있으며, 그것이 정의하는 선형 분류기는 최대 마진 분류기 또는 그에 상응하는 최적의 ^{[citation needed]}안정성의 퍼셉트론이라고 알려져 있다.

좀 더 형식적으로 말하면, 서포트 벡터 기계는 고차원 또는 무한차원 공간에 하이퍼플레인 또는 하이퍼플레인 세트를 구성하며, 분류, 회귀 또는 특이치 ^[3]검출과 같은 다른 작업에 사용할 수 있습니다.직관적으로, 일반적으로 여백이 클수록 ^[4]분류기의 일반화 오차가 낮기 때문에, 모든 등급에서 가장 가까운 훈련 데이터 포인트(이른바 기능적 여백)까지 거리가 가장 큰 하이퍼플레인에 의해 양호한 분리가 달성된다.

커널 머신

원래의 문제가 유한 차원 공간에서 언급될 수 있는 반면, 판별할 집합이 그 공간에서 선형적으로 분리될 수 없는 경우가 종종 발생한다.이러한 이유로, 원래의 유한 차원 공간을 훨씬 더 높은 차원 공간에 매핑하는 것이 제안되었고^[5], 아마도 그 공간에서 분리가 더 쉬워졌을 것이다.계산 부하를 합리적으로 유지하기 위해 SVM 스킴에 의해 사용되는 매핑은 입력 데이터 벡터 쌍의 닷 곱이 ^[6]문제에 적합하도록 선택된 $k(x,y)$ 함수 $k(x,y)$ k $k(x,y)$ y $)\display$ k $(x,$ y)\display k(x, y)\display k $(x$ , y)로 $k(x,y)$ 정의함으로써 원래 공간의 변수 측면에서 쉽게 계산될 수 있도록 설계되어 있습니다.고차원 공간의 하이퍼플레인은 해당 공간의 벡터를 갖는 점곱이 일정한 점 집합으로 정의되며, 여기서 이러한 벡터 집합은 하이퍼플레인을 정의하는 직교(따라서 최소) 벡터 집합입니다.하이퍼플레인을 정의하는 벡터는 데이터베이스에서 발생하는 특징 $x_{i}$ x $x_{i}$ 의 영상의 $\alpha _{i}$ α i \ $\alpha _{i}$ $displaystyle$ \ $alpha$ _ { $i$ $}$ 과 $\alpha _{i}$ $x_{i}$ (와) 선형 조합으로 선택할 수 있습니다.이 하이퍼플레인을 선택하면 하이퍼플레인에 매핑된 피쳐 공간의 x $(\displaystyle$ $\textstyle \sum _{i}\alpha _{i}k(x_{i},x)={\text{constant}}.$ x $)$ 는 $x$ $\textstyle \sum _{i}\alpha _{i}k(x_{i},x)={\text{constant}}.$ $\textstyle \sum _{i}\alpha _{i}k(x_{i},x)={\text{constant}}.$ $\textstyle \sum _{i}\alpha _{i}k(x_{i},x)={\text{constant}}.$ $\textstyle \sum _{i}\alpha _{i}k(x_{i},x)={\text{constant}}.$ k ( $\textstyle \sum _{i}\alpha _{i}k(x_{i},x)={\text{constant}}.$ , $\textstyle \sum _{i}\alpha _{i}k(x_{i},x)={\text{constant}}.$ x ) $=$ constant $.\$ \ $sum _{i}k$ ( $x_{i$ },x)= $text{i}$ k}의 관계에 의해 정의됩니다. $}$ Note that if $k(x,y)$ becomes small as $y$ grows further away from $x$ , each term in the sum measures the degree of closeness of the test point $x$ to the corresponding data base point $x_{i}$ . In this way, the 위의 커널 합계를 사용하여 구별할 세트 중 하나 또는 다른 세트에서 발생하는 데이터 포인트에 대한 각 테스트 포인트의 상대적 근접성을 측정할 수 있다.그 결과 하이퍼플레인에 매핑된 $포인트$ x $(\displaystyle$ x $)$ 세트가 $x$ 상당히 복잡해질 수 있으므로 원래 공간에서 볼록하지 않은 세트를 훨씬 더 복잡하게 구분할 수 있습니다.

적용들

SVM은 다음과 같은 다양한 실제 문제를 해결하기 위해 사용할 수 있습니다.

SVM은 표준 유도 설정과 ^[7]전송 설정 모두에서 라벨이 부착된 교육 인스턴스의 필요성을 크게 줄일 수 있기 때문에 텍스트 및 하이퍼텍스트 분류에 도움이 됩니다.얕은 의미 해석을 위한 일부 방법은 지원 벡터 ^[8]기계를 기반으로 합니다.
이미지 분류는 SVM을 사용하여 실행할 수도 있습니다.실험 결과에 따르면 SVM은 단지 3~4라운드의 관련성 피드백을 받은 후에 기존 쿼리 미세화 방식보다 검색 정확도가 상당히 높은 것으로 나타났습니다.이는 Vapnik이 ^[9]^[10]제안한 대로 특권 접근 방식을 사용하는 수정된 버전의 SVM을 사용하는 시스템을 포함한 이미지 분할 시스템에도 해당됩니다.
SVM을 ^[11]사용한 SAR 데이터 등의 위성 데이터 분류.
손으로 쓴 문자는 ^[12]^[13]SVM을 사용하여 인식할 수 있습니다.
SVM 알고리즘은 생물학 및 기타 과학 분야에서 광범위하게 적용되어 왔습니다.그것들은 정확히 분류된 화합물의 최대 90%를 가진 단백질을 분류하는 데 사용되어 왔다.SVM 가중치에 기초한 치환 테스트는 SVM ^[14]^[15]모델의 해석 메커니즘으로 제안되었다.서포트 벡터 머신의 가중치는 과거에도 ^[16]SVM 모델을 해석하는 데 사용되었습니다.모델이 예측을 위해 사용하는 특징을 식별하기 위해 지지-벡터 기계 모델을 사후 해석하는 것은 생물 과학에서 특별한 의미를 갖는 비교적 새로운 연구 영역이다.

역사

최초의 SVM 알고리즘은 Vapnik과 Alexey Ya에 의해 발명되었습니다. 1963년 ^{[citation needed]}체르보넨키스.1992년 베른하르트 보저, 이자벨 가이온 및 블라디미르 Vapnik은 커널 트릭을 최대 마진 하이퍼플레인에 ^[5]적용하여 비선형 분류기를 만드는 방법을 제안했다.소프트웨어 패키지에 일반적으로 사용되는 "소프트 마진"의 화신은 1993년 Corinna Cortes와 Vapnik에 의해 제안되어 ^[1]1995년에 발표되었다.

선형 SVM

2개의 클래스의 샘플로 트레이닝된 SVM의 최대 마진 하이퍼플레인 및 마진.여백에 있는 표본을 지지 벡터라고 합니다.

폼의 n개 $(style$ n개 $n$ ) $n$ 의 트레이닝 데이터 세트가 제공됩니다.

{\displaystyle(\mathbf {x} _{1} ,\ldots ,(\mathbf {x} _{n} , y_{n} ),

y_{i}

서 y

i(\

displaystyle

y_{i

})는

y_{i}

1 또는 -1 중 하나입니다.각각 x

(\

가

\mathbf {x} _{i}

속한 클래스를 나타냅니다.각

(

표시

스타일 \mathbf {x} _{i})

는

\mathbf {x} _{i}

p

(표시 스타일

p

)

-차원

p

실수 벡터입니다.

y_{i}=-1

y_{i}=1

1 {

style

y_{i

}

인

y_{i}=1

\mathbf {x} _{i}

\

y_{i}=1

\mathbf {x} _{i

} _{i

}

_{i

y_{i}=-1

의

\mathbf {x} _{i}

y_{i}=1

그룹을 하이퍼플레인과 가장

y_{i}=-1

poi 사이의 거리로 정의되는 y i

=

-

1

의

y_{i}=-1

그룹에서 나누는 "최대 지연 하이퍼플레인"을 찾고 싶다.

\mathbf {x} _{i}

\mathbf {x} _{i}

i

\mathbf {x} _{i}

\

displaystyle \mathbf {x} _{i}

가

\mathbf {x} _{i}

최대화됩니다.

모든 하이퍼플레인은 x $(\displaystyle$ \ $mathbf$ {x $})$ 를 $\mathbf {x}$ 만족하는 $\mathbf {x}$ 세트로 쓸 수 있습니다 $.$

\mathbf {w}^{\mathsf {T}}\mathbf {x} -b=0,

\mathbf {w}

서 w

\

는

\mathbf {w}

하이퍼플레인에 대한 정규 벡터(반드시 정규화되지는 않음)입니다.w

\

가

\mathbf {w}

반드시 단위 벡터는 아니라는

\mathbf {w}

을 제외하면 이는 헤세 정규형과 매우 유사합니다.

{\tfrac {b}{\|\mathbf {w} \|}}

b

{\tfrac {b}{\|\mathbf {w} \|}}

w

{\

display{

display

\

tfrac

{

b

} { \ \

mathbf

{ w

}

} parameter

{\tfrac {b}{\|\mathbf {w} \|}}

parameter parameter {\ {\ {\ {\ {\ {\ w

\

display

style \

mathbf { w

\mathbf {w}

를 따라 원점으로부터의 하이퍼플레인

\mathbf {w}

을 결정합니다.

하드 마진

훈련 데이터가 선형으로 분리될 수 있는 경우 두 클래스의 데이터를 분리하는 두 개의 평행 하이퍼플레인을 선택하여 이들 사이의 거리를 최대한 크게 할 수 있습니다.이 두 개의 하이퍼플레인에 의해 둘러싸인 영역을 "마진"이라고 하며, 최대 마진 하이퍼플레인은 이들 사이의 중간에 있는 하이퍼플레인이다.정규화 또는 표준화된 데이터 세트를 사용하면 이러한 하이퍼플레인은 다음 방정식으로 설명할 수 있습니다.

\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b=1

T

\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b=1

-

\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b=1

\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b=1

{\style

\

mathbf {w} ^{\mathsf {T}}\mathbf {x} -b=1

}

(

이 경계 위에 있는 모든 것은 라벨 1이 있는 한 클래스)

그리고.

\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b=-1

T

\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b=-1

x - b

=

-

\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b=-1

{\

displaystyle \mathbf {w} ^{\mathsf {T}}\mathbf {x} -b

=-

1}(

이 경계 위 또는 아래에 있는 모든 것은 라벨 -1이 있는 다른 클래스의 것입니다).

기하학적으로 이들 2개의 하이퍼플레인 사이의 ${\tfrac {2}{\|\mathbf {w} \|}}$ 는 ${\tfrac {2}{\|\mathbf {w} \|}}$ w ${\$ \ $displaystyle$ { $tfrac {$ 2 $}$ \ $mathbf$ { $w} \$ ^[17]이므로 $\|\mathbf {w} \|$ 하고 싶은 평면 사이의 거리를 최대화하기 위해 w ${\$ \ $displaystyle$ \ $mathbf {w}$ \ $\|\mathbf {w} \|$ the the the 까지의 거리를 사용하여 계산됩니다.또한 데이터 포인트가 마진으로 떨어지는 것을 방지해야 합니다.각 $\displaystyle$ i에 $i$ 대해 다음과 같은 제약을 추가합니다.

\displaystyle \mathbf {w}^{\mathsf {T}}\mathbf {x}_{i}-b\geq 1,{\text{}y_{i}=1,}

또는

\mathbf {w}^{\mathsf {T}}\mathbf {x} _{i}-b\leq -1,{\text{ if }}y_{i}=-1

이러한 제약 조건은 각 데이터 점이 여백의 올바른 쪽에 있어야 함을 나타냅니다.

이것은 다음과 같이 고쳐 쓸 수 있습니다.

\displaystyle y_{i}(\mathbf {w}^{\mathsf {T}}}\mathbf {x} _{i}-b)\geq 1,\quad {text{ for all }1\leq i\leq n}}

(1)

이를 종합하여 최적화 문제를 해결할 수 있습니다.

"y

y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1

(

y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1

T

y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1

-

y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1

)에

\|\mathbf {w} \|

y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1

\|\mathbf {w} \|

w ( \

displaystyle

\ mathbf

{ w }

^

{ \

mathbf {

T }

= 1

、

displaystyle

\ mathbf { t }

_

{

i

} \ mathbf {

x

} \

\|\mathbf {w} \|

i=1,\ldots ,n

1

y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1

i=1,\ldots ,n

y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1

이 문제를 해결하는 w $\$ 및 $\mathbf {w}$ $b$ { $displaystyle$ b $}$ 에 $b$ $\mathbf {x} \mapsto \operatorname {sgn}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b)$ $\mathbf {w}$ 분류자 $\mathbf {x} \mapsto \operatorname {sgn}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b)$ sgn $\mathbf {x} \mapsto \operatorname {sgn}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b)$ （ $\mathbf {x} \mapsto \operatorname {sgn}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b)$ $\mathbf {x} \mapsto \operatorname {sgn}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b)$ x - $\mathbf {x} \mapsto \operatorname {sgn}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} -b)$ ）{ $displaystyle \mathbf {x}$ \ $mapsto \operatorname {sgn}(\mathbf {w} ^{\mathbf {$ t} } } })가 $결정됩니다.$ 기능합니다.

이 기하학적 설명의 중요한 결과는 max-margin 하이퍼플레인은 그 하이퍼플레인에 가장 가까운 $xi\displaystyle\$ mathbf { $x}_{i}$ 에 $\mathbf {x} _{i}$ 의해 완전히 결정된다는 것이다.이러한 $(\displaystyle$ \ $mathbf {x} _{i})$ 는 $\mathbf {x} _{i}$ 서포트 벡터라고 불립니다.

소프트 마진

데이터를 선형으로 분리할 수 없는 경우까지 SVM을 확장하려면 힌지 손실 함수가 도움이 됩니다.

(\displaystyle \max \left(0,1-y_{i}(\mathbf {w}^{\mathsf {T}}\mathbf {x}_{i}-b)\오른쪽).}

$y_{i}$ ${\$ 는 $y_{i}$ i번째 타깃(이 경우 1 또는 -1)이며, w $\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b$ $\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b$ - $\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b$ { $display \mathbf {w}^{\mathsf {T}}\mathbf {x}$ _ ${i}-b}$ 는 $\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b$ i번째 출력입니다.

이 함수는 (1)의 제약 조건이 충족되는 경우, 즉 x i \ $displaystyle \mathbf {x} _{i}$ 가 $\mathbf {x} _{i}$ 여백의 올바른 쪽에 있는 $\mathbf {x} _{i}$ 0이 됩니다.여백의 잘못된 쪽에 있는 데이터의 경우 함수 값은 여백으로부터의 거리에 비례합니다.

그 때 최적화의 목표는 다음과 같은 요소를 최소화하는 것입니다.

\displaystyle \lvert \mathbf {w} \rVert ^{2}+\left[{\frac {1}{n}\sum _{i=1}^{n}\max \left (0,1-y_{i}(\mathbf {T}^{\mathsf {t}\b}_i}_right)

여기서 $\lambda >0$ $\lambda >0$ {\ $displaystyle \displayda$ > $0}$ 은 마진 크기를 늘리는 것과 $(\$ _ ${i})$ 가 $\mathbf {x} _{i}$ 마진의 올바른 쪽에 놓이는 것 사이의 균형을 결정합니다 $\lambda >0$ . $\lambda$ 값이 충분히 $작은$ 경우 입력 데이터가 선형으로 분류 가능한 경우 하드 마진 SVM과 동일하게 동작하지만 분류 규칙이 실행 가능한지 여부를 학습합니다(이 $\lambda$ ${\$ { displaystyle $\lambda$ $C$ $}$ 는 $\lambda$ LIBS에서는 C{ $style$ C}, eG라고도 $합니다$ ).ut는 $\lambda$ $"\display\displayda"$ 의 역수를 나타냅니다.) $\lambda$

비선형 커널

커널 머신

1963년 Vapnik에 의해 제안된 최초의 최대 마진 초평면 알고리즘은 선형 분류기를 구성했다.그러나 1992년 베른하르트 보저, 이자벨 가이온 및 블라디미르 Vapnik은 최대 마진 하이퍼플레인에 ^[5]커널 트릭(원래 아이저만 ^[18]등이 제안한)을 적용하여 비선형 분류기를 만드는 방법을 제안했다.결과 알고리즘은 모든 닷 곱이 비선형 커널 함수로 대체된다는 점을 제외하고는 형식적으로 유사합니다.이를 통해 알고리즘은 변환된 피쳐 공간에 최대 마진 하이퍼플레인을 맞출 수 있습니다.변환은 비선형이고 변환된 공간은 고차원일 수 있습니다. 분류기는 변환된 피쳐 공간의 하이퍼플레인이지만 원래 입력 공간에서는 비선형일 수 있습니다.

주목할 점은 높은 차원의 특징 공간에서 작업하면 충분한 샘플이 주어지면 알고리즘이 여전히 ^[19]잘 수행되지만 서포트 벡터 머신의 일반화 오류가 증가한다는 것이다.

일반적인 커널에는 다음과 같은 것이 있습니다.

$k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j})^{d}$ (동종): $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j})^{d}$ k ( $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j})^{d}$ i , $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j})^{d}$ ) $d=1$ ( x $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j})^{d}$ $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j})^{d}$ $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j})^{d}$ j ) $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j})^{d}$ \ $style$ k $(\mathbf {x} _{j$ })=(\ $mathbf {x} _{i$ }\ $cdot \mathbf {x}$ _{ $j})$ $=$ $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j})^{d}$ $d1$ .
$k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j}+r)^{d}$ (이종): $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j}+r)^{d}$ ( $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j}+r)^{d}$ i , $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j}+r)^{d}$ j ) $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j}+r)^{d}$ ( $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j}+r)^{d}$ $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j}+r)^{d}$ x $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j}+r)^{d}$ + $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j}+r)^{d}$ ) $k(\mathbf {x} _{i},\mathbf {x} _{j})=(\mathbf {x} _{i}\cdot \mathbf {x} _{j}+r)^{d}$ \ $displaystyle$ k $(\mathbf {x} _{i}$ ,\ $mathbf {x}$ _ ${$ j} ) = (\ $mathbf {x}$ \ $cdot \mathbf {x}$ _{ $j}$ + $r$ }^{ $d$ )
가우스 레이디얼 기저 함수:k(x,)j))expγ 을에⁡(− γ ‖)나는)− j‖ 2){\displaystyle k(\mathbf{x}_{나는},\mathbf{x}_{j})=\exp \left(-\gamma\left\ \mathbf{x}_{나는}-\mathbf{x}_{j}\right\ ^{2}\right)};0{\displaystyle \gamma>0}. 때때로 γ을 사용하여 parametrized=1/. (2σ $)(\displaystyle$ \ $displaystyle = 1$ /( $2\display ^{2$
Sigmoid 함수(하이퍼볼릭 탄젠트): $k(\mathbf {x_{i}} ,\mathbf {x_{j}} )=\tanh(\kappa \mathbf {x} _{i}\cdot \mathbf {x} _{j}+c)$ ( $k(\mathbf {x_{i}} ,\mathbf {x_{j}} )=\tanh(\kappa \mathbf {x} _{i}\cdot \mathbf {x} _{j}+c)$ , $k(\mathbf {x_{i}} ,\mathbf {x_{j}} )=\tanh(\kappa \mathbf {x} _{i}\cdot \mathbf {x} _{j}+c)$ j ) $= tanh$ $k(\mathbf {x_{i}} ,\mathbf {x_{j}} )=\tanh(\kappa \mathbf {x} _{i}\cdot \mathbf {x} _{j}+c)$ ( $k(\mathbf {x_{i}} ,\mathbf {x_{j}} )=\tanh(\kappa \mathbf {x} _{i}\cdot \mathbf {x} _{j}+c)$ $k(\mathbf {x_{i}} ,\mathbf {x_{j}} )=\tanh(\kappa \mathbf {x} _{i}\cdot \mathbf {x} _{j}+c)$ $k(\mathbf {x_{i}} ,\mathbf {x_{j}} )=\tanh(\kappa \mathbf {x} _{i}\cdot \mathbf {x} _{j}+c)$ $k(\mathbf {x_{i}} ,\mathbf {x_{j}} )=\tanh(\kappa \mathbf {x} _{i}\cdot \mathbf {x} _{j}+c)$ j + $k(\mathbf {x_{i}} ,\mathbf {x_{j}} )=\tanh(\kappa \mathbf {x} _{i}\cdot \mathbf {x} _{j}+c)$ $)표시$ { $style$ k ( \ $mathbf$ { $x$ _ { $i$ } , \ $mathbf$ { $x } )=$ \ $tanh$ ( \ $kappa \ mathbf$ { x } _ $cdot j } _ mathbf$ { x f } )

$\varphi (\mathbf {x} _{i})$ 은 $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ $\varphi (\mathbf {x} _{i})$ $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ ( $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ $\varphi (\mathbf {x} _{i})$ i $\varphi (\mathbf {x} _{i})$ ) \ $displaystyle$ \ $varphi$ ( \ $mathbf { x$ $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ } $_$ { $i$ $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ } ) $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ ( $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ i ) $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ ( $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ j ) { $displaystyle$ k ( \ $mathbf { x } _$ j } { $j$ } \ $mathbfi = varphi$ ( $\varphi (\mathbf {x} _{i})$ $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x_{j}} )$ )에 의해 관련됩니다. $값$ w는 변환된 공간에도 존재하며, ${\textstyle \mathbf {w} =\sum _{i}\alpha _{i}y_{i}\varphi (\mathbf {x} _{i})}$ $i$ ${\textstyle \mathbf {w} =\sum _{i}\alpha _{i}y_{i}\varphi (\mathbf {x} _{i})}$ i ${\textstyle \mathbf {w} =\sum _{i}\alpha _{i}y_{i}\varphi (\mathbf {x} _{i})}$ ( x ${\textstyle \mathbf {w} =\sum _{i}\alpha _{i}y_{i}\varphi (\mathbf {x} _{i})}$ ) $=$ \ $sum$ _ { $i}$ \ ${\textstyle \mathbf {w} \cdot \varphi (\mathbf {x} )=\sum _{i}\alpha _{i}y_{i}k(\mathbf {x} _{i},\mathbf {x} )}$ ${\textstyle \mathbf {w} =\sum _{i}\alpha _{i}y_{i}\varphi (\mathbf {x} _{i})}$ _ { $i} y_{i$ } \varphi ( \ $mathbf {x}$ _{ $i}$ ${\textstyle \mathbf {w} =\sum _{i}\alpha _{i}y_{i}\varphi (\mathbf {x} _{i})}$ 。분류용 w를 $갖는$ 도트 곱은 커널 w로 다시 계산할 수 있다 ${\textstyle \mathbf {w} \cdot \varphi (\mathbf {x} )=\sum _{i}\alpha _{i}y_{i}k(\mathbf {x} _{i},\mathbf {x} )}$ $yle \mathbf {w}$ \ $cdot \varphi (\mathbf {x} )=\sum$ _{ $i}\alpha$ _{ $i}y_{i}k(\mathbf {x} _{i},\mathbf {x}$ ${\textstyle \mathbf {w} \cdot \varphi (\mathbf {x} )=\sum _{i}\alpha _{i}y_{i}k(\mathbf {x} _{i},\mathbf {x} )}$

SVM 분류자 계산

(soft-margin) SVM 분류자를 계산하는 것은 형식의 표현식을 최소화하는 것과 같습니다.

\displaystyle \left [{\frac {1}{n}\sum _{i=1}^{n}\max \left (0,1-y_{i})^{\mathbf {T}}\mathbf {x}_{i}\right)+\mathbf {w}^2}.}

(2)

위에서 설명한 바와 같이 $§$ 에 $\lambda$ 대해 충분히 작은 값을 선택하면 선형 분류 가능한 입력 데이터에 대한 하드 마진 분류기가 생성되므로 소프트 마진 분류기에 초점을 맞춥니다.(2)를 2차 프로그래밍 문제로 줄이는 것을 수반하는 고전적인 접근방식은 아래에 자세히 설명되어 있습니다.그런 다음, 하위 경사 강하 및 좌표 강하와 같은 보다 최근의 접근법이 논의될 것이다.

원초

최소화(2)는 다음과 같은 방법으로 미분 가능한 목적 함수를 가진 제약된 최적화 문제로 고쳐 쓸 수 있다.

$i\in \{1,\,\ldots ,\,n\}$ $i\in \{1,\,\ldots ,\,n\}$ { $i\in \{1,\,\ldots ,\,n\}$ , $i\in \{1,\,\ldots ,\,n\}$ , n $}$ { $displaystyle$ i \ $in$ \ 1, $\ldots$ , , $n$ \ } { $displaystyle$ \ $zeta$ _ { $i }$ \ $left$ ( 0 , 1 - $\zeta _{i}=\max \left(0,1-y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\right)$ - b $\zeta _{i}=\max \left(0,1-y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\right)$ { $mathb$ { $mathf }$ 에 $i\in \{1,\,\ldots ,\,n\}$ 대해 변수 $\zeta _{i}=\max \left(0,1-y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\right)$ i $\zeta _{i}=\max \left(0,1-y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\right)$ max ( 0 $\zeta _{i}=\max \left(0,1-y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\right)$ , $\zeta _{i}=\max \left(0,1-y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\right)$ - $\zeta _{i}=\max \left(0,1-y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\right)$ i )를 도입합니다. $y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1-\zeta _{i}.$ ( $y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1-\zeta _{i}.$ T $y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1-\zeta _{i}.$ i - $y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1-\zeta _{i}.$ ) $y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1-\zeta _{i}.$ 1 $y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1-\zeta _{i}.$ - i 1 - $y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1-\zeta _{i}.$ i $y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1-\zeta _{i}.$ . $y_{i}(\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i}-b)\geq 1-\zeta _{i}.$ \ $display y$ _ { $i$ } ^ { \ $mathbf { T$ } ^ { \ $mathbf$ { t $}$ _ { $i$ } - b ) \ $geq 1-$ \ $zeta$ _ { $i$ } . $}$

따라서 최적화 문제를 다음과 같이 다시 작성할 수 있습니다.

{text{aligned}&{\text{n}}\sum _{i=1}^{n}\zeta _{i}+\displayda \mathbf {w}\[0.5ex]&{\text{i}\left {mathbf}{mathbf}{s}\s}\end { aligned}

이것은 태초의 문제라고 불립니다.

듀얼

위의 문제의 라그랑지안 쌍수를 풀면 단순화된 문제를 얻을 수 있다.

{aligned}&{\text{n},f(c_{1}\ldots c_{n}=\sum _{i=1}{2}-{i=1}^{n}\sum _{j=1}^{n}y_i}_{mathb}_{x}_{f}\end { aligned}

이것은 이중 문제라고 불립니다.이중 최대화 문제는 선형 구속조건이 적용되는 $c_{i}$ i $c_{i}$ {\ $displaystyle c_{i}$ 의 $c_i$ 2차 함수이므로 2차 프로그래밍 알고리즘으로 효율적으로 해결할 수 있다.

$c_{i}$ 서 변수 $(\$ })는 $c_{i}$ 다음과 같이 정의됩니다.

\mathbf {w} =\sum _{i=1}^{n}c_{i}y_{i}\mathbf {x}_{i}.

$c_{i}=0$ $\mathbf {x} _{i}$ 가 $\mathbf {x} _{i}$ 여백의 올바른 쪽에 있는 $\mathbf {x} _{i}$ $0<c_{i}<(2n\lambda )^{-1}$ $=$ $0<c_{i}<(2n\lambda )^{-1}$ $displaystyle$ \ $mathbf$ ${x}$ $_$ $\mathbf {x} _{i}$ ${$ $0<c_{i}<(2n\lambda )^{-1}$ _{i $}$ $0<c_{i}<(2n\lambda )^{-1}$ {i $}$ $_{$ $i$ })_ ${$ $0<c_{i}<(2n\lambda )^{-1}$ $}$ { $displaystyle$ 0 $<($ 2n $\$ $displayda$ $}^-$ 1})이 $0<c_{i}<(2n\lambda )^{-1}$ $표시됩니다.$ $\mathbf {w}$ w $\$ 는 $\mathbf {w}$ 서포트 벡터의 선형 조합으로 쓸 수 있습니다.

오프셋 $(\displaystyle$ b $b$ 는 여백 경계에서 $(\$ })를 찾아 해결하면 복구할 수 있습니다.

y_{i}(\mathbf {W}) ^{\mathbf {T} ^{\mathbf {x} _{i}-b})=1\iff b=\mathbf {w} ^{i}-y_i} } 。

( $y_{i}=\pm 1$ i $y_{i}^{-1}=y_{i}$ $y_{i}=\pm 1$ ± $y_{i}=\pm 1$ ({ $displaystyle$ $y_{i$ $}$ = y_{i $}$ $=$ $y_{i$ }) $y_{i}=\pm 1$ 이므로 $y_{i}^{-1}=y_{i}$ 유의하십시오).

커널 트릭

β(a, b) = (a, b, a² + b²)에 의해 주어진 커널을 가진 SVM의 훈련 예

변환된 데이터 포인트 $\varphi (\mathbf {x} _{i}).$ class ( x $\varphi (\mathbf {x} _{i}).$ )의 선형 분류 규칙에 대응하는 비선형 분류 규칙을 알아보겠습니다 $.\displaystyle \varphi (\mathbf {x} _{i})$ } 또한 $\varphi (\mathbf {x} _{i}).$ $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x} _{j})$ ( $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x} _{j})$ i , $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x} _{j})$ j $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x} _{j})$ ) $=$ ( $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x} _{j})$ x i ) $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x} _{j})$ ( $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x} _{j})$ i ){ $displaystyle$ k ( $x j$ $)_{i},\mathbf {x}_$ {j}=\ $varphi$ (\ $mathbf {x}_{i$ }_cdf phi)\ $varphi$ (xphi $)$ 를 $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x} _{j})$ 만족시키는 $k(\mathbf {x} _{i},\mathbf {x} _{j})=\varphi (\mathbf {x} _{i})\cdot \varphi (\mathbf {x} _{j})$ 함수 $k$ k {\ $displaysty}$ 가 주어집니다 $.$

변환된 공간의 분류 $\mathbf {w}$ w $\$ 가 다음을 $\mathbf {w}$ 만족한다는 것을 알고 있습니다.

\mathbf {w} =\sum _{i=1}^{n}c_{i}y_{i}\varphi(\mathbf {x}_{i}),

$c_{i}$ 서 c $c_{i}$ 는 최적화 문제를 해결함으로써 얻을 수 $있습니다$

j}\\{\text{n}\sum _{i=1}^{n}c_{i}y_{i}&=0,{\text{and}}\leq c_{i}\leq {frac {1}{2n\leda}}}}\;{\text{i}}}i.\end { aligned}}

2차 프로그래밍을 사용할 경우 이전과 같이 $c_{i}$ $(\$ })를 $c_{i}$ 해결할 수 있습니다.다시 0 $0<c_{i}<(2n\lambda )^{-1}$ < $0<c_{i}<(2n\lambda )^{-1}$ i $0<c_{i}<(2n\lambda )^{-1}$ < ( $0<c_{i}<(2n\lambda )^{-1}$ $0<c_{i}<(2n\lambda )^{-1}$ $0<c_{i}<(2n\lambda )^{-1}$ ) - $0<c_{i}<(2n\lambda )^{-1}$ { $display style$ 0 < $c$ _ { $i$ } < ( $2$ n \ $lambda )^{$ - $1$ 이 $0<c_{i}<(2n\lambda )^{-1}$ 인덱스 $i$ 를 찾을 수 있습니다. $\varphi (\mathbf {x} _{i})$ ( $\varphi (\mathbf {x} _{i})$ i ) \ $display$ $style$ \ $varphi$ ( \ $mathbf$ { $x }$ _ { x } _ { i $\varphi (\mathbf {x} _{i})$ } )가 $\varphi (\mathbf {x} _{i})$ 변환된 공간의 경계에 놓여집니다.

{displaystyle {mathbf {w}^{\mathbf {T}}\varphi(\mathbf {i}_{i})-y_{j}c_{j}\varphi(\mathbf {x}_t}_varphi)\end { aligned}

마침내.

\displaystyle \mathbf {z} \mapsto \operatorname {sgn}(\mathbf {w}^{\mathbf {T})\varphi(\mathbf {z}-b)=\operatorname {sgn} \left(\sum _ i=1}^{n}c_i_y(xf)}

현대적 방법

SVM 분류기를 찾기 위한 최근 알고리즘에는 하위 경사 강하 및 좌표 강하가 포함된다.두 기법 모두 크고 희박한 데이터셋을 처리할 때 기존 접근 방식보다 상당한 이점을 제공하는 것으로 입증되었습니다. 하위 단계적 방법은 교육 예가 많을 때 특히 효율적이며 기능 공간의 치수가 높을 때 하강 방식을 조정합니다.

준경사 강하

SVM의 하위 경사 강하 알고리즘은 식과 직접 연동됩니다.

f(\mathbf {w},b)=\left[{i=1}^{n}\max \left (0,1-y_{i}^{\mathbf {T}}\mathbf {x}_{i}\right}\mathbfda\mathbf}\light]

${displaystyle$ f $}$ 는 $f$ w ${$ {w $}$ $및$ b{ $displaystyle$ b $b$ 의 볼록함수이므로 기존의 구배 강하(또는 SGD) 방식을 적용할 수 있으며, 함수의 구배 방향으로 스텝을 밟지 않고 func에서 선택한 벡터의 방향으로 스텝을 밟는다.부제목입니다.이 접근방식은 특정 구현의 경우 반복 횟수가 데이터 포인트 ^[20]수인 n $\displaystyle$ n으로 $n$ 되지 않는다는 장점이 있습니다.

좌표 하강

이중 문제에서 SVM을 위한 하강 알고리즘 조정

{argined}&{\text{n},f(c_{1}\ldots c_{n}=\sum _{i=1}{2}-{sum _{j=1}^{n}\sum _{j1}^{n}y_cd{i}(x})\end { aligned}

각 $i\in \{1,\,\ldots ,\,n\}$ $i\in \{1,\,\ldots ,\,n\}$ { $i\in \{1,\,\ldots ,\,n\}$ , $i\in \{1,\,\ldots ,\,n\}$ , $i\in \{1,\,\ldots ,\,n\}$ $}$ { $displaystyle$ i \ $in$ \ { $1$ , \ $ldots$ , , $n$ \ $\partial f/\partial c_{i}$ $i\in \{1,\,\ldots ,\,n\}$ } each 、 $c_{i}$ $c_{i}$ i { \ $displaystyle$ $\partial f/\partial c_{i}$ c $_$ { $i$ $\partial f/\partial c_{i}$ } is $c_{i}$ $\partial f/\partial c_{i}$ of of $\partial f/\partial c_{i}$ of of of of of of of of for for for for for c / $\partial f/\partial c_{i}$ of $\partial f/\partial c_{i}$ i \ $displaystyle$ f / \ $partial$ c $_$ { i $\partial f/\partial c_{i}$ }방향으로 조정됩니다. $(c_{1}',\,\ldots ,\,c_{n}')$ 다음, $(c_{1}',\,\ldots ,\,c_{n}')$ 계수의 결과 $(c_{1}',\,\ldots ,\,c_{n}')$ ( $(c_{1}',\,\ldots ,\,c_{n}')$ 1 $(c_{1}',\,\ldots ,\,c_{n}')$ , $(c_{1}',\,\ldots ,\,c_{n}')$ , $(c_{1}',\,\ldots ,\,c_{n}')$ n ${\$ ) { $displaystyle$ ( $c$ _ {1} , \ $ldots$ , , $c$ _ { n $(c_{1}',\,\ldots ,\,c_{n}')$ } )는 $(c_{1}',\,\ldots ,\,c_{n}')$ 주어진 제약을 충족하는 계수의 가장 가까운 벡터에 투영됩니다.(일반적으로 유클리드 거리가 사용됩니다.)그런 다음 최적에 가까운 계수 벡터를 얻을 때까지 이 과정을 반복합니다.성능 보장은 거의 ^[21]검증되지 않았지만 결과 알고리즘은 실제로 매우 빠릅니다.

경험적 리스크 최소화

위에서 설명한 소프트 마진 지원 벡터 기계는 힌지 손실에 대한 경험적 위험 최소화(ERM) 알고리즘의 예입니다.이와 같이 서포트 벡터 머신은 통계적 추론을 위한 알고리즘의 자연스러운 클래스에 속하며, 그 고유한 특징의 대부분은 힌지 손실의 동작에 기인한다.이러한 관점을 통해 SVM의 작동 방식과 이유에 대한 자세한 정보를 얻을 수 있으며, SVM의 통계 속성을 더 잘 분석할 수 있습니다.

리스크 최소화

지도 학습에서는 라벨 $y_{1}\ldots y_{n}$ 이 붙은 일련의 훈련 $X_{1}\ldots X_{n}$ $($ X_ ${1}\ldots$ X_ ${n})$ 이 $X_{1}\ldots X_{n}$ 제공되며, $y_{n+1}$ $($ $}\$ $ldots y_{n$ 이 $y_{n+1}$ 제공되며, $1($ $y_{$ $n$ 을 $y_{n+1}$ 하고자 합니다.이를 위해 f $f(X_{n+1})$ n + $f(X_{n+1})$ ) $f(X_{n+1})$ { $style$ f $($ $X_{n+1})$ 가 $f(X_{n+1})$ $y_{n+1}$ + $y_{n+1}$ 의 "좋은" 근사치인 $f(X_{n+1})$ ( $xisplaystyle$ $y_{n+1$ $f$ 을 형성합니다. 일반적으로 "좋은" 근사치는 $\ell (y,z)$ 함수 $\ell (y,z)$ ( $\ell (y,z)$ , $zisplay style$ ) $\ell (y,z)$ \ $\ell (y,z)$ ellzy $\ell (y,z)$ $f$ ( \ displaysty )의 도움을 받아 정의됩니다. $d$ z(\ $displaystyle$ z $)$ 는 $z$ y $(\displaystyle$ y $y$ 의 예측입니다. 그런 다음 예상되는 위험을 최소화하는 가설을 선택합니다.

(\displaystyle \varepsilon (f)=\mathbb {E} \left[\ell (y_{n+1},f(X_{n+1})]\right.}

대부분의 경우 X $X_{n+1},\,y_{n+1}$ + $X_{n+1},\,y_{n+1}$ , y $X_{n+1},\,y_{n+1}$ + $X_{n+1},\,y_{n+1}$ ({ $display style$ X_ ${n+1}, y_{n+1})$ 의 $X_{n+1},\,y_{n+1}$ $X_{n+1},\,y_{n+1}$ 분포를 알 수 없습니다.이러한 경우 일반적인 전략은 경험적 위험을 최소화하는 가설을 선택하는 것입니다.

(\displaystyle {\hat {varepsilon }}(f)=sum _{k=1}^{n}\ell(y_{k},f(X_{k})).}

랜덤 $X_{k},\,y_{k}$ $X_{k},\,y_{k}$ k $X_{k},\,y_{k}$ y k $($ 유한 마르코프 프로세스에 의해 생성된다는)의 시퀀스에 대한 특정 가정 하에서 고려 중인 가설 집합이 충분히 작다면 경험적 위험의 최소화는 다음과 같이 예상되는 위험의 최소화에 근접할 것이다. $\displaystyle$ n이 $n$ 커집니다.이 접근방식을 경험적 리스크 최소화(ERM)라고 부릅니다.

정규화와 안정성

최소화 문제가 명확한 해법을 가지기 위해서는 고려 중인 가설 ${\mathcal {H}}$ H(\style $\display\mathcal {H})$ 에 ${\mathcal {H}}$ 제약을 가해야 합니다.H $(\$ 가 ${\mathcal {H}}$ 표준공간(SVM의 경우와 마찬가지로)인 ${\mathcal {H}}$ 특히 효과적인 방법은 f $\lVert f\rVert _{\mathcal {H}}<k$ f $)$ $f$ 만 $f$ 고려하는 것입니다 $\lVert f\rVert _{\mathcal {H}}<k$ $\lVert f\rVert _{\mathcal {H}}<k$ (\ $displaystyle f\rVert_{\mathcal {H}<k}).$ This is equivalent to imposing a regularization penalty ${\mathcal {R}}(f)=\lambda _{k}\lVert f\rVert _{\mathcal {H}}$ , and solving the new optimization problem

{hat {f}=\mathrm {pin} {f} {in\mathcal {H}} {\hat {varepsilon}}(f) + {\mathcal {R}}(f)

이 방법을 티코노프 정규화라고 합니다.

보다 일반적으로 R $)$ { $style$ { $R$ $f$ $f$ $)$ 은 ${\mathcal {R}}(f)$ $f$ 의 복잡성을 측정하는 수단이 될 수 있으므로 단순한 가설이 선호된다.

SVM 및 힌지 손실

(soft-margin) SVM ${\hat {\mathbf {w} }},b:\mathbf {x} \mapsto \operatorname {sgn}({\hat {\mathbf {w} }}^{\mathsf {T}}\mathbf {x} -b)$ w ${\hat {\mathbf {w} }},b:\mathbf {x} \mapsto \operatorname {sgn}({\hat {\mathbf {w} }}^{\mathsf {T}}\mathbf {x} -b)$ , ${\hat {\mathbf {w} }},b:\mathbf {x} \mapsto \operatorname {sgn}({\hat {\mathbf {w} }}^{\mathsf {T}}\mathbf {x} -b)$ : x ${\hat {\mathbf {w} }},b:\mathbf {x} \mapsto \operatorname {sgn}({\hat {\mathbf {w} }}^{\mathsf {T}}\mathbf {x} -b)$ sgn ${\hat {\mathbf {w} }},b:\mathbf {x} \mapsto \operatorname {sgn}({\hat {\mathbf {w} }}^{\mathsf {T}}\mathbf {x} -b)$ ( ${\hat {\mathbf {w} }},b:\mathbf {x} \mapsto \operatorname {sgn}({\hat {\mathbf {w} }}^{\mathsf {T}}\mathbf {x} -b)$ ^ ${\hat {\mathbf {w} }},b:\mathbf {x} \mapsto \operatorname {sgn}({\hat {\mathbf {w} }}^{\mathsf {T}}\mathbf {x} -b)$ ${\hat {\mathbf {w} }},b:\mathbf {x} \mapsto \operatorname {sgn}({\hat {\mathbf {w} }}^{\mathsf {T}}\mathbf {x} -b)$ - ${\hat {\mathbf {w} }},b:\mathbf {x} \mapsto \operatorname {sgn}({\hat {\mathbf {w} }}^{\mathsf {T}}\mathbf {x} -b)$ b ) \ $displaystyle { mathbf { x }$ , b : \ $mathbf$ { x } \ $mapsto$ \ $operatorname$ { $sgn$ } ( { \ $hat$ { $mathbf$ { $w }$ } } } } ） } { $mathbf$ { sgn} { $mathbf$ } { $sgn$ } } } } } 。

\displaystyle \left [{\frac {1}{n}\sum _{i=1}^{n}\max \left (0,1-y_{i}(\mathbf {t})^{\mathbf {x} -b)\right}+\cladda \mathbf {w} \mathbf {w}\{n2}.}

위의 논의에 비추어 볼 때, 우리는 SVM 기법이 Tikhonov 정규화에 의한 경험적 위험 최소화와 동등하다는 것을 알 수 있다. 이 경우 손실 함수는 힌지 손실이다.

\displaystyle \ell(y,z)=\max \left(0,1-yz\오른쪽).}

이러한 관점에서 SVM은 정규화된 최소 제곱 및 로지스틱 회귀와 같은 다른 기본 분류 알고리즘과 밀접하게 관련되어 있습니다.세 가지 차이는 손실 함수의 선택에 있다. 정규화된 최소 손실량은 제곱 손실로 경험적 위험 최소화에 $\ell _{sq}(y,z)=(y-z)^{2}$ 한다. δ $\ell _{sq}(y,z)=(y-z)^{2}$ q ( $\ell _{sq}(y,z)=(y-z)^{2}$ , $\ell _{sq}(y,z)=(y-z)^{2}$ ) $\ell _{sq}(y,z)=(y-z)^{2}$ ( $\ell _{sq}(y,z)=(y-z)^{2}$ - $\ell _{sq}(y,z)=(y-z)^{2}$ ) $\ell _{sq}(y,z)=(y-z)^{2}$ \ $display$ style \ $ell$ _{ ell $（ y$ , z ) = ( y - z ) = ( y - z ) 2 ; $logistory$ regressional $、$ loss $\ell _{sq}(y,z)=(y-z)^{2}$

\displaystyle _{\log}(y,z)=\ln(1+e^{-yz}).}

대상 기능

힌지 손실과 이러한 다른 손실 함수의 차이는 목표 함수, 즉 주어진 $X,\,y$ 변수 X $X,\,y$ $,\displaystyle$ X $,y$ 쌍에 대한 예상 위험을 최소화하는 함수에서 가장 잘 설명됩니다.

특히, y $y_{x}$ { $displaystyle y_{x}$ 는 $y_{x}$ X $X=x$ { $displaystyle$ X $=x$ 의 $X=x$ 에 따라 y{ $displaystyle$ y $}$ 를 $y$ $나타냅니다$ .분류 설정에는 다음이 있습니다.

y_{x}=sys}1&{\text{확률}}p_{x}\-1&{\text{확률}}1-p_{x}\end{cases}}}

따라서 최적의 분류자는 다음과 같습니다.

\displaystyle f^{*}(x)=p_{x}\geq 1/2\{\text{cases}\end{cases}}

제곱 손실의 경우 대상 함수는 조건부 기대 $f_{sq}(x)=\mathbb {E} \left[y_{x}\right]$ f $s$ q ( $x$ ) $f_{sq}(x)=\mathbb {E} \left[y_{x}\right]$ [ $y$ x $]$ = \ $mathbb$ { $E }$ \ $left$ [ y $_$ { $x$ } \ $right$ ]}이고 $f_{sq}(x)=\mathbb {E} \left[y_{x}\right]$ 로지스틱 손실의 $f_{\log }(x)=\ln \left(p_{x}/({1-p_{x}})\right)$ $f_{\log }(x)=\ln \left(p_{x}/({1-p_{x}})\right)$ ( $f_{\log }(x)=\ln \left(p_{x}/({1-p_{x}})\right)$ ) $f_{\log }(x)=\ln \left(p_{x}/({1-p_{x}})\right)$ $f_{\log }(x)=\ln \left(p_{x}/({1-p_{x}})\right)$ $f_{\log }(x)=\ln \left(p_{x}/({1-p_{x}})\right)$ x $f_{\log }(x)=\ln \left(p_{x}/({1-p_{x}})\right)$ 입니다. $x})\right$ 이 두 대상 함수는 $\operatorname {sgn}(f_{sq})=\operatorname {sgn}(f_{\log })=f^{*}$ 올바른 분류자를 제공하지만, sgn $\operatorname {sgn}(f_{sq})=\operatorname {sgn}(f_{\log })=f^{*}$ ( $\operatorname {sgn}(f_{sq})=\operatorname {sgn}(f_{\log })=f^{*}$ $\operatorname {sgn}(f_{sq})=\operatorname {sgn}(f_{\log })=f^{*}$ q ) $\operatorname {sgn}(f_{sq})=\operatorname {sgn}(f_{\log })=f^{*}$ $\operatorname {sgn}(f_{sq})=\operatorname {sgn}(f_{\log })=f^{*}$ （ $\operatorname {sgn}(f_{sq})=\operatorname {sgn}(f_{\log })=f^{*}$ log $\operatorname {sgn}(f_{sq})=\operatorname {sgn}(f_{\log })=f^{*}$ ） $\operatorname {sgn}(f_{sq})=\operatorname {sgn}(f_{\log })=f^{*}$ $\operatorname {sgn}(f_{sq})=\operatorname {sgn}(f_{\log })=f^{*}$ { $displaystyle \operatorname {sgn}(f_{n$ })=\operatorname {sgn $}(f_$ {\ $log$ }) = $f$ 로서 우리가 필요로 하는 것보다 더 많은 정보를 제공합니다.실제로 y $(\$ 의 $y_{x}$ 를 완전히 설명하기에 충분한 정보를 얻을 수 있습니다.

한편, 힌지 손실의 목표 함수가 $f^{*}$ f ${\$ {\ $displaystyle$ f $f^{*}$ 인 것을 확인할 수 있습니다.따라서 충분히 풍부한 가설 공간(또는 적절히 선택된 커널의 경우)에서 SVM 분류기는 다음과 같은 가장 단순한 함수( $\$ 로 수렴합니다.rectly는 데이터를 분류합니다.이것은 SVM의 기하학적 해석을 확장한다. 선형 분류의 경우, 경험적 위험은 지지 벡터 사이에 여백이 있는 함수에 의해 최소화되며, 그 중 가장 단순한 것이 최대 마진 ^[22]분류기이다.

특성.

SVM은 일반화된 선형 분류기 패밀리에 속하며 퍼셉트론의 확장으로 해석될 수 있습니다.그들은 또한 티코노프 정규화의 특별한 경우로 간주될 수 있다.특별한 특성은 경험적 분류 오류를 최소화하는 동시에 기하학적 여유를 최대화하는 것입니다. 따라서 최대 여백 분류기로도 알려져 있습니다.

Meyer, Leisch 및 Hornik은 ^[23]SVM을 다른 분류기와 비교했습니다.

파라미터 선택

SVM의 유효성은 커널의 선택, 커널의 파라미터 및 소프트 마진 $\lambda$ {\{ display style \ $lambda$ 에 따라 달라집니다.일반적으로 가우스 커널은 단일 $\gamma$ ${\$ { $display$ $\gamma$ style \ $gamma$ 를 $\lambda$ 가지고 $\lambda$ .「 \ display style \ $lambda$ }와 $（$ \ $display$ style \ $gamma ）$ 의 $\gamma$ 최적의 조합은 다음과 같습니다. ${\$ $\lambda \in \{2^{-5},2^{-3},\dots ,2^{13},2^{15}\}$ often $\lambda$ often （ \ $displaystyle$ \ $displayda$ $\lambda \in \{2^{-5},2^{-3},\dots ,2^{13},2^{15}\}$ $）$ $\gamma \in \{2^{-15},2^{-13},\dots ,2^{1},2^{3}\}$ $\lambda \in \{2^{-5},2^{-3},\dots ,2^{13},2^{15}\}$ （ \ displaystyle \ $displaystyle$ \ displaystyle \ $5$ ） $\lambda \in \{2^{-5},2^{-3},\dots ,2^{13},2^{15}\}$ 2 - $\lambda \in \{2^{-5},2^{-3},\dots ,2^{13},2^{15}\}$ 、 $2$ - $）$ $\gamma \in \{2^{-15},2^{-13},\dots ,2^{1},2^{3}\}$ $\lambda \in \{2^{-5},2^{-3},\dots ,2^{13},2^{15}\}$ 2 $\lambda \in \{2^{-5},2^{-3},\dots ,2^{13},2^{15}\}$ , 2 $\lambda \in \{2^{-5},2^{-3},\dots ,2^{13},2^{15}\}$ （ $displaystyle$ \ $displaystyle$ \ 2 ^ { $5$ } ）、 2 、 $3 } }$ $\lambda \in \{2^{-5},2^{-3},\dots ,2^{13},2^{15}\}$ $\displaystyle \gamma \in$ \{ $2^{-13,\dots,2^{1},2^{3$ 일반적으로 상호 검증을 사용하여 파라미터 선택의 각 조합을 확인하고 상호 검증 정확도가 가장 높은 파라미터를 선택합니다.또는 최근 베이지안 최적화 작업을 $\lambda$ 하여 ${\$ {\ $displaystyle \lambda$ } $\gamma$ ${\$ {\ $displaystyle \gamma$ } $\gamma$ 를 선택할 수 있으며 그리드 검색보다 훨씬 적은 파라미터 조합의 평가가 필요한 경우가 많다.테스트 및 새 데이터 분류에 사용되는 최종 모델은 선택한 ^[24]매개 변수를 사용하여 전체 교육 세트에 대해 교육됩니다.

문제들

SVM의 잠재적인 결점은 다음과 같습니다.

입력 데이터의 완전한 라벨 부착 필요
보정되지 않은 클래스 멤버십 확률:SVM은 유한 데이터에 대한 확률 추정을 회피하는 Vapnik의 이론에서 비롯됩니다.
SVM은 2개의 클래스 태스크에만 직접 적용할 수 있습니다.따라서 멀티클래스 태스크를 몇 가지 바이너리 문제로 줄이는 알고리즘을 적용해야 합니다.멀티클래스 SVM 섹션을 참조하십시오.
해결된 모형의 모수는 해석하기 어렵습니다.

내선번호

서포트 벡터 클러스터링(SVC)

SVC는 커널 기능을 기반으로 하는 유사한 방법이지만 비지도 ^{[citation needed]}학습에 적합합니다.

멀티클래스 SVM

멀티클래스 SVM은 서포트 벡터 머신을 사용하여 인스턴스에 라벨을 할당하는 것을 목적으로 합니다.여기서 라벨은 여러 요소의 유한 집합에서 가져옵니다.

이를 위한 주요 접근법은 단일 멀티클래스 문제를 여러 이진 ^[25]분류 문제로 줄이는 것입니다.이러한 감소를 위한 일반적인 방법은 다음과 같습니다.^[25]^[26]

라벨 중 하나와 나머지(1 대 모든 것) 또는 모든 클래스 쌍(1 대 1)을 구별하는 바이너리 분류자 구축.one-versus-all 케이스의 새로운 인스턴스 분류는 winner-takes-all 전략에 의해 이루어집니다.이 전략에서는 가장 높은 출력 함수를 가진 분류자가 클래스를 할당합니다(동일한 점수를 생성하도록 출력 함수를 교정하는 것이 중요합니다).1 대 1 접근방식의 경우 분류는 max-wins 투표전략에 의해 이루어집니다.이 전략에서는 모든 분류자가 인스턴스를 2개의 클래스 중 하나에 할당하고 할당된 클래스에 대한 투표가 1표 증가하며 최종적으로 가장 많은 표를 가진 클래스가 인스턴스 분류를 결정합니다.
방향 비순환 그래프 SVM(DAGSVM)^[27]
오류 수정 출력 코드^[28]

Crammer와 Singer는 다중 클래스 분류 문제를 다중 이진 분류 ^[29]문제로 분해하는 대신 단일 최적화 문제로 주조하는 다중 클래스 SVM 방식을 제안했다.Lee, Lin과^[30]^[31] Wahba, Van den Burg와 ^[32]Groenen도 참조하십시오.

트랜스덕티브 서포트 벡터 머신

전이성 지원-벡터 기계는 부분적으로 레이블이 지정된 데이터를 변환 원리를 준수하여 반감독 학습에서도 처리할 수 있다는 점에서 SVM을 확장한다.여기서는 교육 ${\mathcal {D}}$ D $(\displaystyle\mathcal$ { $D$ 외에 학습자에게도 세트가 제공됩니다.

{D}^{\star}=\{\mathbf {x}_{i}^{\star}\mid \mathbf {x}_{i}^{\star}\in \mathbbb {R}^{p}\}_{i=1}^{k}}}\mathbf {x

분류할 테스트 예제의 경우.형식적으로 트랜지티브서포트 벡터머신은 다음과 같은 초기 최적화 ^[33]문제에 의해 정의됩니다.

최소화( $\mathbf {w} ,b,\mathbf {y} ^{\star }$ $\mathbf {w} ,b,\mathbf {y} ^{\star }$ $\mathbf {w} ,b,\mathbf {y} ^{\star }$ ${\$ { $displaystyle \mathbf$ { $w$ } , $b$ , \ $mathbf$ { $y$ } ^ { \ $star }$ )

{\displaystyle\frac{1}{2}}\mathbf {w}\^{2}}

( $i=1,\dots ,n$ $i=1,\dots ,n$ , $i=1,\dots ,n$ { $displaystyle$ i $=1,\display,n}$ 및 $i=1,\dots ,n$ j $j=1,\dots ,k$ , $j=1,\dots ,k$ k { $displaystyle$ j $=1,\display,k$ 의 경우)에 따릅니다.

({displaystyle {displaystyle {mathbf {w} \cdot \mathbf {x} _{i} - b) \geq 1,\&y_{j}^{\star } _{j} \cdot \mathbf {x} _{\star } - b ） geq 、 {nend } )

그리고.

y_{j}^{\star}\in\{-1,1\}.

1998년 블라디미르 N. Vapnik에 의해 전도성 서포트 벡터 기계가 도입되었습니다.

구조화된 SVM

SVM은 구조화된 SVM으로 일반화되어 라벨 공간이 구조화되고 크기가 무한할 수 있습니다.

회귀

임계값이 다른 서포트 벡터 회귀(예측).increases가 증가할수록 예측은 오류에 덜 민감해진다.

회귀를 위한 SVM 버전은 1996년 블라디미르 N. Vapnik, Harris Drucker, Christopher J. C. Burges, Linda Kaufman 및 Alexander J.^[34] Smola에 의해 제안되었다.이 방법을 Support-Vector Regression(SVR; 지원 벡터 회귀)이라고 합니다.모델을 구축하기 위한 비용 함수는 한계를 벗어난 교육 지점에 대해 신경 쓰지 않기 때문에 지지-벡터 분류에 의해 생성된 모형은 교육 데이터의 부분 집합에만 의존합니다.이와 유사하게 모델 구축 비용 함수는 모델 예측에 가까운 모든 훈련 데이터를 무시하기 때문에 SVR에 의해 생산된 모델은 훈련 데이터의 하위 집합에만 의존합니다.Suykens와 Vandewalle은 ^[35]최소 제곱 지원 벡터 머신(LS-SVM)으로 알려진 또 다른 SVM 버전을 제안했습니다.

원래의 SVR 트레이닝은 문제를 해결하는 것을 의미합니다^[36].

{\tfrac {1}{2}}\|w\|^{2}

{\tfrac {1}{2}}\|w\|^{2}

~ 2

{\tfrac {1}{2}}\|w\|^{2}

2

{\tfrac {1}{2}}\|w\|^{2}

(\

style

{

tfrac {1}

{

2}}

\ w \

^

{

2}

)

|y_{i}-\langle w,x_{i}\rangle -b|\leq \varepsilon

i

|y_{i}-\langle w,x_{i}\rangle -b|\leq \varepsilon

-

|y_{i}-\langle w,x_{i}\rangle -b|\leq \varepsilon

w ,

|y_{i}-\langle w,x_{i}\rangle -b|\leq \varepsilon

i

|y_{i}-\langle w,x_{i}\rangle -b|\leq \varepsilon

- b

{\display

{ displaystyle

y _

{

i

} - \

rangle

-

b

\

leq

\

varepsilon }

에

|y_{i}-\langle w,x_{i}\rangle -b|\leq \varepsilon

합니다.

$x_{i}$ 서 x i $(\$ })는 $x_{i}$ $y_{i}$ $(\$ 의 훈련 샘플입니다.내부 곱 + 절편 $\langle w,x_{i}\rangle +b$ w , $\langle w,x_{i}\rangle +b$ $\langle w,x_{i}\rangle +b$ $\langle w,x_{i}\rangle +b$ + $\langle w,x_{i}\rangle +b$ ( \ $displaystyle$ \ $langle$ w , $x _$ { i } \ $rangle + b$ )는 $\langle w,x_{i}\rangle +b$ 해당 샘플의 예측입니다. $또,$ 「\ displaystyle \ $varepsilon$ $\varepsilon$ $」$ 는 $\varepsilon$ , 임계치로서 기능하는 프리 파라미터입니다.모든 예측은, true 예측의 $범위내$ 에 $\varepsilon$ 있을 필요가 있습니다.보통 slack 변수는 에러를 허용하고 위의 문제를 실행할 수 없는 경우 근사치를 허용하기 위해 위에 추가됩니다.

베이지안 SVM

2011년에 Polson과 Scott는 SVM이 데이터 ^[37]증강 기술을 통해 베이지안 해석을 허용한다는 것을 보여주었다.이 접근법에서는 SVM을 그래픽 모델(모수들이 확률 분포를 통해 연결됨)로 간주합니다.이 확장 뷰를 통해 유연한 피쳐 모델링, 자동 하이퍼 파라미터 조정 및 예측 불확도 정량화 등의 베이지안 기술을 SVM에 적용할 수 있습니다.최근에는 확장 가능한 버전의 베이지안 SVM이 Florian Wenzel에 의해 개발되어 빅데이터에 ^[38]베이지안 SVM을 적용할 수 있게 되었습니다.Florian Wenzel은 베이지안 커널 지원 벡터 머신(SVM)을 위한 변동 추론(VI) 체계와 선형 베이지안 SVM을 ^[39]위한 확률 버전(SVI)의 두 가지 다른 버전을 개발했습니다.

실행

최대 마진 하이퍼플레인 파라미터는 최적화를 해결함으로써 도출됩니다.SVM에서 발생하는 2차 프로그래밍(QP) 문제를 신속하게 해결하기 위한 몇 가지 전문 알고리즘이 있으며, 대부분 문제를 보다 작고 관리하기 쉬운 청크로 분석하기 위한 휴리스틱에 의존합니다.

또 다른 접근법은 뉴턴과 같은 반복을 사용하여 원시 및 이중 문제의 [40]카루시-쿤-터커 조건의 해답을 찾는 것이다.이 접근방식은 일련의 고장난 문제를 해결하는 것이 아니라 문제를 직접 해결합니다.큰 커널 행렬을 포함하는 선형 시스템을 해결하는 것을 피하기 위해, 행렬에 대한 낮은 순위 근사치가 커널 트릭에서 종종 사용됩니다.

또 다른 일반적인 방법은 Platt의 순차적 최소 최적화(SMO) 알고리즘으로, 문제를 분석적으로 해결되는 2차원 하위 문제로 분해하여 수치 최적화 알고리즘과 매트릭스 스토리지가 필요하지 않습니다.이 알고리즘은 개념적으로 단순하고 구현이 용이하며 일반적으로 속도가 빠르며 어려운 SVM ^[41]문제에 대한 확장성이 우수합니다.

선형 지지-벡터 기계의 특수한 경우는 가까운 친척인 로지스틱 회귀를 최적화하는 데 사용되는 것과 동일한 종류의 알고리즘으로 보다 효율적으로 해결할 수 있다. 이 알고리즘의 등급에는 준경사 강하(예: PEGASOS^[42])와 좌표 강하(예^[43]: LIBLINARE)가 포함된다.LIBLINARE에는 몇 가지 매력적인 훈련 시간 속성이 있습니다.각 컨버전스 반복은 트레인 데이터를 읽는 데 걸리는 시간에 선형으로 시간이 걸리며, 반복은 Q-선형 컨버전스 특성도 가지고 있기 때문에 알고리즘이 매우 빠릅니다.

일반 커널 SVM은 특히 병렬화가 허용되는 경우 하위 경사 강하(예: P-packSVM^[44])를 사용하여 보다 효율적으로 해결할 수 있습니다.

커널 SVM은 LIBSVM, MATLAB, SAS, SVMlight, kernlab, skikit-learn, 쇼군, Weka, Shark, JKernel Machines, OpenCV 등을 포함한 많은 머신러닝 툴킷에서 사용할 수 있습니다.

분류의 ^[45]정확성을 높이기 위해 데이터의 전처리(표준화)를 강력히 권장한다.표준화 방법에는 min-max, 10진수에 의한 정규화, Z-score ^[46]등이 있습니다.평균과 각 피쳐의 분산에 의한 나눗셈은 보통 ^[47]SVM에 사용됩니다.

「」를 참조해 주세요.

현장 적응형 표
커널 머신
피셔 커널
플랫 스케일링
다항식 커널
예측 분석
서포트 벡터 머신의 정규화 관점
SVM과 기능 형태가 동일한 확률론적 스파스 커널 모델인 관련성 벡터 머신
순차적 최소 최적화
공간 매핑
Winnow (알고리즘)

레퍼런스

^ ^a ^b ^c Cortes, Corinna; Vapnik, Vladimir (1995). "Support-vector networks" (PDF). Machine Learning. 20 (3): 273–297. CiteSeerX 10.1.1.15.9362. doi:10.1007/BF00994018. S2CID 206787478.
^ Ben-Hur, Asa; Horn, David; Siegelmann, Hava; Vapnik, Vladimir N. ""Support vector clustering" (2001);". Journal of Machine Learning Research. 2: 125–137.
^ "1.4. Support Vector Machines — scikit-learn 0.20.2 documentation". Archived from the original on 2017-11-08. Retrieved 2017-11-08.
^ Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2008). The Elements of Statistical Learning : Data Mining, Inference, and Prediction (PDF) (Second ed.). New York: Springer. p. 134.
^ ^a ^b ^c Boser, Bernhard E.; Guyon, Isabelle M.; Vapnik, Vladimir N. (1992). "A training algorithm for optimal margin classifiers". Proceedings of the fifth annual workshop on Computational learning theory – COLT '92. p. 144. CiteSeerX 10.1.1.21.3818. doi:10.1145/130385.130401. ISBN 978-0897914970. S2CID 207165665.
^ Press, William H.; Teukolsky, Saul A.; Vetterling, William T.; Flannery, Brian P. (2007). "Section 16.5. Support Vector Machines". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8. Archived from the original on 2011-08-11.
^ Joachims, Thorsten (1998). "Text categorization with Support Vector Machines: Learning with many relevant features". Machine Learning: ECML-98. Lecture Notes in Computer Science. Springer. 1398: 137–142. doi:10.1007/BFb0026683. ISBN 978-3-540-64417-0.
^ Pradhan, Sameer S. 등"서포트 벡터 머신을 사용하여 의미 해석을 허용합니다."컴퓨터 언어학 협회 북미 지부 HLT-NAACL 2004의 인간 언어 기술 회의의 진행.2004.
^ Vapnik, Vladimir N:초대 연사.IPMU 정보처리 및 관리 2014).
^ 토했어, 로렌"이미지 분할을 위한 반복적인 퍼지-결정-결정에 사용되는 공간-분류 정보 과립"세분화된 컴퓨팅과 의사결정.Springer International Publishing, 2015. 285~318.
^ A. Maity (2016). "Supervised Classification of RADARSAT-2 Polarimetric Data for Different Land Features". arXiv:1608.00501 [cs.CV].
^ DeCoste, Dennis (2002). "Training Invariant Support Vector Machines" (PDF). Machine Learning. 46: 161–190. doi:10.1023/A:1012454411458. S2CID 85843.
^ Maitra, D. S.; Bhattacharya, U.; Parui, S. K. (August 2015). "CNN based common approach to handwritten character recognition of multiple scripts". 2015 13th International Conference on Document Analysis and Recognition (ICDAR): 1021–1025. doi:10.1109/ICDAR.2015.7333916. ISBN 978-1-4799-1805-8. S2CID 25739012.
^ Gaonkar, Bilwaj; Davatzikos, Christos; "지원 벡터 머신 기반의 다변수 이미지 분석 및 분류를 위한 통계적 유의성 맵의 분석적 추정"
^ 쿠잉넷, 레미, 로소, 샬롯, 추팽, 마리, 스테판, 도르몽, 디디에, 베날리, 하비브, 샘슨, 이브, 올리비에, "의료용 뇌졸중과 연관된 확산 결과 검출을 위한 SVM의 공간적 정규화"
^ Statnikov, Alexander; Hardin, Douglas; & Aliferis, Constantin; (2006); "SVM 체중 기반 방법을 사용하여 인과적 관련성 및 비원인적 관련 변수 식별", 서명, 1, 4.
^ "Why is the SVM margin equal to ${\frac {2}{\ \mathbf {w} \ }}$ ". Mathematics Stack Exchange. 30 May 2015.
^ Aizerman, Mark A.; Braverman, Emmanuel M. & Rozonoer, Lev I. (1964). "Theoretical foundations of the potential function method in pattern recognition learning". Automation and Remote Control. 25: 821–837.
^ Jin, Chi; Wang, Liwei (2012). Dimensionality dependent PAC-Bayes margin bound. Advances in Neural Information Processing Systems. CiteSeerX 10.1.1.420.3487. Archived from the original on 2015-04-02.
^ Shalev-Shwartz, Shai; Singer, Yoram; Srebro, Nathan; Cotter, Andrew (2010-10-16). "Pegasos: primal estimated sub-gradient solver for SVM". Mathematical Programming. 127 (1): 3–30. CiteSeerX 10.1.1.161.9629. doi:10.1007/s10107-010-0420-4. ISSN 0025-5610. S2CID 53306004.
^ Hsieh, Cho-Jui; Chang, Kai-Wei; Lin, Chih-Jen; Keerthi, S. Sathiya; Sundararajan, S. (2008-01-01). A Dual Coordinate Descent Method for Large-scale Linear SVM. Proceedings of the 25th International Conference on Machine Learning. ICML '08. New York, NY, USA: ACM. pp. 408–415. CiteSeerX 10.1.1.149.5594. doi:10.1145/1390156.1390208. ISBN 978-1-60558-205-4. S2CID 7880266.
^ Rosasco, Lorenzo; De Vito, Ernesto; Caponnetto, Andrea; Piana, Michele; Verri, Alessandro (2004-05-01). "Are Loss Functions All the Same?". Neural Computation. 16 (5): 1063–1076. CiteSeerX 10.1.1.109.6786. doi:10.1162/089976604773135104. ISSN 0899-7667. PMID 15070510. S2CID 11845688.
^ Meyer, David; Leisch, Friedrich; Hornik, Kurt (September 2003). "The support vector machine under test". Neurocomputing. 55 (1–2): 169–186. doi:10.1016/S0925-2312(03)00431-4.
^ Hsu, Chih-Wei; Chang, Chih-Chung & Lin, Chih-Jen (2003). A Practical Guide to Support Vector Classification (PDF) (Technical report). Department of Computer Science and Information Engineering, National Taiwan University. Archived (PDF) from the original on 2013-06-25.
^ ^a ^b Duan, Kai-Bo; Keerthi, S. Sathiya (2005). "Which Is the Best Multiclass SVM Method? An Empirical Study" (PDF). Multiple Classifier Systems. LNCS. Vol. 3541. pp. 278–285. CiteSeerX 10.1.1.110.6789. doi:10.1007/11494683_28. ISBN 978-3-540-26306-7.
^ Hsu, Chih-Wei & Lin, Chih-Jen (2002). "A Comparison of Methods for Multiclass Support Vector Machines" (PDF). IEEE Transactions on Neural Networks. 13 (2): 415–25. doi:10.1109/72.991427. PMID 18244442.
^ Platt, John; Cristianini, Nello; Shawe-Taylor, John (2000). "Large margin DAGs for multiclass classification" (PDF). In Solla, Sara A.; Leen, Todd K.; Müller, Klaus-Robert (eds.). Advances in Neural Information Processing Systems. MIT Press. pp. 547–553. Archived (PDF) from the original on 2012-06-16.
^ Dietterich, Thomas G.; Bakiri, Ghulum (1995). "Solving Multiclass Learning Problems via Error-Correcting Output Codes" (PDF). Journal of Artificial Intelligence Research. 2: 263–286. arXiv:cs/9501101. Bibcode:1995cs........1101D. doi:10.1613/jair.105. S2CID 47109072. Archived (PDF) from the original on 2013-05-09.
^ Crammer, Koby & Singer, Yoram (2001). "On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines" (PDF). Journal of Machine Learning Research. 2: 265–292. Archived (PDF) from the original on 2015-08-29.
^ Lee, Yoonkyung; Lin, Yi & Wahba, Grace (2001). "Multicategory Support Vector Machines" (PDF). Computing Science and Statistics. 33. Archived (PDF) from the original on 2013-06-17.
^ Lee, Yoonkyung; Lin, Yi; Wahba, Grace (2004). "Multicategory Support Vector Machines". Journal of the American Statistical Association. 99 (465): 67–81. CiteSeerX 10.1.1.22.1879. doi:10.1198/016214504000000098. S2CID 7066611.
^ Van den Burg, Gerrit J. J. & Groenen, Patrick J. F. (2016). "GenSVM: A Generalized Multiclass Support Vector Machine" (PDF). Journal of Machine Learning Research. 17 (224): 1–42.
^ Joachims, Thorsten; "지원 벡터 기계를 사용한 텍스트 분류를 위한 트랜잭션 추론", 1999년 기계 학습 국제회의 진행(ICML 1999), 페이지 200–209.
^ 드러커, 해리스 버지스, 예수님C.; Kaufman, Linda; Smola, Alexander J.; 및 Vapnik, Vapnik, Vapnik, Vladimir N.(1997), "지원 벡터 회귀 기계", NIPS 1996, 155-161, MIT Press.
^ Suykens, Johan A. K., Vandewalle, Joos P. L.; "벡터 기계 분류기를 지원하는 최소 제곱", Neural Processing Letters, vol. 9, no. 3, 1999, 페이지 293-300.
^ Smola, Alex J.; Schölkopf, Bernhard (2004). "A tutorial on support vector regression" (PDF). Statistics and Computing. 14 (3): 199–222. CiteSeerX 10.1.1.41.1452. doi:10.1023/B:STCO.0000035301.49549.88. S2CID 15475. Archived (PDF) from the original on 2012-01-31.
^ Polson, Nicholas G.; Scott, Steven L. (2011). "Data Augmentation for Support Vector Machines". Bayesian Analysis. 6 (1): 1–23. doi:10.1214/11-BA601.
^ Wenzel, Florian; Galy-Fajou, Theo; Deutsch, Matthäus; Kloft, Marius (2017). "Bayesian Nonlinear Support Vector Machines for Big Data". Machine Learning and Knowledge Discovery in Databases (ECML PKDD). Lecture Notes in Computer Science. 10534: 307–322. arXiv:1707.05532. Bibcode:2017arXiv170705532W. doi:10.1007/978-3-319-71249-9_19. ISBN 978-3-319-71248-2. S2CID 4018290.
^ 플로리안 웬젤; 마테우스 도이치;Théo Galy-Fajou; Marius Kloft; "베이지안 비선형 지지 벡터 기계에 대한 확장 가능한 근사 추론"
^ Ferris, Michael C.; Munson, Todd S. (2002). "Interior-Point Methods for Massive Support Vector Machines" (PDF). SIAM Journal on Optimization. 13 (3): 783–804. CiteSeerX 10.1.1.216.6893. doi:10.1137/S1052623400374379. Archived (PDF) from the original on 2008-12-04.
^ Platt, John C. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines (PDF). NIPS. Archived (PDF) from the original on 2015-07-02.
^ Shalev-Shwartz, Shai; Singer, Yoram; Srebro, Nathan (2007). Pegasos: Primal Estimated sub-GrAdient SOlver for SVM (PDF). ICML. Archived (PDF) from the original on 2013-12-15.
^ Fan, Rong-En; Chang, Kai-Wei; Hsieh, Cho-Jui; Wang, Xiang-Rui; Lin, Chih-Jen (2008). "LIBLINEAR: A library for large linear classification" (PDF). Journal of Machine Learning Research. 9: 1871–1874.
^ Allen Zhu, Zeyuan; Chen, Weizhu; Wang, Gang; Zhu, Chenguang; Chen, Zheng (2009). P-packSVM: Parallel Primal grAdient desCent Kernel SVM (PDF). ICDM. Archived (PDF) from the original on 2014-04-07.
^ Fan, Rong-En; Chang, Kai-Wei; Hsieh, Cho-Jui; Wang, Xiang-Rui; Lin, Chih-Jen (2008). "LIBLINEAR: A library for large linear classification". Journal of Machine Learning Research. 9 (Aug): 1871–1874.
^ Mohamad, Ismail; Usman, Dauda (2013-09-01). "Standardization and Its Effects on K-Means Clustering Algorithm". Research Journal of Applied Sciences, Engineering and Technology. 6 (17): 3299–3303. doi:10.19026/rjaset.6.3638.
^ Fennell, Peter; Zuo, Zhiya; Lerman, Kristina (2019-12-01). "Predicting and explaining behavioral data with structured feature space decomposition". EPJ Data Science. 8. doi:10.1140/epjds/s13688-019-0201-0.

추가 정보

Bennett, Kristin P.; Campbell, Colin (2000). "Support Vector Machines: Hype or Hallelujah?" (PDF). SIGKDD Explorations. 2 (2): 1–13. doi:10.1145/380995.380999. S2CID 207753020.
Cristianini, Nello; Shawe-Taylor, John (2000). An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press. ISBN 0-521-78019-5.
Fradkin, Dmitriy; Muchnik, Ilya (2006). "Support Vector Machines for Classification" (PDF). In Abello, J.; Carmode, G. (eds.). Discrete Methods in Epidemiology. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Vol. 70. pp. 13–20.^{[메모리를 찾을 수 없음]}
Joachims, Thorsten (1998). "Text categorization with Support Vector Machines: Learning with many relevant features". In Nédellec, Claire; Rouveirol, Céline (eds.). "Machine Learning: ECML-98. Berlin, Heidelberg: Springer. p. 137-142. doi:10.1007/BFb0026683.
Ivanciuc, Ovidiu (2007). "Applications of Support Vector Machines in Chemistry" (PDF). Reviews in Computational Chemistry. 23: 291–400. doi:10.1002/9780470116449.ch6. ISBN 9780470116449.
James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert (2013). "Support Vector Machines" (PDF). An Introduction to Statistical Learning : with Applications in R. New York: Springer. pp. 337–372. ISBN 978-1-4614-7137-0.
Schölkopf, Bernhard; Smola, Alexander J. (2002). Learning with Kernels. Cambridge, MA: MIT Press. ISBN 0-262-19475-9.
Steinwart, Ingo; Christmann, Andreas (2008). Support Vector Machines. New York: Springer. ISBN 978-0-387-77241-7.
Theodoridis, Sergios; Koutroumbas, Konstantinos (2009). Pattern Recognition (4th ed.). Academic Press. ISBN 978-1-59749-272-0.

외부 링크

libsvm, LIBSVM은 SVM 학습자에게 인기 있는 라이브러리입니다.
liblinear는 일부 SVM을 포함한 대규모 선형 분류용 라이브러리입니다.
SVM light는 SVM을 사용하여 학습 및 분류하기 위한 소프트웨어 도구 모음입니다.
SVMJS 라이브 데모는 SVM의 JavaScript 구현을 위한 GUI 데모입니다.

[CorinnaCortes-1] Cortes, Corinna; Vapnik, Vladimir (1995). "Support-vector networks" (PDF). Machine Learning. 20 (3): 273–297. CiteSeerX 10.1.1.15.9362. doi:10.1007/BF00994018. S2CID 206787478.

[HavaSiegelmann-2] Ben-Hur, Asa; Horn, David; Siegelmann, Hava; Vapnik, Vladimir N. ""Support vector clustering" (2001);". Journal of Machine Learning Research. 2: 125–137.

[3] "1.4. Support Vector Machines — scikit-learn 0.20.2 documentation". Archived from the original on 2017-11-08. Retrieved 2017-11-08.

[4] Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2008). The Elements of Statistical Learning : Data Mining, Inference, and Prediction (PDF) (Second ed.). New York: Springer. p. 134.

[ReferenceA-5] Boser, Bernhard E.; Guyon, Isabelle M.; Vapnik, Vladimir N. (1992). "A training algorithm for optimal margin classifiers". Proceedings of the fifth annual workshop on Computational learning theory – COLT '92. p. 144. CiteSeerX 10.1.1.21.3818. doi:10.1145/130385.130401. ISBN 978-0897914970. S2CID 207165665.

[6] Press, William H.; Teukolsky, Saul A.; Vetterling, William T.; Flannery, Brian P. (2007). "Section 16.5. Support Vector Machines". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8. Archived from the original on 2011-08-11.

[7] Joachims, Thorsten (1998). "Text categorization with Support Vector Machines: Learning with many relevant features". Machine Learning: ECML-98. Lecture Notes in Computer Science. Springer. 1398: 137–142. doi:10.1007/BFb0026683. ISBN 978-3-540-64417-0.

[8] Pradhan, Sameer S. 등"서포트 벡터 머신을 사용하여 의미 해석을 허용합니다."컴퓨터 언어학 협회 북미 지부 HLT-NAACL 2004의 인간 언어 기술 회의의 진행.2004.

[9] Vapnik, Vladimir N:초대 연사.IPMU 정보처리 및 관리 2014).

[10] 토했어, 로렌"이미지 분할을 위한 반복적인 퍼지-결정-결정에 사용되는 공간-분류 정보 과립"세분화된 컴퓨팅과 의사결정.Springer International Publishing, 2015. 285~318.

[11] A. Maity (2016). "Supervised Classification of RADARSAT-2 Polarimetric Data for Different Land Features". arXiv:1608.00501 [cs.CV].

[12] DeCoste, Dennis (2002). "Training Invariant Support Vector Machines" (PDF). Machine Learning. 46: 161–190. doi:10.1023/A:1012454411458. S2CID 85843.

[13] Maitra, D. S.; Bhattacharya, U.; Parui, S. K. (August 2015). "CNN based common approach to handwritten character recognition of multiple scripts". 2015 13th International Conference on Document Analysis and Recognition (ICDAR): 1021–1025. doi:10.1109/ICDAR.2015.7333916. ISBN 978-1-4799-1805-8. S2CID 25739012.

[14] Gaonkar, Bilwaj; Davatzikos, Christos; "지원 벡터 머신 기반의 다변수 이미지 분석 및 분류를 위한 통계적 유의성 맵의 분석적 추정"

[15] 쿠잉넷, 레미, 로소, 샬롯, 추팽, 마리, 스테판, 도르몽, 디디에, 베날리, 하비브, 샘슨, 이브, 올리비에, "의료용 뇌졸중과 연관된 확산 결과 검출을 위한 SVM의 공간적 정규화"

[16] Statnikov, Alexander; Hardin, Douglas; & Aliferis, Constantin; (2006); "SVM 체중 기반 방법을 사용하여 인과적 관련성 및 비원인적 관련 변수 식별", 서명, 1, 4.

[17] "Why is the SVM margin equal to ${\frac {2}{\ \mathbf {w} \ }}$ ". Mathematics Stack Exchange. 30 May 2015.

[18] Aizerman, Mark A.; Braverman, Emmanuel M. & Rozonoer, Lev I. (1964). "Theoretical foundations of the potential function method in pattern recognition learning". Automation and Remote Control. 25: 821–837.

[19] Jin, Chi; Wang, Liwei (2012). Dimensionality dependent PAC-Bayes margin bound. Advances in Neural Information Processing Systems. CiteSeerX 10.1.1.420.3487. Archived from the original on 2015-04-02.

[20] Shalev-Shwartz, Shai; Singer, Yoram; Srebro, Nathan; Cotter, Andrew (2010-10-16). "Pegasos: primal estimated sub-gradient solver for SVM". Mathematical Programming. 127 (1): 3–30. CiteSeerX 10.1.1.161.9629. doi:10.1007/s10107-010-0420-4. ISSN 0025-5610. S2CID 53306004.

[21] Hsieh, Cho-Jui; Chang, Kai-Wei; Lin, Chih-Jen; Keerthi, S. Sathiya; Sundararajan, S. (2008-01-01). A Dual Coordinate Descent Method for Large-scale Linear SVM. Proceedings of the 25th International Conference on Machine Learning. ICML '08. New York, NY, USA: ACM. pp. 408–415. CiteSeerX 10.1.1.149.5594. doi:10.1145/1390156.1390208. ISBN 978-1-60558-205-4. S2CID 7880266.

[22] Rosasco, Lorenzo; De Vito, Ernesto; Caponnetto, Andrea; Piana, Michele; Verri, Alessandro (2004-05-01). "Are Loss Functions All the Same?". Neural Computation. 16 (5): 1063–1076. CiteSeerX 10.1.1.109.6786. doi:10.1162/089976604773135104. ISSN 0899-7667. PMID 15070510. S2CID 11845688.

[23] Meyer, David; Leisch, Friedrich; Hornik, Kurt (September 2003). "The support vector machine under test". Neurocomputing. 55 (1–2): 169–186. doi:10.1016/S0925-2312(03)00431-4.

[24] Hsu, Chih-Wei; Chang, Chih-Chung & Lin, Chih-Jen (2003). A Practical Guide to Support Vector Classification (PDF) (Technical report). Department of Computer Science and Information Engineering, National Taiwan University. Archived (PDF) from the original on 2013-06-25.

[duan2005-25] Duan, Kai-Bo; Keerthi, S. Sathiya (2005). "Which Is the Best Multiclass SVM Method? An Empirical Study" (PDF). Multiple Classifier Systems. LNCS. Vol. 3541. pp. 278–285. CiteSeerX 10.1.1.110.6789. doi:10.1007/11494683_28. ISBN 978-3-540-26306-7.

[hsu2002-26] Hsu, Chih-Wei & Lin, Chih-Jen (2002). "A Comparison of Methods for Multiclass Support Vector Machines" (PDF). IEEE Transactions on Neural Networks. 13 (2): 415–25. doi:10.1109/72.991427. PMID 18244442.

[27] Platt, John; Cristianini, Nello; Shawe-Taylor, John (2000). "Large margin DAGs for multiclass classification" (PDF). In Solla, Sara A.; Leen, Todd K.; Müller, Klaus-Robert (eds.). Advances in Neural Information Processing Systems. MIT Press. pp. 547–553. Archived (PDF) from the original on 2012-06-16.

[28] Dietterich, Thomas G.; Bakiri, Ghulum (1995). "Solving Multiclass Learning Problems via Error-Correcting Output Codes" (PDF). Journal of Artificial Intelligence Research. 2: 263–286. arXiv:cs/9501101. Bibcode:1995cs........1101D. doi:10.1613/jair.105. S2CID 47109072. Archived (PDF) from the original on 2013-05-09.

[29] Crammer, Koby & Singer, Yoram (2001). "On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines" (PDF). Journal of Machine Learning Research. 2: 265–292. Archived (PDF) from the original on 2015-08-29.

[30] Lee, Yoonkyung; Lin, Yi & Wahba, Grace (2001). "Multicategory Support Vector Machines" (PDF). Computing Science and Statistics. 33. Archived (PDF) from the original on 2013-06-17.

[31] Lee, Yoonkyung; Lin, Yi; Wahba, Grace (2004). "Multicategory Support Vector Machines". Journal of the American Statistical Association. 99 (465): 67–81. CiteSeerX 10.1.1.22.1879. doi:10.1198/016214504000000098. S2CID 7066611.

[32] Van den Burg, Gerrit J. J. & Groenen, Patrick J. F. (2016). "GenSVM: A Generalized Multiclass Support Vector Machine" (PDF). Journal of Machine Learning Research. 17 (224): 1–42.

[33] Joachims, Thorsten; "지원 벡터 기계를 사용한 텍스트 분류를 위한 트랜잭션 추론", 1999년 기계 학습 국제회의 진행(ICML 1999), 페이지 200–209.

[34] 드러커, 해리스 버지스, 예수님C.; Kaufman, Linda; Smola, Alexander J.; 및 Vapnik, Vapnik, Vapnik, Vladimir N.(1997), "지원 벡터 회귀 기계", NIPS 1996, 155-161, MIT Press.

[35] Suykens, Johan A. K., Vandewalle, Joos P. L.; "벡터 기계 분류기를 지원하는 최소 제곱", Neural Processing Letters, vol. 9, no. 3, 1999, 페이지 293-300.

[36] Smola, Alex J.; Schölkopf, Bernhard (2004). "A tutorial on support vector regression" (PDF). Statistics and Computing. 14 (3): 199–222. CiteSeerX 10.1.1.41.1452. doi:10.1023/B:STCO.0000035301.49549.88. S2CID 15475. Archived (PDF) from the original on 2012-01-31.

[37] Polson, Nicholas G.; Scott, Steven L. (2011). "Data Augmentation for Support Vector Machines". Bayesian Analysis. 6 (1): 1–23. doi:10.1214/11-BA601.

[38] Wenzel, Florian; Galy-Fajou, Theo; Deutsch, Matthäus; Kloft, Marius (2017). "Bayesian Nonlinear Support Vector Machines for Big Data". Machine Learning and Knowledge Discovery in Databases (ECML PKDD). Lecture Notes in Computer Science. 10534: 307–322. arXiv:1707.05532. Bibcode:2017arXiv170705532W. doi:10.1007/978-3-319-71249-9_19. ISBN 978-3-319-71248-2. S2CID 4018290.

[39] 플로리안 웬젤; 마테우스 도이치;Théo Galy-Fajou; Marius Kloft; "베이지안 비선형 지지 벡터 기계에 대한 확장 가능한 근사 추론"

[40] Ferris, Michael C.; Munson, Todd S. (2002). "Interior-Point Methods for Massive Support Vector Machines" (PDF). SIAM Journal on Optimization. 13 (3): 783–804. CiteSeerX 10.1.1.216.6893. doi:10.1137/S1052623400374379. Archived (PDF) from the original on 2008-12-04.

[41] Platt, John C. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines (PDF). NIPS. Archived (PDF) from the original on 2015-07-02.

[42] Shalev-Shwartz, Shai; Singer, Yoram; Srebro, Nathan (2007). Pegasos: Primal Estimated sub-GrAdient SOlver for SVM (PDF). ICML. Archived (PDF) from the original on 2013-12-15.

[43] Fan, Rong-En; Chang, Kai-Wei; Hsieh, Cho-Jui; Wang, Xiang-Rui; Lin, Chih-Jen (2008). "LIBLINEAR: A library for large linear classification" (PDF). Journal of Machine Learning Research. 9: 1871–1874.

[44] Allen Zhu, Zeyuan; Chen, Weizhu; Wang, Gang; Zhu, Chenguang; Chen, Zheng (2009). P-packSVM: Parallel Primal grAdient desCent Kernel SVM (PDF). ICDM. Archived (PDF) from the original on 2014-04-07.

[45] Fan, Rong-En; Chang, Kai-Wei; Hsieh, Cho-Jui; Wang, Xiang-Rui; Lin, Chih-Jen (2008). "LIBLINEAR: A library for large linear classification". Journal of Machine Learning Research. 9 (Aug): 1871–1874.

[46] Mohamad, Ismail; Usman, Dauda (2013-09-01). "Standardization and Its Effects on K-Means Clustering Algorithm". Research Journal of Applied Sciences, Engineering and Technology. 6 (17): 3299–3303. doi:10.19026/rjaset.6.3638.

[47] Fennell, Peter; Zuo, Zhiya; Lerman, Kristina (2019-12-01). "Predicting and explaining behavioral data with structured feature space decomposition". EPJ Data Science. 8. doi:10.1140/epjds/s13688-019-0201-0.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

Search

서포트 벡터 머신

네임스페이스

더

목차

동기

적용들

역사

선형 SVM

하드 마진

소프트 마진

비선형 커널

SVM 분류자 계산

원초

듀얼

커널 트릭

현대적 방법

준경사 강하

좌표 하강

경험적 리스크 최소화

리스크 최소화

정규화와 안정성

SVM 및 힌지 손실

대상 기능

특성.

파라미터 선택

문제들

내선번호

서포트 벡터 클러스터링(SVC)

멀티클래스 SVM

트랜스덕티브 서포트 벡터 머신

구조화된 SVM

회귀

베이지안 SVM

실행

「」를 참조해 주세요.

레퍼런스

추가 정보

외부 링크

Search

서포트 벡터 머신

동기

적용들

역사

선형 SVM

하드 마진

소프트 마진

비선형 커널

SVM 분류자 계산

원초

듀얼

커널 트릭

현대적 방법

준경사 강하

좌표 하강

경험적 리스크 최소화

리스크 최소화

정규화와 안정성

SVM 및 힌지 손실

대상 기능

특성.

파라미터 선택

문제들

내선번호

서포트 벡터 클러스터링(SVC)

멀티클래스 SVM

트랜스덕티브 서포트 벡터 머신

구조화된 SVM

회귀

베이지안 SVM

실행

「 」를 참조해 주세요.

레퍼런스

추가 정보

외부 링크

「」를 참조해 주세요.