해밀턴-자코비-벨만 방정식

최적 제어 이론에서 해밀턴-자코비-벨만(HJB) 방정식은 손실 함수에 관한 제어의 최적화에 필요한 충분한 조건을 제공한다.^[1] 일반적으로 값함수의 비선형 부분미분방정식이며, 이는 그 해법이 값함수 그 자체임을 의미한다. 이 용액이 알려지면 HJB 방정식에 관여하는 해밀턴인의 맥시마이저(또는 미니마이저)를 취함으로써 최적의 제어력을 얻는 데 사용할 수 있다.^[2]^[3]

이 방정식은 1950년대에 리차드 벨먼과 동료들에 의해 개척된 동적 프로그래밍 이론의 결과물이다.^[4]^[5]^[6] 고전물리학의 해밀턴-자코비 방정식과의 연결은 루돌프 칼만(Rudolf Khalman)이 먼저 그렸다.^[7] 이산 시간 문제에서 해당 차이 방정식은 보통 Bellman 방정식이라고 한다.

브라키스토크론 문제와 같은 고전적 변수 문제는 해밀턴-자코비-벨만 방정식을 사용하여 해결할 수 있지만, 이 방법은 더 넓은 범위의 문제에 적용될 수 있다.^[8] 또한 확률적 시스템으로 일반화할 수 있으며, 이 경우 HJB 방정식은 2차 타원형 부분 미분 방정식이다.^[9] 그러나 주요 단점은 HJB 방정식이 대부분의 상황에서 보장되지 않는 충분히 부드러운 값 함수에 대해서만 고전적 해결책을 인정한다는 점이다. 대신에 기존의 파생상품이 (설정된) 하위파생상품으로 대체되는 점성용액의 개념이 필요하다.^[10]

최적 제어 문제

기간 $[0,T]$ [ 0, T $[0,T]$ $[0,T]$ 의 결정론적 최적 제어에서 다음과 같은 문제를 고려하십시오 $[0,T]$

V_{T}(x(0),0)=\min _{u}\left\{0}^{0}C[x(t),u(t)]\,dt+D[x(T)]\right\}}}}

where $C[\cdot ]$ is the scalar cost rate function and $D[\cdot ]$ is a function that gives the bequest value at the final state, $x(t)$ is the system state vector, $x(0)$ is assumed given, and $u(t)$ $u(t)$ $0\leq t\leq T$ $0\leq t\leq T$ $0\leq t\leq T$ $0\leq t\leq T$ $0\leq t\leq T$ $0\leq t\leq T$ 은 $0\leq t\leq T$ (는) 우리가 찾으려는 제어 벡터 입니다.

또한 시스템은 다음 조건에 따라야 한다.

{\dot{x}(t)=F[x(t)]u(t)\,

여기서 $F[\cdot ]$ [ $F[\cdot ]$ $F[\cdot ]$ $F[\cdot ]$ 은 시간에 따른 상태 벡터의 물리적 진화를 결정하는 벡터를 제공한다 $F[\cdot ]$ .

부분 미분 방정식

이 간단한 시스템(Letting $V=V_{T}$ = $V=V_{T}$ $V=V_{T}$ ${\$ 의 경우 해밀턴-자코비-벨만 부분 미분 방정식은 다음과 같다. $V=V_{T}$

{\frac(x,t)}{\partial t}+\min _{{u}\partial v(x,t)}{\partial x}\cdot F(x,u)+C(x,u)\right\}=0

말기 상태에 따라.

V(x,T)=D(x),\,

위의 부분 미분방정식에서 알 $V(x,t)$ 수 없는 스칼라 $V(x,t)$ , $V(x,t)$ ) ${\displaystyle$ $V$ $(x,t)}$ 은 벨만 값 함수로, 시간 $t$ ${\displaystyt t}$ 에서 $x$ 시작하여 $t$ 그때부터 시간 $T$ ${\displaystytle T}$ 까지 시스템을 최적으로 제어하는 데 드는 비용을 나타낸다 $T$

방정식 도출

직관적으로 HJB 방정식은 다음과 같이 도출할 수 있다. $V(x(t),t)$ ( $V(x(t),t)$ ( $V(x(t),t)$ ) $V(x(t),t)$ , t $V(x(t),t)$ ) $V(x(t),t)$ 이 최적의 비용-투-고 함수('값 함수'라고도 함)라면 $V(x(t),t)$ , t에서 t+dt로 가는 리처드 벨만의 최적성 원리에 의해 우리는 다음과 같은 것을 얻게 된다.

V(x)(t),t)=\min _{u}\왼쪽\{V(x(t+dt),t+dt)+\int_{t}^{t+dt(x)C(s)\,ds\right\}.

참고로 오른쪽의 첫 번째 임기의 테일러 확장은

V(x(t+dt),t+dt)=V(x(t),t)+{\frac {\partial V(x,t)}{\partial t}\\partial v(x,t)}{\partial x}}\cdot {\x(t)\,d+{\mathcal {o}(dt),

여기서 ${\mathcal {o}}(dt)$ ( ${\mathcal {o}}(dt)$ $){\displaystyle {\mathcal{o}}(dt)}$ 는 테일러 팽창에서 little-o 표기법보다 높은 순서의 항을 나타낸다 ${\mathcal {o}}(dt)$ . 그런 다음 $V(x(t),t)$ 에서 $V(x(t),t)$ $V(x(t),t)$ ( $V(x(t),t)$ ( $V(x(t),t)$ t ) $V(x(t),t)$ , $V(x(t),t)$ ) ${\displaystyle$ V( $x(t),t)}$ 을 빼서 dt로 나누고 dt가 0에 가까워질 때 한도를 취하면 위에서 정의한 HJB 방정식을 얻는다.

방정식 풀기

HJB 방정식은 $t=T$ t $t=T$ = $t=T$ $t=T$ 에서 $t=0$ 하여 $t=T$ t $t=0$ = $t=0$ $t=0$ 에서 끝나는 시간 내에 역방향으로 해결된다 $t=0$ ^{[citation needed]}

전체 상태 공간에 걸쳐 해결되고 $V(x)$ ( $V(x)$ ) $V(x)$ 이 $V(x)$ (가) 지속적으로 다를 수 있을 때, HJB 방정식은 터미널 상태가 구속되지 않을 때 최적화를 위해 필요하고 충분한 조건이다.^[11] $V$ $V$ 을 $V$ (를) 해결할 수 있다면 최소 비용을 달성하는 $u$ 컨트롤 $u$ $u$ 을(를) 찾을 수 있다.

In general case, the HJB equation does not have a classical (smooth) solution. Several notions of generalized solutions have been developed to cover such situations, including viscosity solution (Pierre-Louis Lions and Michael Crandall),^[12] minimax solution (Andrei Izmailovich Subbotin [ru]), and others.

Approximate dynamic programming has been introduced by D. P. Bertsekas and J. N. Tsitsiklis with the use of artificial neural networks (multilayer perceptrons) for approximating the Bellman function in general.^[13] This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters. In particular, for continuous-time systems, an approximate dynamic programming approach that combines both policy iterations with neural networks was introduced.^[14] In discrete-time, an approach to solve the HJB equation combining value iterations and neural networks was introduced.^[15]

Alternatively, it has been shown that Sum-of-squares optimization can yield an approximate polynomial solution to the Hamilton-Jacobi-Bellman equation arbitrarily well with respect to the $L^{1}$ norm. ^[16]

Extension to stochastic problems

The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems. Consider similar as above

\min _{u}\mathbb {E} \left\{\int _{0}^{T}C(t,X_{t},u_{t})\,dt+D(X_{T})\right\}

now with $(X_{t})_{t\in [0,T]}\,\!$ the stochastic process to optimize and $(u_{t})_{t\in [0,T]}\,\!$ the steering. By first using Bellman and then expanding $V(X_{t},t)$ with Itô's rule, one finds the stochastic HJB equation

\min _{u}\왼쪽\{\mathcal {A}V(x,t)+C(t,x,u)\right\}=0,

여기서 ${\mathcal {A}}$ {\ $displaystyle {\mathcal {A}$ 은(는) 확률적 분화 연산자를 나타내며 ${\mathcal {A}}$ , 단자 상태에 따라 달라진다.

V(x,T)=D(x)\,\!}

무작위성이 사라졌다는 점에 유의하십시오. 이 경우 후자의 솔루션 $V\,\!$ $V\,\!$ $displaystyle$ V $\,\!}$ 이(가) 반드시 원시적인 문제를 해결하지는 않으며, 후보일 뿐이며, 추가적인 검증이 필요하다. 이 기법은 금융 수학에서 시장에서 최적의 투자 전략을 결정하기 위해 널리 사용된다(예: 머튼의 포트폴리오 문제 참조).

LQG 컨트롤에 적용

예를 들어, 우리는 선형 확률적 역학과 2차 비용을 가진 시스템을 볼 수 있다. 다음에 의해 시스템 다이내믹스가 제공되는 경우

dx_{t}=(ax_{t}+bu_{t}dt+\dwma dw_{t}}}}

$C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ 비용은 C $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ t $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ , $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ t $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ ) = $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ ( $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ ) $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ t 2 $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ / $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ + $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ ( $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ ) $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ t $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ / $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$ $(\displaystyle C(x_{t},u_{t}}=r(t)u_{t}^{2}/2+q(t)x_{t^{$ t}^{ $2}/2$ }에 의해 누적된다 $C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2$

-{\frac {\partial V(x,t)}{\partial t}}={\frac {1}{2}}q(t)x^{2}+{\frac {\partial V(x,t)}{\partial x}}ax-{\frac {b^{2}}{2r(t)}}\left({\frac {\partial V(x,t)}{\partial x}}\right)^{2}+{\frac {\sigma ^{2}}{2}}{\frac {\partial ^{2}V(x,t)}{\partial x^{2}}}.

에 의해 주어진 최적의 작용으로

u_{t}=-{\frac {b}{r(t)}}{\frac {\partial V(x,t)}{\partial x}}}

값 함수의 2차 형식을 가정하면, 우리는 선형 2차-가우스 제어에 대해 통상적으로와 같이 값 함수의 헤시안에게 일반적인 Riccati 방정식을 얻는다.

참고 항목

벨만 방정식, 해밀턴-자코비-벨만 방정식의 이산 시간 상대식.
폰트랴긴의 최대 원리는 해밀턴 계를 최대화함으로써 필요하지만 최적화를 위해 충분하지는 않지만, 이는 고려되는 단일 궤도에 대해서만 만족하면 된다는 HJB에 비해 이점이 있다.

참조

^ Kirk, Donald E. (1970). Optimal Control Theory: An Introduction. Englewood Cliffs, NJ: Prentice-Hall. pp. 86–90. ISBN 0-13-638098-0.
^ Yong, Jiongmin; Zhou, Xun Yu (1999). "Dynamic Programming and HJB Equations". Stochastic Controls : Hamiltonian Systems and HJB Equations. Springer. pp. 157–215 [p. 163]. ISBN 0-387-98723-1.
^ Naidu, Desineni S. (2003). "The Hamilton–Jacobi–Bellman Equation". Optimal Control Systems. Boca Raton: CRC Press. pp. 277–283 [p. 280]. ISBN 0-8493-0892-5.
^ Bellman, R. E. (1954). "Dynamic Programming and a new formalism in the calculus of variations". Proc. Natl. Acad. Sci. 40 (4): 231–235. Bibcode:1954PNAS...40..231B. doi:10.1073/pnas.40.4.231. PMC 527981. PMID 16589462.
^ Bellman, R. E. (1957). Dynamic Programming. Princeton, NJ.
^ Bellman, R.; Dreyfus, S. (1959). "An Application of Dynamic Programming to the Determination of Optimal Satellite Trajectories". J. Br. Interplanet. Soc. 17: 78–83.
^ Kálmán, Rudolf E. (1963). "The Theory of Optimal Control and the Calculus of Variations". In Bellman, Richard (ed.). Mathematical Optimization Techniques. Berkeley: University of California Press. pp. 309–331. OCLC 1033974.
^ Kemajou-Brown, Isabelle (2016). "Brief History of Optimal Control Theory and Some Recent Developments". In Budzban, Gregory; Hughes, Harry Randolph; Schurz, Henri (eds.). Probability on Algebraic and Geometric Structures. Contemporary Mathematics. 668. pp. 119–130. doi:10.1090/conm/668/13400. ISBN 9781470419455.
^ Chang, Fwu-Ranq (2004). Stochastic Optimization in Continuous Time. Cambridge, UK: Cambridge University Press. pp. 113–168. ISBN 0-521-83406-6.
^ Bardi, Martino; Capuzzo-Dolcetta, Italo (1997). Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Boston: Birkhäuser. ISBN 0-8176-3640-4.
^ Bertsekas, Dimitri P. (2005). Dynamic Programming and Optimal Control. Athena Scientific.
^ Bardi, Martino; Capuzzo-Dolcetta, Italo (1997). Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Boston: Birkhäuser. ISBN 0-8176-3640-4.
^ Bertsekas, Dimitri P.; Tsitsiklis, John N. (1996). Neuro-dynamic Programming. Athena Scientific. ISBN 978-1-886529-10-6.
^ Abu-Khalaf, Murad; Lewis, Frank L. (2005). "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach". Automatica. 41 (5): 779–791. doi:10.1016/j.automatica.2004.11.034.
^ Al-Tamimi, Asma; Lewis, Frank L.; Abu-Khalaf, Murad (2008). "Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof". IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 38 (4): 943–949. doi:10.1109/TSMCB.2008.926614. PMID 18632382. S2CID 14202785.
^ Jones, Morgan; Peet, Matthew (2020). "Polynomial Approximation of Value Functions and Nonlinear Controller Design with Performance Bounds". arXiv:2010.06828. Cite 저널은 필요로 한다. journal= (도움말)

추가 읽기

Bertsekas, Dimitri P. (2005). Dynamic Programming and Optimal Control. Athena Scientific.
Pham, Huyên (2009). "The Classical PDE Approach to Dynamic Programming". Continuous-time Stochastic Control and Optimization with Financial Applications. Springer. pp. 37–60. ISBN 978-3-540-89499-5.
Stengel, Robert F. (1994). "Conditions for Optimality". Optimal Control and Estimation. New York: Dover. pp. 201–222. ISBN 0-486-68200-5.

[1] Kirk, Donald E. (1970). Optimal Control Theory: An Introduction. Englewood Cliffs, NJ: Prentice-Hall. pp. 86–90. ISBN 0-13-638098-0.

[2] Yong, Jiongmin; Zhou, Xun Yu (1999). "Dynamic Programming and HJB Equations". Stochastic Controls : Hamiltonian Systems and HJB Equations. Springer. pp. 157–215 [p. 163]. ISBN 0-387-98723-1.

[3] Naidu, Desineni S. (2003). "The Hamilton–Jacobi–Bellman Equation". Optimal Control Systems. Boca Raton: CRC Press. pp. 277–283 [p. 280]. ISBN 0-8493-0892-5.

[4] Bellman, R. E. (1954). "Dynamic Programming and a new formalism in the calculus of variations". Proc. Natl. Acad. Sci. 40 (4): 231–235. Bibcode:1954PNAS...40..231B. doi:10.1073/pnas.40.4.231. PMC 527981. PMID 16589462.

[5] Bellman, R. E. (1957). Dynamic Programming. Princeton, NJ.

[6] Bellman, R.; Dreyfus, S. (1959). "An Application of Dynamic Programming to the Determination of Optimal Satellite Trajectories". J. Br. Interplanet. Soc. 17: 78–83.

[7] Kálmán, Rudolf E. (1963). "The Theory of Optimal Control and the Calculus of Variations". In Bellman, Richard (ed.). Mathematical Optimization Techniques. Berkeley: University of California Press. pp. 309–331. OCLC 1033974.

[8] Kemajou-Brown, Isabelle (2016). "Brief History of Optimal Control Theory and Some Recent Developments". In Budzban, Gregory; Hughes, Harry Randolph; Schurz, Henri (eds.). Probability on Algebraic and Geometric Structures. Contemporary Mathematics. 668. pp. 119–130. doi:10.1090/conm/668/13400. ISBN 9781470419455.

[9] Chang, Fwu-Ranq (2004). Stochastic Optimization in Continuous Time. Cambridge, UK: Cambridge University Press. pp. 113–168. ISBN 0-521-83406-6.

[10] Bardi, Martino; Capuzzo-Dolcetta, Italo (1997). Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Boston: Birkhäuser. ISBN 0-8176-3640-4.

[11] Bertsekas, Dimitri P. (2005). Dynamic Programming and Optimal Control. Athena Scientific.

[12] Bardi, Martino; Capuzzo-Dolcetta, Italo (1997). Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Boston: Birkhäuser. ISBN 0-8176-3640-4.

[NeuroDynProg-13] Bertsekas, Dimitri P.; Tsitsiklis, John N. (1996). Neuro-dynamic Programming. Athena Scientific. ISBN 978-1-886529-10-6.

[CTHJB-14] Abu-Khalaf, Murad; Lewis, Frank L. (2005). "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach". Automatica. 41 (5): 779–791. doi:10.1016/j.automatica.2004.11.034.

[DTHJB-15] Al-Tamimi, Asma; Lewis, Frank L.; Abu-Khalaf, Murad (2008). "Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof". IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 38 (4): 943–949. doi:10.1109/TSMCB.2008.926614. PMID 18632382. S2CID 14202785.

[16] Jones, Morgan; Peet, Matthew (2020). "Polynomial Approximation of Value Functions and Nonlinear Controller Design with Performance Bounds". arXiv:2010.06828. Cite 저널은 필요로 한다. journal= (도움말)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

Search