C&SA | Contents

Volume 56 >>> № 2 MARCH — APRIL 2020

UDC 517.2+519.977.58+519.8

V.I. Norkin¹

¹ V.M. Glushkov Institute of Cybernetics, National Academy of Sciences of Ukraine, Kyiv, Ukraine, and National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute,” Kyiv, Ukraine

vladimir.norkin@gmail.com

GENERALIZED GRADIENTS IN DYNAMIC OPTIMIZATION, OPTIMAL CONTROL,
AND MACHINE LEARNING PROBLEMS

Abstract. Problems of nonsmooth nonconvex dynamic optimization, optimal control (in discrete time), including feedback control, and machine learning are considered from a common point of view. An analogy between controlling discrete dynamical systems and multilayer neural networks learning problems with nonsmooth functionals and connections is traced. Methods for computing subgradients for such systems based on the Hamilton–Pontryagin functions are developed. Gradient (stochastic) algorithms for optimal control and learning are extended to nonconvex nonsmooth systems.

Keywords: dynamic optimization, optimal control, machine learning, multilayer neural networks, deep learning, nonsmooth noncovex optimization, stochastic optimization, stochastic generalized gradient, stochastic smoothing.

FULL TEXT

REFERENCES

Bryson A., Ho Yu-Shi. Applied theory of optimal control. Optimization, evaluation and management [Russian translation]. Moscow: Mir, 1972. 544 p.

Ermoliev Yu.M. Methods of stochastic programming [in Russian]. Moscow: Nauka, 1976. 240 p.

Ermoliev Yu.M., Gulenko V.P., Tsarenko T.I. Finite-difference method in optimal control problems [in Russian]. Kiev: Nauk. dumka, 1978. 164 p.

Vasiliev F.P. Methods for solving extreme problems [in Russian]. Moscow: Nauka, 1981. 400 p.

Evtushenko Yu.G. Optimization and fast automatic differentiation [in Russian]. Moscow: A.A. Dorodnitsyn RAS, 2013. 144 p. http://dx.doi.org/10.1016/j.neunet.2014.09.003.

Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks. 2015. Vol. 61. P. 85–117. http://dx.doi.org/10.1016/j.neunet.2014.09.003.

Nurminsky E.A. Numerical methods for solving stochastic minimax problems [in Russian]. Kiev: Nauk. dumka, 1979. 176 p.

Pontryagin L.S., Boltyansky V.G., Gamkrelidze R.V., Mishchenko E.F. The mathematical theory of optimal processes [in Russian]. Moscow: Fizmatgiz, 1961. 391 p.

Boltyanskiy V.G. Optimal control of discrete systems [in Russian]. Moscow: Nauka, 1973. 448 p.

Rumelhart D.E., Hinton G.E., and Williams R.J. Learning representations by Back-Propagating errors. Nature. 1986. Vol. 323. P. 533–536. https://doi.org/10.1038/323533a0.

LeCun Y., Bengio Y., and Hinton G. Deep learning. Nature. 2015. Vol. 521. P. 436–444. https://doi.org. 10.1038/nature14539.

Goodfellow Y., Benjio I., Courville A. Deep Learning [Russian translation]. Moscow: DMK Press, 2018. 652 p.

Norkin V.I. Generalized differentiable functions. Kibernetika. 1980. N 1. P. 9–11.

Demyanov V.F. Extremum conditions and calculus of variations [in Russian]. Moscow: Vyssh. shk., 2005. 335 p.

Daduna G., Knopov P.S., Tour L.P. Optimal strategies for a stock system with general cost functions. Kibernetika i sistemnyj analiz. 1999. N 4. P.106–123.

Ermoliev Yu.M. and Norkin V.I. Stochastic generalized gradient method with application to insurance risk management. Interim Report IR-97-021. Int. Inst. for Appl. Syst. Anal. Laxenburg, Austria, 1997. 19 p. URI: http://pure.iiasa.ac.at/5270.

Pshenichny B.N. Necessary conditions for an extremum [in Russian]. 2nd ed., Revised. and add. Moscow: Nauka. Gl. red. fiz.-mat. lit., 1982. 144 p. (1st Ed. 1969.)

Nonsmooth problems of the theory of optimization and control [in Russian]. Demianov V.F. (Ed.). Leningrad: Publishing House Leningrad. University, 1982. 322 p.

Clark F. Optimization and nonsmooth analysis [Russian translation]. Moscow: Nauka, 1988. 280 p.

Mordukhovich B. Sh. Approximation methods in optimization and control problems [in Russian]. Moscow: Nauka. Gl. red. fiz.-mat. lit., 1988. 360 p.

Rockafellar R.T., Wets R.J-B. Variational analysis. Berlin; Heidelberg: Springer, 1998. 734 p.

Ahn H-S., Moore K.L., Chen YQ. Iterative learning control. Robustness and monotonic convergence for interval systems. London: Springer-Verlag, 2007. 230 p.

Aubin J-P. Neural Networks and Qualitative Physics. Cambridge: Cambridge University Press, 1996. 283 p.

Nikolenko S., Kadurin A., Arkhangelskaya E. Deep learning [in Russian]. SPb .: Piter, 2018. 480 p.

Griewank A., Walther A. Evaluating derivatives. Principles and techniques of algorithmic differentiation. Sec. Ed. Philadelphia: Soc. for Industr. and Appl. Mathematics, 2008. 438 p.

Hardt M., Recht B., Singer Y. Train faster, generalize better: Stability of stochastic gradient descent. Proceedings of Machine Learning Research. 2016. Vol. 48. P. 1225–1234.

Zhang C., Liao Q., Rakhlin A., Miranda B., Golowich N., Poggio T. Theory of deep learning IIb: Optimization Properties of SGD. CBMM Memo No. 072. McGovern Institute for Brain Research. Cambridge: Massachusetts Institute of Technology, 2018. 9 p. arXiv:1801.02254v1 [cs.LG] 7 Jan. 2018.

Soudry D., Hoffer E., Nacson M.S., Gunasekar S., Srebro N. The implicit bias of gradient Descent on separable data. arXiv:1710.10345v3 [stat.ML] 21 Mar. 2018.

Bottou L., Curtisy F.E., Nocedalz J. Optimization methods for large-scale machine learning. SIAM Review. 2018. Vol. 60, N 2. P. 223–311. https://doi.org.10.1137/16m1080173.

Robbins H. and Monro S. A stochastic approximation method. The Annals of Mathematical Statistics. 1951. Vol. 22, N 3. P. 400–407.

Nemirovski A., Juditsky A., Lan G., and Shapiro A. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization. 2009. Vol. 19, N 4. P. 1574–1609. https://doi.org/10.1137/070704277.

Shapiro A., Dentcheva D., and RuszczyA. Lectures on stochastic programming: modeling and theory. Philadelphia: SIAM, 2009. 442 p.

Ermoliev Yu.M., Norkin V.I. A stochastic generalized gradient method for solving non-convex nonsmooth problems of stochastic optimization. Kibernetika i sistemnyj analiz. 1998. N 2. P. 50–71.

Davis D., Drusvyatskiy D., Kakade S., and Lee J.D. Stochastic subgradient method converges on tame functions. Found. Comput. Math. 2019. P. 1–36. https://doi.org/10.1007/s10208-018-09409-5.

Ruszczy A. Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization. arXiv Preprint, 2019. arXiv:1912.07580v1 [math.OC] 16 Dec 2019. 10 p. https://arxiv.org/abs/1912.07580.

Mifflin R. An algorithm for constrained optimization with semi-smooth functions. Math. Oper. Res. 1977. Vol. 2, N 2. P. 191–207. URL: www.jstor.org/stable/3689654.

Mifflin R. Semismooth and semiconvex functions in constrained optimization. SIAM J. Contr. Opt. 1977. Vol. 15, N 6. P. 959–972. https://doi.org/10.1137/0315061.

Gupal A.M. Stochastic methods for solving nonsmooth extremal problems [in Russian]. Kiev: Nauk. dumka, 1979. 150 p.

Mifflin R. A modification and an extension of Lemarechal’s algorithm for nonsmooth minimization. In: Nondifferential and Variational Techniques in Optimization (Sorensen D.C., Wets J.B., eds.). Math. Prog. Study. 1982. Vol. 17. P. 77–90.

Shor N.Z. Minimization methods for nondifferentiable functions. Berlin; Heidelberg: Springer, 1985. 164 p.

Dorofeev P.A. On some properties of the generalized gradient method. Zhurn. vychisl. matematiki i mat. fiziki. 1985. Vol. 25, N 2. P. 181–189.

Mikhalevich V.S., Gupal A.M., Norkin V.I. Nonconvex optimization methods. Moscow: Nauka, 1987. 280 p.

Zavriev S.K., Carriers A.G. A stochastic method of generalized gradient descent for solving minimax problems with related variables. Zhurn. vychisl. matematiki i mat. fiziki. 1990. Vol. 30, N 4. P. 491–500.

Uryasyev S.P. Adaptive algorithms for stochastic optimization and game theory [in Russian]. Moscow: Nauka, 1990. 184 p.

Hiriart-Urruty J.-B., Lemarechal C. Convex analysis and minimization algorithms. Berlin; Heidelberg: Springer-Verlag, 1993. Vol. II. 346 p.

Fukushima M., Qi L. (Eds.) Reformulation: Nonsmooth, piecewise smooth, semismooth and smoothing methods. Dordrecht; Boston; London: Kluwer Acad. Publ., 1999. 442 p.

Stetsyuk P.I. Theory and software implementations of Shor’s r-algorithms. Kibernetika i sistemnyj analiz. 2017. Vol. 53, N 5. P. 43–57.

Norkin V.I. Stochastic methods for solving problems of convex stochastic programming and their application: author. diss. Dr. Phys. Sciences. V.M. Glushkov Institute of Cybernetics. NAS of Ukraine. Kiev, 1998. 32 p. URL: http://library.nuft.edu.ua/ebook/file/01.05.01%20Norkin%20VI.pdf.

Ermoliev Yu.M., Norkin V.I. Methods for solving non-convex nonsmooth problems of stochastic optimization. Kibernetika i sistemnyj analiz. 2003. N 5. P. 89–106.

Norkin V.I. Nonlocal algorithms for minimizing non-differentiable functions. Kibernetika. 1978. N 5. P. 44–48.

Bolte J., Daniilidis A., Lewis A. Tame functions are semismooth. Math. Program., Ser. B. 2009. Vol. 117. P. 5–19. https://doi.org.10.1007/s10107-007-0166-9.

Norkin V.I. Random generalized differentiable functions in a non-convex nonsmooth stochastic optimization problem. Kibernetika. 1986. N 6. P. 98–102.

Miric S. A note on the generalized differentiability of mappings. Nonl. Anal. Theory, Methods, Applications. 1980. Vol. 4, N 3. P. 567–575. https://doi.org/10.1016/0362-546x(80)90092-9.

Lyashko I.I., Emelianov V.F., Boyarchuk O.K. Mathematical analysis. Kyiv: Vyshcha shkola, 1992. Part І. 495 p.

Qi L., Sun J. A nonsmooth version of Newton’s method. Mathematical Programming. 1993. Vol. 58. P. 353–368. https://doi.org/10.1007/bf01581275.

Qi L. Convergence analysis of some algorithms for solving nonsmooth equations. Mathematics of Operations Research. 1993. Vol. 18. P. 227–244. https://doi.org/10.1287/moor.18.1.227.

Polyak B.T. Introduction to optimization [in Russian]. Moscow: Nauka. Gl. red. fiz.-mat. lit., 1983. 384 p.

Sternberg S. Lectures on differential geometry [Russian translation]. Moscow: Mir, 1970. 412 p.

Burke J., Lewis A. & Overton M. A robust gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. on Opt. 2005. Vol. 15, N 3. P. 751–779. https://doi.org/10.1137/030601296.

UDC 517.2+519.977.58+519.8

GENERALIZED GRADIENTS IN DYNAMIC OPTIMIZATION, OPTIMAL CONTROL, AND MACHINE LEARNING PROBLEMS

REFERENCES

GENERALIZED GRADIENTS IN DYNAMIC OPTIMIZATION, OPTIMAL CONTROL,
AND MACHINE LEARNING PROBLEMS