UDC 004.81
1 National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine
mzz@kpi.ua
|
2 Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine
kasyanov@i.ua
|
3 National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine
lusi.levenchuk@gmail.com
|
|
FORMALIZATION OF METHODS FOR THE DEVELOPMENT OF AUTONOMOUS
ARTIFICIAL INTELLIGENCE SYSTEMS
Abstract. This paper explores the problem of formalizing the development of autonomous
artificial intelligence systems (AAIS), whose mathematical models may be complex or non-identifiable.
Using the value-iterations method for Q-functions of rewards, a methodology for constructing
of ε -optimal strategies with a given accuracy has been developed.
The results allow us to outline classes (including dual-use), for which it is possible
to rigorously justify the construction of optimal and ε -optimal strategies even in cases
where the models are identifiable but the computational complexity of standard dynamic programming algorithms may not be strictly polynomial.
Keywords: autonomous artificial intelligence systems, Markov decision processes, ε -optimal strategies.
full text
REFERENCES
- Feinberg E.A., Bender M.A., Curry M.T., Huang D., Koutsoudis T., Bernstein J.L. Sensor resource management for an airborne early warning radar. Proceedings of SPIE, Signal and Data Processing of Small Targets. August 7, Orlando, Florida. 2002. Vol. 4728. P. 145–156.
- Feinberg E.A., Kasyanov P.O., Zgurovsky M.Z. Continuity of equilibria for twoperson zero-sum games with noncompact action sets and unbounded payoffs. Annals of Operations Research. 2022. Vol. 317. P. 537–568. https://doi.org/10.1109/WSC.2015.7408530 .
- Wallis W.A. The statistical research group, 1942–1945. Journal of the American Statistical Association. 1980. Vol. 75 (370). P. 320–330.
- Yordanova V., Griffiths H., Hailes S. Rendezvous planning for multiple autonomous underwater vehicles using a Markov decision process. IET Radar, Sonar & Navigation. 2017. Vol. 11, N 12. P. 1762–1769.
- Silver D., Singh S., Precup D., Sutton R.S. Reward is enough. Artificial Intelligence. 2021. Vol. 299. 103535.
- Kara A.D., Saldi N., Yksel S. -learning for MDPs with general spaces: Convergence and near optimality via quantization under weak continuity. 2021. 25 p. arXiv preprint https:/arXiv:2111.06781 .
- Kara A.D., Yksel S. Convergence of finite memory Q-learning for POMDPs and near optimality of learned policies under filter stability. Mathematics of Operations Research. 2022. https://doi.org/10.1287/moor.2022.1331.
- Parthasarathy K.R. Probability measures on metric spaces. New York: Academic Press, 1967. 288 p.
- Bertsekas D.P., Shreve S.E. Stochastic optimal control: The discrete-time case. Belmont, MA: Athena Scientific, 1996. 330 p.
- Hernаndez-Lerma O., Lassere J.B. Discrete-time Markov control processes: Basic optimality criteria. New York: Springer, 1996. 216 p.
- Feinberg E.A., Kasyanov P.O., Zadoianchuk N.V. Berge’s theorem for noncompact image sets. Journal of Mathematical Analysis and Applications. 2013. Vol. 397, Iss. 1. P. 255–259.
- Feinberg E.A., Kasyanov P.O., Zadoianchuk N.V. Average-cost Markov decision processes with weakly continuous transition probabilities. Math. Oper. Res. 2012. Vol 37, N 4. P. 591–607.
- Rhenius D. Incomplete information in Markovian decision models. Ann. Statist. 1974. Vol. 2, N 6. P. 1327–1334.
- Yushkevich A.A. Reduction of a controlled Markov model with incomplete data to a problem with complete information in the case of Borel state and control spaces. Theory Probab. 1976. Vol. 21, N 1. P. 153–158.
- Dynkin E.B., Yushkevich A.A. Controlled Markov processes. New York: Springer-Verlag, 1979. 292 p.
- Bertsekas D.P., Shreve S.E. Stochastic optimal control: The discrete-time case. Belmont, MA: Athena Scientific, 1996. 331 p.
- Hernаndez-Lerma O. Adaptive Markov control processes. New York: Springer-Verlag, 1989. 148 p.
- Feinberg E.A., Kasyanov P.O., Zgurovsky M.Z. Markov decision processes with incomplete information and semiuniform feller transition probabilities. SIAM Journal on Control and Optimization. 2022. Vol. 60, N 4. P. 2488–2513.
- Sondik E.J . The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res. 1978. Vol. 26, N 2 P. 282–304.
- Hernаndez-Lerma O., Lassere J.B. Discrete-time Markov control processes: Basic optimality criteria. New York: Springer, 1996. 216 p.
- Feinberg, E.A., Kasyanov, P.O., Zgurovsky M.Z. Convergence of value iterations for total-cost mdps and pomdps with general state and action sets. IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). December 2014. P. 1–8.
- Szepesvari C. Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning. 2010. Vol. 4 (1). 104 p.
- Rempel M., Cai J. A review of approximate dynamic programming applications within military operations research. Operations Research Perspectives. 2021. Vol. 8. 100204.
- Department of the Navy. Science & Technology Strategy for Intelligent Autonomous Systems. July 2. 2021. 24 p.