DOI
10.34229/KCA2522-9664.24.2.4
UDC 004.22+004.89
CORESET DISCOVERY FOR MACHINE LEARNING PROBLEMS
Abstract. The coreset discovery problem is reviewed as well as the following three main methods to solve it: geometric coreset estimation, coreset discovery using the genetic algorithm, and coreset discovery using neural networks. We analyze each of these methods and find the cases where they show the best results. The focus of the paper is on neural network-based approaches and their ability to solve the coreset discovery problem. We perform a comparative analysis of several neural network-based approaches, describe their pros and cons, and determine the next steps in solving the coreset discovery problem.
Keywords: coreset, dataset distillation, dataset condensation, geometry coreset, genetic algorithm.
full text
REFERENCES
- Pankaj K. Agarwal S. Har-Peled, K. R. Varadarajan. Geometric approximation via coresets. In: Combinatorial and Computational Geometry. Goodman J.E., Pach J., Welzl E. (Eds.). New York: Cambridge University Press, 2005. P. 1–30.
- Jubran I., Maalouf A., Feldman D. Introduction to coresets: accurate coresets. arXiv:1910.08707v1 [cs.LG] 19 Oct 2019. https://doi.org/10.48550/arXiv.1910.08707.
- Agarwal P.K., Har-Peled S., Varadarajan K.R. Approximating extent measures of points. Journal of the ACM. 2004. Vol. 51, Iss. 4. P. 606–635. https://doi.org/10.1145/1008731.1008736.
- Barbiero P., Squillero G., Tonda A. Evolutionary discovery of coresets for classification. Proc. Genetic and Evolutionary Computation Conference Companion (13–17 July 2019, Prague, Czech Republic). Prague, 2019. P. 1747–1754. https://doi.org/10.1145/3319619.3326846.
- Barbiero P., Squillero G., Tonda A. Uncovering coresets for classification with multi-objective evolutionary algorithms. arXiv:2002.08645v1 [cs.LG] 20 Feb 2020. https://doi.org/10.48550/arXiv.2002.08645.
- Deb K., Pratap A., Agarwal S., Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation. 2002. Vol. 6, Iss. 2, P. 182–197. https://doi.org/10.1109/4235.996017.
- Sklearn datasets: Blobs dataset. URL: https://scikit-learn.org/stable/.
- Sklearn datasets: Circles dataset. URL: https://scikit-learn.org/stable/modules/generated/.
- Sklearn datasets: Moons dataset. URL: https://scikit-learn.org/stable/modules/generated/.
- Iris Data Set. URL: https://archive.ics.uci.edu/ml/datasets/iris .
- Hoerl A.E., Kennard R.W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 2000. Vol. 42, Iss. 1. P. 80–86. https://doi.org/10.2307/1271436.
- Yu R., Liu S., Wang X. Dataset distillation: a comprehensive review. arXiv:2301.07014v2 [cs.LG] 21 Jan 2023. https://doi.org/10.48550/arXiv.2301.07014.
- Maclaurin D., Duvenaud D., Adams R.P. Gradient-based hyperparameter optimization through reversible learning. arXiv:1502.03492v3 [stat.ML] 2 Apr 2015. https://doi.org/10.48550/arXiv.1502.03492.
- Wang T., Zhu J.-Y., Torralba A., Efros A.A. Dataset distillation. arXiv:1811.10959v3 [cs.LG] 24 Feb 2018. https://doi.org/10.48550/arXiv.1811.10959.
- Hinton G., Vinyals O., Dean J. Distilling the knowledge in a neural network. arXiv:1503.02531v1 [stat.ML] 9 Mar 2015. https://doi.org/10.48550/arXiv.1503.02531.
- Zhao B., Mopuri K.R., Bilen H. Dataset condensation with gradient matching. arXiv:2006.05929v3 [cs.CV] 8 Mar 2021. https://doi.org/10.48550/arXiv.2006.05929.
- Cazenavette G., Wang T., Torralba A., Efros A.A., Zhu J.-Y. Dataset distillation by matching training trajectories. arXiv:2203.11932v1 [cs.CV] 22 Mar 2022. https://doi.org/10.48550/arXiv.2203.11932.
- Jiang Z., Gu J., Liu M., Pan D.Z. Delving into effective gradient matching for dataset condensation. arXiv:2208.00311v1 [cs.LG] 30 Jul 2022. https://doi.org/10.48550/arXiv.2208.00311.
- Lee S., Chun S., Jung S., Yun S., Yoon S. Dataset condensation with contrastive signals. arXiv:2202.02916v3 [cs.CV] 16 Jun 2022. https://doi.org/10.48550/arXiv.2202.02916.
- Zhao B., Bilen H. Dataset condensation with distribution matching. arXiv:2110.04181v3 [cs.LG] 22 Dec 2021. https://doi.org/10.48550/arXiv.2110.04181.
- Gretton A., Borgwardt K.M, Rasch M., Scholkopf B., Smola A. A kernel two-sample test. The Journal of Machine Learning Research. 2012. N 13. P. 723–773.
- Wang K., Zhao B., Peng X., Zhu Z., Yang S., Wang S., Huang G., Bilen H., Wang X., You Y. CAFE: learning to condense dataset by aligning features. arXiv:2203.01531v2 [cs.CV] 27 Mar 2022. https://doi.org/10.48550/arXiv.2203.01531.
- Zhao B., Bilen H. Synthesizing informative training samples with GAN. arXiv:2204.07513v2 [cs.LG] 21 Dec 2022. https://doi.org/10.48550/arXiv.2204.07513.