Abstract. This review focuses on methods and algorithms for fast estimation of distance/similarity measures of initial data by vector representations with binary or integer components obtained from initial data. The initial data are mainly high-dimensional vectors with various distance measures (angular, Euclidean, etc.) or similarity measures (cosine, inner product, etc.). The discussed methods are without training and use mostly random projection followed by quantization, as well as sampling. The resulting vectors can be used for similarity search, machine learning, and other algorithms
Keywords: distance, similarity, embeddings, sketches, random projection, sampling, binarization, quantization, Johnson–Lindenstrauss lemma, kernel similarity, similarity search, locality-sensitive hashing.
International Scientific-Educational Center of Information Technologies and Systems, NAS and MES of Ukraine, Kyiv, Ukraine
e-mail: dar@infrm.kiev.ua.