Abstract. This survey paper focuses on methods and algorithms for fast estimation of data distance/similarity measures. The estimation is done by real-valued vector representations of small dimension. The discussed methods do not use learning and mainly use random projection and sampling. Initial data are mainly high-dimensional vectors with different distance measures (Euclidean, Manhattan, statistical, etc.) and similarities (dot product etc.). Vector representations of non-vector data are discussed as well. The resultant vectors can also be used for similarity search algorithms, machine learning, etc.
Keywords: distance, similarity, embeddings, sketches, dimensionality reduction, random projection, sampling, Johnson–Lindenstrauss lemma, kernel similarity, similarity search.
Рачковский Дмитрий Андреевич,
доктор техн. наук, ведущий научный сотрудник Международного научно-учебного центра информационных технологий и систем НАН и МОН Украины, Киев,
e-mail: dar@infrm.kiev.ua