Cybernetics And Systems Analysis logo
Editorial Board Announcements Abstracts Authors Contents
Cybernetics And Systems Analysis
International Theoretical Science Journal
UDC 004.22 + 004.93'11
D.A. Rachkovskij

REAL-VALUED EMBEDDINGS AND SKETCHES FOR FAST DISTANCEAND SIMILARITY ESTIMATION

Abstract. This survey paper focuses on methods and algorithms for fast estimation of data distance/similarity measures. The estimation is done by real-valued vector representations of small dimension. The discussed methods do not use learning and mainly use random projection and sampling. Initial data are mainly high-dimensional vectors with different distance measures (Euclidean, Manhattan, statistical, etc.) and similarities (dot product etc.). Vector representations of non-vector data are discussed as well. The resultant vectors can also be used for similarity search algorithms, machine learning, etc.

Keywords: distance, similarity, embeddings, sketches, dimensionality reduction, random projection, sampling, Johnson–Lindenstrauss lemma, kernel similarity, similarity search.



FULL TEXT

Рачковский Дмитрий Андреевич,
доктор техн. наук, ведущий научный сотрудник Международного научно-учебного центра информационных технологий и систем НАН и МОН Украины, Киев,
e-mail: dar@infrm.kiev.ua

© 2016 Kibernetika.org. All rights reserved.