УДК 004.22+004.93'11
БЫСТРЫЙ ПОИСК СХОДНЫХ ГРАФОВ
ПО РАССТОЯНИЮ РЕДАКТИРОВАНИЯ
Аннотация. Дан обзор индексных структур для быстрого поиска по сходству объектов, представленных деревьями и графами. В качестве меры сходства использовано расстояние редактирования. Рассмотрено выполнение запросов точного поиска по сходству. В основном представлены алгоритмы на основе стратегии фильтрации и уточнения, использующие обратное индексирование. Кроме того, рассмотрены алгоритмы точного вычисления расстояния редактирования графов и его нижних и верхних границ.
Ключевые слова: поиск по сходству, графы, расстояние редактирования, ближайший сосед, индексные структуры, обратное индексирование.
ПОЛНЫЙ ТЕКСТ
Рачковский Дмитрий Андреевич,
доктор техн. наук, ведущий научный сотрудник Международного научно-учебного центра информационных технологий и систем НАН Украины и МОН Украины, Киев,
dar@infrm.kiev.ua
СПИСОК ЛИТЕРАТУРЫ
- Rachkovskij D.A. Index structures for fast similarity search for symbolic strings. Cybernetics and Systems Analysis. 2019. Vol. 55, N 5. P.
- Rachkovskij D.A. Real-valued vectors for fast distance and similarity estimation. Cybernetics and Systems Analysis. 2016. Vol. 52, N 6. P. 967–988.
- Rachkovskij D.A. Binary vectors for fast distance and similarity estimation. Cybernetics and Systems Analysis. 2017. Vol. 53, N 1. P. 138–156.
- Rachkovskij D.A. Distance-based index structures for fast similarity search. Cybernetics and Systems Analysis. 2017. Vol. 53, N 4. P. 636–658.
- Rachkovskij D.A. Index structures for fast similarity search for binary vectors. Cybernetics and Systems Analysis. 2017. Vol. 53, N 5. P. 799–820.
- Rachkovskij D.A. Index structures for fast similarity search for real-valued vectors. I. Cybernetics and Systems Analysis. 2018. Vol. 54, N 1. P. 152–164.
- Rachkovskij D.A. Index structures for fast similarity search for real-valued vectors. II. Cybernetics and Systems Analysis. 2018. Vol. 54, N 2. P. 320–335.
- Rachkovskij D.A., Slipchenko S.V. Similarity-based retrieval with structure-sensitive sparse binary distributed representations. Comp. Intelligence. 2012. Vol. 28, N 1. P. 106–129.
- Bille P. A survey on tree edit distance and related problems. Theoretical Computer Science. 2005. Vol. 337, N 1–3. P. 217–239.
- Tai K.-C. The tree-to-tree correction problem. Journal of the Association for Computing Machinery (JACM). 1979. Vol. 26. P. 422–433.
- Levenshtein V.I. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics — Doklady. 1966. Vol. 10, N 8. P. 707–710.
- Gao X., Xiao B., Tao D., Li X. A survey of graph edit distance. Pattern Analysis and Applications. 2010. Vol. 13, N 1. P. 113–129.
- Sanfeliu A., Fu K.S. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man. Cybern. 1983. Vol. 13, N 3. P. 353–362.
- Zhang K., Jiang T. Some MAX SNP-hard results concerning unordered labeled trees. Information Processing Letters. 1994. Vol. 49. P. 249–254.
- Pawlik M., Augsten N. Rted: a robust algorithm for the tree edit distance. Proceedings of the VLDB Endowment. 2011. Vol. 5, N 4. P. 334–345.
- Pawlik M., Augsten N. Tree edit distance: Robust and memory-efficient. Information Systems. 2016. Vol. 56. P. 157–173.
- Kailing K., Kriegel H.-P., Schonauer S., Seidl T. Efficient similarity search for hierarchical data in large databases. Proc. EDBT’04. 2004. P. 676–693.
- Berchtold S., Keim D., Kriegel H.P. The X-tree: An index structure for high-dimensional data. Proc. VLDB’96. 1996. P. 28–39.
- Yang R., Kalnis P., Tung A.K.H. Similarity evaluation on tree-structured data. Proc. SIGMOD’05. 2005. P. 754–765.
- Guha S., Jagadish H.V., Koudas N., Srivastava D., Yu T. Integrating XML data sources using approximate joins. ACM Trans. Database Syst. 2006. Vol. 31, N 1. P. 161–207.
- Akutsu T., Fukagawa D., Takasu A. Approximating tree edit distance through string edit distance. Algorithmica. 2010. Vol. 57, N 2. P. 325–348.
- Tang Y., Cai Y., Mamoulis N. Scaling similarity joins over tree-structured data. Proc. VLDB Endowment. 2015. Vol. 8, N 11. P. 1130–1141.
- Zeng Z., Tung A.K.H., Wang J., Feng J., Zhou L. Comparing stars: On approximating graph edit distance. Proc. VLDB Endowment. 2009. Vol. 2, N 1. P. 25–36.
- Zhao X., Xiao C., Lin X., Wang W. Efficient graph similarity joins with edit distance constraints. Proc. ICDE’12. 2012. P. 834–845.
- Gouda K., Arafa M. An improved global lower bound for graph edit similarity search. Pattern Recogn. Lett. 2015. Vol. 58. P. 8–14.
- Wang G., Wang B., Yang X., Yu G. Efficiently indexing large sparse graphs for similarity search. IEEE Trans. Knowledge and Data Engineering. 2012. Vol. 24, N 3. P. 440–451.
- Bougleux S., Gauzere B., Blumenthal D.B., Brun L. Fast linear sum assignment with error-correction and no cost constraints. Pattern Recogn. Lett. https://doi.org/10.1016/j.patrec.2018.03.032.
- Kuhn H.W. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly. 1955. Vol. 2, N 1–2. P. 83–97.
- Munkres J. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics. 1957. Vol. 5, N 1. P. 32–38.
- Wang X., Ding X., Tung A.K.H., Ying S., Jin H. An efficient graph indexing method. Proc. ICDE’12. 2012. P. 210–221.
- Zhao X., Xiao C., Lin X., Wang W., Ishikawa Y. Efficient processing of graph similarity queries with edit distance constraints. VLDB J. 2013. Vol. 22. P. 727–752.
- Riesen K., Fankhauser S., Bunke H. Speeding up graph edit distance computation with a bipartite heuristic. Proc. MLG’07. 2007. P. 21–24.
- Zheng W., Zou L., Lian X., Wang D., Zhao D. Efficient graph similarity search over large graph databases. IEEE TKDE. 2015. Vol. 27, N 4. P. 964–978.
- Li Z., Jian X., Lian X., Chen L. An efficient probabilistic approach for graph similarity search. Proc. ICDE’18. 2018. P. 533–544.
- Qin J., Xiao C. Pigeonring: a principle for faster thresholded similarity search. Proc. VLDB Endow. 2018. Vol. 12, N 1. P. 28–42.
- Zhao X., Xiao C., Lin X., Zhang W., Wang Y. Efficient structure similarity searches: A partition-based approach. The VLDB Journal. 2018. Vol. 27, N 1. P. 53–78.
- Liang Y., Zhao P. Similarity search in graph databases: A multilayered indexing approach. Proc. ICDE’17. 2017. P. 783–794.
- Abu-Aisheh Z., Raveaux R., Ramel J.-Y. Efficient k-nearest neighbors search in graph space. Pattern Recognition Letters. https://doi.org/10.1016/j.patrec.2018.05.001.
- Blumenthal D.B., Gamper J. Improved lower bounds for graph edit distance. IEEE TKDE. 2018. Vol. 30, N 3. P. 503–516.
- Blumenthal D.B., Gamper J. On the exact computation of the graph edit distance. Pattern Recognition Letters. 2018. https://doi.org/10.1016/j.patrec.2018.05.002/.
- Abu-Aisheh Z., Gauzere B., Bougleux S., Ramel J.-Y., Brun L., Raveaux R., Heroux P., Adam S. Graph edit distance contest: Results and future challenges. Pattern Recognition Letters. 2017. Vol. 100. P. 96–103.
- Riesen K., Neuhaus M., Bunke H. Bipartite graph matching for computing the edit distance of graphs. Proc. GbRPR’07. 2007. P. 1–12.
- Riesen K., Bunke H. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision Computing. 2009. Vol. 27, N 7. P. 950–959.
- Abu-Aisheh Z., Raveaux R., Ramel J.Y., Martineau P. An exact graph edit distance algorithm for solving pattern recognition problems. Proc. ICPRAM’15. 2015. P. 271–278.
- Abu-Aisheh Z., Raveaux R., Ramel J.-Y., Martineau P. A parallel graph edit distance algorithm. Expert Systems with Applications. 2018. Vol. 94. P. 41–57.
- Gouda K., Hassaan M. А novel edge-centric approach for graph edit similarity computation. Information Systems. 2019. Vol. 80. P. 91–106.
- Chen X., Huo H., Huan J., Vitter J.S. An efficient algorithm for graph edit distance computation. Knowledge-Based Systems. 2019. Vol. 163. P. 762–775.
- Zhou R., Hansen E.A. Beam-stack search: Integrating backtracking with beam search. Proc. ICAPS’05. 2005. P. 90–98.
- Chang L., Feng X., Lin X., Qin L., Zhang W. Efficient graph edit distance computation and verification via anchor-aware lower bound estimation. arXiv:1709.06810. 1 Oct 2017.
- Justice D., Hero A. A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 2006. Vol. 28, N 8. P. 1200–1214.
- Lerouge J., Abu-Aisheh Z., Raveaux R., Heroux P., Adam S. New binary linear programming formulation to compute the graph edit distance. Pattern Recognition. 2017. Vol. 72. P. 254–265.
- Darwiche М., Raveaux R, Conte D., T’Kindt V. Graph Edit Distance in the exact context. Proc. S+SSPR’18. 2018. P. 304–314.
- Carletti V., Gauzere B., Brun L., Vento M. Approximate graph edit distance computation combining bipartite matching and exact neighborhood substructure distance. Proc. GbRPR’15. 2015. P. 188–197.
- Blumenthal D., Bougleux S., Gamper J., Brun L. Ring based approximation of graph edit distance. Proc. S+SSPR’18. 2018. P. 293–303.
- Bougleux S., Brun L., Carletti V., Foggia P., Gauzere B., Vento M. Graph edit distance as a quadratic assignment problem. Pattern Recognition Letters. 2017. Vol. 87. P. 38–46.
- Daller E., Bougleux S., Gauzere B., Brun L. Approximate graph edit distance by several local searches in parallel. Proc. ICPRAM’18. 2018. P. 149–158.
- Darwiche M., Conte D., Raveaux R., T’Kindt V. A local branching heuristic for solving a graph edit distance problem. Comp. & Oper. Res. https://doi.org/10.1016/j.cor.2018.02.002.
- Gouda K., Arafa M., Calders T. A novel hierarchical-based framework for upper bound computation of graph edit distance. Pattern Recognition. 2018. Vol. 80. P. 210–224.
- Slipchenko S.V., Rachkovskij D.A. Analogical mapping using similarity of binary distributed representations. Information Theories & Applications. 2009. Vol. 16, N 3. P. 269–290.
- Rachkovskij D.A. Some approaches to analogical mapping with structure-sensitive distributed representations. Journal of Experimental & Theoretical Artificial Intelligence. 2004. Vol. 16, N 3. P. 125–144.
- Rachkovskij D.A. Formation of similarity-reflecting binary vectors with random binary projections. Cybernetics and Systems Analysis. 2015. Vol. 51, N 2. P. 313–323.
- Рачковський Д.А., Гриценко В.І. Розподілене подання векторних даних на основі випадкових проекцій. Київ: Інтерсервіс, 2018. 216 c.
- Rachkovskij D.A., Revunova E.G. A randomized method for solving discrete ill-posed problems. Cybernetics and Systems Analysis. 2012. Vol. 48, N 4. P. 621–635.
- Revunova E.G. Model selection criteria for a linear model to solve discrete ill-posed problems on the basis of singular decomposition and random projection. Cybernetics and Systems Analysis. 2016. Vol. 52, N 4. P. 647–664.
- Revunova E.G. Averaging over matrices in solving discrete ill-posed problems on the basis of random projection. Proc. CSIT’17. 2017. P. 473–478.
- Riba P., Llados J., Fornes A., Dutta A. Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases. Pattern Recognition Letters. 2017. Vol. 87. P. 203–211.
- Narayanan A., Chandramohan M., Venkatesan R., Chen L., Liu Y., Jaiswal S. Graph2vec: Learning distributed representations of graphs. Proc. MLG’17. 2017. P. 21:1–21:8.
- Goyal P., Ferrara E. Graph embedding techniques, applications, and performance: A survey. Knowledge Based Systems. 2018. Vol. 151. P. 78–94.
- Wu Z., Pan S., Chen F., Long G., Zhang C., Yu P.S. A comprehensive survey on graph neural networks. arXiv:1901.00596. 10 Mar. 2019.
- Bai Y., Ding H., Bian S., Chen T., Sun Y., Wang W. SimGNN: A neural network approach to fast graph similarity computation. Proc. WSDM’19. 2019. P. 384–392.