Cybernetics And Systems Analysis logo
Editorial Board Announcements Abstracts Authors Archive
Cybernetics And Systems Analysis
International Theoretical Science Journal
UDC 004.22+004.93'11
D.A. Rachkovskij1


1 International Research and Training Center for Information Technologies and Systems, NAS of Ukraine and MES of Ukraine, Kyiv, Ukraine

dar@infrm.kiev.ua

FAST SEARCH FOR SIMILAR GRAPHS BY EDIT DISTANCE

Abstract. This survey article considers index structures for fast similarity search for objects represented by trees and graphs. The edit distance is used as a measure of similarity. The execution of exact similarity search queries is considered. Algorithms based on filter-and-refine strategy using inverted indexing are mainly presented. Algorithms for accurate calculation of the graph edit distance and its lower and upper bounds are also considered.

Keywords: similarity search, graphs, edit distance, nearest neighbor, index structures, inverted indexing.



FULL TEXT

REFERENCES

  1. Rachkovskij D.A. Index structures for fast similarity search for symbolic strings. Cybernetics and Systems Analysis. 2019. Vol. 55, N 5. P.

  2. Rachkovskij D.A. Real-valued vectors for fast distance and similarity estimation. Cybernetics and Systems Analysis. 2016. Vol. 52, N 6. P. 967–988.

  3. Rachkovskij D.A. Binary vectors for fast distance and similarity estimation. Cybernetics and Systems Analysis. 2017. Vol. 53, N 1. P. 138–156.

  4. Rachkovskij D.A. Distance-based index structures for fast similarity search. Cybernetics and Systems Analysis. 2017. Vol. 53, N 4. P. 636–658.

  5. Rachkovskij D.A. Index structures for fast similarity search for binary vectors. Cybernetics and Systems Analysis. 2017. Vol. 53, N 5. P. 799–820.

  6. Rachkovskij D.A. Index structures for fast similarity search for real-valued vectors. I. Cybernetics and Systems Analysis. 2018. Vol. 54, N 1. P. 152–164.

  7. Rachkovskij D.A. Index structures for fast similarity search for real-valued vectors. II. Cybernetics and Systems Analysis. 2018. Vol. 54, N 2. P. 320–335.

  8. Rachkovskij D.A., Slipchenko S.V. Similarity-based retrieval with structure-sensitive sparse binary distributed representations. Comp. Intelligence. 2012. Vol. 28, N 1. P. 106–129.

  9. Bille P. A survey on tree edit distance and related problems. Theoretical Computer Science. 2005. Vol. 337, N 1–3. P. 217–239.

  10. Tai K.-C. The tree-to-tree correction problem. Journal of the Association for Computing Machinery (JACM). 1979. Vol. 26. P. 422–433.

  11. Levenshtein V.I. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics — Doklady. 1966. Vol. 10, N 8. P. 707–710.

  12. Gao X., Xiao B., Tao D., Li X. A survey of graph edit distance. Pattern Analysis and Applications. 2010. Vol. 13, N 1. P. 113–129.

  13. Sanfeliu A., Fu K.S. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man. Cybern. 1983. Vol. 13, N 3. P. 353–362.

  14. Zhang K., Jiang T. Some MAX SNP-hard results concerning unordered labeled trees. Information Processing Letters. 1994. Vol. 49. P. 249–254.

  15. Pawlik M., Augsten N. Rted: a robust algorithm for the tree edit distance. Proceedings of the VLDB Endowment. 2011. Vol. 5, N 4. P. 334–345.

  16. Pawlik M., Augsten N. Tree edit distance: Robust and memory-efficient. Information Systems. 2016. Vol. 56. P. 157–173.

  17. Kailing K., Kriegel H.-P., Schonauer S., Seidl T. Efficient similarity search for hierarchical data in large databases. Proc. EDBT’04. 2004. P. 676–693.

  18. Berchtold S., Keim D., Kriegel H.P. The X-tree: An index structure for high-dimensional data. Proc. VLDB’96. 1996. P. 28–39.

  19. Yang R., Kalnis P., Tung A.K.H. Similarity evaluation on tree-structured data. Proc. SIGMOD’05. 2005. P. 754–765.

  20. Guha S., Jagadish H.V., Koudas N., Srivastava D., Yu T. Integrating XML data sources using approximate joins. ACM Trans. Database Syst. 2006. Vol. 31, N 1. P. 161–207.

  21. Akutsu T., Fukagawa D., Takasu A. Approximating tree edit distance through string edit distance. Algorithmica. 2010. Vol. 57, N 2. P. 325–348.

  22. Tang Y., Cai Y., Mamoulis N. Scaling similarity joins over tree-structured data. Proc. VLDB Endowment. 2015. Vol. 8, N 11. P. 1130–1141.

  23. Zeng Z., Tung A.K.H., Wang J., Feng J., Zhou L. Comparing stars: On approximating graph edit distance. Proc. VLDB Endowment. 2009. Vol. 2, N 1. P. 25–36.

  24. Zhao X., Xiao C., Lin X., Wang W. Efficient graph similarity joins with edit distance constraints. Proc. ICDE’12. 2012. P. 834–845.

  25. Gouda K., Arafa M. An improved global lower bound for graph edit similarity search. Pattern Recogn. Lett. 2015. Vol. 58. P. 8–14.

  26. Wang G., Wang B., Yang X., Yu G. Efficiently indexing large sparse graphs for similarity search. IEEE Trans. Knowledge and Data Engineering. 2012. Vol. 24, N 3. P. 440–451.

  27. Bougleux S., Gauzere B., Blumenthal D.B., Brun L. Fast linear sum assignment with error-correction and no cost constraints. Pattern Recogn. Lett. https://doi.org/10.1016/j.patrec.2018.03.032.

  28. Kuhn H.W. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly. 1955. Vol. 2, N 1–2. P. 83–97.

  29. Munkres J. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics. 1957. Vol. 5, N 1. P. 32–38.

  30. Wang X., Ding X., Tung A.K.H., Ying S., Jin H. An efficient graph indexing method. Proc. ICDE’12. 2012. P. 210–221.

  31. Zhao X., Xiao C., Lin X., Wang W., Ishikawa Y. Efficient processing of graph similarity queries with edit distance constraints. VLDB J. 2013. Vol. 22. P. 727–752.

  32. Riesen K., Fankhauser S., Bunke H. Speeding up graph edit distance computation with a bipartite heuristic. Proc. MLG’07. 2007. P. 21–24.

  33. Zheng W., Zou L., Lian X., Wang D., Zhao D. Efficient graph similarity search over large graph databases. IEEE TKDE. 2015. Vol. 27, N 4. P. 964–978.

  34. Li Z., Jian X., Lian X., Chen L. An efficient probabilistic approach for graph similarity search. Proc. ICDE’18. 2018. P. 533–544.

  35. Qin J., Xiao C. Pigeonring: a principle for faster thresholded similarity search. Proc. VLDB Endow. 2018. Vol. 12, N 1. P. 28–42.

  36. Zhao X., Xiao C., Lin X., Zhang W., Wang Y. Efficient structure similarity searches: A partition-based approach. The VLDB Journal. 2018. Vol. 27, N 1. P. 53–78.

  37. Liang Y., Zhao P. Similarity search in graph databases: A multilayered indexing approach. Proc. ICDE’17. 2017. P. 783–794.

  38. Abu-Aisheh Z., Raveaux R., Ramel J.-Y. Efficient k-nearest neighbors search in graph space. Pattern Recognition Letters. https://doi.org/10.1016/j.patrec.2018.05.001.

  39. Blumenthal D.B., Gamper J. Improved lower bounds for graph edit distance. IEEE TKDE. 2018. Vol. 30, N 3. P. 503–516.

  40. Blumenthal D.B., Gamper J. On the exact computation of the graph edit distance. Pattern Recognition Letters. 2018. https://doi.org/10.1016/j.patrec.2018.05.002/.

  41. Abu-Aisheh Z., Gauzere B., Bougleux S., Ramel J.-Y., Brun L., Raveaux R., Heroux P., Adam S. Graph edit distance contest: Results and future challenges. Pattern Recognition Letters. 2017. Vol. 100. P. 96–103.

  42. Riesen K., Neuhaus M., Bunke H. Bipartite graph matching for computing the edit distance of graphs. Proc. GbRPR’07. 2007. P. 1–12.

  43. Riesen K., Bunke H. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision Computing. 2009. Vol. 27, N 7. P. 950–959.

  44. Abu-Aisheh Z., Raveaux R., Ramel J.Y., Martineau P. An exact graph edit distance algorithm for solving pattern recognition problems. Proc. ICPRAM’15. 2015. P. 271–278.

  45. Abu-Aisheh Z., Raveaux R., Ramel J.-Y., Martineau P. A parallel graph edit distance algorithm. Expert Systems with Applications. 2018. Vol. 94. P. 41–57.

  46. Gouda K., Hassaan M. А novel edge-centric approach for graph edit similarity computation. Information Systems. 2019. Vol. 80. P. 91–106.

  47. Chen X., Huo H., Huan J., Vitter J.S. An efficient algorithm for graph edit distance computation. Knowledge-Based Systems. 2019. Vol. 163. P. 762–775.

  48. Zhou R., Hansen E.A. Beam-stack search: Integrating backtracking with beam search. Proc. ICAPS’05. 2005. P. 90–98.

  49. Chang L., Feng X., Lin X., Qin L., Zhang W. Efficient graph edit distance computation and verification via anchor-aware lower bound estimation. arXiv:1709.06810. 1 Oct 2017.

  50. Justice D., Hero A. A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 2006. Vol. 28, N 8. P. 1200–1214.

  51. Lerouge J., Abu-Aisheh Z., Raveaux R., Heroux P., Adam S. New binary linear programming formulation to compute the graph edit distance. Pattern Recognition. 2017. Vol. 72. P. 254–265.

  52. Darwiche М., Raveaux R, Conte D., T’Kindt V. Graph Edit Distance in the exact context. Proc. S+SSPR’18. 2018. P. 304–314.

  53. Carletti V., Gauzere B., Brun L., Vento M. Approximate graph edit distance computation combining bipartite matching and exact neighborhood substructure distance. Proc. GbRPR’15. 2015. P. 188–197.

  54. Blumenthal D., Bougleux S., Gamper J., Brun L. Ring based approximation of graph edit distance. Proc. S+SSPR’18. 2018. P. 293–303.

  55. Bougleux S., Brun L., Carletti V., Foggia P., Gauzere B., Vento M. Graph edit distance as a quadratic assignment problem. Pattern Recognition Letters. 2017. Vol. 87. P. 38–46.

  56. Daller E., Bougleux S., Gauzere B., Brun L. Approximate graph edit distance by several local searches in parallel. Proc. ICPRAM’18. 2018. P. 149–158.

  57. Darwiche M., Conte D., Raveaux R., T’Kindt V. A local branching heuristic for solving a graph edit distance problem. Comp. & Oper. Res. https://doi.org/10.1016/j.cor.2018.02.002.

  58. Gouda K., Arafa M., Calders T. A novel hierarchical-based framework for upper bound computation of graph edit distance. Pattern Recognition. 2018. Vol. 80. P. 210–224.

  59. Slipchenko S.V., Rachkovskij D.A. Analogical mapping using similarity of binary distributed representations. Information Theories & Applications. 2009. Vol. 16, N 3. P. 269–290.

  60. Rachkovskij D.A. Some approaches to analogical mapping with structure-sensitive distributed representations. Journal of Experimental & Theoretical Artificial Intelligence. 2004. Vol. 16, N 3. P. 125–144.

  61. Rachkovskij D.A. Formation of similarity-reflecting binary vectors with random binary projections. Cybernetics and Systems Analysis. 2015. Vol. 51, N 2. P. 313–323.

  62. Rachkovskij D.A., Gritsenko V.I. Distributed representation of vector data based on random projections. Kyiv: Interservice, 2018. 216 p.

  63. Rachkovskij D.A., Revunova E.G. A randomized method for solving discrete ill-posed problems. Cybernetics and Systems Analysis. 2012. Vol. 48, N 4. P. 621–635.

  64. Revunova E.G. Model selection criteria for a linear model to solve discrete ill-posed problems on the basis of singular decomposition and random projection. Cybernetics and Systems Analysis. 2016. Vol. 52, N 4. P. 647–664.

  65. Revunova E.G. Averaging over matrices in solving discrete ill-posed problems on the basis of random projection. Proc. CSIT’17. 2017. P. 473–478.

  66. Riba P., Llados J., Fornes A., Dutta A. Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases. Pattern Recognition Letters. 2017. Vol. 87. P. 203–211.

  67. Narayanan A., Chandramohan M., Venkatesan R., Chen L., Liu Y., Jaiswal S. Graph2vec: Learning distributed representations of graphs. Proc. MLG’17. 2017. P. 21:1–21:8.

  68. Goyal P., Ferrara E. Graph embedding techniques, applications, and performance: A survey. Knowledge Based Systems. 2018. Vol. 151. P. 78–94.

  69. Wu Z., Pan S., Chen F., Long G., Zhang C., Yu P.S. A comprehensive survey on graph neural networks. arXiv:1901.00596. 10 Mar. 2019.

  70. Bai Y., Ding H., Bian S., Chen T., Sun Y., Wang W. SimGNN: A neural network approach to fast graph similarity computation. Proc. WSDM’19. 2019. P. 384–392.
© 2019 Kibernetika.org. All rights reserved.