Cybernetics And Systems Analysis logo
Editorial Board Announcements Abstracts Authors Archive
Cybernetics And Systems Analysis
International Theoretical Science Journal
-->

UDC 004.42
A. Hlybovets1, V. Didenko2


1 National University of Kyiv-Mohyla Academy, Kyiv, Ukraine

a.glybovets@ukma.edu.ua

2 National University of Kyiv-Mohyla Academy, Kyiv, Ukraine

verochka1998@gmail.com

CONSTRUCTION OF GENERALIZED SUFFIX TREES
ON DISTRIBUTED PARALLEL PLATFORMS

Abstract. This paper proposes an algorithm for constructing generalized suffix trees using distributed parallel platforms, which is optimal from the point of both time complexity and memory consumption. The distributed construction approach allows operating with large alphabets and very long strings. The algorithm is efficient for scalability on distributed parallel platforms. It supports indexing suffixes for various long strings, ranging from a single long string to multiple long strings of varying lengths.

Keywords: suffix tree, generalized suffix tree, ERa, algorithm, parallel construction of a suffix tree.


full text

REFERENCES

  1. Weiner P. Linear pattern matching algorithms. Proc. 14th Annual IEEE Symposium on Switching and Automata Theory (SWAT 1973) (15–17 October 1973, Iowa City, Iowa, USA). Iowa City, 1973. P. 1–11.

  2. Mansour E., Allam A., Skiadopoulos S., Kalnis P. ERA: Efficient serial and parallel suffix tree construction for very long strings. Proc. VLDB Endowment (PVLDB). 2011. Vol. 5, N 1. P. 49–60.

  3. Kleppmann M. Designing data-intensive applications. O’Reilly Media, Inc., 2017. 614 p.

  4. Comin M., Farreras M. Efficient parallel construction of suffix trees for genomes larger than main memory. Proc. 20th European MPI Users’ Group Meeting (EuroMPI’13) (15–18 September 2013, Madrid, Spain). Madrid, 2013. P. 211–216.

  5. Gobonamang T.K., Mpoeleng D. Counter based suffix tree for DNA pattern repeats. Theoretical Computer Science. 2020. Vol. 814, Iss. C. P. 1–12.

  6. Ghoting A., Makarychev K. Suffix trees. In: Encyclopedia of Parallel Computing. Padua D. (Ed.), Boston, MA: Springer, 2011. P. 1945–1955. https://doi.org/10.1007/978-0-387-09766-4_464.

  7. Ibarra O.H., Zhang L (Eds.). Computing and combinatorics. Proc. 8th Annual International Conference, COCOON 2002 (15–17 August 2002, Singapore). Singapore, 2002. 614 p.

  8. Ukkonen E. On-line construction of suffix trees. Algorithmica. 1995. Vol. 14. P. 249–260.

  9. HDFS architecture guide. URL: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.

  10. Apache spark. URL: https://spark.apache.org/.

  11. Eberle A. Parallel multiway LCP-mergesort. Bachelor Thesis. 2014. 61 p.

  12. Zhu G., Guo C., Lu L., Huang Z., Yuan C., Gu R., Huang Y. DGST: Efficient and scalable suffix tree construction on distributed data-parallel platforms. Parallel Computing. 2019. Vol. 87. P. 87–102.

  13. The bin packing problem. URL: https://developers.google.com/optimization/bin/bin_packing.

  14. Schreiber E.L., Korf R.E., Moffitt M.D. Optimal multi-way number partitioning. Journal of the ACM. 2018. Vol. 65, Iss. 4. P. 1–61.

  15. Flick P., Aluru S. Parallel construction of suffix trees and the all-nearest-smaller-values problem. Proc. 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS) (29 May – 2 June 2017, Orlando, Florida, USA). Orlando, 2017. P. 12–21.

  16. Ng W., Kakehi K. Merging string sequences by longest common prefixes. IPSJ Digital Courier. 2008. Vol. 4. P. 69–78.

  17. Datasets. URL: https://www.kaggle.com/datasets.




© 2023 Kibernetika.org. All rights reserved.