Cybernetics And Systems Analysis logo
Editorial Board Announcements Abstracts Authors Archive
Cybernetics And Systems Analysis
International Theoretical Science Journal
UDC 004.83
S.D. Pogorilyy1, A.A. Kramov2


1 Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

sdp77@i.ua

2 Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

artemkramovphd@knu.ua

ASSESSMENT OF TEXT COHERENCE BY CONSTRUCTING THE GRAPH OF SEMANTIC,
LEXICAL AND GRAMMATICAL CONSISTENCY OF PHRASES OF SENTENCES

Abstract. The graph-based method of coherence evaluation of texts based on the analysis of semantic, grammatical, and lexical consistency of sentence phrases has been suggested. The experimental verification of the efficiency of the method has been performed on the English-language corpus. The metrics obtained can indicate that the suggested method outperforms other state-of-the-art approaches. The method can be applied to other languages by replacing the linguistic models according to the features of a certain language.

Keywords: natural language processing, evaluation of text coherence, bipartite graph of phrases, graph-based method of coherence assessment of texts, lexical and grammatical consistency of sentences.



FULL TEXT

REFERENCES

  1. Kurdi M. Natural language processing and computational linguistics 2: Semantics, discourse and applications. John Wiley & Sons, 2018. 316 p.

  2. Poulimenou S., Stamou S., Papavlasopoulos S., Poulos M. Short text coherence hypothesis. Journal of Quantitative Linguistics. 2016. Vol. 23, Iss. 2. P. 191–210. https://doi.org/10.1080/09296174. 2016.1142328.

  3. Marchenko O., Radyvonenko O., Ignatova T., Titarchuk P., Zhelezniakov D. Improving text generation through introducing coherence metrics. Cybernetics and Systems Analysis. 2020.Vol. 56, N 1, P. 13–21. https://doi.org/10.1007/s10559-020-00216-x.

  4. Pogorilyy S., Kramov A. Automated extraction of structured information from a variety of web pages. Proc. 11th International Conference of Programming UkrPROG 2018 (22–24 May 2018, Kyiv, Ukraine). Kyiv, Ukraine, 2018. P. 149–158.

  5. Barzilay R., Lapata M. Modeling local coherence: an entity-based approach. Computational Linguistics. 2008. Vol. 34, N 1, P. 1–34. https://doi.org/10.1162/coli.2008.34.1.1.

  6. Mesgar M., Strube M. Normalized entity graph for computing local coherence. Proc. TextGraphs-9: the workshop on Graph-based Methods for Natural Language Processing (29 October 2014, Doha, Quatar). Doha, Quatar, 2014. P. 1–5. https://doi.org/10.3115/v1/w14-3701.

  7. Li J., Hovy E. A model of coherence based on distributed sentence representation. Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (25–29 October 2014, Doha, Quatar). Doha, Quatar, 2014. P. 2039–2048, 2014. https://doi.org/10.3115/v1/d14-1218.

  8. Cui B., Li Y., Zhang Y., Zhang Z. Text coherence analysis based on deep neural network. Proc. 2017 ACM on Conference on Information and Knowledge Management (CIKM’17) (6–10 November 2017, Singapore, Singapore). Singapore, Singapore, 2017. P. 2027–2030. https://doi.org/10.1145/ 3132847.3133047.

  9. Putra J., Tokunaga T. Evaluating text coherence based on semantic similarity graph. Proc. TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing (3 November 2017, Vancouver, Canada). Vancouver, Canada, 2017. P. 76–85. 2017. https://doi.org/ 10.18653/v1/w17-2410.

  10. Giray G., Unalir M. Assessment of text coherence using an ontology-based relatedness measurement method. Expert Systems. 2019. Vol. 37, N. 3. P. 1–24. https://doi.org/10.1111/exsy.12505.

  11. Bohn T., Hu Y., Zhang J., Ling C.X. Learning sentence embeddings for coherence modelling and beyond. Proc. Recent Advances in Natural Language Processing (2–4 September 2019, Varna, Bulgaria). Varna, Bulgaria, 2019. P. 151–160. https://doi.org/10.26615/978-954-452-056-4_018.

  12. Angeli G., Premkumar M.J.J., Manning C. Leveraging linguistic structure for open domain information extraction. Proc. 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers) (26–31 July 2015, Beijing, China). Beijing, China, 2015. P. 344–354. https://doi.org/10.3115/ v1/p15-1034.

  13. Pogorilyy S., Kramov A. Coreference resolution method using a convolutional neural network. Proc. 2019 IEEE International Conference on Advanced Trends in Information Theory (ATIT) (18–20 December 2019, Kyiv, Ukraine). Kyiv, Ukraine, 2019. P. 397–401. https://doi.org/10.1109/ ATIT49449.2019.9030596.

  14. Le Q., Mikolov T. Distributed representations of sentences and documents. Proc. 31st International Conference on Machine Learning (21–26 June 2014, Beijing, China). Beijing, China, 2014. P. 1188–1196.

  15. Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed representations of words and phrases and their compositionality. Proc. 26th International Conference on Neural Information Processing Systems (5–8 December 2013, Lake Tahoe, Nevada, USA). Lake Tahoe, Nevada, USA, 2013. P. 3111–3119.

  16. Mikolov T., Grave E., Bojanowski P., Puhrsch C., Joulin A. Advances in pre-training distributed word representations. Proc. Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (7–12 May 2018, Miyazaki, Japan). Miyazaki, Japan, 2018. P. 52–55.

  17. Pogorilyy S., Kramov A. Method of the coherence evaluation of Ukrainian text. Data Recording, Storage & Processing. 2018. Vol. 20, N 4. P. 64–75. https://doi.org/10.35681/1560-9189.2018. 20.4.178945.

  18. OntoNotes Release 5.0. Linguistic Data Consortium, Catalog.ldc.upenn.edu, 2020. URL: https:// catalog.ldc.upenn.edu/LDC2013T19.
© 2020 Kibernetika.org. All rights reserved.