C&SA | Contents

Volume 60 >>> № 2 MARCH — APRIL 2024

-->

DOI 10.34229/KCA2522-9664.24.2.1

UDC 004.89

O.H. Skurzhanskyi¹, O.O. Marchenko², A.V. Anisimov³

¹ Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

oleksandr.skurzhanskyi@gmail.com

² Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

rozenkrans17@gmail.com

³ Taras Shevchenko National University of Kyiv, Kyiv, Ukraine

avatatan@gmail.com

SPECIALIZED PRE-TRAINING OF NEURAL NETWORKS ON SYNTHETIC DATA
FOR IMPROVING PARAPHRASE GENERATION

Abstract. Generating paraphrases is a fundamental problem in natural language processing. In light of the significant success of transfer learning technology, the “pre-training fine-tuning” approach has become the standard. However, popular general-purpose pre-training methods typically require large datasets and computational resources, and available pre-trained models are limited by fixed architecture and size. We propose a simple and effective approach for pre-training specifically for paraphrase generation, which significantly improves model quality and matches the quality level of general-purpose models. Both existing public data and new data generated by large language models were used. The impact of this procedure on neural networks of different architectures was investigated, and it was shown to work for all of them.

Keywords: artificial intelligence, machine learning, neural networks, paraphrase generation, pre-training, fine-tuning.

full text

REFERENCES

Han X., Zhang Z., Ding N., Gu Y., Liu X., Huo Y., Qiu J., Yao Y., Zhang A., Zhang L., et al. Pre-trained models: past, present and future. AI Open. 2021. Vol. 2. P. 225–250. https://doi.org/10.1016/j.aiopen.2021.08.002.

Zhao W., Wang L., Shen K., Jia R., Liu J. Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data. https://doi.org/10.48550/arXiv.1903.00138.

Omelianchuk K., Atrasevych V., Chernodub A., Skurzhanskyi O. GECToR–Grammatical error correction: tag, not rewrite. arXiv:2005.12592v2 [cs.CL] 29 May 2020. https://doi.org/10.48550/arXiv.2005.12592.

Kasai J., Pappas N., Peng H., Cross J., Smith N.A. Deep encoder, shallow decoder: reevaluating non-autoregressive machine translation. 2020. arXiv:2006.10369v4 [cs.CL]. 24 Jun 2021. https://doi.org/10.48550/arXiv.2006.10369.

Wieting J., Gimpel K. ParaNMT-50M: pushing the limits of paraphrastic sentence embeddings with millions of machine translations. arXiv:1711.05732v2 [cs.CL] 20 Apr 2018. https://doi.org/10.48550/arXiv.1711.05732.

Ouyang L., Wu J., Jiang X., Almeida D. et al. Training language models to follow instructions with human feedback. arXiv:2203.02155v1 [cs.CL] 4 Mar 2022. https://doi.org/10.48550/arXiv.2203.02155.

Lin T.-Y., Maire M., Belongie S., Bourdev L., Girshick R., Hays J., Perona P., Ramanan D., Zitnick C.L., Dollїr P. Microsoft COCO: common objects in context. arXiv:1405.0312v3 [cs.CV] 21 Feb 2015. https://doi.org/ 10.48550/arXiv.1405.0312.

Wang Z., Hamza W., Florian R. Bilateral multi-perspective matching for natural language sentences. 2017. arXiv:1702.03814v3 [cs.AI] 14 Jul 2017. https://doi.org/10.48550/arXiv.1702.03814.

Papineni K., Roukos S., Ward T., Zhu W.-J. Bleu: a method for automatic evaluation of machine translation. Proc. 40th annual meeting on Association for Computational Linguistics (7–12 July 2002, Philadelphia, Pennsylvania, USA). Philadelphia, 2002. P. 311–318. https://doi.org/10.3115/1073083.1073135.

Snover M., Dorr B., Schwartz R., Micciulla L., Makhoul J. A study of translation edit rate with targeted human annotation. Proc. 7th Conference of the Association for Machine Translation in the Americas: Technical Papers (8–12 August 2006, Cambridge, Massachusetts, USA). Cambridge, 2006. P. 223–231. URL: https://aclanthology.org/2006.amta-papers.25.

Lavie A., Agarwal A. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. Proc. Second Workshop on Statistical Machine Translation (June 2007, Prague, Czech Republic). Prague, 2007. P. 228–231. URL: https://aclanthology.org/W07-0734.pdf .

Wubben S., Van Den Bosch A., Krahmer E. Paraphrase generation as monolingual translation: Data and evaluation. Proc. 6th International Natural Language Generation Conference (7–9 July, 2010, Trim, Co. Meath, Ireland). Trim, 2010. URL: https://aclanthology.org/W10-4223.pdf .

Post M. A call for clarity in reporting BLEU scores. Proc. the Third Conference on Machine Translation: Research Papers. (31 October – 1 November 2018, Brussels, Belgium). Brussels, 2018. https://doi.org/10.48550/arXiv.1804.08771.

Gehring J., Auli M., Grangier D., Yarats D., Dauphin Y.N. Convolutional sequence to sequence learning. Proc. 34th International Conference on Machine Learning (6–11 August 2017, Sydney NSW Australia). Sydney, 2017. PMLR. 2017. Vol. 70. P. 1243–1252. https://doi.org/10.48550/arXiv.1705.03122.

Hochreiter S., Schmidhuber J. Long short-term memory. Neural Computation. 1997. Vol. 9, Iss. 8. P. 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser ., Polosukhin I. Attention is all you need. Proc. 31st Conference on Neural Information Processing Systems (NIPS 2017) (4–9 December 2017, Long Beach, CA, USA). Long Beach, 2017. Advances in Neural Information Processing Systems. 2017. Vol. 30. P. 5998–6008. https://doi.org/10.48550/arXiv.1706.03762.

Fabre B., Urvoy T., Chevelu J., Lolive D. Neural-driven search-based paraphrase generation. Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. (19–23 April 2021, virtual event). P. 2100–2111. https://doi.org/10.18653/ v1/2021.eacl-main.180.

Prakash A., Hasan S.A, Lee K., Datla V., Qadir A., Liu J., Farri O. Neural paraphrase generation with stacked residual LSTM networks. 2016. https://doi.org/10.48550/arXiv.1610.03098.

Miao N., Zhou H., Mou L., Yan R., Li L. CGMH: Constrained sentence generation by Metropolis–Hastings sampling. Proc. 33rd AAAI Conference on Artificial Intelligence (AAAI-19) (27 January – 1 February 2019, Honolulu, Hawaii, USA). Honolulu, 2019. Vol. 33, N 1. P. 6834–6842. https://doi.org/10.48550/arXiv.1811.10996.

Pavlick E., Rastogi P., Ganitkevitch J., Van Durme B., Callison-Burch C. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. Proc. 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (26-31 July 2015, Beijing, China). Beijing, 2015. Vol. 2, Short Papers. P. 425–430. https://doi.org/10.3115/v1/P15-2070.

Lewis M., Liu Y., Goyal N., Ghazvininejad M., Mohamed A., Levy O., Stoyanov V., Zettlemoyer L. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. Proc. 58th Annual Meeting of the Association for Computational Linguistics (July 2020, online event). Online event, 2020. P. 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703.

Tay Y., Dehghani M., Gupta J., Bahri D., Aribandi V., Qin Z., Metzler D. Are pre-trained convolutions better than pre-trained transformers? Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (August 2021, online event). Online event, 2021. Vol. 1: Long Papers. P. 4349–4359. URL: https://aclanthology.org/2021.acl-long.335.pdf .

Fu Y., Feng Y., Cunningham J.P. Paraphrase generation with latent bag of words. 2020. https://doi.org/10.48550/arXiv.2001.01941.

Krishna K., Wieting J., Iyyer M. Reformulating unsupervised style transfer as paraphrase generation. Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (16–20 November 2020, online event). Online event, 2020. https://doi.org/10.18653/v1/2020.emnlp-main.55.

Goyal T., Durrett G. Neural syntactic preordering for controlled paraphrase generation. Proc. 58th Annual Meeting of the Association for Computational Linguistics (5-10 July 2020, online event). Online event, 2020. https://doi.org/10.18653/v1/2020.acl-main.22.

Hosking T., Lapata M.. Factorising meaning and form for intent-preserving paraphrasing. Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (1-6 August 2021, virtual event). Virtual event, 2021. Vol. 1: Long Papers. P. 1405–1418. https://doi.org/10.18653/v1/2021.acl-long.112.

Fu Y., Tan C., Bi B., Chen M., Feng Y., Rush A. Latent template induction with Gumbel-CRFs. Proc. Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS2020) (6-12 December 2020, virtual event). Virtual event, 2020. https://doi.org/10.48550/arXiv.2011.14244.

UDC 004.89

SPECIALIZED PRE-TRAINING OF NEURAL NETWORKS ON SYNTHETIC DATA FOR IMPROVING PARAPHRASE GENERATION

REFERENCES

SPECIALIZED PRE-TRAINING OF NEURAL NETWORKS ON SYNTHETIC DATA
FOR IMPROVING PARAPHRASE GENERATION