КіСА | Зміст

Том 60 >>> № 2 БЕРЕЗЕНЬ — КВІТЕНЬ 2024

-->

УДК 004.89

О.Г. СКУРЖАНСЬКИЙ
Київський національний університет імені Тараса Шевченка, Київ, Україна,
oleksandr.skurzhanskyi@gmail.com

О.О. МАРЧЕНКО
Київський національний університет імені Тараса Шевченка, Київ, Україна,
rozenkrans17@gmail.com

А.В. АНІСІМОВ
Київський національний університет імені Тараса Шевченка, Київ, Україна,
avatatan@gmail.com

СПЕЦІАЛІЗОВАНЕ ПОПЕРЕДНЄ НАВЧАННЯ НЕЙРОМЕРЕЖЕВИХ
МОДЕЛЕЙ НА СИНТЕТИЧНИХ ДАНИХ ДЛЯ ПОКРАЩЕННЯ
ГЕНЕРАЦІЇ ПЕРЕФРАЗУВАННЯ

Анотація. Генерація перефразувань є фундаментальною проблемою в галузі обробки природних мов. Завдяки значному успіху технології перенесення навчання підхід «попереднє навчання → точне налаштування» став стандартним. Однак популярні універсальні методики попереднього навчання зазвичай потребують величезних наборів даних та значних обчислювальних потужностей, а доступні навчені моделі обмежені фіксованою архітектурою та розміром. Запропоновано простий та ефективний підхід до попереднього навчання спеціально для генерації перефразувань, який помітно підвищує якість генерації перефразувань та забезпечує суттєве покращення моделей загального призначення. Використано як наявні публічні дані, так і нові, згенеровані великими мовними моделями. Досліджено, як ця процедура попереднього навчання впливає на нейронні мережі різної архітектури, та доведено, що вона працює ефективно для всіх архітектур.

Ключові слова: штучний інтелект, машинне навчання, нейронні мережі, генерація перефразування, попереднє навчання, точне налаштування.

повний текст

СПИСОК ЛІТЕРАТУРИ

Han X., Zhang Z., Ding N., Gu Y., Liu X., Huo Y., Qiu J., Yao Y., Zhang A., Zhang L., et al. Pre-trained models: past, present and future. AI Open. 2021. Vol. 2. P. 225–250. https://doi.org/10.1016/j.aiopen.2021.08.002.

Zhao W., Wang L., Shen K., Jia R., Liu J. Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data. https://doi.org/10.48550/arXiv.1903.00138.

Omelianchuk K., Atrasevych V., Chernodub A., Skurzhanskyi O. GECToR–Grammatical error correction: tag, not rewrite. arXiv:2005.12592v2 [cs.CL] 29 May 2020. https://doi.org/10.48550/arXiv.2005.12592.

Kasai J., Pappas N., Peng H., Cross J., Smith N.A. Deep encoder, shallow decoder: reevaluating non-autoregressive machine translation. 2020. arXiv:2006.10369v4 [cs.CL]. 24 Jun 2021. https://doi.org/10.48550/arXiv.2006.10369.

Wieting J., Gimpel K. ParaNMT-50M: pushing the limits of paraphrastic sentence embeddings with millions of machine translations. arXiv:1711.05732v2 [cs.CL] 20 Apr 2018. https://doi.org/10.48550/arXiv.1711.05732.

Ouyang L., Wu J., Jiang X., Almeida D. et al. Training language models to follow instructions with human feedback. arXiv:2203.02155v1 [cs.CL] 4 Mar 2022. https://doi.org/10.48550/arXiv.2203.02155.

Lin T.-Y., Maire M., Belongie S., Bourdev L., Girshick R., Hays J., Perona P., Ramanan D., Zitnick C.L., Dollїr P. Microsoft COCO: common objects in context. arXiv:1405.0312v3 [cs.CV] 21 Feb 2015. https://doi.org/ 10.48550/arXiv.1405.0312.

Wang Z., Hamza W., Florian R. Bilateral multi-perspective matching for natural language sentences. 2017. arXiv:1702.03814v3 [cs.AI] 14 Jul 2017. https://doi.org/10.48550/arXiv.1702.03814.

Papineni K., Roukos S., Ward T., Zhu W.-J. Bleu: a method for automatic evaluation of machine translation. Proc. 40th annual meeting on Association for Computational Linguistics (7–12 July 2002, Philadelphia, Pennsylvania, USA). Philadelphia, 2002. P. 311–318. https://doi.org/10.3115/1073083.1073135.

Snover M., Dorr B., Schwartz R., Micciulla L., Makhoul J. A study of translation edit rate with targeted human annotation. Proc. 7th Conference of the Association for Machine Translation in the Americas: Technical Papers (8–12 August 2006, Cambridge, Massachusetts, USA). Cambridge, 2006. P. 223–231. URL: https://aclanthology.org/2006.amta-papers.25.

Lavie A., Agarwal A. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. Proc. Second Workshop on Statistical Machine Translation (June 2007, Prague, Czech Republic). Prague, 2007. P. 228–231. URL: https://aclanthology.org/W07-0734.pdf .

Wubben S., Van Den Bosch A., Krahmer E. Paraphrase generation as monolingual translation: Data and evaluation. Proc. 6th International Natural Language Generation Conference (7–9 July, 2010, Trim, Co. Meath, Ireland). Trim, 2010. URL: https://aclanthology.org/W10-4223.pdf .

Post M. A call for clarity in reporting BLEU scores. Proc. the Third Conference on Machine Translation: Research Papers. (31 October – 1 November 2018, Brussels, Belgium). Brussels, 2018. https://doi.org/10.48550/arXiv.1804.08771.

Gehring J., Auli M., Grangier D., Yarats D., Dauphin Y.N. Convolutional sequence to sequence learning. Proc. 34th International Conference on Machine Learning (6–11 August 2017, Sydney NSW Australia). Sydney, 2017. PMLR. 2017. Vol. 70. P. 1243–1252. https://doi.org/10.48550/arXiv.1705.03122.

Hochreiter S., Schmidhuber J. Long short-term memory. Neural Computation. 1997. Vol. 9, Iss. 8. P. 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser ., Polosukhin I. Attention is all you need. Proc. 31st Conference on Neural Information Processing Systems (NIPS 2017) (4–9 December 2017, Long Beach, CA, USA). Long Beach, 2017. Advances in Neural Information Processing Systems. 2017. Vol. 30. P. 5998–6008. https://doi.org/10.48550/arXiv.1706.03762.

Fabre B., Urvoy T., Chevelu J., Lolive D. Neural-driven search-based paraphrase generation. Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. (19–23 April 2021, virtual event). P. 2100–2111. https://doi.org/10.18653/ v1/2021.eacl-main.180.

Prakash A., Hasan S.A, Lee K., Datla V., Qadir A., Liu J., Farri O. Neural paraphrase generation with stacked residual LSTM networks. 2016. https://doi.org/10.48550/arXiv.1610.03098.

Miao N., Zhou H., Mou L., Yan R., Li L. CGMH: Constrained sentence generation by Metropolis–Hastings sampling. Proc. 33rd AAAI Conference on Artificial Intelligence (AAAI-19) (27 January – 1 February 2019, Honolulu, Hawaii, USA). Honolulu, 2019. Vol. 33, N 1. P. 6834–6842. https://doi.org/10.48550/arXiv.1811.10996.

Pavlick E., Rastogi P., Ganitkevitch J., Van Durme B., Callison-Burch C. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. Proc. 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (26-31 July 2015, Beijing, China). Beijing, 2015. Vol. 2, Short Papers. P. 425–430. https://doi.org/10.3115/v1/P15-2070.

Lewis M., Liu Y., Goyal N., Ghazvininejad M., Mohamed A., Levy O., Stoyanov V., Zettlemoyer L. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. Proc. 58th Annual Meeting of the Association for Computational Linguistics (July 2020, online event). Online event, 2020. P. 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703.

Tay Y., Dehghani M., Gupta J., Bahri D., Aribandi V., Qin Z., Metzler D. Are pre-trained convolutions better than pre-trained transformers? Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (August 2021, online event). Online event, 2021. Vol. 1: Long Papers. P. 4349–4359. URL: https://aclanthology.org/2021.acl-long.335.pdf .

Fu Y., Feng Y., Cunningham J.P. Paraphrase generation with latent bag of words. 2020. https://doi.org/10.48550/arXiv.2001.01941.

Krishna K., Wieting J., Iyyer M. Reformulating unsupervised style transfer as paraphrase generation. Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (16–20 November 2020, online event). Online event, 2020. https://doi.org/10.18653/v1/2020.emnlp-main.55.

Goyal T., Durrett G. Neural syntactic preordering for controlled paraphrase generation. Proc. 58th Annual Meeting of the Association for Computational Linguistics (5-10 July 2020, online event). Online event, 2020. https://doi.org/10.18653/v1/2020.acl-main.22.

Hosking T., Lapata M.. Factorising meaning and form for intent-preserving paraphrasing. Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (1-6 August 2021, virtual event). Virtual event, 2021. Vol. 1: Long Papers. P. 1405–1418. https://doi.org/10.18653/v1/2021.acl-long.112.

Fu Y., Tan C., Bi B., Chen M., Feng Y., Rush A. Latent template induction with Gumbel-CRFs. Proc. Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS2020) (6-12 December 2020, virtual event). Virtual event, 2020. https://doi.org/10.48550/arXiv.2011.14244.

УДК 004.89

СПЕЦІАЛІЗОВАНЕ ПОПЕРЕДНЄ НАВЧАННЯ НЕЙРОМЕРЕЖЕВИХ МОДЕЛЕЙ НА СИНТЕТИЧНИХ ДАНИХ ДЛЯ ПОКРАЩЕННЯ ГЕНЕРАЦІЇ ПЕРЕФРАЗУВАННЯ

СПИСОК ЛІТЕРАТУРИ

СПЕЦІАЛІЗОВАНЕ ПОПЕРЕДНЄ НАВЧАННЯ НЕЙРОМЕРЕЖЕВИХ
МОДЕЛЕЙ НА СИНТЕТИЧНИХ ДАНИХ ДЛЯ ПОКРАЩЕННЯ
ГЕНЕРАЦІЇ ПЕРЕФРАЗУВАННЯ