FRAZEOLOGIK BIRLIKLARNI KORPUS UCHUN TEGLASH MASALASIDA ZAMONAVIY YONDASHUVLAR VA TADQIQOTLAR SHARHI
PDF

Keywords

O‘zbek tili milliy korpusi, frazeologik birliklar, tabiiy tilni qayta ishlash, lingvistik annotatsiya, avtomatik teglash, ko‘p komponentli birikmalar, semantik yaxlitlik.

Abstract

Mazkur maqola o‘zbek tili milliy korpusida frazeologik birliklarni  teglash va ularni avtomatik lingvistik annotatsiya qilishning nazariy-metodik asoslarini ishlab chiqishga bag‘ishlanadi. Frazeologik birliklarning semantik jihatdan yaxlitligi, shuningdek, tarkibining o‘zgaruvchanligi sabab NLP tizimlarida oddiy so‘z birikmalaridan ularni ajratish biroz murakkab. Tadqiqotda tizimli lisoniy tahlil, korpus metodlari va zamonaviy neyron tarmoq modellari imkoniyatlarini qiyoslash metodologiyasidan foydalanish orqali frazeologizmlarning elektron bazasini shakllantirishda xalqaro tajribalar va ularning o‘zbek tili korpusiga tatbiq etilish imkoniyatlari ko‘rib chiqiladi.

PDF

References

1. Abdurakhmonova N. Korpus lingvistikasi. – Toshkent, 2024. – 320 b.

2. Abdurakhmonova N., Ismailov A., Mengliev D. Developing NLP Tool for Linguistic Analysis of Turkic Languages // 2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON). – 2022. – Pp. 1–6.

3. Abdurakhmonova N., Mengliev D., Barakhnin V. Development of Intellectual Web System for Morph Analyzing of Uzbek Words // Applied Sciences. – 2021. – Vol. 11. – No. 19. – Pp. 9117.

4. Abdurakhmonova N., Shirinova R., Sayfullayeva R., Mengliev D., Ibragimov B., Ernazarova M. An annotated morphological dataset for Uzbek word forms: Towards rule-based and machine learning approaches // Data in Brief. – 2025. – Vol. 61. – Pp. 111702.

5. Abdurakhmonova N., Alisher I., Sayfulleyeva R. Morphological analyzer for the Uzbek language // 2022 7th International Conference on Computer Science and Engineering (UBMK). – 2022. – Pp. 61–66.

6. Agostini A., Usmanov T., Khamdamov U., Abdurakhmonova N., Mamasaidov M. UzWordNet: A lexical-semantic database for the Uzbek language // Proceedings of the 11th Global Wordnet Conference. – 2021. – Pp. 8–19.

7. Aslantaş G., Güngör T. A Unified Turkic Idiom Understanding Benchmark: Idiom Detection and Semantic Retrieval Across Five Turkic Languages // Proceedings of SIGTURK. – 2026. – Pp. 12–24.

8. Budiltseva M.B., Novikova N.S. Life of an idiom – defining the current corpus of phraseological units: experimental research experience // RUDN Journal of Language Studies, Semiotics and Semantics. – 2023. – Vol. 14. – No. 3. – Pp. 931–945.

9. Dobrovol’skij D.O. Korpusniy podkhod k issledovaniyu frazeologii: novie rezultati po dannim parallelnikh korpusov // Vestnik SPbGU. Yazik i literatura. – 2020. – T. 17. – Vip. 3. – Pp. 398–411.

10. Dobrovol’skij D.O. Kriterii semanticheskoy chlenimosti idiom // RUDN Journal of Language Studies, Semiotics and Semantics. – 2025. – Vol. 16. – No. 3. – Pp. 638–655.

11. Hakimov Sh. TurkicNLP: An NLP Toolkit for Turkic Languages // arXiv preprint arXiv:2602.19174. – 2026.

12. Ide Y., Tanner J., Nohejl A., Hoffman J., Vasselli J., Kamigaito H., Watanabe T. COAM: Corpus of All-Type Multiword Expressions // arXiv preprint arXiv:2412.18151. – 2024.

13. Jurafsky D., Martin J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. – Stanford University, 2025. – 640 p.

14. Nedumpozhimana V., Klubička F., Kelleher J.D. Shapley Idioms: Analysing BERT Sentence Embeddings for General Idiom Token Identification // Frontiers in Artificial Intelligence. – 2022. – Vol. 5. – Pp. 813967.

15. Rahmatullayev Sh. Nutqimiz ko‘rki. – Toshkent: Fan, 1970. – 60 b.

16. Rahmatullayev Sh. O‘zbek tilining izohli frazeologik lug‘ati. – Toshkent: O‘qituvchi, 1978. – 408 b.

17. Tanner J., Hoffman J. MWE as WSD: Solving Multiword Expression Identification with Word Sense Disambiguation // arXiv preprint arXiv:2303.06623. – 2023.

18. Yayavaram A., Yayavaram S., Upadhyay P., Das A. BERT-based Idiom Identification using Language Translation and Word Cohesion // Proceedings of MWE-2024. – 2024. – Pp. 110–118.

19. Yodgorov U.S. Formation of a database of phraseological units based on the corpus of the Uzbek language // American Journal of Philological Sciences. – 2025. – Vol. 5. – No. 3. – Pp. 148–151.

20. Zaitova I., Hirak V., Abdullah B.M., Klakow D., Möbius B., Avgustinova T. Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax // arXiv preprint arXiv:2505.06062. – 2025.