O'zbek tilida tabiiy tilni qayta ishlashdagi morfologik, sintaktik va semantik tahlil metodlari

Botir Elov; Oqila; Mastura

Авторы

Botir Elov
Oqila filologiya fanlari bo‘yicha falsafa doktori (PhD), Alisher Navoiy nomidagi Toshkent davlat o‘zbek tili va adabiyoti universiteti doktoranti.
Mastura Alisher Navoiy nomidagi Toshkent davlat o‘zbek tili va adabiyoti universiteti Kompyuter lingvistikasi va raqamli texnologiyalar kafedrasi o‘qituvchisi.

Ключевые слова:

POS tagging, parser, syntactic parsing, semantic parsing, sentiment analysis, natural language processing

Аннотация

This article discusses the methods of morphological, syntactic,
and semantic analysis used in natural language processing for the
Uzbek language. The linguistic features of Uzbek – complex agglutinative
morphology, free word order, and limited resources – necessitate a specialized
approach and research in applying these methods [Senuma, Aizawa
2017, 100-109]. Within the framework of this study, morphological analysis
methods, followed by syntactic and semantic analysis methods, were
examined based on scientific sources. Each section presents the existing
advantages and disadvantages, experiences in applying these methods
to the Uzbek language, as well as comparative analyses with foreign
languages. For morphological analysis in Uzbek, rule-based methods, statistical
models (HMM, CRF, etc.), and neural network-based approaches
(BiLSTM-CRF, seq2seq) are discussed, with results provided in examples
and percentages. It is demonstrated that syntactic parsing is carried out
using dependency and constituency parsing methods. The issue of constructing
a UD treebank for the Uzbek language, which follows the SOV
word order, has been examined. The impact of complex morphological
structure and free word order in sentences on parser construction is highlighted.
As a result of the studied approaches, the issue of building hybrid parsers, integrating them with morphological analysis, and feeding grammatical
categories of words into the parser has been raised. Additionally,
the development of neural constituency parsers based on neural networks
and the effectiveness of their results were analyzed. For the subsequent
analysis stage related to NLP, particularly in semantic and sentiment analysis,
models ranging from Word2Vec and FastText, which represent word
meanings in vector form, to context-adapted transformer models such as
BERT, mBERT, and UzBERT were also discussed. The text also considers
issues of high-level semantic role labeling and the creation of semantic
networks such as WordNet and FrameNet. In evaluating NLP approaches
for the Uzbek language, Uzbek is compared with other languages: English,
Turkish, and Russian. The application of rule-based, statistical, and neural
methods in these languages is examined, and the results of the analysis are
presented

O'zbek tilida tabiiy tilni qayta ishlashdagi morfologik, sintaktik va semantik tahlil metodlari

Авторы

Ключевые слова:

Аннотация

Загрузки

Опубликован

Выпуск

Раздел

Язык