论文标题

乌兹别克(Uzbek)字母之间的机器音译工具

A machine transliteration tool between Uzbek alphabets

论文作者

Salaev, Ulugbek, Kuriyozov, Elmurod, Gómez-Rodríguez, Carlos

论文摘要

如本文所定义的那样,机器音译是一个将单词的书面脚本从源字母转换为同一语言中另一个目标字母的单词的过程,同时保留其含义以及发音。本文的主要目的是在低资源的乌兹别克语语言中使用的三个常见脚本之间介绍机器音译工具:旧的西里尔(Cyrillic),目前是官方拉丁语和新宣布的新拉丁字母。该工具是使用基于规则和微调方法的组合创建的。创建的工具可作为开源Python软件包以及包括公共API的基于Web的应用程序。据我们所知,这是第一个支持乌兹别克语语言的新宣布的拉丁字母的机器音译工具。

Machine transliteration, as defined in this paper, is a process of automatically transforming written script of words from a source alphabet into words of another target alphabet within the same language, while preserving their meaning, as well as pronunciation. The main goal of this paper is to present a machine transliteration tool between three common scripts used in low-resource Uzbek language: the old Cyrillic, currently official Latin, and newly announced New Latin alphabets. The tool has been created using a combination of rule-based and fine-tuning approaches. The created tool is available as an open-source Python package, as well as a web-based application including a public API. To our knowledge, this is the first machine transliteration tool that supports the newly announced Latin alphabet of the Uzbek language.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源