Research & Development

Uzbek Language
Speech AI Technologies

Advancing the state of Uzbek language AI through open-source research, specialized datasets, and state-of-the-art model architectures.

Islomov Sardor

AI Researcher & Engineer

Mission Statement

I'm an AI researcher bridging the gap between software engineering and machine learning. My focus is on democratizing speech technologies for the Uzbek language ecosystem.

Currently, I am developing open-source STT (Speech-to-Text), TTS (Text-to-Speech), and NLP models including PII detection and text correction for privacy compliance and product-quality Uzbek text. By publishing models, datasets, and training methodologies, I aim to make high-quality AI accessible to everyone.

Model Registry

rubaiSTT-2v Medium

v2.0 Stable

A fine-tuned Whisper Medium model optimized for Uzbek language nuances. Trained on a diverse 500+ hour dataset including podcasts, news, and dialect-rich audio.

⚡ Architecture: Transformer (Whisper)

📊 Dataset: Mixed Source (Human + Pseudo)

🎯 Focus: Generalization & Dialects

View Model Specs

Rubai PII Detection v1.3

v1.3 Updated

BERT-based NER model for detecting Personal Identifiable Information (PII) in Uzbek text. Now with bank card detection! 96.1% F1 score on 475K+ samples.

⚡ Architecture: BERT Token Classification

📊 Dataset: 475K+ Samples (100 Domains)

🎯 Focus: Names, Phones, Dates, Addresses, IDs, Cards

View Model Specs

Rubai Corrector

3 Models Live

A ByT5-based Uzbek text correction family for transcript display normalization, OCR repair for old books, and community fine-tuning from a shared base checkpoint.

⚡ Architecture: ByT5 Seq2Seq

📊 Scope: Base + Transcript + OCR Books

🎯 Focus: Apostrophes, formatting, mixed scripts, OCR damage

View Model Specs

GapTTS-1v

In Development

Upcoming high-fidelity Text-to-Speech synthesis engine. Designed to generate natural-sounding Uzbek speech with proper intonation and prosody.

🔄 Status: Data Collection Complete

📅 Release: Q4 2025

🔓 License: Open Source

Mission Statement

Model Registry

rubaiSTT-2v Medium

Rubai PII Detection v1.3

Rubai Corrector

GapTTS-1v

Support Open Science