Research & Development

Uzbek Language
Speech AI Technologies

Advancing the state of Uzbek language AI through open-source research, specialized datasets, and state-of-the-art model architectures.

Islomov Sardor
Islomov Sardor
AI Researcher & Engineer

Mission Statement

I'm an AI researcher bridging the gap between software engineering and machine learning. My focus is on democratizing speech technologies for the Uzbek language ecosystem.

Currently, I am developing open-source STT (Speech-to-Text), TTS (Text-to-Speech), and NLP models including PII detection for privacy compliance. By publishing models, datasets, and training methodologies, I aim to make high-quality AI accessible to everyone.

Model Registry

rubaiSTT-2v Medium

v2.0 Stable

A fine-tuned Whisper Medium model optimized for Uzbek language nuances. Trained on a diverse 500+ hour dataset including podcasts, news, and dialect-rich audio.

Architecture: Transformer (Whisper)
📊 Dataset: Mixed Source (Human + Pseudo)
🎯 Focus: Generalization & Dialects
View Model Specs

Rubai PII Detection v1.3

v1.3 Updated

BERT-based NER model for detecting Personal Identifiable Information (PII) in Uzbek text. Now with bank card detection! 96.1% F1 score on 475K+ samples.

Architecture: BERT Token Classification
📊 Dataset: 475K+ Samples (100 Domains)
🎯 Focus: Names, Phones, Dates, Addresses, IDs, Cards
View Model Specs

GapTTS-1v

In Development

Upcoming high-fidelity Text-to-Speech synthesis engine. Designed to generate natural-sounding Uzbek speech with proper intonation and prosody.

🔄 Status: Data Collection Complete
📅 Release: Q4 2025
🔓 License: Open Source