Uzbek language AI speech technologies

Author's photo
Islomov Sardor
AI Enthusiast & Software Engineer | Click JSC
in Connect on LinkedIn

AI Enthusiast

Hey there! 👋 I'm an AI researcher with a background spanning software engineering, information security, and machine learning. While AI started as a hobby, it's grown into my passion project now that I've found time to return to the machine learning industry.

I'm currently working on Uzbek language speech technologies, specifically STT/TTS models. For now, I've decided to take the open source route, publishing my work right here and on my Hugging Face account.

My goal? To contribute to Uzbekistan's emerging AI landscape. Because sometimes the most meaningful innovations start with a passion project!

Models

I'm developing a suite of speech AI models tailored specifically for Uzbek language. Here are the current and upcoming models:

NavaiSTT-1v Medium

Available Now

A medium-sized speech recognition model based on Whisper Medium, fine-tuned specifically for the Uzbek language. This model was trained on a diverse dataset including publicly available podcasts, audiobooks, and the Common Voice dataset for Uzbek. The model performs particularly well with the Tashkent dialect, especially when processing podcast content recorded in this dialect. As part of my commitment to open science, this model is fully open source.

View Details

NavaiSTT-1v Large

Coming Soon

A large-sized speech recognition model based on Whisper Large-v3, fine-tuned specifically for the Uzbek language. This model was trained on an extensive dataset comprising publicly available news broadcasts, podcasts, audiobooks, and the Common Voice dataset for Uzbek. The model excels at handling everyday speech, performs robustly with noisy audio inputs, and is particularly effective with the Tashkent dialect, especially when processing podcast content. This model is fully open source.

GapTTS-1v

Later This Year

GapTTS-1v is my upcoming Text-to-Speech project for the Uzbek language. While I have a clear vision and have gathered the necessary data for training, development will begin after I complete the Navai STT Large model. I'm planning to make GapTTS-1v open source upon completion, bringing natural Uzbek speech synthesis to the AI community.