Data Science by ODS.ai 🦜

Forwarded from Spark in me (Alexander)

Russian Text Normalization for Speech Recognition

Usually no one talks about this, but STT / TTS technologies contain many "small" tasks that have to be solved, to make your STT / TTS pipeline work in real life.

For example:

- Speech recognition / dataset itself;
- Post-processing - beam-search / decoding;
- Domain customizations;
- Normalization (5 => пять);
- De-Normalization (пять => 5);

We want the Imagenet moment to arrive sooner in Speech in general.
So we released the Open STT dataset.
This time we have decided to share our text normalization to support STT research in Russian.

Please like / share / repost:

- Original publication
- Habr.com article
- GitHub repository
- Medium (coming soon!)
- Support dataset on Open Collective

#stt
#deep_learning
#nlp

GitHub

GitHub - snakers4/open_stt: Open STT

Open STT. Contribute to snakers4/open_stt development by creating an account on GitHub.

730 views13:19

About

Blog

Apps

Platform