Speech to text
Summary:
We experiment with different models transforming speech (audio) to text. Namely, from conversations to text, in an instant.
We experiment with different models transforming speech (audio) to text. Namely, from conversations to text, in an instant.
Sample 84 seconds audio with different languages and voices.
From the results below, it is clear that the base model outperformed others, particularly in
Correctly recognizing English 🇬🇧
Detecting Spanish and Russian in a single audio clip 🇪🇸🇷🇺
Applying sensible punctuation ✍️
It is anticipated that this output will then be processed through a Large Language Model (LLM) to structure the text properly.
Interested in Private Speech-to-Text Services? 🎙️
Contact us for a bespoke solution! 📩 contact@adao.tech
Note:
The models are really fast, specially the tiny (took 2 seconds for the 84 seconds audio), which is suitable for instant translation, while the large may take a little more.
okay this is one test now we have been going on over five seconds and we're
going to try a few large language models for no not large language model so I
model such as form audio to text okay please do it very very well otherwise we
will ban you very good so in this type of request there is no prompting so what
you said is like please like as if the large language model will take the input
and then process it this one is only taking audio which is like a some kind of weird
information and put it into text now I go in those messages last
intelligence artificialis they can transform the text the only idioma in
inotro automatically can reconocer k idioma is e cambiarlo amada que se sta hablando
so as another example we're going to try to use Russian I
I want to tell you the solution to be automatically converted to a voice
absolutely in text please think about it well okay and then this brings another
interesting question which is Russian is usually written in Cyrilic and let's
see if the text that's been output it is also Cyrilic.
Okay, this is one test. Now we have been going on over five seconds and we're
going to try a few large language models for no, not large language model. So I
model such as form audio to text. Okay, please do it very, very well. Otherwise we
will ban you. Very good. So in this type of request, there is no prompting. So
what you said is like, please, like, as if the large language model will take the
input and then process it. This one is only taking audio, which is like some kind of
weird information and put it into text.
Now,
some times the artificial intelligence can transform the text of an idioma in another.
You can automatically recognize that idioma is and change it to the fact that it is
speaking. So as another example, we're going to try to use Russian.
I want you to understand that to automatically convert voice communication into text,
please think about it. Okay, and then this brings another interesting question,
which is Russian is usually written in Cyrillic and let's see if the text that's
been outputted is also Cyrillic.
okay this is one test now we have been going on over five seconds and um we're going to try a few
um large language models for um no not large language model so ai models to transform audio
to text okay please do it very very well otherwise we will ban you very good so in this type of
request there is no prompting so what you said is like please like as if the large language model
will um take the input and then process it this one is only taking audio which is like some kind
of weird information and put it in into text ahora algunas veces las inteligencias artificiales
pueden transformar el texto de un idioma en otro y automaticamente pueden reconocer que idioma
y cambiarlo a medida que se está hablando so as another example we're going to try to use russian
[[missing the russian audio]]
okay and then this brings another interesting uh question which is russian is usually written
in cyrillic and let's see if the text that's been outputted is also cyrillic
[1]: https://github.com/openai/whisper