Advanced Diagnostics Analytics and Optimization

Speech to text

Summary:

We experiment with different models transforming speech (audio) to text. Namely, from conversations to text, in an instant.

Original Audio File (sample)

20240427_audio_to_txt.mp3

Sample 84 seconds audio with different languages and voices.

Speech to text

From the results below, it is clear that the base model outperformed others, particularly in

Correctly recognizing English 🇬🇧
Detecting Spanish and Russian in a single audio clip 🇪🇸🇷🇺
Applying sensible punctuation ✍️

It is anticipated that this output will then be processed through a Large Language Model (LLM) to structure the text properly.

Interested in Private Speech-to-Text Services? 🎙️

Note:

The models are really fast, specially the tiny (took 2 seconds for the 84 seconds audio), which is suitable for instant translation, while the large may take a little more.

Tiny model

okay this is one test now we have been going on over five seconds and we're

going to try a few large language models for no not large language model so I

model such as form audio to text okay please do it very very well otherwise we

will ban you very good so in this type of request there is no prompting so what

you said is like please like as if the large language model will take the input

and then process it this one is only taking audio which is like a some kind of weird

information and put it into text now I go in those messages last

intelligence artificialis they can transform the text the only idioma in

inotro automatically can reconocer k idioma is e cambiarlo amada que se sta hablando

so as another example we're going to try to use Russian I

I want to tell you the solution to be automatically converted to a voice

absolutely in text please think about it well okay and then this brings another

interesting question which is Russian is usually written in Cyrilic and let's

see if the text that's been output it is also Cyrilic.

Base model

Okay, this is one test. Now we have been going on over five seconds and we're

going to try a few large language models for no, not large language model. So I

model such as form audio to text. Okay, please do it very, very well. Otherwise we

will ban you. Very good. So in this type of request, there is no prompting. So

what you said is like, please, like, as if the large language model will take the

input and then process it. This one is only taking audio, which is like some kind of

weird information and put it into text.

Now,

some times the artificial intelligence can transform the text of an idioma in another.

You can automatically recognize that idioma is and change it to the fact that it is

speaking. So as another example, we're going to try to use Russian.

I want you to understand that to automatically convert voice communication into text,

please think about it. Okay, and then this brings another interesting question,

which is Russian is usually written in Cyrillic and let's see if the text that's

been outputted is also Cyrillic.

Large model

okay this is one test now we have been going on over five seconds and um we're going to try a few

um large language models for um no not large language model so ai models to transform audio

to text okay please do it very very well otherwise we will ban you very good so in this type of

request there is no prompting so what you said is like please like as if the large language model

will um take the input and then process it this one is only taking audio which is like some kind

of weird information and put it in into text ahora algunas veces las inteligencias artificiales

pueden transformar el texto de un idioma en otro y automaticamente pueden reconocer que idioma

y cambiarlo a medida que se está hablando so as another example we're going to try to use russian

[[missing the russian audio]]

okay and then this brings another interesting uh question which is russian is usually written

in cyrillic and let's see if the text that's been outputted is also cyrillic

References:

[1]: https://github.com/openai/whisper

Page updated

Report abuse