Build simple Apps that can convert text-to-speech and speech-to-text but in c#

As a developer back in 2017 I always wonder it will be nice to write Machine learning code in c# .Net framework to show my manager that i know enough to become Team Lead but past is past and i left that productive company most of the company manager’s in the world are same full with dull insights as they tries to bring people down and demotivate them from their goal as they didn’t get their anyways the other day i was searching memes in the internet and all of a sudden one of the website gives me two HD videos each of 45 mins that made my day as now as a c# developer i can show machine learning concepts written in c#.

so without further ado lets start :

All thanks to machine learning API created by Google now i can directly use high accurate machine learning model into my simple dotnet app for fun

Google Cloud Speech-to-Text API enables developers to convert audio to text in 120 languages and variants, by applying powerful neural network models in an easy to use API.

I am using cloud shell which is nothing but a linux vm with pre-installed softwares if you like to install software in your computer to make your computer a garbage download the GCP sdk instruction given in their page .

Now lets open shell/console after selecting the project and user

Choose an account
Start Cloud Shell

first to use the API you have to activate the API so type the command in console that let you play with speech to text

gcloud services enable speech.googleapis.com

Authenticate API requests

In order to make requests to the Speech-to-Text API, you need to use a Service Account. A Service Account belongs to your project and it is used by the Google Client C# library to make Speech-to-Text API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the *Cloud SDK *to create a service account and then create credentials you will need to authenticate as the service account.

First, set an environment variable with your PROJECT_ID which you will use throughout this lab:

export GOOGLE_CLOUD_PROJECT=$(gcloud config get-value core/project)

Next, create a new service account to access the Vision API:

gcloud iam service-accounts create my-speech-to-text-sa \
  --display-name "my speech-to-text lab service account"

Then create credentials that your C# code will use to log in as your new service account. Create these credentials and save it as a JSON file “~/key.json” by using the following command:

gcloud iam service-accounts keys create ~/key.json \
  --iam-account my-speech-to-text-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com

Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable. This is used by the Vision API C# library, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created:

export GOOGLE_APPLICATION_CREDENTIALS="/home/${USER}/key.json"

You can read more about authenticating the Speech-to-Text API.

Install the Google Cloud Speech-to-Text API client library for C#

First, create a simple C# console application that you will use to run Vision API samples.

dotnet new console -n SpeechToTextApiDemo

You should see the application created and dependencies resolved:

The template "Console Application" was created successfully.
Processing post-creation actions...
...
Restore succeeded.

Next, navigate to VisionApiDemo folder and add Google.Cloud.Vision.V1 NuGet package to the project:

cd SpeechToTextApiDemo/

And add Google.Cloud.Language.V1 NuGet package to the project:

dotnet add package Google.Cloud.Speech.V1

(Output)

info : Adding PackageReference for package 'Google.Cloud.Speech.V1' into project '/home/atameldev/SpeechToTextApiDemo/SpeechToTextApiDemo.csproj'.
log  : Restoring packages for /home/atameldev/SpeechToTextApiDemo/SpeechToTextApiDemo.csproj...
...
info : PackageReference for package 'Google.Cloud.Speech.V1' version '1.0.1' added to file '/home/atameldev/SpeechToTextApiDemo/SpeechToTextApiDemo.csproj'.

Now you’re ready to use the Speech-to-Text API!

Transcribe audio files

In this section you will transcribe a pre-recorded audio file in English. The audio file is available on Google Cloud Storage.

Note: We are using a pre-recorded file that’s available on Google Cloud Storage: gs://cloud-samples-tests/speech/brooklyn.flac. You can listen to this file before sending it to the Speech-to-Text API here.

Open the code editor from the top right side of the Cloud Shell:

ba731110a97f468f.png

If you cannot see the icon, close the Navigation menu in the upper left corner.

Navigate to the Program.cs file inside the SpeechToTextApiDemo folder and replace the code with the following:

using Google.Cloud.Speech.V1;
using System;

namespace SpeechToTextApiDemo
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var speech = SpeechClient.Create();
            var config = new RecognitionConfig
            {
                Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
                SampleRateHertz = 16000,
                LanguageCode = LanguageCodes.English.UnitedStates
            };
            var audio = RecognitionAudio.FromStorageUri("gs://cloud-samples-tests/speech/brooklyn.flac");

            var response = speech.Recognize(config, audio);

            foreach (var result in response.Results)
            {
                foreach (var alternative in result.Alternatives)
                {
                    Console.WriteLine(alternative.Transcript);
                }
            }
        }
    }
}

Take a minute or two to study the code and see how it is used to transcribe an audio file.

The Encoding parameter tells the API which type of audio encoding you’re using for the audio file. Flac is the encoding type for .raw files (see the doc for encoding type for more details).

In the RecognitionAudio object, you can pass the API either the uri of our audio file in Cloud Storage or the local file path for the audio file. Here, we’re using a Cloud Storage uri.

Back in Cloud Shell, run the app. You should see the following output:

dotnet run

You should see the following output:

how old is the Brooklyn Bridge

Note: If this C# code does not work for you, verify the instructions you performed during Authenticate API requests step.

Using the following command to verify the value of GOOGLE_APPLICATION_CREDENTIALS environment variable:

echo GOOGLE_APPLICATION_CREDENTIALS

It should output the value "~/key.json".

If it does, next check that a service account was created and is located at "~/key.json" by using:

cat "~/key.json"

You should see something similar to:

{

"type": "service_account",

"project_id": "PROJECT_ID",

"private_key_id": "ff31939192529e07f42e4535fb20bb029def1276",

"Private_key":...

If you don’t, revisit the Authenticate API requests step.

In this step, you were able to transcribe an audio file in English and print out the result. Read more about Transcribing.

Transcribe with word timestamps

Speech-to-Text can detect time offset (timestamp) for the transcribed audio. Time offsets show the beginning and end of each spoken word in the supplied audio. A time offset value represents the amount of time that has elapsed from the beginning of the audio, in increments of 100ms.

To transcribe an audio file with time offsets, navigate to the Program.cs file inside the SpeechToTextApiDemofolder and update the code with the following:

using Google.Cloud.Speech.V1;
using System;

namespace SpeechToTextApiDemo
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var speech = SpeechClient.Create();
            var config = new RecognitionConfig
            {
                Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
                SampleRateHertz = 16000,
                LanguageCode = LanguageCodes.English.UnitedStates,
                EnableWordTimeOffsets = true
            };
            var audio = RecognitionAudio.FromStorageUri("gs://cloud-samples-tests/speech/brooklyn.flac");

            var response = speech.Recognize(config, audio);

            foreach (var result in response.Results)
            {
                foreach (var alternative in result.Alternatives)
                {
                    Console.WriteLine($"Transcript: { alternative.Transcript}");
                    Console.WriteLine("Word details:");
                    Console.WriteLine($" Word count:{alternative.Words.Count}");
                    foreach (var item in alternative.Words)
                    {
                        Console.WriteLine($"  {item.Word}");
                        Console.WriteLine($"    WordStartTime: {item.StartTime}");
                        Console.WriteLine($"    WordEndTime: {item.EndTime}");
                    }
                }
            }
        }
    }
}

Take a minute or two to study the code and see how it is used to transcribe an audio file with word timestamps*.* The EnableWordTimeOffsets parameter tells the API to enable time offsets (see the doc for more details).

Back in Cloud Shell, run the app:

dotnet run

You should see the following output:

Transcript: how old is the Brooklyn Bridge
Word details:
 Word count:6
  how
    WordStartTime: "0s"
    WordEndTime: "0.300s"
  old
    WordStartTime: "0.300s"
    WordEndTime: "0.600s"
  is
    WordStartTime: "0.600s"
    WordEndTime: "0.800s"
  the
    WordStartTime: "0.800s"
    WordEndTime: "0.900s"
  Brooklyn
    WordStartTime: "0.900s"
    WordEndTime: "1.100s"
  Bridge
    WordStartTime: "1.100s"
    WordEndTime: "1.500s"

In this step, you were able to transcribe an audio file in English with word timestamps and print out the result. Read more about Transcribing with word offsets.

Transcribe different languages

Speech-to-Text API supports transcription in over 100 languages! You can find a list of supported languages here.

In this section, you will transcribe a pre-recorded audio file in French. The audio file is available on Google Cloud Storage.

Note: We are using a pre-recorded file that’s available on Google Cloud Storage: gs://speech-language-samples/fr-sample.flac. You can listen to this file before sending it to the Speech-to-Text API here.

To transcribe the French audio file, navigate to the Program.cs file inside the SpeechToTextApiDemo folder and replace the code with the following:

using Google.Cloud.Speech.V1;
using System;

namespace SpeechToTextApiDemo
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var speech = SpeechClient.Create();
            var config = new RecognitionConfig
            {
                Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
                LanguageCode = LanguageCodes.French.France
            };
            var audio = RecognitionAudio.FromStorageUri("gs://speech-language-samples/fr-sample.flac");

            var response = speech.Recognize(config, audio);

            foreach (var result in response.Results)
            {
                foreach (var alternative in result.Alternatives)
                {
                    Console.WriteLine(alternative.Transcript);
                }
            }
        }
    }
}

Take a minute or two to study the code and see how it is used to transcribe an audio file*.* The LanguageCode parameter tells the API what language the audio recording is in.

Back in Cloud Shell, run the app:

dotnet run

You should see the following output:

maître corbeau sur un arbre perché tenait en son bec un fromage

This is a sentence from a popular French children’s tale.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s