What is the Vision API and what can it do?
The vision API is an API that uses machine learning and other Google services to extract information from images.
The sorts of predictions that it can currently make include but are not limited to the following list:
- Label Detection, which is used to detect the presence of certain broad classes of objects within images
- Text Detection, which can be used to extract text from images, a process that also is referred to as OCR
- Safe Search Detection, which can be used to check if an image is safe to serve
- Landmark Detection, which can be used to detect common geographic landmarks in an image
- Logo Detection, which detects the presence and location of common corporate logos
- Web Detection, which detects topical entities such as news, events, or celebrities within the image, and find similar images on the web using the power of Google Image Search
- Object Localization, which detects the presence of multiple objects and their locations within the image
Making predictions with the Vision API
It’s easy for developers to use the Vision API to make predictions within their applications. This lab does not cover how to make predictions from code, but there are many self-paced labs available on Qwiklabs.com that do cover this material.
In this lab, we will make some one-off predictions without writing any code to see what the API is capable of.
- In a separate window, navigate to https://cloud.google.com/vision/ and scroll down to the box labeled Try the API.
- To use this feature, you’ll need to upload an image from your computer. First, find a picture of some cirrus clouds and download it to your local machine. Once you’ve downloaded it, click inside the box labeled “Drag image file here or Browse from your computer”. In the modal window that pops up, select the image file you’ve just downloaded and click Open.
- The web page will display the results of a number of different API calls, including label detection, web detection and safe search detection. The first tab in the table holds the results of label detection. Note the top classes. Is “cirrus cloud” among them? It shouldn’t be. Depending on the image you chose, “cloud” will be, along with a number of other high-level nouns. This is because even though the Vision API knows about a large number of classes, it doesn’t distinguish between every sort of object in the world.
In real-world implementations, calls to the API would be made programmatically. But you should now understand what each one of those calls can accomplish and that the Vision API is powerful but limited to the set of high-level classes that it was trained with.
Using AutoML Vision
Setup AutoML Vision
AutoML Vision enables you to train custom machine learning models capable of making predictions to classify your images according to your own defined labels. In this section, you upload images of clouds to Cloud Storage and use them to train a custom model to recognize different types of clouds (cumulus, cumulonimbus, etc.).
- In your GCP Console, click on the Navigation menu ( ), click on Artifical Intelligence > Vision.
- Navigate to Dashboard, and click on Get started for AutoML Vision > Image Classification.
- Select the GCP account created by qwiklabs and Allow AutoML access:
- Choose the correct GCP project created by qwiklabs, if required and click Continue.
- Now set up the necessary APIs and service accounts by clicking on SET UP NOW.
- Navigate to the AutoML Vision UI and click + New Dataset.
- Type clouds for the Dataset name and leave Single-Label Classification option checked. Click Create Dataset.
In your own projects, you may want to check the box for Multi-Label Classification if you want to assign multiple labels per image.
- Choose Select a CSV file on Cloud Storage in the IMPORT column and browse the
data.csvfrom your bucket. It should then read something like:
- Click Continue.
It will take around 2 minutes for your images to finish importing. Once the import has completed, you’ll be brought to a page with all the images in your dataset.
- After the import completes, navigate to Images tab.
- Try filtering by different labels (i.e. click cumulus) to review the training images:
- Note: If you were building a production model, you’d want at least 100 images per label to ensure high accuracy. This is just a demo so we only used 20 images so that our model will train quickly.
- If any image is labeled incorrectly you can click on them to switch the label or delete the image from your training set:
- To see a summary of how many images you have for each label, click on Label stats. You should see the following show up on the left side of your browser.Note: If you are working with a dataset that isn’t already labeled, AutoML Vision provides an in-house human labeling service.
Train your model
You’re ready to start training your model! AutoML Vision handles this for you automatically, without requiring you to write any of the model code.
- To train your clouds model, go to the Train tab and click Start Training.
- Enter a name for your model, or use the default auto-generated name, and accept the default value for Define your model. Click Start Training.Since this is a small dataset, it will only take around 5 minutes to complete.
Evaluate your model
- In the Evaluate tab, you’ll see information about precision and recall, two metrics that were introduced in the videos. Because ultimately, the model will output a Score that can be interpreted as its confidence that the input belongs to a certain label, you can control the threshold that divides one prediction from another. By changing the score threshold, you change the minimum level of confidence that the model needs in order to decide upon a class. Moving the Score threshold down will increase recall but decrease precision. Moving it up will increase precision at the expense of recall. By default, the Score threshold is set to .5.
- You can also play around with Score threshold:
- Finally, scroll down to take a look at the Confusion matrix.All of this provides some common machine learning metrics to evaluate your model accuracy and see where you can improve your training data. Since the focus for this lab was not on accuracy, skip to the prediction section, but feel free to browse the accuracy metrics on your own.
- The confusion matrix provides a great summary of the overall pattern of predictions but sometimes, in an effort to understand why the model made certain mistakes, it’s useful to look at individual mistaken predictions. To do so, click on a class, for example the cirrus class.
- Each class will have its own unique pattern of successes and failures. These events will be categorized in one of four ways with respect to the currently selected class:
- True positives are correctly classified instances of the selected class
- True negatives are correctly classified instances that are not the selected class
- False positives are images incorrectly classified as the selected class
- False negatives are images incorrectly classified as another class that belong to the selected class
Now it’s time for the most important part: generating predictions on your trained model using data it hasn’t seen before.
- Navigate to the Predict tab in the AutoML UI:There are a few ways to generate predictions. In this lab, you’ll use the UI to upload images. You’ll see how your model does classifying these two images (the first is a cirrus cloud, the second is a cumulonimbus).
- Download these images by right-clicking on each of them:
- Return to the UI, select Upload Images and upload them to the online prediction UI. When the prediction request completes you should see something like the following:Pretty cool – the model classified each type of cloud correctly! Does your trained model do better than the 57% CIRRUS cloud above?Note: In addition to generating predictions in the AutoML UI, you can also use the REST API or the Python client to make prediction requests on your trained model.