Building an IoT Analytics Pipeline on Google Cloud Platform step by step

let’s start with the definition of IoT: The term Internet of Things (IoT) refers to the interconnection of physical devices with the global Internet. These devices are equipped with sensors and networking hardware, and each is globally identifiable. Taken together, these capabilities afford rich data about items in the physical world.

The internet of things, or IoT, is a system of interrelated computing devices, mechanical and digital machines, objects, animals or people that are provided with unique identifiers ( UIDs ) and the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction.

Why does IoT matter?: IoT provides businesses and people better insight into and control over the 99 percent of objects and environments that remain beyond the reach of the internet. And by doing so, IoT allows businesses and people to be more connected to the world around them and to do more meaningful, higher-level work.

Secure device connection and management


A fully managed service to easily and securely connect, manage, and ingest data from globally dispersed devices..

  • Secure device connection and management.
  • Make informed decisions at global scale.
  • Securely connect your existing device network
  • Establish two-way communication with your devices
  • Zero touch device provisioning to Cloud IoT Core

in short Cloud IoT Core is a fully managed service that allows you to easily and securely connect, manage, and ingest data from millions of globally dispersed devices. The service connects IoT devices that use the standard Message Queue Telemetry Transport (MQTT) protocol to other Google Cloud Platform data services.

Cloud IoT Core has two main components:

  • device manager for registering devices with the service, so you can then monitor and configure them.
  • protocol bridge that supports MQTT, which devices can use to connect to the Google Cloud Platform.

What We Will Do:

We will dive into GCP’s IoT services and complete the following tasks:

  • Connect and manage MQTT-based devices using Cloud IoT Core (we will use simulated devices.)
  • Ingest a stream of information from Cloud IoT Core using Cloud Pub/Sub.
  • Process the IoT data using Cloud Dataflow.
  • Analyze the IoT data using BigQuery.

what you required to follow…………..A browser.

log in into google cloud console

Choose an account
  1. In the GCP Console, click Navigation menu > APIs & services.

scroll down in the list of enabled APIs, and confirm that these APIs are enabled:

  • Cloud IoT API
  • Cloud Pub/Sub API
  • Dataflow API

If one or more API is not enabled, click the ENABLE APIS AND SERVICES button at the top. Search for the APIs by name and enable each API for your current project.

Create a Cloud Pub/Sub topic

wait what pub and sub let’s understand the words Data ingestion is the foundation for analytics and machine learning, whether you are building stream, batch, or unified pipelines. Cloud Pub/Sub provides a simple and reliable staging location for your event data on its journey towards processing, storage, and analysis.

Cloud Pub/Sub is an asynchronous global messaging service. By decoupling senders and receivers, it allows for secure and highly available communication between independently written applications. Cloud Pub/Sub delivers low-latency, durable messaging.

In Cloud Pub/Sub, publisher applications and subscriber applications connect with one another through the use of a shared string called a topic. A publisher application creates and sends messages to a topic. Subscriber applications create a subscription to a topic to receive messages from it.

In an IoT solution built with Cloud IoT Core, device telemetry data is forwarded to a Cloud Pub/Sub topic.

To define a new Cloud Pub/Sub topic:

  1. In the GCP Console, go to Navigation menu > Pub/Sub > Topics.
  2. Click Create a Topic. The Create a topic dialog shows you a partial URL path.
  3. Add this string as your topic name:

click create

In the list of topics, you will see a new topic whose partial URL ends in iotlab. Click the three-dot icon at the right edge of its row to open the context menu. Choose Permissions.

In the Permissions dialogue, click Add members and copy the below member as New members:

From the Role menu, give the new member the Pub/Sub > Pub/Sub Publisher role.

Click Save.

Create a BigQuery dataset

I have discussed bigquery in my blog many times

  1. In the GCP Console, go to Navigation menu > BigQuery.
  2. Click Done.
  • Click on your GCP Project ID from the left-hand menu:
  1. On the right-hand side of the console underneath the query editor click CREATE DATASET.
  2. Name the dataset iotlabdataset, leave all the other fields the way they are, and click Create dataset.
  3. You should see your newly created dataset under your project:

then click on the dataset and click in the create table and do as follows:


Create a cloud storage bucket

Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.

For this lab Cloud Storage will provide working space for your Cloud Dataflow pipeline.

  1. In the GCP Console, go to Navigation menu > Storage.
  3. For Name, use your GCP project ID then add “-bucket”.
  4. For Default storage class, click Multi-regional if it is not already selected.
  5. For Location, choose the selection closest to you.
  6. create

Set up a Cloud Dataflow Pipeline

Cloud Dataflow is a serverless way to carry out data analysis. In this lab, you will set up a streaming data pipeline to read sensor data from Pub/Sub, compute the maximum temperature within a time window, and write this out to BigQuery.

  1. In the GCP Console, go to Navigation menu > Dataflow.
  2. In the top menu bar, click CREATE JOB FROM TEMPLATE.
  3. In the job-creation dialog, for Job name, enter “iotlabflow”.
  4. For Cloud Dataflow template, choose Cloud PubSub Topic to BigQuery. When you choose this template, the form updates to review new fields below.
  5. For Cloud Dataflow Regional Endpoint, choose the region as us-central1.
  6. For Cloud Pub/Sub input topic, enter projects/ followed by your GCP project ID then add /topics/iotlab. The resulting string will look like this: projects/your_project_id/topics/iotlab
  7. The BigQuery output table takes the form of GCP project ID:dataset.table (:iotlabdataset.sensordata). The resulting string will look like this:  your_project_id:iotlabdataset.sensordata
  8. For Temporary location, enter gs:// followed by your GCS bucket name (should be your GCP project ID if you followed the instructions) then /tmp/. The resulting string will look like this: gs:// your_project_id -bucket/tmp/
  9. Click Optional parameters.
  10. For Max workers, enter 2.
  11. For Machine type, enter n1-standard-1.
  12. Click Run job.

A new streaming job is started. You can now see a visual representation of the data pipeline.

Prepare your compute engine VM

snap of vm

Click the SSH drop-down arrow and select Open in browser window.

In your SSH session on the iot-device-simulator VM instance, enter this command to remove the default Google Cloud Platform SDK installation. (In subsequent steps, you will install the latest version, including the beta component.)

sudo apt-get remove google-cloud-sdk -y

Now install the latest version of the Google Cloud Platform SDK and accept all defaults:

curl | bash

End your ssh session on the iot-device-simulator VM instance:


Start another SSH session on the iot-device-simulator VM instance.

Initialize the gcloud SDK

gcloud init

choose 2 proceed. get verification key and paste. choose your project and continue.

Enter this command to make sure that the components of the SDK are up to date:

gcloud components update

gcloud components install beta

sudo apt-get update

sudo apt-get install python-pip openssl git -y

sudo pip install pyjwt paho-mqtt cryptography

now clone it:

git clone

Create a registry for IoT devices

In your SSH session on the iot-device-simulator VM instance, run the following, adding your project ID as the value for PROJECT_ID:

  1. You must choose a region for your IoT registry. At this time, these regions are supported:
  • us-central1
  • europe-west1
  • asia-east1

Choose the region that is closest to you. To set an environment variable containing your preferred region, enter this command followed by the region name:

export MY_REGION=

Your completed command will look like this: export MY_REGION=us-central1.

type it:

gcloud beta iot registries create iotlab-registry \ –project=$PROJECT_ID \ –region=$MY_REGION \ –event-notification-config=topic=projects/$PROJECT_ID/topics/iotlab

Create a Cryptographic Key pair

In your SSH session on the iot-device-simulator VM instance, enter these commands to create the keypair in the appropriate directory:

cd $HOME/training-data-analyst/quests/iotlab/ openssl req -x509 -newkey rsa:2048 -keyout rsa_private.pem \ -nodes -out rsa_cert.pem -subj “/CN=unused”

This openssl command creates an RSA cryptographic keypair and writes it to a file called rsa_private.pem

Add simulated devices to the registry

For a device to be able to connect to Cloud IoT Core, it must first be added to the registry.

  1. In your SSH session on the iot-device-simulator VM instance, enter this command to create a device called temp-sensor-buenos-aires:
gcloud beta iot devices create temp-sensor-buenos-aires \
  --project=$PROJECT_ID \
  --region=$MY_REGION \
  --registry=iotlab-registry \
  --public-key path=rsa_cert.pem,type=rs256
  1. Enter this command to create a device called temp-sensor-istanbul:
gcloud beta iot devices create temp-sensor-istanbul \
  --project=$PROJECT_ID \
  --region=$MY_REGION \
  --registry=iotlab-registry \
  --public-key path=rsa_cert.pem,type=rs25

Run simulated devices

n your SSH session on the iot-device-simulator VM instance, enter these commands to download the CA root certificates from to the appropriate directory:

cd $HOME/training-data-analyst/quests/iotlab/
  1. Enter this command to run the first simulated device:
python \
   --project_id=$PROJECT_ID \
   --cloud_region=$MY_REGION \
   --registry_id=iotlab-registry \
   --device_id=temp-sensor-buenos-aires \
   --private_key_file=rsa_private.pem \
   --message_type=event \
   --algorithm=RS256 > buenos-aires-log.txt 2>&1 &

It will continue to run in the background.

  1. Enter this command to run the second simulated device:
python \
   --project_id=$PROJECT_ID \
   --cloud_region=$MY_REGION \
   --registry_id=iotlab-registry \
   --device_id=temp-sensor-istanbul \
   --private_key_file=rsa_private.pem \
   --message_type=event \

Telemetry data will flow from the simulated devices through Cloud IoT Core to your Cloud Pub/Sub topic. In turn, your Dataflow job will read messages from your Pub/Sub topic and write their contents to your BigQuery table.

Analyze the Sensor Data Using BigQuery

To analyze the data as it is streaming:

  1. In the GCP Console, open the Navigation menu and select BigQuery.
  2. Enter the following query in the Query editor and click RUN:
SELECT timestamp, device, temperature from iotlabdataset.sensordata
ORDER BY timestamp DESC

  1. You should receive a similar output:

did you learn something new give me a hand.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s