Sticky post

Spark Cluster Overview

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. Spark uses Hadoop’s client libraries for HDFS and YARN.  Users can also download a “Hadoop free” binary and run Spark with any Hadoop version by augmenting Spark’s classpath. Scala and Java users can … Continue reading Spark Cluster Overview

Sticky post

Bayes Classification with Cloud Datalab, Spark, and Pig on Google Cloud

Note: If you are really following with post this job can take upto 1:30 hours to finish and if you stuck in a typo it will increase your resistance power In this post you will learn how to deploy a … Continue reading Bayes Classification with Cloud Datalab, Spark, and Pig on Google Cloud

Sticky post

Building an IoT Analytics Pipeline on Google Cloud Platform step by step

let’s start with the definition of IoT: The term Internet of Things (IoT) refers to the interconnection of physical devices with the global Internet. These devices are equipped with sensors and networking hardware, and each is globally identifiable. Taken together, these capabilities … Continue reading Building an IoT Analytics Pipeline on Google Cloud Platform step by step