Sticky post

Spark Cluster Overview

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. Spark uses Hadoop’s client libraries for HDFS and YARN.  Users can also download a “Hadoop free” binary and run Spark with any Hadoop version by augmenting Spark’s classpath. Scala and Java users can … Continue reading Spark Cluster Overview

Running Spark on Azure Databricks

Who Should See This Blog Post? This course is intended for the people who wants to Run their Analytics workload and Machine Learning workload on DataBricks And chosen Azure as their distributor. Skills Required To Follow: SQL Machine Learning Why we need DataBricks and Azure services ….. well we can train one model on DataBricks and deploy it on Azure Services Before we can run a spark job we needs to create a computer cluster and before we create a computer cluster we need to create a DataBricks Workspace.All the work done in DataBricks need not to be done within … Continue reading Running Spark on Azure Databricks

Sticky post

Machine Learning Applications for Big Data(Regression):

Machine Learning: Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it … Continue reading Machine Learning Applications for Big Data(Regression):

You Can Blend Apache Spark And Tensorflow To Build Potential Deep Learning Solutions

Before we Start our journey let’s explore what is spark and what is tensorflow and why we want them to be combined. Apache Spark™ is a unified analytics engine for large-scale data processing. Features: Speed: Run workloads 100x faster. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.(DAG means ) Logistic regression in Hadoop and Spark Ease of Use: Write applications quickly in Java, Scala, Python, R, and SQL.Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use … Continue reading You Can Blend Apache Spark And Tensorflow To Build Potential Deep Learning Solutions