• Big Data and Hadoop • Basic Python data structures • Basic knowledge of Pandas dataframes and SQL • Entry-level Data Science

Learn about Apache Spark and the Spark 2.0 and PySpark architecture • Build and interact with PySpark DataFrames • Read, transform, and understand data and use it to train machine learning models • Build machine learning models with MLlib and ML • Learn how to submit your applications programmatically using spark-submit • ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering • Features : feature extraction, transformation, dimensionality reduction, and selection • Pipelines: tools for constructing, evaluating, and tuning ML Pipelines • Persistence: saving and load algorithms, models, and Pipelines • Utilities: linear algebra, statistics, data handling, etc.

Topics Name
Online Session
Online Session

Introduction to Python


Intermediate Python




PySpark Introduction RDD


PySpark DataFrame


PySpark Machine Learning

Hardware : Intel Core 5 processor with 16GB Recommended RAM. OS : Ubuntu Server ( Latest Version ) or Cent OS or Mac OS or Windows 64 bit 7/8/10 ( Latest preferable version ) High Speed Internet Connection ( Open Port for Installations ) Software Prerequisites Java ( Latest Version ) , Scala ( Latest Version) Apache Spark [ Latest Version ] (Downloadable from http://spark.apache.org/downloads.html) A Python distribution containing IPython, Pandas and Scikit-learn Anaconda with Python3.6, PySpark Local Environment www.anaconda.com [ Local Machine ] Hadoop, PySpark PySpark on Hadoop Cloud Environment OR Cloudera Hadoop or Online Databriks Cloud

Latest Course

R Programming
Trending Courses
Trending Courses
Ruby on Rails

Hello! How can I help you?