+ 91 99 7507 2320     surendra@gktcs.com


Description

In this course you'll learn how to use Spark from Python! Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. PySpark is the Python package that makes the magic happen. You'll use this package to work with data about flights from Portland and Seattle. You'll learn to wrangle this data and build a whole machine learning pipeline to predict whether or not flights will be late. Get ready to put some Spark in your Python code and dive into the world of high performance machine learning!

Objectives

• Learn about Apache Spark and the Spark 2.0 architecture • Build and interact with Spark DataFrames using Spark SQL • Learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively • Read, transform, and understand data and use it to train machine learning models • Build machine learning models with MLlib and ML • Learn how to submit your applications programmatically using spark-submit • ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering • Featurization: feature extraction, transformation, dimensionality reduction, and selection • Pipelines: tools for constructing, evaluating, and tuning ML Pipelines • Persistence: saving and load algorithms, models, and Pipelines

Course Content
Topics Name
Online Session
Online Session
Pandas_PySpark
NA

Pandas, Case Study, PySpark

Get detailed course syllabus in your inbox     View PDF
DataScience
NA

Database, Numpy, pandas

Get detailed course syllabus in your inbox     View PDF
Python_Function
NA

User define function, built in functions

NA
Python Introduction
NA

Introduction Python, Data Types, Data Structure, Control Flow

NA
Object Oriented Programming
NA

Modules, Object Oriented Programming

NA
File_Handling_Regular_Expression
NA

File handling, Exception Handling, Regular Expression, Debugging

Get detailed course syllabus in your inbox     View PDF
PySpark_DataFrame
NA

PySpark DataFrame

Get detailed course syllabus in your inbox     View PDF
PySpark_DataFrame_Case_Study
NA

Dataframe, Case Study

Get detailed course syllabus in your inbox     View PDF
PySpark_Machine_Learning
NA

Machine Learning, PySpark ML

Get detailed course syllabus in your inbox     View PDF
Prerequisite

Knowledge Prerequisites • Big Data and Hadoop • Basic Python data structures • Basic knowledge of Pandas dataframes and SQL • Entry-level Data Science • Anyone interested in Machine Learning • Any intermediate level people who know the basics of machine learning, including the classical algorithms like linear regression or logistic regression, but who want to learn more about it and explore all the different fields of Machine Learning. • Any people who are not that comfortable with coding but who are interested in Machine Learning and want to apply it easily on datasets. • Any data analysts who want to level up in Machine Learning. • Any people who are not satisfied with their job and who want to become a Data Scientist. • Any people who want to create added value to their business by using powerful Machine Learning tools

Requirements

Software Prerequisites • Apache Spark (Downloadable from http://spark.apache.org/downloads.html) • A Python distribution containing IPython, Pandas and Scikit-learn • PySpark • Anaconda with Python3.6 • www.anaconda.com


Latest Course
Courses to get you started
Card image cap

Trending Courses

Card image cap

Trending Courses

Card image cap

Trending Courses

Card image cap

Trending Courses


Snow