Description

In this course you'll learn how to use Spark from Python! Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. PySpark is the Python package that makes the magic happen. You'll use this package to work with data about flights from Portland and Seattle. You'll learn to wrangle this data and build a whole machine learning pipeline to predict whether or not flights will be late. Get ready to put some Spark in your Python code and dive into the world of high performance machine learning!

Objectives

• Learn about Apache Spark and the Spark 2.0 architecture • Build and interact with Spark DataFrames using Spark SQL • Learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively • Read, transform, and understand data and use it to train machine learning models • Build machine learning models with MLlib and ML • Learn how to submit your applications programmatically using spark-submit • ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering • Featurization: feature extraction, transformation, dimensionality reduction, and selection • Pipelines: tools for constructing, evaluating, and tuning ML Pipelines • Persistence: saving and load algorithms, models, and Pipelines

Course Content
Pandas, Case Study, PySpark
Database, Numpy, pandas
User define function, built in functions
Introduction Python, Data Types, Data Structure, Control Flow
Modules, Object Oriented Programming
File handling, Exception Handling, Regular Expression, Debugging
PySpark DataFrame
Dataframe, Case Study
Machine Learning, PySpark ML

Prerequisite

Knowledge Prerequisites • Big Data and Hadoop • Basic Python data structures • Basic knowledge of Pandas dataframes and SQL • Entry-level Data Science • Anyone interested in Machine Learning • Any intermediate level people who know the basics of machine learning, including the classical algorithms like linear regression or logistic regression, but who want to learn more about it and explore all the different fields of Machine Learning. • Any people who are not that comfortable with coding but who are interested in Machine Learning and want to apply it easily on datasets. • Any data analysts who want to level up in Machine Learning. • Any people who are not satisfied with their job and who want to become a Data Scientist. • Any people who want to create added value to their business by using powerful Machine Learning tools

Requirements

Software Prerequisites • Apache Spark (Downloadable from http://spark.apache.org/downloads.html) • A Python distribution containing IPython, Pandas and Scikit-learn • PySpark • Anaconda with Python3.6 • www.anaconda.com


Latest Course
Courses to get you started
Denim Jeans
300000.00
App Development

App Development

Denim Jeans
100.00
App Development

App Development

Denim Jeans
250000.00
None

None

Denim Jeans
None.00
None

None


Snow