Today, Hadoop has become a cornerstone of every business technology professional. To stay ahead in the game, Hadoop has become a must-know technology for the following professionals: 1. Analytics professionals 2. BI /ETL/DW professionals 3. Project managers 4. Testing professionals 5. Mainframe professionals 6. Software developers and architects 7. Graduates aiming to build a successful career around Big Data
Course Objectives By the end of the course, you will: 1. Master the concepts of HDFS and MapReduce framework 2. Understand Hadoop 2.x Architecture 3. Setup Hadoop Cluster and write Complex MapReduce programs 4. Learn data loading techniques using Sqoop and Flume 5. Perform data analytics using Pig, Hive and YARN 6. Implement HBase and MapReduce integration 7. Implement Advanced Usage and Indexing 8. Schedule jobs using Oozie 9. Implement best practices for Hadoop development 10. Work on a real life Project on Big Data Analytics 11. Understand Spark and its Ecosystem 12. Learn how to work in RDD in Spark
You can master Hadoop, irrespective of your IT background. While basic knowledge of Core Java and SQL might help, it is not a pre-requisite for learning Hadoop.
Lab Set Up ( For all Machines ) Hardware : Intel Core 5 processor with 16GB Recommended RAM. OS : Ubuntu Server ( Latest Version ) or Cent OS or Mac OS or Windows 64 bit 7/8/10 ( Latest preferable version ) High Speed Internet Connection ( Open Port for Installations ) Software Prerequisites Java ( Latest Version ) , Scala ( Latest Version) Apache Spark [ Latest Version ] (Downloadable from http://spark.apache.org/downloads.html) A Python distribution containing IPython, Pandas and Scikit-learn Anaconda with Python3.6, PySpark Local Environment www.anaconda.com [ Local Machine ] Hadoop, PySpark PySpark on Hadoop Cloud Environment OR Cloudera Hadoop or Online Databriks Cloud