Home
Welcome to the HadoopExam CDH(Cloudera) Admin Beginner Course-1 Training Courses.
Syllabus of CDH Admin Beginner Course-1 Training : In total approx between 25-30 Sessions will be created. Some of the sessions are in prgress and soon they will be available.
Module 1 : Introduction to BigData, Hadoop (HDFS and MapReduce) : Available (Length 35 Minutes)
1. BigData Inroduction
2. Hadoop Introduction
3. HDFS Introduction
4. MapReduce Introduction
Module 2 : Deep Dive in HDFS : Available (Length 48 Minutes)
1. HDFS Design
2. Fundamental of HDFS (Blocks, NameNode, DataNode, Secondary Name Node)
3. Rack Awareness
4. Read/Write from HDFS
5. HDFS Federation and High Availability (Hadoop 2.x.x)
6. Parallel Copying using DistCp
7. HDFS Command Line Interface
Module 4 : Cloudera QuickStart VM Step By Step Installation (Length 19 Mins) Available + Steps in Doc + Hands On Lab
1. It Includes Hadoop 2.0
2. YARN
3. Hive
4. Pig
5. Hue
6. Apache Spark
7. Workflow
Module 5 : Load data in HDFS using the HDFS commands (Length 35 Mins) Available + Steps in Doc + Hands On Lab
Module 7 : Installing VMWare Workstation (Length 9 Mins) Available + Steps in Doc + Hands On Lab
Module 8 : Preparing Linux Instance for Hadoop Node(DataNode or NameNode) (Length 19 Mins) Available + Steps in Doc + Hands On Lab
Module 9 : Network Configuration on Linux Image (Length 19 Mins) Available + Steps in Doc + Hands On Lab
1. Assign IP address to Node
2. Assign HostName
Module 10 : Setup Cygwin on Windows (Length 27 Mins) Available + Steps in Doc + Hands On Lab
Module 11 : Create and Setup Linux Instance for MultiNode cluster (Length 16 Mins) Available + Steps in Doc + Hands On Lab
1. Disable SELinux
2. Disable Firewalls
Module 12 : Create 4 Nodes for the Cluster (Length 24 Mins) Available + Steps in Doc + Hands On Lab
Module 13 : Setup Local Repository for Cloudera Manager (Length 17 Mins) Available + Steps in Doc + Hands On Lab
1. Create Apache Web Server Instance.
2. Create repo web directory on given node
3. Download Cloudera Manager-5 software
4. Setting up the .repo file
Module 14 : Install Cloudera Manager from Local Repository (Length 12 Mins) Available + Steps in Doc + Hands On Lab
1. Download Latest Cloudera Manager installer
2. Install Cloudera Manager-5 from local repository
3. Verify the installation of CM-5 and provide web UI URL
Module 15 : Setup Cloudera Parcels in Local Repository (Length 8 Mins) Available + Steps in Doc + Hands On Lab
1. Setup Parcels for CDH5 in local repository
2. Setup Impala Parcels in local repository
3. Setup Solr Search Parcels in local repository
4. Similarly setup kuddu, Accumulo and Kafka parcels
Module 16 : Install Cloudera Manager Agent on each node (Length 29 Mins) Available + Steps in Doc + Hands On Lab
1. Install Java/JDK on each node
2. Install Cloudera Agent on each node.
Module 17 : Add Cloudera Management Service (Length 11 Mins) Available + Steps in Doc + Hands On Lab
Module 18 : Setup 4 Node CDH cluster (Length 21 Mins) Available + Steps in Doc + Hands On Lab
1. Only cluster should be created with HDFS installed. No other service, should be installed.
2. It should use parcels to setup cluster, which we have setup as a local repository.
3. Don’t activate HHTPFs roles
4. Role assignment should be as below
Module 19 : Cloudera Manager Introduction (Length 21 Mins) Available + Steps in Doc + Hands On Lab
1. Feature of Cloudera Manager
2. Terminology in Cloudera Manager
Module 20 : Introduction to Apache Zookeeper (Length 21 Mins) Available + Steps in Doc + Hands On Lab
Module 21A : Install Zookeeper Service on Cluster (Length 8 Mins) Available + Steps in Doc + Hands On Lab
Module 21B : Enable NameNode High Availability (Length 13 Mins) Available + Steps in Doc + Hands On Lab
Module 21C : Test NameNode High Avaialability (Length 4 Mins) Available + Steps in Doc + Hands On Lab
Module 22 : Setup User Space for root user (Length 16 Mins) Available + Steps in Doc + Hands On Lab
1. Creare new Linux user named "Admin"
2. Setup HDFS user space for "Admin" user
Module 23A : Understand HDFS Snapshot Concepts (Length 15 Mins) Available + Steps in Doc + Hands On Lab
Module 23B : Recover the deleted Directory using HDFS snapshot (Length 15.2 Mins) Available + Steps in Doc + Hands On Lab
Module 24A : Understand HDFS ACLs concept (Length 21 Mins) Available + Steps in Doc + Hands On Lab
Module 24B : Assigning HDFS ACLs concept (Length 5 Mins) Available + Steps in Doc + Hands On Lab
Owner should have read and write permissions
All member of admingroup allow data modifications
he_dev1 (Only one member from devgroup allow data read)
he_test1 and he_exec1 (Means execgroup and testgroup is not allowed data read)
Module 24C : Directory level HDFS ACLs concept (Length 8 Mins) Available + Steps in Doc + Hands On Lab
Create a directory he_monthly_data, using he_admin1
Now create two directory JAN16 and FEB16 directory, and these does not have any permission for executives.
All the members of executive group should automatically get access to new subdirectories as they are created each month e.g. MAR16, MAR17 etc.
Module 25 : Basics of Cloudera Manager and its Confusing terminology (Length 35 Mins) Available
Module 26 : NameNode Memory Consideration (Length x Mins) InProgress + Steps in Doc + Hands On Lab
Module 27 : WebUI for HDFS (Length x Mins) InProgress + Steps in Doc + Hands On Lab
Module 28 : Using the Hadoop FileShell (Length x Mins) InProgress + Steps in Doc + Hands On Lab
Module 29 : The Role of Computational Framework (Length x Mins) InProgress + Steps in Doc + Hands On Lab
Module 30 : Install YARN on 4 Node Cluster (Length 8 Mins) Available + Steps in Doc + Hands On Lab
Module 31 : Running YARN application (Length x Mins) (Length x Mins) InProgress + Steps in Doc + Hands On Lab
Module 32 : Apache Spark : Introduction to Apache Spark (Length 48 Mins) Available 100 Time Faster Data Processing
1. Introduction to Apache Spark
2. Features of Apache Spark
3. Apache Spark Stack
4. Introduction to RDD's
5. RDD's Transformation
6. What is Good and Bad In MapReduce
7. Why to use Apache Spark