Administering Apache Hadoop
COURSE OUTLINE:
- Utilize best practices for deploying Hadoop clusters
- Determine hardware needs
- Monitor Hadoop clusters
- Recover from NameNode failure
- Handle DataNode failures
- Manage hardware upgrade processes including node removal, configuration changes, node installation and rebalancing clusters
- Manage log files
- Install, configure, deploy verify and maintain Hadoop clusters including: o MapReduce o HDFS o Pig o Hive (and MySQL) o HBase (and ZooKeeper) o HCatalog o Oozie o Mahout
- Overview of Hadoop
- Cluster Hardware and Installation of HDFS and MapReduce
- Rack Topology
- Setting up a Multi-user Environment
- Using Schedulers
- Hadoop Security with Kerberos
- Logs and Log Rotation
- Monitor, Maintain and Troubleshoot HDFS and MapReduce
- NameNode Failure and Recovery
- JobTracker Restarting
- Upgrade of Hardware Process
- Rebalancing
- Data Management
- Install Configure, Deploy and Verify Pig
- Install Configure, Deploy and Verify Hive
- Install Configure, Deploy and Verify MySQL
- Install Configure, Deploy and Verify HBase and ZooKeeper
- Install Configure, Deploy and Verify Other Hadoop Ecosystem (HCatalog, Oozie, Mahout)
- Install Configure, Deploy and Verify Nagios and Ganglia