Certified Big Data Science Analyst - CBDSA

COURSE OUTLINE:

Description

Big Data Analytics delivers competitive advantage in two ways compared to the traditional analytical model. First, Big Data Analytics describes the efficient use of a simple model applied to volumes of data that would be too large for the traditional analytical environment. Second, Big Data Analytics refers to the sophistication of the model itself. Increasingly, analysis algorithms are provided directly by database management system (DBMS) vendors. To pull away from the pack, companies must go well beyond what is provided and innovate by using newer, more sophisticated statistical analysis.

This specialized course covers the concept of business analytics and big data technologies with its strategic importance to any organization. Participants will be introduced to the concept of business analytics with big data technologies: Hadoop, Hive and HBase. The course deals with basic principles, concepts, and techniques/tools used for big data and business analytics, which includes data mining, Hadoop, HDFS & MapReduce, Apache HBase and Apache Hive. Also, this course covers different types of business analytics with real life use cases including association rule mining and regression models. Participants will get good picture of all these concepts and how they all are interconnected to each other in organizational context.

Audience

  • Data Analyst - Statistics and Mining
  • Big Data Analyst
  • Operations Research Analyst
  • Senior Data Analyst- Statistics and Mining
  • Data Scientist

Prerequisites

Participants are recommended to have preferably min. 2 years of experience in software development with Java/Unix/Linux environment and a good understanding of data and business analytics.

Learning Objectives

  • Understand business analytics and big data technologies with its impact on enterprises
  • Learning data mining concepts, techniques through an open source data mining tool
  • Understand the role of big data technologies (Hadoop, HBase, Hive) in business analytics
  • Acquire the knowledge and learn to use Hadoop (HDFS and MapReduce), HBase and Hive

Unit 1: Introduction to Business Analytics

  • The concept of Business Analytics
  • Data, Information, Knowledge and Wisdom
  • Data as Unique Enterprise Asset
  • Data, Information and Analytics Lifecycle
  • Business Analytics � Current Context
  • Types of Analytics
    • Descriptive Analytics
    • Predictive Analytics
    • Prescriptive Analytics

Unit 2: Data/Information Architecture for Business Analytics

  • Data/Information Architecture
  • Concept of Data Warehouse/Enterprise Data Warehouse (EDW)
  • ETL � Key Process
  • Concept of Data Mart
  • Business Intelligence
  • Data Mining

Unit 3: Data Mining Tool

  • Understand the open source data mining tool RapidMiner
  • Explore the various features of RapidMiner
  • Walk through a RapidMiner demo with different scenarios

Unit 4: Data Mining Techniques

  • Understand the various data mining techniques
  • Understand how correlation matrix works
  • Understand how association rule mining works
  • Understanding the Predictive Analytics technique
  • Understand the forecasting technique

Unit 5: Introduction to Big Data

  • What is Big Data? Why Big Data?
  • 3V�s of Big Data
  • The Rapid Growth of Unstructured Data
  • Big Data Market Forecast
  • Big Data Analytics
  • Big Data in Business
  • Big Data Types & Architecture

Unit 6: Introduction to Hadoop

  • Big Data � Current Industry Trends
  • Why Process Big Data?
  • Challenges in Data Processing
  • Why Hadoop?
  • What is Hadoop offering?
  • Hadoop Network Structure
  • Hadoop Eco-System
  • Hadoop Core Components
  • Hadoop � Features
  • Hadoop � Relevance
  • Hadoop in Action

Unit 7: Hadoop HDFS & MapReduce

  • Hadoop HDFS
    • What does HDFS Facilitate?
    • HDFS Architecture
    • Hadoop Network and Server Infrastructure
    • NameNode, Secondary NameNode and DataNode
    • Ensuring Data Correctness
    • Data Pipelining while Loading Data
    • fs Operations
  • Hadoop MapReduce
    • MapReduce Conceptualization
    • MapReduce � Overview
    • MapReduce � Programming Model
    • MapReduce � Execution Overview
    • Hadoop � Application Examples
    • Word Count � Example

Unit 8: Apache HBase

  • What is HBase?
  • HBase Architecture
  • ZooKeeper
  • HBase Data model
  • HBase Deployment
  • HBase Cluster Architecture
  • Indexes in HBase
  • Scaling HBase
  • Data Locality, Coherence and Concurrency, Fault Tolerance
  • Hadoop Integration
  • High-Level Architecture
  • Replication of Data Across Data Centres
  • HBase Applications
  • Advantages and Disadvantages

Unit 9: Apache Hive

  • What is Hive?
  • Why Hive?
  • Where to use Hive?
  • Hive Architecture
  • Hive: Benefits
  • Hive: Tradeoffs
  • Hive: Real world Examples