Survival Data Mining: A Programming Approach

COURSE OUTLINE:

Description

This advanced course covers predictive hazard modeling for customer history data. Designed for data analysts, the course uses SAS/STAT software to illustrate various survival data mining methods and their practical implementation.

Note: Formerly titled 'Survival Data Mining: Predictive Hazard Modeling for Customer History Data,' this course now includes hands-on exercises so that you can practice the techniques that you learn. Other additions include a chapter on recurrent events, new features in SAS/STAT software, and an expanded section that compares discrete time approach versus the continuous time models such as Cox Proportional Hazards models and fully parametric models such as Weibull.

Audience

  • Predictive modelers
  • Data analysts
  • Statisticians
  • Econometricians
  • Model validators
  • Data scientists

Prerequisites

  • A basic understanding of survival analysis
  • Experience with predictive modeling (particularly with logistic regression)
  • Familiarity with statistical concepts such as random variables, probability distributions, and parameter estimation
  • Familiarity with SQL (including topics such as sub-queries and left-joining)
  • SAS programming proficiency

Learning Objectives

  • Build models for time-dependent outcomes derived from customer event histories
  • Account for competing risks, time-dependent covariates, right censoring, and left truncation, handle large data sets
  • Compute the expected value of the remaining time until an event
  • Evaluate the predictive performance of the model

1. Survival Data Mining

  • Introduction to survival data mining
  • Elements of survival analysis
  • Time-dependent covariates

2. Survival Models

  • Semi-parametric survival models
  • Parametric survival models
  • Discrete-time survival models

3. Flexible Hazard Modeling

  • Building discrete time hazard models
  • Grouped expanded data

4. Hazard Modeling with Big Data

  • Outcome-dependent sampling
  • Data truncation
  • Piecewise constant hazards

5. Predictive Performance

  • Predictive scoring
  • Empirical validation

6. Recurrent Events

  • Introduction to recurrent events