HANDS-ON-LAB

Predicting Customer Churn in Telecom Industry Project

Problem Statement

With the rapid development of the telecommunication industry, service providers are inclined more toward expanding the subscriber base. Retaining existing customers has become a huge challenge to meet the need to survive in the competitive environment. It is stated that the cost of acquiring a new customer is far more than that of retaining the existing one. Therefore, it is imperative for the telecom industries to use advanced analytics to understand consumer behavior and predict the association of the customers as to whether or not they will leave the company.

Dataset

The data contains 11 variables and of which Churn is the Target variable. 

The complete data dictionary can be found here.

Kindly download the data from here.

 

Tasks

  1. Hypothesis-based EDA:

    • Plot the distribution chart for the target variable. What is the class imbalance ratio?

    • Does having more customer service calls to increase the churn probability? 

    • If a customer uses more data, does that impact churn?

  1. Preprocess and create new features:

    • Create a function to clip outliers between Q1 to Q3 range for all columns

    • Create bucketed features based on the variable distribution for CustServCalls, AccountWeeks variables

  1. Create a function named “treat_null_values” that takes in the dataframe and does the following:

    • Drops features that have more than >= 60% null values

    • Impute columns based on median for numerical columns and mode for categorical variables

  1. Build Logistic Regression and XGBoost algorithm using the prepared data.

  2. Research on using hyperparameters to handle a class imbalance in Logistic Regression and XGBoost algorithm and build models using the same (Hint: class_weight & scale_pos_weight)

  3. Use SMOTE to oversample & undersample data, then build the logistic and XGBoost model. Then compare the results from the above 2 steps 4 & 5.

 

Discover the impact of customer service calls and data usage on churn probability in the telecom industry.

 

FAQs

Q1. What is the class imbalance ratio in the telecom customer churn data?

The class imbalance ratio can be determined by analyzing the distribution chart for the target variable.

 

Q2. Does having more customer service calls increase the churn probability?

By analyzing the data, we can determine if there is a correlation between customer service calls and churn probability.

 

Q3. Does customer data usage impact churn in the telecom industry?

By examining the data, we can evaluate whether there is a relationship between data usage and churn probability.