IBM Customer Churn
Business Objective
IBM is a multinational technology company that offers a wide range of services, including IT services, cloud and cognitive software, and hardware. With a large and diverse workforce, employee retention is a key concern for the company. High attrition rates can lead to increased costs due to the need for recruiting, hiring, and training new employees. It can also lead to loss of knowledge and skills, decreased productivity, and potential impacts on team morale.
The dataset provided contains various details about IBM employees, including their age, department, distance from home, education level, job satisfaction, monthly income, years at the company, and more. Most importantly, it includes whether or not they have left the company (attrition).
The main business problem is to understand and predict employee attrition. The goal is to identify the key factors that influence an employee’s decision to leave the company. This could include factors related to their job role, compensation, work-life balance, or personal circumstances. By understanding these factors, IBM can take proactive measures to improve employee retention. This could involve changes to HR policies, management practices, compensation packages, or work environment.
Data-Driven Solutions:
The data-driven approach to this problem involves analyzing the dataset to extract insights and build predictive models. This could involve the following steps:
- Exploratory Data Analysis (EDA): This involves understanding the structure of the dataset, checking for missing values, and analyzing the distribution of variables. This step can provide initial insights into the factors that may be associated with attrition.
- Data Visualization: This involves creating visual representations of the data to better understand the relationships between variables. For example, visualizing the distribution of monthly income by attrition status could reveal if lower-paid employees are more likely to leave.
- Feature Engineering: This involves creating new variables that may be more predictive of attrition. For example, a new feature could be created to represent an employee’s total years of experience.
- Hypothesis Testing: This involves statistically testing the relationships between variables. For example, a t-test could be used to determine if there is a significant difference in monthly income between employees who left and those who stayed.
- Predictive Modeling: This involves building a machine learning model to predict attrition based on the available features. The model could be used to identify employees who are at risk of leaving, allowing for proactive measures to be taken.
Deliverables
You can analyze the data in any tool you like (Tableau, Power BI, python, R, Excel, etc.) Your manager
would like a dashboard. The dashboard will be used by upper management to monitoring performance.
She would also like for you to generate a slide deck to present your analysis and recommendations to the VP of Human Resources of the company. She would like to know the factors that impact attrition and which areas of the company are impacted the most.
The slide deck can be done in Google Slides, PowerPoint, or any other tool. Just save it as a PDF.
Additional Instructions
Feel free to explore the data however you see fit. We have provided some guided questions to help direct your analysis and spark your own ideas.
Get the data here.
Guiding Questions
Exploratory Data Analysis (EDA)
- What is the correlation between ‘MonthlyIncome’ and ‘YearsAtCompany’?
- This can help us understand if employees who have been at the company longer tend to have higher incomes.
- What is the distribution of ‘NumCompaniesWorked’ for employees who left the company vs those who stayed?
- This can help us understand if employees who have worked at more companies are more likely to leave.
- What is the average ‘DistanceFromHome’ for each ‘Department’?
- This can help us understand if employees in certain departments tend to live further from work.
- What is the attrition rate for each ‘EducationField’?
- This can help us understand if employees in certain fields are more likely to leave the company.
- What is the average ‘MonthlyIncome’ for each ‘JobSatisfaction’ level?
- This can help us understand if higher job satisfaction is associated with higher income.
- What is the percentage of employees who left the company in each ‘MaritalStatus’ category?
- This can help us understand if marital status is associated with attrition.
- What is the average ‘EnvironmentSatisfaction’ for employees who left the company vs those who stayed?
- This can help us understand if environment satisfaction is associated with attrition.
- What is the distribution of ‘WorkLifeBalance’ for each ‘Department’?
- This can help us understand if work-life balance ratings differ by department.
- What is the average ‘Age’ for each ‘Attrition’ category?
- This can help us understand if age is associated with attrition.
- What is the percentage of employees in each ‘Education’ level for each ‘Department’?
- This can help us understand the education level distribution in each department.
Feature Engineering
- Create a new feature ‘AgeGroup’ categorizing employees into ‘Young’, ‘Middle-aged’, and ‘Senior’ based on their age.
- This can help understand if age groups have different attrition rates or other significant differences.
- Create a new feature ‘TotalSatisfaction’ as the sum of ‘EnvironmentSatisfaction’ and ‘JobSatisfaction’.
- This can help us understand overall satisfaction and its impact on attrition.
- Create a binary feature ‘HighIncome’, categorizing employees as ‘High’ if their income is above the average income, and ‘Low’ otherwise.
- This can help us understand if income level impacts attrition.
Data Visualization
- Create a bar plot showing the count of ‘Attrition’ for each ‘Department’.
- This can help us understand if attrition rates differ by department.
- Create a histogram of ‘MonthlyIncome’.
- This can help us understand the distribution of income among employees.
- Create a boxplot of ‘YearsAtCompany’ for each ‘EducationField’.
- This can help us understand if employees in certain fields tend to stay longer at the company.
- Create a scatter plot of ‘Age’ vs ‘MonthlyIncome’, colored by ‘Attrition’.
- This can help us understand if there’s a relationship between age, income, and attrition.
Hypothesis Testing
- Is there a significant difference in ‘MonthlyIncome’ between employees who left the company and those who didn’t?
- This can help us understand if income is a significant factor in attrition.
- Is there a significant difference in ‘TotalSatisfaction’ between different ‘Departments’?
- This can help us understand if job satisfaction varies significantly by department.
Machine Learning
- Build a logistic regression model to predict ‘Attrition’ based on other features in the dataset.
- This can help us understand which features are most predictive of attrition.
- Evaluate the performance of your model.
- This is important to understand how well our model is doing and if it’s useful in a practical context.
- Try at least two other machine learning models and compare their performance.
- Different models have different strengths and weaknesses, and it’s important to try multiple approaches.
- Use feature importance from a tree-based model to identify the top features impacting ‘Attrition’.
- This can help us understand which features are most important in predicting attrition.
- Tune the hyperparameters of your best performing model to improve its performance.
- Hyperparameter tuning is an important step in optimizing a machine learning model.