- 26th Jul 2024
- 18:14 pm
- Aditya Lakhera
In this assignment, you are a marketing analyst for a popular e-commerce site. The company is putting together a marketing presentation that highlights its last decade in business. You have specifically been tasked with pulling figures for the year 2011. You have been provided with a CSV file of sales data that you will first need to load and clean. In order to thoroughly clean the data, you must:
- Convert columns that contain dates to a DateTime type
- Convert any columns that contain negative values to positive ones
- Remove rows containing "none" values
- Remove rows where the unit price of an item is equal to 0
- Select only 2011 sales data
Once your dataset is clean, you will analyze it to collect the following figures for 2011:
- Total items sold
- Total revenue
- Total number of unique items sold
- Average number of orders per customer
- Average value of each invoice
- Instructions to begin:
Download the provided files, launch Jupyter Notebook, and open the Notebook file.
Analyzing 2011 E-Commerce Sales Data - Get Assignment Solution
This sample Python assignment solution has been successfully completed by our team of Python programmers. The solutions provided are designed exclusively for research and reference purposes. If you find value in reviewing the reports and code, our Python tutors would be delighted.
-
For a comprehensive solution package including code, reports, and screenshots, please visit our Python Assignment Sample Solution page.
-
Contact our Python experts for personalized online tutoring sessions focused on clarifying any doubts related to this assignment.
-
Explore the partial solution for this assignment available in the blog above for further insights.
Free Assignment Solution - Analyzing 2011 E-Commerce Sales Data
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Keeping Customers: Fighting Churn with Pandas\n",
"\n",
"In this assignment, you'll play the role of lead analyst for a credit card provider.
You’ve been provided with a CSV file that provides information on both customers who have churned and customers who still use the credit card.
You have been tasked to determine whether churned vs. existing customers show significant differences when it comes to the following factors:\n",
"\n",
"- Age\n",
"- Number of dependents \n",
"- Total Revolving balance; i.e. the balance that still needs to be paid \n",
"- Income \n",
"\n",
"The first three metrics are numeric so you will apply the same strategies to solve those problems.
Customer income is categorical so you will be applying a slightly different strategy to solve that problem.\n",
"\n",
"By the end of this notebook, you will have determined which of the above factors may help identify a customer who is likely to churn.
This will help your organization better identify and retain flight risks in the future."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"### Getting Started\n",
"To get started, download the following files:\n",
"- `Unit 20 - Business - Unsolved.ipynb` (_this notebook_)\n",
"- `CreditCardChurn.csv`\n",
"\n",
"Place these together in to a dedicated directory on your hard drive.
We recommend creating a folder in your `Documents` directory for this week of class, as follows:\n",
"\n",
"```\n",
"Documents/\n",
" Python/\n",
" Week20/\n",
" Unit 20 - Business - Unsolved.ipynb\n",
" CreditCardChurn.csv\n",
"```\n",
"\n",
"Then, start Jupyter Notebook and open `Unit 20 - Business - Unsolved.ipynb` in your browser.
Make sure the `CreditCardChurn.csv` file lives in the same directory.\n",
"\n",
"---\n",
"\n",
"### Problem Structure\n",
"Each problem will be accompanied by:\n",
"- **Instructions**\n",
" - Each problem features a markdown cell explaining the problem.\n",
"- **Unfinished Code Cells**\n",
" - Each problem has unfinished code cells, where you will write code to solve the problem.\n",
" - Cells will contain either starter code for you to finish, or a comment explaining what your code should do.\n",
"- **Expected Output**. \n",
" - Many unfinished code cells will have output below them. You will be expected to write code that produces the same output.\n",
" - Some unfinished code cells do _not_ have output below them. This is simply because not all code will generate output.
Your solutions for these cells should _not_ print anything.\n",
" \n",
"---\n",
" \n",
"### Deliverables\n",
"To receive credit for this assignment, you must submit the following files:\n",
"- Your completed Jupyter Notebook\n",
"\n",
"Your completed Jupyter Notebook will be this file, but with all of the problems solved. This is the only file you will need to submit.
When you're done with the assignment, run all cells to verify that your code executes as expected. Then, save and submit this notebook.\n",
"\n",
"Good luck!\n",
"\n",
"----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 1: Loading & Cleaning Data\n",
"In Part 1, you will perform the following steps on the data in `CreditCardChurn.csv`:\n",
"- Load the CSV into a dataframe and print the first five rows\n",
"- Add a new column called `Churned`\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 1: Loading Data\n",
"You will load the data in `CreditCardChurn.csv`, and inspect its columns using `head`. \n",
"\n",
"You have been provided a `filename` variable, which contains the path to `CreditCardChurn.csv`.
Use it to complete the steps below:\n",
"- Load `filename` into a DataFrame called `churn`\n",
"- Print the first 5 rows of `churn`\n",
"\n",
"---\n",
"\n",
"Your code should print the following:\n",
"\n",
"```\n",
"ClientID\tAttritionFlag\tCustomerAge\tGender\tDependentCount\tEducationLevel\tIncomeCategory\tTotalRevolvingBal\n",
"0\t768805383\tExisting Customer\t45\tM\t3\tHigh School\t 60????− 80K\t777\n",
"1\t818770008\tExisting Customer\t49\tF\t5\tGraduate\tLess than $40K\t864\n",
"2\t713982108\tExisting Customer\t51\tM\t3\tGraduate\t 80????− 120K\t0\n",
"3\t769911858\tExisting Customer\t40\tF\t4\tHigh School\tLess than $40K\t2517\n",
"4\t709106358\tExisting Customer\t40\tM\t3\tUneducated\t 60????− 80K\t0\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Provided Code -- Do NOT Edit!\n",
"import pandas as pd\n",
"filename = 'CreditCardChurn.csv'"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2021-04-07T02:06:26.285841Z",
"start_time": "2021-04-07T02:06:26.262287Z"
}
},
"outputs": [],
"source": [
"# TODO: Load `filename` into a DataFrame called `churn`\n",
"df = pd.read_csv('CreditCardChurn.csv', sep = ',')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2021-04-07T02:06:26.348790Z",
"start_time": "2021-04-07T02:06:26.290336Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ClientID</th>\n",
" <th>AttritionFlag</th>\n",
" <th>CustomerAge</th>\n",
" <th>Gender</th>\n",
" <th>DependentCount</th>\n",
" <th>EducationLevel</th>\n",
" <th>IncomeCategory</th>\n",
" <th>TotalRevolvingBal</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>768805383</td>\n",
" <td>Existing Customer</td>\n",
" <td>45</td>\n",
" <td>M</td>\n",
" <td>3</td>\n",
" <td>High School</td>\n",
" <td>$60K - $80K</td>\n",
" <td>777</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>818770008</td>\n",
" <td>Existing Customer</td>\n",
" <td>49</td>\n",
" <td>F</td>\n",
" <td>5</td>\n",
" <td>Graduate</td>\n",
" <td>Less than $40K</td>\n",
" <td>864</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>713982108</td>\n",
" <td>Existing Customer</td>\n",
" <td>51</td>\n",
" <td>M</td>\n",
" <td>3</td>\n",
" <td>Graduate</td>\n",
" <td>$80K - $120K</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>769911858</td>\n",
" <td>Existing Customer</td>\n",
" <td>40</td>\n",
" <td>F</td>\n",
" <td>4</td>\n",
" <td>High School</td>\n",
" <td>Less than $40K</td>\n",
" <td>2517</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>709106358</td>\n",
" <td>Existing Customer</td>\n",
" <td>40</td>\n",
" <td>M</td>\n",
" <td>3</td>\n",
" <td>Uneducated</td>\n",
" <td>$60K - $80K</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ClientID AttritionFlag CustomerAge Gender DependentCount \\\n",
"0 768805383 Existing Customer 45 M 3 \n",
"1 818770008 Existing Customer 49 F 5 \n",
"2 713982108 Existing Customer 51 M 3 \n",
"3 769911858 Existing Customer 40 F 4 \n",
"4 709106358 Existing Customer 40 M 3 \n",
"\n",
" EducationLevel IncomeCategory TotalRevolvingBal \n",
"0 High School $60K - $80K 777 \n",
"1 Graduate Less than $40K 864 \n",
"2 Graduate $80K - $120K 0 \n",
"3 High School Less than $40K 2517 \n",
"4 Uneducated $60K - $80K 0 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TODO: Print first 5 rows of `churn`\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 2: Adding a `Churned` Column\n",
"Note that the `AttritionFlag` Series contains one of two values: Either `Existing Customer`,
indicating that the customer is still a subscriber; or `Attrited Customer`, indicating that they've canceled. \n",
"\n",
"In this problem, you'll add a new column, called `Churned`, which will be `
True` if `AttritionFlag` is `Attrited Customer`, and `False` otherwise. Follow the steps below:\n",
"- Create a new column, called `Churned`, which is `True` for rows where
`AttritionFlag` is equal to `Attrited Customer`, and `False` otherwise\n",
"- Count the values of the `Churned` column\n",
"\n",
"---\n",
"\n",
"Your code should print the following:\n",
"\n",
"```\n",
"False 8500\n",
"True 1627\n",
"Name: Churned, dtype: int64\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2021-04-07T02:06:26.486096Z",
"start_time": "2021-04-07T02:06:26.473418Z"
}
},
"outputs": [],
"source": [
"# TODO: Create column called `Churned` \n",
"df[\"Churned\"]= True\n",
"df.loc[df['AttritionFlag'] != \"Existing Customer\" , \"Churned\"] = False\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2021-04-07T02:06:26.513464Z",
"start_time": "2021-04-07T02:06:26.498650Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True 8500\n",
"False 1627\n",
"Name: Churned, dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TODO: Count values in `Churned` column\n",
"df['Churned'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 2: Differences in Numeric Variables\n",
"In Part 2, you will see if there are significant differences in the following columns between the two groups:\n",
"- Customer Age\n",
"- Number of Dependents\n",
"- Total Revolving Balance"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 1: Calculating the Average Age Difference Between `churned` vs `not_churned`\n",
"You will write code that computes the average age of churned customers,
and the average age of unchurned customers, and then prints the _difference_ in these two averages.\n",
"\n",
"Follow the steps below to solve this problem:\n",
"- Create a variable, called `churned_customers`, that contains only the
rows in your DataFrame corresponding to churned customers \n",
"- Create a variable, called `unchurned_customers`, that contains only the
rows in your DataFrame corresponding to _unchurned_ customers\n",
"- Compute the average `CustomerAge` for each group\n",
"- Print the difference between these averages\n",
"\n",
"---\n",
"\n",
"Your code should print the following:\n",
"\n",
"```\n",
"AVERAGE AGE DIFFERENCE: 0.3973783578582015\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Create `churned_customers` and `unchurned_customers` variables\n",
"churned_customers = df[df['Churned'] == True]\n",
"unchurned_customers = df[df['Churned'] == False]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Compute average age of churned and unchurned customers\n",
"average_churned_customers = churned_customers['CustomerAge'].mean()\n",
"average_unchurned_customers = unchurned_customers['CustomerAge'].mean()\n",
"\n",
"# TODO: Compute `difference_in_average_age`\n",
"difference_in_average_age = (average_churned_customers - average_unchurned_customers )"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2021-04-07T02:06:26.700155Z",
"start_time": "2021-04-07T02:06:26.666471Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.3973783578582015\n"
]
}
],
"source": [
"# TODO: Print `difference_in_average_age`\n",
"print(difference_in_average_age)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After finding the difference in average age, answer the following questions:\n",
"- What are the minimum and maximum values in `churn['CustomerAge']`?
Store the minimum value in a variable called `min_age` and maximum value in a variable called `max_age`.\n",
"- What is the difference between these values? Store the result in a variable `difference_in_average_dependent_count`.\n",
"- Is `CustomerAge` predictive of churn?"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"26\n"
]
}
],
"source": [
"# Compute and print `min_age`\n",
"min_age = churned_customers['CustomerAge'].min()\n",
"print(min_age)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"73\n"
]
}
],
"source": [
"# Compute and print `max_age`\n",
"max_age = churned_customers['CustomerAge'].max()\n",
"print(max_age)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"47\n"
]
}
],
"source": [
"# Compute and print difference between maximum and minimum ages\n",
"print(max_age - min_age)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 2: Calculating Difference in Average Number of Dependents\n",
"Next, you will write code that computes the average number of dependents of churned customers,
and the average number of dependents of unchurned customers, and then prints the _difference_ in these two averages.\n",
"\n",
"Follow the steps below to solve this problem:\n",
"- Compute the average `DependentCount` for each group\n",
"- Print the difference between these averages\n",
"\n",
"---\n",
"\n",
"Your code should print the following:\n",
"\n",
"```\n",
"AVERAGE DIFFERENCE IN DEPENDENT COUNT: 0.06716967352398884\n",
"```\n",
"\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Create `churned_customers` and `unchurned_customers` variables\n",
"churned_customers = df[df['Churned'] == True]\n",
"unchurned_customers = df[df['Churned'] == False] "
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Compute average `DependentCount` for each group\n",
"average_churned_customers = churned_customers['DependentCount'].mean()\n",
"average_unchurned_customers = unchurned_customers['DependentCount'].mean()\n",
"\n",
"# TODO: Compute `difference_in_average_dependent_count`\n",
"difference_in_average_dependent_count = (average_churned_customers - average_unchurned_customers)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"ExecuteTime": {
"end_time": "2021-04-07T02:06:26.759904Z",
"start_time": "2021-04-07T02:06:26.740591Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"AVERAGE DIFFERENCE IN DEPENDENT COUNT: 0.067170\n"
]
}
],
"source": [
"# TODO: Print `difference_in_average_dependent_count`\n",
"print(\"AVERAGE DIFFERENCE IN DEPENDENT COUNT: %f\"%(difference_in_average_dependent_count))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After finding the difference in average number of dependents, answer the following questions:\n",
"- What are the minimum and maximum values in `churn['DependentCount']`?
Store the minimum value in a variable called `min_dependent_count` and maximum value in a variable called `max_dependent_count`.\n",
"- What is the difference between these values? Store the result in a variable called `dependent_count_range`.\n",
"- Is `DependentCount` predictive of churn?"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n"
]
}
],
"source": [
"# Compute and print `min_dependent_count`\n",
"min_dependent_count = churned_customers['DependentCount'].min()\n",
"print(min_dependent_count)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5\n"
]
}
],
"source": [
"# Compute and print `max_dependent_count`\n",
"max_dependent_count = churned_customers['DependentCount'].max()\n",
"print(max_dependent_count)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5\n"
]
}
],
"source": [
"# Compute and print difference between max and min dependent count\n",
"print(max_dependent_count -min_dependent_count )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 3: Calculating Average Difference in Total Revolving Balance\n",
"Next, you will write code that computes the average number of dependents of churned customers,
and the average number of dependents of unchurned customers, and then prints the _difference_ in these two averages.\n",
"\n",
"Follow the steps below to solve this problem:\n",
"- Create a variable, called `churned_customers`, that contains only the rows in your
DataFrame corresponding to churned customers \n",
"- Create a variable, called `unchurned_customers`, that contains only the rows in your
DataFrame corresponding to _unchurned_ customers\n",
"- Compute the average `DependentCount` for each group\n",
"- Print the difference between these averages\n",
"\n",
"---\n",
"\n",
"Your code should print the following:\n",
"\n",
"```\n",
"AVERAGE DIFFERENCE IN TOTAL REVOLVING BALANCE: -583.7811305542499\n",
"```\n",
"\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Create `churned_customers` and `unchurned_customers` variables\n",
"churned_customers = df[df['Churned'] == True]\n",
"unchurned_customers = df[df['Churned'] == False] "
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Compute average `TotalRevolvingBal` for each group\n",
"average_churned_customers = churned_customers['TotalRevolvingBal'].mean()\n",
"average_unchurned_customers = unchurned_customers['TotalRevolvingBal'].mean()\n",
"\n",
"# TODO: Compute `difference_in_average_revolving_balance`\n",
"difference_in_average_revolving_balance = average_unchurned_customers - average_churned_customers"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"ExecuteTime": {
"end_time": "2021-04-07T02:06:26.809032Z",
"start_time": "2021-04-07T02:06:26.793357Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"AVERAGE DIFFERENCE IN TOTAL REVOLVING BALANCE: -583.781131\n"
]
}
],
"source": [
"# TODO: Print `difference_in_numeric` of `TotalRevolvingBal`\n",
"print(\"AVERAGE DIFFERENCE IN TOTAL REVOLVING BALANCE: %f\"%(difference_in_average_revolving_balance))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After finding the difference in total revolving balance, answer the following questions:\n",
"- What are the maximum and minimum values in `churn['TotalRevolvingBal']`?
Store the minimum value in a variable called `min_balance` and maximum value in a variable called `max_balance`.\n",
"- What is the difference between these values? Store the result in a variable called `balance_range`\n",
"- Is `TotalRevolvingBal` predictive of churn?"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n"
]
}
],
"source": [
"# TODO: Compute and print minimum revolving balance\n",
"min_revolving_balance = churned_customers['TotalRevolvingBal'].min()\n",
"print(min_revolving_balance)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2517\n"
]
}
],
"source": [
"# TODO: Compute and print maximum revolving balance\n",
"max_revolving_balance = churned_customers['TotalRevolvingBal'].max()\n",
"print(max_revolving_balance)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2517\n"
]
}
],
"source": [
"# TODO: Compute difference between max and min revolving balances\n",
"print(max_revolving_balance - min_revolving_balance)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 3: Studying Income Categories\n",
"In Part #, you will continue looking for differences between churned and unchurned customers.
This time, you will see if there low-income or high-income customers are more likely to churn, by studying the `IncomeCategory` column."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 1: Comparing \"Low-Income\" Churned vs Unchurned Customers\n",
"Next, you will determine whether there is a difference in the number of
churned vs unchurned customers who qualify as \"low-income\" -- i.e., those who make less than $40K per year.\n",
"\n",
"You have been provided with a variable, called `low_income`, containing the value `'Less than $40K'`.
Use it to complete the steps below:\n",
"- Create a variable, called `churned_customers`, that contains only the rows in your
DataFrame corresponding to churned customers\n",
"- Create a variable, called `unchurned_customers`, that contains only the rows in your DataFrame corresponding to unchurned customers\n",
"- Create the following two DataFrames:\n",
" - `low_income_churned`: Filter `churned_customers` for only rows with an `IncomeCategory` value of `low_income'\n",
" - `low_income_unchurned`: Filter `unchurned_customers` for only rows with an `IncomeCategory` value of `low_income`\n",
"- Compute the mean of each of these new DataFrames, then print the difference between these means.\n",
"\n",
"---\n",
"\n",
"Your code should print the following:\n",
"\n",
"```\n",
"0.029211251310604147\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"# Provided Code -- Do NOT Edit!\n",
"low_income = 'Less than $40K'"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Create `churned_customers` and `unchurned_customers` DataFrames\n",
"churned_customers = df[df['Churned'] == True]\n",
"unchurned_customers = df[df['Churned'] == False] "
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ClientID 7.032297e+06\n",
"CustomerAge -1.571653e-01\n",
"DependentCount -6.633854e-02\n",
"TotalRevolvingBal 5.542824e+02\n",
"Churned 1.000000e+00\n",
"dtype: float64\n"
]
}
],
"source": [
"# TODO: Filter for low-income, churned customers\n",
"low_income_churned = churned_customers[churned_customers['IncomeCategory'] == low_income]\n",
"# TODO: Filter for low-income, unchurned customers\n",
"low_income_unchurned = unchurned_customers[unchurned_customers['IncomeCategory'] == low_income]\n",
"# TODO: Compute difference in means between `low_income_churned` and `low_income_unchurned`\n",
"average_income_churned = low_income_churned.mean()\n",
"average_income_unchurned = low_income_unchurned.mean()\n",
"difference_in_average = average_income_churned - average_income_unchurned\n",
"print(difference_in_average)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 2: Comparing \"High-Income\" Churned vs Unchurned Customers\n",
"- Create the following two DataFrames:\n",
" - `high_income_churned`: Filter `churned_customers` for only rows with an `IncomeCategory` value of `high_income'\n",
" - `high_income_unchurned`: Filter `unchurned_customers` for only rows with an `IncomeCategory` value of `high_income`\n",
"- Compute the mean of each of these new DataFrames, then print the difference between these means.\n"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"# Provided Code -- Do NOT Edit!\n",
"high_income = '$120K +'"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Create `churned_customers` and `unchurned_customers` DataFrames\n",
"churned_customers = df[df['Churned'] == True]\n",
"unchurned_customers = df[df['Churned'] == False] "
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ClientID 7.032297e+06\n",
"CustomerAge -1.571653e-01\n",
"DependentCount -6.633854e-02\n",
"TotalRevolvingBal 5.542824e+02\n",
"Churned 1.000000e+00\n",
"dtype: float64\n"
]
}
],
"source": [
"# TODO: Filter for high-income, churned customers\n",
"high_income_churned = churned_customers[churned_customers['IncomeCategory'] == high_income]\n",
"\n",
"# TODO: Filter for high-income, unchurned customers\n",
"high_income_unchurned = unchurned_customers[unchurned_customers['IncomeCategory'] == high_income]\n",
"\n",
"# TODO: Compute difference in means between `high_income_churned` and `high_income_unchurned` \n",
"average_income_churned = low_income_churned.mean()\n",
"average_income_unchurned = low_income_unchurned.mean()\n",
"difference_in_average = average_income_churned - average_income_unchurned\n",
"print(difference_in_average)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Wrapping Up\n",
"\n",
"Congratulations -- you've finished digging into the customer churn data set, and have generated significant
insights for your organization! Write a paragraph that summarizes your findings and explains which factor(s)
you analyzed are predictive of customer churn. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Get the best Analyzing 2011 E-Commerce Sales Data assignment help and tutoring services from our experts now!
About The Author - Jordan Lee
Jordan Lee is a data analyst with expertise in sales data processing. For this assignment, Jordan focused on cleaning 2011 e-commerce data, including date conversions and removing invalid entries. Analyzed metrics like total items sold, revenue, and average order value using Jupyter Notebook to support a decade-long marketing review.