You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
675 lines
21 KiB
675 lines
21 KiB
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div>\n",
|
|
" <span>\n",
|
|
" <p align=\"left\">\n",
|
|
" <img align=\"left\" style=\"padding-right: 5px\" valign=\"center\" src=\"https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png\" width=\"28px\">\n",
|
|
" </p>\n",
|
|
" </span>\n",
|
|
" <span>\n",
|
|
" <h1>Consumer Sales Lab</h1>\n",
|
|
" </span>\n",
|
|
"</div>\n",
|
|
"\n",
|
|
"<font color='red'><strong>Important!!</font></strong>: This lab is fairly challenging and may take longer than 60m to complete. Because of this, we've included a <a href='#shortcut'>shortcut cell</a> that _skips the EDA section_ of this lab and lets you get right to the analysis section. Work with your instructor and use your best judgement to use your time wisely to focus on the areas you'd like to practice.\n",
|
|
"\n",
|
|
"This lab can be conducted in one of two ways:\n",
|
|
"\n",
|
|
"- <a href=\"#eda\">EDA</a> _and_ <a href=\"#analysis\">Analysis</a>\n",
|
|
"- <a href=\"#analysis\">Analysis</a> only\n",
|
|
"\n",
|
|
"The <a href=\"#eda\">EDA</a> section covers the following topics:\n",
|
|
"\n",
|
|
"- <a href='#import'>Importing</a> data from csvs\n",
|
|
"- <a href=\"#nulls\">Handling nulls</a>\n",
|
|
"- <a href='#dtypes'>Casting different Dtypes</a>\n",
|
|
"- <a href=\"#join\">Complex joining</a> of star-schema tables\n",
|
|
"\n",
|
|
"The <a href=\"#analysis\">Analysis</a> section covers the following topics:\n",
|
|
"\n",
|
|
"- <a href='#fe'>Feature engineering</a>\n",
|
|
"- <a href=\"#visualization\">Visualization and Reporting</a>\n",
|
|
"\n",
|
|
"<details>\n",
|
|
" <summary>Table of Contents</summary>\n",
|
|
" <ul>\n",
|
|
" <li><a href=\"#eda\">EDA</a></li>\n",
|
|
" <ul>\n",
|
|
" <li><a href='#import'>Import</a></li>\n",
|
|
" <li><a href=\"#nulls\">Nulls</a></li>\n",
|
|
" <li><a href='#dtypes'>Dtypes</a></li>\n",
|
|
" <li><a href=\"#join\">Join</a></li>\n",
|
|
" </ul>\n",
|
|
" <li><a href=\"#analysis\">Analysis</a></li>\n",
|
|
" <ul>\n",
|
|
" <li><a href='#fe'>Feature Engineering</a></li>\n",
|
|
" <li><a href=\"#visualization\">Visualization and Reporting</a></li>\n",
|
|
" <ul>\n",
|
|
" <li><a href='#1a'>1.A</a></li>\n",
|
|
" <li><a href=\"#1b\">1.B</a></li>\n",
|
|
" <li><a href='#1c'>1.C</a></li>\n",
|
|
" <li><a href=\"#2a\">2.A</a></li>\n",
|
|
" <li><a href=\"#3a\">3.A</a></li>\n",
|
|
" </ul>\n",
|
|
" </ul>\n",
|
|
" </ul>\n",
|
|
"</details>\n",
|
|
"<details>\n",
|
|
" <summary>Background</summary>\n",
|
|
" <ul>\n",
|
|
" <li>Originally adapted from <a href=\"https://sense-demo.qlik.com/sense/app/372cbc85-f7fb-4db6-a620-9a5367845dce\">qlik</a>, we'll be performing EDA on a consumer data set.</li>\n",
|
|
" </ul>\n",
|
|
"</details>\n",
|
|
"<details id='prompts'>\n",
|
|
" <summary>Prompts</summary>\n",
|
|
" <br>\n",
|
|
" Your boss, Joanna, has requested a report on the following:\n",
|
|
" <ol>\n",
|
|
" <li>Product Sales</li>\n",
|
|
" <ol>\n",
|
|
" <li>Gross margin analysis by product group.</li>\n",
|
|
" <li>Sales by product group, top 10 product groups only.</li>\n",
|
|
" <li>Sales, by year/month, year over year</li>\n",
|
|
" </ol>\n",
|
|
" <li>Sales Reps</li>\n",
|
|
" <ol>\n",
|
|
" <li>Sum of Sales and sales quantity, by rep, by customer</li>\n",
|
|
" </ol>\n",
|
|
" <li>Supply Chain</li>\n",
|
|
" <ol>\n",
|
|
" <li>Inventory vs Lead Time for all products</li>\n",
|
|
" </ol>\n",
|
|
" </ol>\n",
|
|
"</details>\n",
|
|
"<details id='dictionary'>\n",
|
|
" <summary>Data Dictionary</summary>\n",
|
|
" <br>\n",
|
|
" <!-- table created with https://www.tablesgenerator.com/html_tables please see ../assets/dictionary.tgn file -->\n",
|
|
" <table>\n",
|
|
" <tr>\n",
|
|
" <th>Table</th>\n",
|
|
" <th>Field</th>\n",
|
|
" <th>Description</th>\n",
|
|
" <th>PK</th>\n",
|
|
" <th>FK</th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Item master.xls</td>\n",
|
|
" <td>Item Number</td>\n",
|
|
" <td>Foreign key to Sales.Item Number field. Unique identifier for item</td>\n",
|
|
" <td>Y</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Item master.xls</td>\n",
|
|
" <td>Product Group</td>\n",
|
|
" <td>Group for the product, i.e. Frozen Foods, Deli, etc</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Item master.xls</td>\n",
|
|
" <td>Product Line</td>\n",
|
|
" <td>Product line, i.e. Food, Drink, etc</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Item master.xls</td>\n",
|
|
" <td>Product Sub Group</td>\n",
|
|
" <td>Detail field for the Product Group field, i.e. Produce -> Fresh Vegetables</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Item master.xls</td>\n",
|
|
" <td>Product Type</td>\n",
|
|
" <td>Type of product and additional detail at the sub group level, i.e. 'Breakfast Foods'</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales rep.csv</td>\n",
|
|
" <td>Manager</td>\n",
|
|
" <td>Name of manager</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr> \n",
|
|
" <tr>\n",
|
|
" <td>Sales rep.csv</td>\n",
|
|
" <td>Manager Number</td>\n",
|
|
" <td>ID of manager</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales rep.csv</td>\n",
|
|
" <td>Path</td>\n",
|
|
" <td>Order through which sales passes through reps, separated by hyphens. Correlates with Sales Rep ID key.</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales rep.csv</td>\n",
|
|
" <td>Sales Rep Name</td>\n",
|
|
" <td>Primary sales rep name</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales rep.csv</td>\n",
|
|
" <td>Sales Rep Name 1</td>\n",
|
|
" <td>Secondary sales rep name (nullable)</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales rep.csv</td>\n",
|
|
" <td>Sales Rep Name 2</td>\n",
|
|
" <td>Tertiary sales rep name (nullable)</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales rep.csv</td>\n",
|
|
" <td>Sales Rep Name 3</td>\n",
|
|
" <td>Quaterinary sales rep name (nullable)</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales rep.csv</td>\n",
|
|
" <td>Sales Rep ID</td>\n",
|
|
" <td>Foreign key to Sales. UID for path.</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>Y</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Customers.xlsx</td>\n",
|
|
" <td>Customer</td>\n",
|
|
" <td>Name of customer</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Customers.xlsx</td>\n",
|
|
" <td>Customer Number</td>\n",
|
|
" <td>Unique identifier for customer name, keys to Sales.Customer Number</td>\n",
|
|
" <td>Y</td>\n",
|
|
" <td>Y</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Customers.xlsx</td>\n",
|
|
" <td>City Code</td>\n",
|
|
" <td>City ID, foreign key for City.City Code</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>Y</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Cities.xlsx</td>\n",
|
|
" <td>City</td>\n",
|
|
" <td>Name of city</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Cities.xlsx</td>\n",
|
|
" <td>City Code</td>\n",
|
|
" <td>ID of city name</td>\n",
|
|
" <td>Y</td>\n",
|
|
" <td>Y</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Cities.xlsx</td>\n",
|
|
" <td>Region</td>\n",
|
|
" <td>Sales region (i.e. USA, Nordic, etc)</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Cities.xlsx</td>\n",
|
|
" <td>Latitude</td>\n",
|
|
" <td>Latitude of city</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Cities.xlsx</td>\n",
|
|
" <td>Longitude</td>\n",
|
|
" <td>Longitude of city</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Cities.xlsx</td>\n",
|
|
" <td>Desc</td>\n",
|
|
" <td>String description of city, including city, state (if applicable), and country</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>%KEY</td>\n",
|
|
" <td>Primary key of table</td>\n",
|
|
" <td>Y</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Cost</td>\n",
|
|
" <td>Total cost of sale for transaction [USD]</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Customer Number</td>\n",
|
|
" <td>Customer number, keys to Customer.Customer Number</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>Y</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Date</td>\n",
|
|
" <td>Date of sale</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>GrossSales</td>\n",
|
|
" <td>Gross sale for invoice [USD]</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Invoice Date</td>\n",
|
|
" <td>Date of invoice</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Item Desc</td>\n",
|
|
" <td>Description of invoiced item</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Item Number</td>\n",
|
|
" <td>ID of invoiced item (product) - not a primary key. Keys to Item.Item Number</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>Y</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Margin</td>\n",
|
|
" <td>Percent gross margin of line item sale</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Order Number</td>\n",
|
|
" <td>ID of the order placed</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Promised Delivery Date</td>\n",
|
|
" <td>Agreed date of delivery</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Sales</td>\n",
|
|
" <td>Gross sale for invoice [USD], less cost of sale</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Sales Qty</td>\n",
|
|
" <td>Qty of invoiced item sold (see Item Number, Item Desc)</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Sales.xlsx</td>\n",
|
|
" <td>Sales Rep Number</td>\n",
|
|
" <td>Sales rep ID credited with sale</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>Y</td>\n",
|
|
" </tr>\n",
|
|
"</table>\n",
|
|
"</details>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='eda'></div>\n",
|
|
"<h2>EDA</h2>\n",
|
|
"\n",
|
|
"Before we create our charts/reports for Joanna, we'll need to sanity check our input data. We'll get to the analysis (feature engineering) and visualization and reporting in just a bit."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='import'></div>\n",
|
|
"<h3>Import Data</h3>\n",
|
|
"Read in the data. Check the raw file to make sure you understand quote characters, delimiters, and encoding. You will need to use the encoding flag here since we are dealing with international character sets."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Conduct any library imports here"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Read in your sales, cities, customers, item_master, and sales_rep csvs here."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='nulls'></div>\n",
|
|
"<h3>Nulls</h3>\n",
|
|
"Check for nulls and missing values in all imported tables. If you are filling missing values, state your reasoning for dropping and/or imputing data."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='dtypes'></div>\n",
|
|
"<h3>Dtypes</h3>\n",
|
|
"Review all imported tables and convert the data types if necessary, according to the rules in the following table:\n",
|
|
"<br><br>\n",
|
|
"<table>\n",
|
|
" <tr>\n",
|
|
" <th>Name</th>\n",
|
|
" <th>Dtype</th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Primary or Foreign Keys</td>\n",
|
|
" <td>int64 or int32</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Currency</td>\n",
|
|
" <td>float64</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Text fields</td>\n",
|
|
" <td>object (string)</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Discrete, non-negative values</td>\n",
|
|
" <td>int64 or int32</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <td>Dates</td>\n",
|
|
" <td>datetime64[ns] (Timestamp object)</td>\n",
|
|
" </tr>\n",
|
|
"</table>\n",
|
|
" "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='join'></div>\n",
|
|
"<h3>Join</h3>\n",
|
|
"Join all your tables together and store the joined result into a dataframe named <code>cs</code>. You'll need this for the <a href=\"#visualization\">reporting and visualization</a> section below. \n",
|
|
"\n",
|
|
"Use the <a href=\"#dictionary\">data dictionary</a> for guidance."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='Analysis'></div>\n",
|
|
"<h2>Analysis</h2>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='fe'></div>\n",
|
|
"<h3>Feature Engineering</h3>\n",
|
|
"<ul>\n",
|
|
" <li>Create a new column, <code>GrossMargin</code>, which is the <code>GrossSales</code> minus the <code>Cost</code>, all divided by <code>GrossSales</code>. Store this value as a float (percentage).</li>\n",
|
|
" <li>Create a new column, <code>ShipDiff</code>, which is the difference between the <code>Promised Delivery Date</code> and the <code>Invoice Date</code>, in <code>seconds</code>.</li>\n",
|
|
" <li>Drop the <code>%KEY</code>, <code>Sales Rep Number</code>, <code>Manager Number</code>, <code>Path</code>, <code>Sales Rep ID</code>, and <code>Desc</code>.</li>\n",
|
|
"</ul>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='visualization'></div>\n",
|
|
"<h2>Reporting and Visualization</h2>\n",
|
|
"Create charts or reports according to the <a href=\"#prompts\">prompts</a>.\n",
|
|
"\n",
|
|
"<b>Use your best judgement to create visualizations or reports to best answer the questions. As yourself questions such as:</b>\n",
|
|
"<ul>\n",
|
|
" <li>Is the data I'm using categorical or continuous?</li>\n",
|
|
" <li>Am I looking at timeseries data?</li>\n",
|
|
" <li>Am I representing a part-of-a-whole relationship?</li>\n",
|
|
" <li>Do I have many data points? If so, could I report or chart a subset of that data?</li>\n",
|
|
"</ul>\n",
|
|
"\n",
|
|
"<b>There's no right or wrong answer to these questions. As you solve them, focus on this progression:</b>\n",
|
|
"<ul>\n",
|
|
" <li>First, create a pandas report with a dataframe or list of values that attempts to answer the prompt.</li>\n",
|
|
" <li>If you get that done, try charting it out using a pandas charting method, like <code>.plot()</code></li>\n",
|
|
" <li>If you get that done, look at how you might make your chart <i>more information dense</i></li>\n",
|
|
" <ul>\n",
|
|
" <li>Increase chart ink area</li>\n",
|
|
" <li>Reduce visual clutter</li>\n",
|
|
" <li>Use color to convey meaning</li>\n",
|
|
" <li>Increase information density with shape, size, color</li>\n",
|
|
" <li>Use callouts to highlight anomalies in your data or points of interest</li>\n",
|
|
" </ul>\n",
|
|
" <li>If you get that done, look at graduating from the <a href=\"https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html\">pandas plotting methods</a> to <a href=\"https://matplotlib.org/\">matplotlib</a>, which is the backend for pandas plotting. This will allow you more control over your plots but the learning curve is fairly steep. Hang in there!</li>\n",
|
|
"</ul>\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<img src=\"https://cdn.wikimg.net/en/strategywiki/images/7/70/SMB3-warpzone.png\" height=\"150\" width=\"150\">\n",
|
|
"\n",
|
|
"<div id='shortcut'></div>\n",
|
|
"<font color='red'><strong>Shortcut cell</font></strong>: if you'd like to bypass the EDA section, please run the cell below to import a pre-cleaned dataset into variable `cs` for charting purposes:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Shortcut cell\n",
|
|
"cs = pd.read_csv('../data/pre_cleaned_data.csv', \n",
|
|
" infer_datetime_format=True, \n",
|
|
" parse_dates=['Date', 'Invoice Date', 'Promised Delivery Date']\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='1a'></div>\n",
|
|
"<h3>1.A</h3>\n",
|
|
"Gross Margin by product group."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='1b'></div>\n",
|
|
"<h3>1.B</h3>\n",
|
|
"Sales by product group, top 10 product groups only."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='1c'></div>\n",
|
|
"<h3>1.C</h3>\n",
|
|
"Sales, by year/month, year over year"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='2a'></div>\n",
|
|
"<h3>2.A</h3>\n",
|
|
"Sum of Sales and sales quantity, by rep, by customer. Top 10 customer gross sales only. Formatted as a data frame, not a chart."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id='3a'></div>\n",
|
|
"<h3>3.A</h3>\n",
|
|
"Scatter plot of mean Gross Margin vs Gross Sales, by Product Sub Group"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# A:"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.2"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|