{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

\n", " \n", "

\n", "
\n", " \n", "

Consumer Sales Lab

\n", "
\n", "
\n", "\n", "Important!!: This lab is fairly challenging and may take longer than 60m to complete. Because of this, we've included a shortcut cell that _skips the EDA section_ of this lab and lets you get right to the analysis section. Work with your instructor and use your best judgement to use your time wisely to focus on the areas you'd like to practice.\n", "\n", "This lab can be conducted in one of two ways:\n", "\n", "- EDA _and_ Analysis\n", "- Analysis only\n", "\n", "The EDA section covers the following topics:\n", "\n", "- Importing data from csvs\n", "- Handling nulls\n", "- Casting different Dtypes\n", "- Complex joining of star-schema tables\n", "\n", "The Analysis section covers the following topics:\n", "\n", "- Feature engineering\n", "- Visualization and Reporting\n", "\n", "
\n", " Table of Contents\n", " \n", "
\n", "
\n", " Background\n", " \n", "
\n", "
\n", " Prompts\n", "
\n", " Your boss, Joanna, has requested a report on the following:\n", "
    \n", "
  1. Product Sales
  2. \n", "
      \n", "
    1. Gross margin analysis by product group.
    2. \n", "
    3. Sales by product group, top 10 product groups only.
    4. \n", "
    5. Sales, by year/month, year over year
    6. \n", "
    \n", "
  3. Sales Reps
  4. \n", "
      \n", "
    1. Sum of Sales and sales quantity, by rep, by customer
    2. \n", "
    \n", "
  5. Supply Chain
  6. \n", "
      \n", "
    1. Inventory vs Lead Time for all products
    2. \n", "
    \n", "
\n", "
\n", "
\n", " Data Dictionary\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TableFieldDescriptionPKFK
Item master.xlsItem NumberForeign key to Sales.Item Number field. Unique identifier for itemYN
Item master.xlsProduct GroupGroup for the product, i.e. Frozen Foods, Deli, etcNN
Item master.xlsProduct LineProduct line, i.e. Food, Drink, etcNN
Item master.xlsProduct Sub GroupDetail field for the Product Group field, i.e. Produce -> Fresh VegetablesNN
Item master.xlsProduct TypeType of product and additional detail at the sub group level, i.e. 'Breakfast Foods'NN
Sales  rep.csvManagerName of managerNN
Sales  rep.csvManager NumberID of managerNN
Sales  rep.csvPathOrder through which sales passes through reps, separated by hyphens. Correlates with Sales Rep ID key.NN
Sales  rep.csvSales Rep NamePrimary sales rep nameNN
Sales  rep.csvSales Rep Name 1Secondary sales rep name (nullable)NN
Sales  rep.csvSales Rep Name 2Tertiary sales rep name (nullable)NN
Sales  rep.csvSales Rep Name 3Quaterinary sales rep name (nullable)NN
Sales  rep.csvSales Rep IDForeign key to Sales. UID for path.NY
Customers.xlsxCustomerName of customerNN
Customers.xlsxCustomer NumberUnique identifier for customer name, keys to Sales.Customer NumberYY
Customers.xlsxCity CodeCity ID, foreign key for City.City CodeNY
Cities.xlsxCityName of cityNN
Cities.xlsxCity CodeID of city nameYY
Cities.xlsxRegionSales region (i.e. USA, Nordic, etc)NN
Cities.xlsxLatitudeLatitude of cityNN
Cities.xlsxLongitudeLongitude of cityNN
Cities.xlsxDescString description of city, including city, state (if applicable), and countryNN
Sales.xlsx%KEYPrimary key of tableYN
Sales.xlsxCostTotal cost of sale for transaction [USD]NN
Sales.xlsxCustomer NumberCustomer number, keys to Customer.Customer NumberNY
Sales.xlsxDateDate of saleNN
Sales.xlsxGrossSalesGross sale for invoice [USD]NN
Sales.xlsxInvoice DateDate of invoiceNN
Sales.xlsxItem DescDescription of invoiced itemNN
Sales.xlsxItem NumberID of invoiced item (product) - not a primary key. Keys to Item.Item NumberNY
Sales.xlsxMarginPercent gross margin of line item saleNN
Sales.xlsxOrder NumberID of the order placedNN
Sales.xlsxPromised Delivery DateAgreed date of deliveryNN
Sales.xlsxSalesGross sale for invoice [USD], less cost of saleNN
Sales.xlsxSales QtyQty of invoiced item sold (see Item Number, Item Desc)NN
Sales.xlsxSales Rep NumberSales rep ID credited with saleNY
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

EDA

\n", "\n", "Before we create our charts/reports for Joanna, we'll need to sanity check our input data. We'll get to the analysis (feature engineering) and visualization and reporting in just a bit." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Import Data

\n", "Read in the data. Check the raw file to make sure you understand quote characters, delimiters, and encoding. You will need to use the encoding flag here since we are dealing with international character sets." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Conduct any library imports here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read in your sales, cities, customers, item_master, and sales_rep csvs here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Nulls

\n", "Check for nulls and missing values in all imported tables. If you are filling missing values, state your reasoning for dropping and/or imputing data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Dtypes

\n", "Review all imported tables and convert the data types if necessary, according to the rules in the following table:\n", "

\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameDtype
Primary or Foreign Keysint64 or int32
Currencyfloat64
Text fieldsobject (string)
Discrete, non-negative valuesint64 or int32
Datesdatetime64[ns] (Timestamp object)
\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Join

\n", "Join all your tables together and store the joined result into a dataframe named cs. You'll need this for the reporting and visualization section below. \n", "\n", "Use the data dictionary for guidance." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Analysis

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Feature Engineering

\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Reporting and Visualization

\n", "Create charts or reports according to the prompts.\n", "\n", "Use your best judgement to create visualizations or reports to best answer the questions. As yourself questions such as:\n", "\n", "\n", "There's no right or wrong answer to these questions. As you solve them, focus on this progression:\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
\n", "Shortcut cell: if you'd like to bypass the EDA section, please run the cell below to import a pre-cleaned dataset into variable `cs` for charting purposes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Shortcut cell\n", "cs = pd.read_csv('../data/pre_cleaned_data.csv', \n", " infer_datetime_format=True, \n", " parse_dates=['Date', 'Invoice Date', 'Promised Delivery Date']\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

1.A

\n", "Gross Margin by product group." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

1.B

\n", "Sales by product group, top 10 product groups only." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

1.C

\n", "Sales, by year/month, year over year" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

2.A

\n", "Sum of Sales and sales quantity, by rep, by customer. Top 10 customer gross sales only. Formatted as a data frame, not a chart." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

3.A

\n", "Scatter plot of mean Gross Margin vs Gross Sales, by Product Sub Group" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A:" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }