PYTHR/unit-6-pandas/08-pandas-datetime/code/pandas-datetime.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div>\n",
    "    <span>\n",
    "    <p align=\"left\">\n",
    "    <img align=\"left\" valign=\"center\" src=\"https://ga-core.s3.amazonaws.com/production/uploads/program/default_image/5734/GA_Stack_Large_RedBlack_RGB.png\" width=\"80px\">\n",
    "    </p>\n",
    "    </span>\n",
    "    <span>\n",
    "        <h1>Pandas for Dates and Times</h1>\n",
    "    </span>\n",
    "</div>\n",
    "\n",
    "\n",
    "Pandas can also be used for times and dates! This is especially helpful for financial analysis, as most measures are with respect to time.\n",
    "\n",
    "<!-- Overview -->\n",
    "<details>\n",
    "    <summary>Overview</summary>\n",
    "    <ul>\n",
    "        <li>In this lesson, we'll continue exploring Pandas with dates and times. Specifically:</li>\n",
    "        <ul>\n",
    "            <li>Converting dates and times into a <code>Timestamp</code> object using <code>to_datetime</code>.</li>\n",
    "            <li>Specifying input and output format arguments</li>\n",
    "            <li>Extracting components, such as year and day, from a <code>Timestamp</code> object</li>\n",
    "            <li>Creating <code>DatetimeIndex</code> objects, and their advantages</li>\n",
    "        </ul>\n",
    "    </ul>\n",
    "</details>\n",
    "\n",
    "<!-- TOC -->\n",
    "<details>\n",
    "    <summary>Table of Contents</summary>\n",
    "    <ul>\n",
    "        <li><a href=\"#import\">Import</a></li>\n",
    "        <li><a href=\"#objects\">Datetime Objects</a></li>\n",
    "        <ul>\n",
    "            <li><a href=\"#timestamp\">Creating Timestamp Objects</a></li>\n",
    "            <li><a href=\"#timestampidx\">Creating an Index of Timestamps</a></li>\n",
    "            <li><a href=\"#period\">Creating Period Objects</a></li>\n",
    "            <li><a href=\"#periodidx\">Creating an Index of Periods</a></li>\n",
    "        </ul>\n",
    "        <li><a href=\"#conversion\">Converting Datetime Objects</a></li>\n",
    "        <ul>\n",
    "            <li><a href=\"#todatetime\">Using .to_datetime()</a></li>\n",
    "            <ul>\n",
    "                <li><a href=\"#nulls\">Working with Nulls</a></li>\n",
    "            </ul>\n",
    "            <li><a href=\"#extracting\">Extracting Components from Datetime Objects</a></li>\n",
    "            <ul>\n",
    "                <li><a href=\"#year\">Year</a></li>\n",
    "                <li><a href=\"#day\">Day</a></li>\n",
    "            </ul>\n",
    "        </ul>\n",
    "        <li><a href=\"#exercise\">Exercise - AdventureWorks</a></li>\n",
    "        <ul>\n",
    "            <li><a href=\"#p_exercise\">Production.Product Exercise</a></li>\n",
    "            <ul>\n",
    "                <li><a href=\"#p_read\">Read in the Dataset</a></li>\n",
    "                <li><a href=\"#p_resetidx\">Reset Index</a></li>\n",
    "                <li><a href=\"#p_types\">Identify Types</a></li>\n",
    "                <li><a href=\"#p_nullcheck\">Check Nulls</a></li>\n",
    "                <li><a href=\"#p_convert\">Convert Dates</a></li>\n",
    "                <li><a href=\"#p_newcols\">Create New Columns</a></li>\n",
    "            </ul>\n",
    "            <li><a href=\"#p_exercise\">Sales.SalesOrderHeader Exercise</a></li>\n",
    "            <ul>\n",
    "                <li><a href=\"#s_read\">Read in the Dataset</a></li>\n",
    "                <li><a href=\"#s_resetidx\">Reset Index</a></li>\n",
    "                <li><a href=\"#s_slice\">Slice Dates</a></li>\n",
    "            </ul>\n",
    "        </ul>\n",
    "    </ul>\n",
    "</details>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"import\"></div>\n",
    "<h2>Import Pandas</h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "print(f'Pandas v{pd.__version__}\\nNumpy v{np.__version__}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"objects\"></div>\n",
    "<h2>Datetime Objects</h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table>\n",
    "  <tr>\n",
    "    <th>Use</th>\n",
    "    <th>Class</th>\n",
    "    <th>Remarks</th>\n",
    "    <th>How to create</th>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"2\">Time points</td>\n",
    "    <td>Timestamp</td>\n",
    "    <td>Represents a single timestamp</td>\n",
    "    <td>to_datetime, Timestamp</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>DatetimeIndex</td>\n",
    "    <td>Index of Timestamp</td>\n",
    "    <td>to_datetime, date_range, bdate_range, DatetimeIndex</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"2\">Time spans</td>\n",
    "    <td>Period</td>\n",
    "    <td>Represents a single time span</td>\n",
    "    <td>Period</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>PeriodIndex</td>\n",
    "    <td>Index of Period</td>\n",
    "    <td>period_range, PeriodIndex</td>\n",
    "  </tr>\n",
    "<table>\n",
    "    \n",
    "Above is a list of the possible types of dates and times within pandas. Note that, under the hood, numpy `datetime64` and `timedelta64` objects are being used - the former for `Timestamp` and `DatetimeIndex` objects, and the latter for `Period` and `PeriodIndex` objects, respectively.\n",
    "\n",
    "<ul>\n",
    "    <li>Time points</li>\n",
    "    <ul>\n",
    "        <li>The first two objects in the above table, `Timestamp` and `DatetimeIndex` deal with <a href=\"https://pandas.pydata.org/pandas-docs/stable/timeseries.html#converting-to-timestamps\"><b>discrete points in time</b></a>. This will be the focus for this and future labs.</li>\n",
    "    </ul>\n",
    "    <li>Time spans</li>\n",
    "    <ul>\n",
    "        <li>The latter two objects, `Period` and `PeriodIndex` deal with <a href=\"https://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-span-representation\"><b>spans of time</b></a>. We will briefly touch on this, as most data ingestion tasks (versus creation) are handled with coersion into the former `Timestamp` objects. It is assumed, and usually the case, that pandas mutates previously existing data (usually in an RDBMS like SQL).</li>\n",
    "    </ul>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"timestamp\"></div>\n",
    "<h3>Creating Timestamp Objects</h3>\n",
    "\n",
    "Let's start by making a `Timestamp` object.\n",
    "\n",
    "```python\n",
    "Init signature: pd.Timestamp(ts_input=<object object at 0x7fc5d75bfe60>, freq=None, tz=None, unit=None, year=None, month=None, day=None, hour=None, minute=None, second=None, microsecond=None, nanosecond=None, tzinfo=None)\n",
    "Docstring:     \n",
    "Pandas replacement for datetime.datetime\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can create a `Timestamp` object using the kwargs explicitly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also implicitly cast a string as follows. Note the `T` separator between YYYYMMDD and HH:MM:SS."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that, under the hood, a `Timestamp` object is a `datetime64[ns]` numpy object which has nanosecond resolution and is stored as a 64 bit integer. As such, it's capable of covering about 584 years. That's a lot of nanoseconds! 2^64, to be exact."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"timestampidx\"></div>\n",
    "<h3>Creating an Index of Timestamps</h3>\n",
    "\n",
    "We can assign these `Timestamp` objects to the index of our `DataFrame` to create a table that is indexed chronologically. A timeseries database, if you will. Pandas has a helper function for this, `pd.date_range`. This takes three arguments:\n",
    "\n",
    "<ol>\n",
    "    <li><code>start</code>: The beginning of the index</li>\n",
    "    <li><code>end</code>: The end of the index</li>\n",
    "    <li><code>freq</code>: The interval for each <code>Timestamp</code></li>\n",
    "</ol>\n",
    "\n",
    "Note that the <code>freq</code> references what is referred to as an <a href=\"https://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases\">offset alias</a>, which is pre-loaded set of common frequencies, or <code>Timestamp</code> spans.\n",
    "\n",
    "```python\n",
    "Signature: pd.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs)\n",
    "Docstring:\n",
    "Return a fixed frequency DatetimeIndex.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will create an index starting a Jan 1, 2018, and ending at Jan 1, 2019. It does so with `BM`, or <i>business month end frequency</i>. This is the last work day of each month."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then create a series out of this index by specifying it in the <code>index=</code> kwarg of <code>pd.Series</code>:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that our index type is a <code>DateTimeIndex</code>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This allows us the distinct advantage of slicing and indexing our index, just as we would with an automatically generated, integer index. Let's take the first 3 rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But there's more! We can even select specific dates in the index, as a string, which returns to us the corresponding value in that 'cell':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"stringslicing\"></div>\n",
    "Finally, we can select <b>ranges</b> of dates, <i>as strings</i>:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"conversion\"></div>\n",
    "<h2>Converting Datetime Objects</h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"todatetime\"></div>\n",
    "<h3>Using .to_datetime()</h3>\n",
    "\n",
    "Previously, we manually created `Timestamp` and `Period` objects. This assumes we know the year, day, etc of our input data - nice and clean, ready to convert. \n",
    "\n",
    "Of course, data is never clean, and we'd need to parse the input string to feed the individual keyword arguments to a function like `Timestamp` for it to know how to convert it. What a pain! Surely, there must be a better way to parse these pesky strings!\n",
    "\n",
    "<b>Enter `to_datetime()`.</b>\n",
    "\n",
    "This function is extremely powerful and automatically detects and parses input dates (as strings) and returns the result as a `Timestamp` object. Nice!\n",
    "\n",
    "```python\n",
    "Signature: pd.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=False)\n",
    "Docstring:\n",
    "Convert argument to datetime.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see if it can convert this wild string correctly, 1:55pm and 24 seconds, January 22nd, 1985."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Amazing! Let's try it out on a `DataFrame` object to see how it fares. First, let's make a `DataFrame` containing a few dates we make up. Note that the 3rd entry is `np.nan`, which represents a null value in our dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's try to convert this column to a `Timestamp` object, using `to_datetime()`. Do we think it will work? Note: we are storing the result `Series` object of the conversion in a variable, `s`, for later use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i>It did!</i><br><br><u>Note:</u>\n",
    "<ul>\n",
    "    <li>We're storing the date as the numpy object, <code>datetime64[ns]</code>, which is the storage object for a <code>Timestamp</code>.</li>\n",
    "    <li>We have assumed a time of midnight, <code>00:00:00</code> for days where no time information is passed to <code>to_datetime()</code>.</li>\n",
    "    <li>We've imputed our null as <code>NaT</code>, which is short for 'Not a Time'</li>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"nulls\"></div>\n",
    "<h4>Working with Nulls</h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This brings us to our final point - what happens if our superhero `to_datetime()` function <i>can't parse the input string</i> and arrive at a usable date?\n",
    "\n",
    "The default behavior is `raise`, which raises a `ValueError` and exits the function (stops parsing immediately). What if we just want to stick a null in there and move on?\n",
    "\n",
    "That's what `coerce` is useful for. If `to_datetime()` can't parse the string, it'll just stick a `NaT` in there instead. In many cases, this is preferable. Make sure to keep an eye on the number of nulls you generate when using this as it won't warn you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the default for errors kwarg is 'raise'\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"extracting\"></div>\n",
    "<h3>Extracting Components from Datetime Objects</h3>\n",
    "\n",
    "So, we've gotten our messy string values all tidied up using `to_datetime()`. What happens when I want to retrieve the year or day of the data I've stored?\n",
    "\n",
    "Enter <a href=\"https://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties\">series datetimelike properties</a>.\n",
    "\n",
    "These are a collection of `Series` object properties that are accessible when the datatype is `Timestamp` or `Period`. Basically, <i>it allows us to extract date information from our date column, when it's stored as a date</i>.\n",
    "\n",
    "We access these properties using the following dot notation:\n",
    "\n",
    "```python\n",
    "pd.Series.dt.<part of the date you want>\n",
    "```\n",
    "\n",
    "If our `Series` object were named `s`, it'd be written as:\n",
    "\n",
    "```python\n",
    "s.dt.<part of the date you want>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"year\"></div>\n",
    "Let's extract just the `year` of the above `Series` object. What do we expect to see?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"day\"></div>\n",
    "Now you try - retrieve the `day` of the time series. What is the result?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the data type of the resultant conversion? Why does this matter?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b>Best Practices</b>:\n",
    "<ul>\n",
    "    <li>Whenever possible, store dates as <code>Timestamp</code> or <code>Period</code> objects</li>\n",
    "    <li>These datatypes use methods and properties that are memory (space) and compute (time) optimized</li>\n",
    "    <li><i>Only during reporting or extraction</i> should the above properties be used</li>\n",
    "    <li>Any children of the parent datetime object are not stored to help reduce redundancy and database size</li>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"exercise\"></div>\n",
    "<h2>Exercise - AdventureWorks</h2>\n",
    "<p align=\"right\">\n",
    "<img src=\"http://lh6.ggpht.com/_XjcDyZkJqHg/TPaaRcaysbI/AAAAAAAAAFo/b1U3q-qbTjY/AdventureWorks%20Logo%5B5%5D.png?imgmax=800\">\n",
    "</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"p_exercise\"></div>\n",
    "<h3>Production.Product</h3>\n",
    "\n",
    "Here's the <i>Production.Product</i> table [data dictionary](https://www.sqldatadictionary.com/AdventureWorks2014/Production.Product.html), which is a description of the fields (columns) in the table (the .csv file we will import below):<br>\n",
    "\n",
    "<details>\n",
    "    <summary>Data Dictionary</summary>\n",
    "    <table>\n",
    "        <tr>\n",
    "            <th>Name</th>\n",
    "            <th>Description</th>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>ProductID</td>\n",
    "            <td>Primary key for Product records</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>Name</td>\n",
    "            <td>Name of the product</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>ProductNumber</td>\n",
    "            <td>Unique product identification number</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>MakeFlag</td>\n",
    "            <td>0 = Product is purchased, 1 = Product is manufactured in-house.</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>FinishedGoodsFlag</td>\n",
    "            <td>0 = Product is not a salable item. 1 = Product is salable.</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>Color</td>\n",
    "            <td>Product color</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>SafetyStockLevel</td>\n",
    "            <td>Minimum inventory quantity</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>ReorderPoint</td>\n",
    "            <td>Inventory level that triggers a purchase order or work order</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>StandardCost</td>\n",
    "            <td>Standard cost of the product [USD]</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>ListPrice</td>\n",
    "            <td>Selling price [USD]</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>Size</td>\n",
    "            <td>Product size [units vary, see SizeUnitMeasureCode]</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>SizeUnitMeasureCode</td>\n",
    "            <td>Unit of measure for the Size column</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>WeightUnitMeasureCode</td>\n",
    "            <td>Unit of measure for the Weight column</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>DaysToManufacture</td>\n",
    "            <td>Number of days required to manufacture the product</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>ProductLine</td>\n",
    "            <td>R = Road, M = Mountain, T = Touring, S = Standard</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>Class</td>\n",
    "            <td>H = High, M = Medium, L = Low</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>Style</td>\n",
    "            <td>W = Womens, M = Mens, U = Universal</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>ProductSubcategoryID</td>\n",
    "            <td>Product is a member of this product subcategory. Foreign key to ProductSubCategory.ProductSubCategoryID</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>ProductModelID</td>\n",
    "            <td>Product is a member of this product model. Foreign key to ProductModel.ProductModelID</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>SellStartDate</td>\n",
    "            <td>Date the product was available for sale</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>SellEndDate</td>\n",
    "            <td>Date the product was no longer available for sale</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>DiscontinuedDate</td>\n",
    "            <td>Date the product was discontinued</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>rowguid</td>\n",
    "            <td>ROWGUIDCOL number uniquely identifying the record. Used to support a merge replication sample</td>\n",
    "        </tr>\n",
    "        <tr>\n",
    "            <td>ModifiedDate</td>\n",
    "            <td>Date and time the record was last updated</td>\n",
    "        </tr>\n",
    "    </table>\n",
    "</details>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"p_read\"></div>\n",
    "<h4>Read in the Dataset</h4>\n",
    "\n",
    "We are using the `read_csv()` method (and the `\\t` separator to specify tab-delimited columns)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# let's check out the first 3 rows again, for old time's sake\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# and the number of rows x cols\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"p_resetidx\"></div>\n",
    "<h4>Reset Index</h4>\n",
    "\n",
    "Let's bring our `ProductID` column into the index since it's the PK (primary key) of our table and that's where PKs belong as a best practice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"p_types\"></div>\n",
    "<h3>Identify Types</h3>\n",
    "\n",
    "<ul>\n",
    "    <li>Print out the column data types</li>\n",
    "    <li>Which columns in the `prod` dataframe are candidates for datetime conversion? Store the column names in a list named <code>datecols</code></li>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"p_nullcheck\"></div>\n",
    "<h4>Check Nulls</h4>\n",
    "\n",
    "<ul>\n",
    "    <li>Report the number of nulls for each column contained in <code>datecols</code></li>\n",
    "    <li>Display this result as a <code>pd.Series</code> object</li>\n",
    "    <li>Make note of anything that might warrant further investigation</li>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"p_convert\"></div>\n",
    "<h3>Convert the <code>SellStartDate</code> column to a <code>Timestamp</code> object</h3>\n",
    "\n",
    "Convert the <code>SellStartDate</code> column to a <code>Timestamp</code> object using <a href=\"#todatetime\"><code>to_datetime()</code></a>.\n",
    "\n",
    "<ul>\n",
    "    <li>Write a <code>for</code> loop to iterate over the columns in <code>datecols</code></i>\n",
    "    <li>Using <a href=\"#todatetime\"><code>to_datetime()</code></a>, convert each column to a <code>Timestamp</code> object</li>\n",
    "    <li>Take the result of this <code>Timestamp</code> object and overwrite each respective source column</li>\n",
    "    <li>Print the data types of the <code>datecols</code> columns to verify the conversion</li>\n",
    "    <li>Print the first 3 rows of the <code>datecols</code> columns to verify the conversion</li>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"p_newcols\"></div>\n",
    "<h3>Create New Columns</h3>\n",
    "\n",
    "Using <a href=\"https://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties\">series datetimelike properties</a>, create three new columns:\n",
    "\n",
    "<ul>\n",
    "    <li><code>SellStartDate_Year</code>, a column containing the <code>year</code> of the <code>SellStartDate</code> column.</li>\n",
    "    <li><code>SellStartDate_Month</code>, a column containing the <code>month</code> of the <code>SellStartDate</code> column.</li>\n",
    "    <li><code>SellStartDate_Day</code>, a column containing the <code>month</code> of the <code>SellStartDate</code> column.</li>\n",
    "    <li>Print the data types of <code>SellStartDate</code>, <code>SellStartDate_Year</code>, <code>SellStartDate_Month</code>, and <code>SellStartDate_Day</code>.</li>\n",
    "    <li>Print the first 3 rows of <code>SellStartDate</code>, <code>SellStartDate_Year</code>, <code>SellStartDate_Month</code>, and <code>SellStartDate_Day</code>.</li>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"exercise_product\"></div>\n",
    "<h3>Sales.SalesOrderDetail</h3>\n",
    "\n",
    "Here's the <i>Sales.SalesOrderDetail</i> table [data dictionary](https://www.sqldatadictionary.com/AdventureWorks2014/Sales.SalesOrderDetail.html), which is a description of the fields (columns) in the table (the .csv file we will import below).<br>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"s_read\"></div>\n",
    "<h3>Read in the Dataset</h3>\n",
    "\n",
    "We are using the `read_csv()` method (and the `\\t` separator to specify tab-delimited columns)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"s_resetidx\"></div>\n",
    "<h3>Reset Index</h3>\n",
    "\n",
    "Using <code>.set_index()</code>, set the <code>sod</code> dataframe to a <a href=\"#timestampidx\"><code>Timestamp</code> index</a>.\n",
    "\n",
    "<ul>\n",
    "    <li>Display the index of the <code>sod</code> dataframe as-is.</li>\n",
    "    <li>Convert <code>ModifiedDate</code> to a <code>Timestamp</code> object.</li>\n",
    "    <li>Use <code>.set_index()</code> to make this new column the dataframe index.</li>\n",
    "    <li>Display the index of the <code>sod</code> dataframe after conversion. Has the type changed?</li>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div id=\"s_slice\"></div>\n",
    "<h3>Slicing Dates</h3>\n",
    "\n",
    "Using <a href=\"#stringslicing\">string slicing</a> of the index:\n",
    "<ul>\n",
    "    <li>Find how many <code>SalesOrderID</code>s were processed between March 15th, 2013 and March 20th, 2013</li>\n",
    "    <li>Return your result as an <code>int</code>.</li>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}