You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
220 lines
6.3 KiB
220 lines
6.3 KiB
<!--
|
|
---
|
|
title: Pandas Datetime
|
|
type: lesson
|
|
duration: "0:45"
|
|
---
|
|
-->
|
|
|
|
##  {.separator}
|
|
|
|
<h1>Pandas Datetime</h1>
|
|
|
|
<!--
|
|
|
|
## Overview
|
|
This is an intro to the notebook - use it as an overview to cover the concepts. Budget time accordingly.
|
|
|
|
## Important Notes or Prerequisites
|
|
|
|
- There are **Class Questions** littered throughout the notebook. Use as much/little time on these as you see fit relative to how your class is pacing
|
|
- This lesson includes high level slides and a Notebook. To present this content, it is recommended you begin directly with the Jupyter Notebook. The student slides contain the wrap-up of the big ideas covered in the notebook.
|
|
|
|
|
|
---
|
|
|
|
## Learning Objectives
|
|
*After this lesson, you will be able to:*
|
|
|
|
- Handle timeseries data in pandas
|
|
- Convert dates and times into a Timestamp object using to_datetime
|
|
- Specify input and output format arguments
|
|
- Extract components, such as year and day, from a Timestamp object
|
|
- Create DatetimeIndex objects, and understand their advantages
|
|
- Implement groupby statements for specific segmented analysis
|
|
- Use apply functions to clean data with Pandas
|
|
|
|
## Duration
|
|
45 minutes.
|
|
|
|
---
|
|
|
|
## Suggested Agenda
|
|
|
|
| Time | Activity | Purpose |
|
|
|-------------|----------|---------|
|
|
| 0:00 - 0:03 | Welcome |
|
|
| 0:03 - 0:08 | Datetime Objects |
|
|
| 0:08 - 0:11 | Timestamp and Period Objects |
|
|
| 0:11 - 0:25 | Converting Datetime Objects |
|
|
| 0:25 - 0:30 | Handling Nulls |
|
|
| 0:30 - 0:40 | Exercise |
|
|
| 0:40 - 0:45 | Summary |
|
|
|
|
## Materials and Preparation
|
|
- Send out the link to the presentation slides, and help students download the Notebook.
|
|
|
|
## Differentiation and Extensions
|
|
|
|
- If students are excelling in the first half, consider deeper discussions.
|
|
- If students are struggling, hone the conceptual elements of each portion heavily - the **why**. Note that the order of these lessons is in order of importance, so even if the latter half is rushed, students will still be covering the major points.
|
|
|
|
-->
|
|
|
|
---
|
|
|
|
## Learning Objectives
|
|
*After this lesson, you will be able to:*
|
|
|
|
- Handle timeseries data in Pandas
|
|
- Convert dates and times into a Timestamp object using `to_datetime`
|
|
- Specify input and output format arguments
|
|
- Extract components, such as year and day, from a `Timestamp` object
|
|
- Create `DatetimeIndex` objects, and understand their advantages
|
|
- Implement `groupby` statements for specific segmented analysis
|
|
- Use apply functions to clean data with Pandas
|
|
|
|
---
|
|
|
|
## To the Notebook!
|
|
|
|
We will actually commence this lesson directly in the Jupyter Notebook, `pandas-datetime.ipynb`, to walk through the what, why, and how all at once.
|
|
|
|
Here we have slides reviewing the key concepts.
|
|
|
|
<aside class="notes">
|
|
|
|
**Teaching Tip**:
|
|
|
|
- This is an intro to the notebook - use it as an overview to cover the concepts. Budget time accordingly.
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## How Do We Handle Timeseries Data (Dates and Times)?
|
|
|
|
To handle timeseries data, we must:
|
|
|
|
- Import the data (usually as a string)
|
|
- Convert the data into a `Timestamp` object
|
|
- Handle missing values (sometimes)
|
|
- Understand how to slice and handle this `Timestamp` object
|
|
|
|
**Pro tip:** Timeseries information is very common in the financial industry (fintech/trading, etc).
|
|
|
|
<aside class="notes">
|
|
|
|
**Teaching Tip**:
|
|
|
|
- Do a quick recap and check for understanding.
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## A Note on Delivery
|
|
|
|
- This unit's lessons will occur in [jupyter notebooks](http://jupyter.org/)
|
|
- The slides will be an introduction to the lesson (no code, just overview)
|
|
- Then, we'll open a notebook and start coding!
|
|
|
|
<aside class="notes">
|
|
**Teaching Tip**:
|
|
- We could have made this into a speaker note, but it's helpful to get it out there so everybody's on the same page
|
|
- No repl.it for this unit as we'll be in notebooks
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## Why use Timeseries Data?
|
|
|
|
**Timestamp** objects in pandas allow us to conduct analysis on _chronological data_.
|
|
|
|

|
|
|
|
- What's the X axis unit of this chart?
|
|
- What's the ordering of the data?
|
|
|
|
<aside class="notes">
|
|
|
|
**Teaching Tips**:
|
|
|
|
- Reinforce that chronological data has an inherent order
|
|
- This is intuitive to a human, but a computer needs more help
|
|
- Timestamp objects handle things like leap years, working days, holidays, etc
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## Key Pandas Function for Converting to Timestamp Objects:
|
|
|
|
```python
|
|
Signature: pd.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=False)
|
|
Docstring:
|
|
Convert argument to datetime.
|
|
```
|
|
|
|
1. Read in the dataset using `pd.read_csv()`
|
|
2. Use `pd.to_datetime(df['myColumn'])`
|
|
3. The returned `pd.Series` object will be converted to a Timestamp object!
|
|
|
|
<aside class="notes">
|
|
|
|
**Teaching Tip**:
|
|
|
|
- Do a quick recap and check for understanding.
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## Things to Look Out For
|
|
|
|
**`to_datetime`** allows us to convert from string values to datetime values.
|
|
|
|
- Most of the time, it works very nicely
|
|
- At the end of the day, it's just a string parser
|
|
- Keep this in mind - always check the output column for `NaT` values
|
|
- These are values that `pd.to_datetime` isn't able to convert
|
|
- Make sure you have elegant, automated ways of handling/flagging these scenarios
|
|
- One example may be a separate column flag, and a backfill/forwardfill strategy
|
|
|
|
<aside class="notes">
|
|
|
|
**Teaching Tip**:
|
|
|
|
- Do a quick recap and check for understanding.
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## Why Does This Matter?
|
|
|
|

|
|
|
|
- Storing datetime information in a database (dataframe) as a string:
|
|
- is very space inefficient
|
|
- doesn't allow us to easily _extract_ information from it (see 3NF image above)
|
|
- doesn't allow us to use linear algebra library (numpy!) advantages
|
|
- _Note_: Timestamp (datetime) pandas objects are numpy objects!
|
|
|
|
<aside class="notes">
|
|
|
|
**Teaching Tips**:
|
|
|
|
- Don't get too hung up on this diagram
|
|
- Focus on reduction in redundancy in databases, and how a single Timestamp object allows _derivation_ of all other types (workdays, years, weeks, etc) - from a single object
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## Additional Resources
|
|
|
|
- Pandas [documentation](https://pandas.pydata.org/pandas-docs/stable/)
|
|
- DataSchool [30-video series](http://www.dataschool.io/easier-data-analysis-with-pandas/) (by a former GA instructor!)
|