You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
195 lines
6.5 KiB
195 lines
6.5 KiB
<!--
|
|
---
|
|
title: Plotting with Pandas
|
|
type: lesson
|
|
duration: "1:00"
|
|
---
|
|
-->
|
|
|
|
##  {.separator}
|
|
|
|
<h1>Plotting with Pandas</h1>
|
|
|
|
|
|
<!--
|
|
|
|
## Overview
|
|
This lesson covers plotting with pandas, which serves as a wrapper for matplotlib.
|
|
|
|
## Learning Objectives
|
|
In this lesson, students will:
|
|
- Use pandas to plot from three different datasets
|
|
|
|
## Duration
|
|
60 minutes
|
|
|
|
## Suggested Agenda
|
|
|
|
| Time | Activity |
|
|
| --- | --- |
|
|
|
|
## Suggested Agenda
|
|
|
|
| Time | Activity | Purpose |
|
|
|-------------|----------|---------|
|
|
| 0:00 - 0:03 | Welcome |
|
|
| 0:03 - 0:15 | Slides |
|
|
| 0:15 - 0:17 | NOTE: Switch to Notebook |
|
|
| 0:17 - 0:25 | Line Plots |
|
|
| 0:25 - 0:35 | Bar Plots and Histograms |
|
|
| 0:35 - 0:44 | Scatter Plots |
|
|
| 0:44 - 0:58 | Independent Exercises |
|
|
| 0:58 - 1:00 | Summary |
|
|
|
|
## Materials and Preparation
|
|
- Send out the presentation link.
|
|
- Students will need the data sets and notebook. Consider having a zip file of all notebooks and data sets for the rest of the unit that you hand out at the beginning of this lesson. Alternatively, link them directly in GitHub - remember that they haven't learned GitHub, so you'll need to help them download the files.
|
|
|
|
## Differentiation and Extensions
|
|
|
|
- If students are excelling in the first half, consider deeper discussions surrounding five number summaries, data integrity, off-the-cuff filters and sorts
|
|
- If students are struggling, work on the code more heavily than the **Class Questions** portions. Make the Independent Exercises be Collective Exercises (as a class)
|
|
|
|
## In Class: Materials
|
|
- Projector
|
|
- Internet connection
|
|
- Jupyter Notebooks
|
|
- Python3
|
|
-->
|
|
|
|
|
|
---
|
|
|
|
|
|
<aside class="notes">
|
|
**Talking Points**:
|
|
This lesson introduces the Pandas library and the beginnings of Exploratory Data Analysis. The majority of the lesson should be spent going through code -- whether that is via Jupyter Slides or a Jupyter Notebook demonstration.
|
|
|
|
To present this content, begin with `04-plotting-with-pandas.ipynb` to introduce Pandas as a library and data integrity. Transition to the Jupyter Notebook to introduce reading in data, column manipulation, filtering and sorting; conclude with exercises.
|
|
|
|
**Teaching Tips**:
|
|
- There are **Class Questions** littered throughout the notebook. Use as much/little time on these as you see fit relative to how your class is pacing
|
|
- There is no **Independent Exercise** at the end of this lesson. It is aspirational to have time to let students work entirely independently on this time-wise, so consider doing a guided code-along or paired programming. Use this time to have students set their own challenges.
|
|
- Pause after learning objectives and level-set for what students will get out of the lesson
|
|
</aside>
|
|
|
|
---
|
|
|
|
## A Note on Delivery
|
|
|
|
- This unit's lessons will occur in [jupyter notebooks](http://jupyter.org/)
|
|
- Slides will be an introduction to the lesson (no code, just overview)
|
|
- Then, we'll open a notebook and start coding!
|
|
|
|
<aside class="notes">
|
|
**Teaching Tips**:
|
|
- We could have made this into a speaker note, but it's helpful to get it out there so everybody's on the same page
|
|
- No repl.it for this unit as we'll be in notebooks
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## Plotting with Pandas
|
|
|
|
- Pandas `.plot()` functionality is effectively a wrapper for [matplotlib](https://matplotlib.org/)
|
|
- Matplotlib is a charting library for python and scientific computing
|
|
- It's considered the de-facto standard for charting locally
|
|
- It's best for scientific papers, EDA, and general introspection of data
|
|
- It's not so great for production level charts that are embedded in applications (check out [d3.js](https://d3js.org/)
|
|
|
|
<aside class="notes">
|
|
**Talking Points**:
|
|
|
|
- Talk briefly about where charts are interpreted, and why different tools may be advantageous
|
|
</aside>
|
|
|
|
---
|
|
|
|
## So, Pandas and Matplotlib
|
|
|
|
|
|
Whats a wrapper?
|
|
|
|
- A program that _abstracts_ another program to modify its interface
|
|
|
|
???
|
|
|
|
- Pandas `.plot()` functionality references matplotlib behind the scenes
|
|
- Matplotlib has a reputation for being fairly complex
|
|
- Even for fairly simple charts, you will frequently write loops
|
|
- A fairly plain chart can be 20-30 lines of code
|
|
- Pandas helps us here and most charts can be produced with 1-2 lines of code
|
|
- Some functionality is reduced, but _effort is minimized in most cases_
|
|
|
|
<aside class="notes">
|
|
**Talking Points**:
|
|
|
|
- Encourage students to learn matplotlib on their own time if they wish
|
|
- Many data science shops use matplotlib as a standard
|
|
- It's a bit older and a little hokey, but it's well supported, open source, and generally gets the job done
|
|
- Take some time to talk about the balance between package complexity and utility overall - sometimes a good answer delivered on time beats a perfect answer delivered late
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## Talk Data to Me
|
|
|
|
|
|
We'll be using three data sets for this lesson:
|
|
|
|
- Football Records: International football results from 1872 to 2018
|
|
- Avocado Prices: Historical data on avocado prices and sales volume in multiple US markets
|
|
- Chocolate Bar Ratings: Expert ratings of over 1,700 chocolate bars
|
|
|
|
All datasets have been graciously downloaded from Kaggle.com, and we'll discover that the right visualization can often replace a bit of fancy machine learning, if done properly.
|
|
|
|
<aside class="notes">
|
|
**Talking Points**:
|
|
|
|
- We'll be walking through the data sets during the lesson, feel free to refer to the links there if you wish.
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## Chart Types
|
|
|
|
We'll be covering the following chart types during this lesson:
|
|
|
|
- Time series line charts
|
|
- Categorical bar charts
|
|
- Histograms of single columns
|
|
- Histograms of entire data frames
|
|
- Scatter plots (continuous vs continuous)
|
|
- Scatter matricies (multiple scatter plots in a grid)
|
|
- Scatter plots with class colors for data points
|
|
|
|
<aside class="notes">
|
|
**Teaching Tips**:
|
|
|
|
- This is the tip of the iceberg for plots, that's okay
|
|
- Assure students that the above charts have been selected specifically to cover the majority of cases you'll encounter
|
|
- Take a minute to talk about the common use cases for each of these, as well as the data types they all prefer
|
|
|
|
</aside>
|
|
|
|
---
|
|
|
|
## Let's Go!
|
|
|
|
|
|
- Open up your dataset!
|
|
|
|
<aside class="notes">
|
|
**Teaching Tips**:
|
|
|
|
- Make sure everyone gets to the notebook successfully.
|
|
- Have students assist one another and walk around the room to ensure everyone gets to the notebook successfully
|
|
- Make sure all students can open and run their Notebooks. It's only the second time they've done so!
|
|
- The presentation is also at the top of the Notebook, so students can later reference in one place. Jump down to `Importing Pandas`.
|
|
</aside>
|
|
|
|
---
|