|
|
<!doctype html>
|
|
|
<html lang="en">
|
|
|
<head>
|
|
|
<meta charset="utf-8">
|
|
|
|
|
|
<title></title>
|
|
|
|
|
|
<meta name="description" content="">
|
|
|
|
|
|
|
|
|
<meta name="apple-mobile-web-app-capable" content="yes" />
|
|
|
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
|
|
|
|
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
|
|
|
|
|
|
<!-- For syntax highlighting -->
|
|
|
<link rel="stylesheet" href="../../../../lib/css/zenburn.css">
|
|
|
<link rel="stylesheet" href="../../../../lib/css/prism.css">
|
|
|
|
|
|
<link rel="stylesheet" href="../../../../css/reveal.css">
|
|
|
<link rel="stylesheet" href="../../../../css/theme/ga-title.css" id="theme">
|
|
|
|
|
|
<!--[if lt IE 9]>
|
|
|
<script src="lib/js/html5shiv.js"></script>
|
|
|
<![endif]-->
|
|
|
|
|
|
<link rel="stylesheet" type="text/css" href="https://s3.amazonaws.com/python-ga/proxima-nova/fonts.css" />
|
|
|
|
|
|
</head>
|
|
|
|
|
|
<body class="language-javascript">
|
|
|
|
|
|
<div class="reveal">
|
|
|
|
|
|
<!-- Any section element inside of this container is displayed as a slide -->
|
|
|
<div class="slides">
|
|
|
|
|
|
|
|
|
<!--
|
|
|
---
|
|
|
title: Pandas I
|
|
|
type: lesson
|
|
|
duration: "1:00"
|
|
|
creator: [Joseph Nelson](https://twitter.com/josephofiowa)
|
|
|
---
|
|
|
-->
|
|
|
<section id="section" class="level2 separator">
|
|
|
<h2><img src="https://s3.amazonaws.com/python-ga/images/GA_Cog_Medium_White_RGB.png" /></h2>
|
|
|
<h1>
|
|
|
Pandas I
|
|
|
</h1>
|
|
|
<!--
|
|
|
|
|
|
## Overview
|
|
|
This lesson introduces the Pandas library and the beginnings of Exploratory Data Analysis. The majority of the lesson should be spent going through code -- whether that is via Jupyter Slides or a Jupyter Notebook demonstration.
|
|
|
|
|
|
To present this content, begin with `02-pandas-i.md` to introduce Pandas as a library and data integrity. Transition to the Jupyter Notebook to introduce reading in data, column manipulation, filtering and sorting; conclude with exercises.
|
|
|
|
|
|
|
|
|
## Important Notes or Prerequisites
|
|
|
|
|
|
- Review the Iowa liquor dataset [here](https://data.iowa.gov/Economy/Iowa-Liquor-Sales/m3tr-qhgy)
|
|
|
- Be aware that the dataset you are examining is a **subset** of that dataset: it is only May 2017 and May 2018. **New columns** have been created to delineate: `is_may_2017` and `is_may_2018`. These are demonstrated for the purposes of showing filters.
|
|
|
- There are **Class Questions** littered throughout the notebook. Use as much/little time on these as you see fit relative to how your class is pacing
|
|
|
- There is an **Independent Exercise** at the end of this lesson. It is aspirational to have time to let students work entirely independently on this time-wise, so consider doing a guided code-along or paired programming. Answers are included.
|
|
|
|
|
|
|
|
|
## Learning Objectives
|
|
|
In this lesson, students will:
|
|
|
- Use Pandas to read in a dataset.
|
|
|
- Investigate a dataset's integrity.
|
|
|
- Filter, sort, and manipulate DataFrame series.
|
|
|
|
|
|
|
|
|
## Duration
|
|
|
60 minutes
|
|
|
|
|
|
## Suggested Agenda
|
|
|
|
|
|
| Time | Activity |
|
|
|
| --- | --- |
|
|
|
|
|
|
## Suggested Agenda
|
|
|
|
|
|
| Time | Activity | Purpose |
|
|
|
|-------------|----------|---------|
|
|
|
| 0:00 - 0:03 | Welcome |
|
|
|
| 0:03 - 0:07 | Pandas and EDA |
|
|
|
| 0:07 - 0:15 | Data Sets and Integrity |
|
|
|
| 0:15 - 0:17 | NOTE: Switch to Notebook |
|
|
|
| 0:17 - 0:25 | Basic Pandas |
|
|
|
| 0:25 - 0:35 | Columns |
|
|
|
| 0:35 - 0:44 | Filtering and Sorting |
|
|
|
| 0:44 - 0:58 | Independent Exercise |
|
|
|
| 0:58 - 1:00 | Summary |
|
|
|
|
|
|
## Materials and Preparation
|
|
|
- Send out the presentation link.
|
|
|
- Students will need the data sets and notebook. Consider having a zip file of all notebooks and data sets for the rest of the unit that you hand out at the beginning of this lesson. Alternatively, link them directly in GitHub - remember that they haven't learned GitHub, so you'll need to help them download the files.
|
|
|
- The presentation is also at the top of the Notebook, so students can later reference in one place. Jump down to `Importing Pandas`.
|
|
|
|
|
|
## Differentiation and Extensions
|
|
|
|
|
|
- If students are excelling in the first half, consider deeper discussions surrounding five number summaries, data integrity, off-the-cuff filters and sorts
|
|
|
- If students are struggling, work on the code more heavily than the **Class Questions** portions. Make the Independent Exercises be Collective Exercises (as a class)
|
|
|
|
|
|
## In Class: Materials
|
|
|
- Projector
|
|
|
- Internet connection
|
|
|
- Jupyter Notebooks
|
|
|
- Python3
|
|
|
-->
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="learning-objectives" class="level2">
|
|
|
<h2>Learning Objectives</h2>
|
|
|
<p><em>After this lesson, you will be able to:</em></p>
|
|
|
<ul>
|
|
|
<li>Use Pandas to read in a dataset.</li>
|
|
|
<li>Investigate a dataset’s integrity.</li>
|
|
|
<li>Filter, sort, and manipulate DataFrame series.</li>
|
|
|
</ul>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Talking Points</strong>: This lesson introduces the Pandas library and the beginnings of Exploratory Data Analysis. The majority of the lesson should be spent going through code – whether that is via Jupyter Slides or a Jupyter Notebook demonstration.</p>
|
|
|
<p>To present this content, begin with <code>02-pandas-i.md</code> to introduce Pandas as a library and data integrity. Transition to the Jupyter Notebook to introduce reading in data, column manipulation, filtering and sorting; conclude with exercises.</p>
|
|
|
<strong>Teaching Tips</strong>: - Review the Iowa liquor dataset <a href="https://data.iowa.gov/Economy/Iowa-Liquor-Sales/m3tr-qhgy">here</a> - Be aware that the dataset you are examining is a <strong>subset</strong> of that dataset: it is only May 2017 and May 2018. <strong>New columns</strong> have been created to delineate: <code>is_may_2017</code> and <code>is_may_2018</code>. These are demonstrated for the purposes of showing filters. - There are <strong>Class Questions</strong> littered throughout the notebook. Use as much/little time on these as you see fit relative to how your class is pacing - There is an <strong>Independent Exercise</strong> at the end of this lesson. It is aspirational to have time to let students work entirely independently on this time-wise, so consider doing a guided code-along or paired programming. Answers are included. - Pause after learning objectives and level-set for what students will get out of the lesson
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="what-is-pandas" class="level2">
|
|
|
<h2>What is Pandas?</h2>
|
|
|
<ul>
|
|
|
<li>A group of adorable bears 🐼🐼🐼</li>
|
|
|
<li>A Python library for data manipulation.</li>
|
|
|
</ul>
|
|
|
<iframe src="https://giphy.com/embed/EatwJZRUIv41G" width="480" height="270" frameborder="0" class="giphy-embed" allowfullscreen>
|
|
|
</iframe>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Get them excited to learn this.</li>
|
|
|
<li>The iframe is just this gif:<img src="https://media.giphy.com/media/EatwJZRUIv41G/giphy.gif" /></li>
|
|
|
<li>Show your favorite Pandas gifs <a href="https://media.giphy.com/media/z6xE1olZ5YP4I/giphy.gif">Seriously</a></li>
|
|
|
<li>Describe exploratory data analysis as an <strong>ongoing</strong> process, and cite an example from your experience.</li>
|
|
|
</ul>
|
|
|
<p><strong>Talking Points</strong>:</p>
|
|
|
<ul>
|
|
|
<li>As we learned, Python libraries are collections of functions and methods that allow us to perform lots of actions without writing as much of our own code.</li>
|
|
|
<li>The pandas library is written specifically for data manipulation and analysis in Python</li>
|
|
|
</ul>
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="so-pandas-the-library" class="level2">
|
|
|
<h2>So, Pandas the Library</h2>
|
|
|
<p>The Swiss Army Knife of data manipulation!</p>
|
|
|
<p>Pandas:</p>
|
|
|
<ul>
|
|
|
<li>Is <em>the</em> library for exploratory data analysis (EDA).</li>
|
|
|
<li>Formats, wrangles, cleans, and prepares our data.</li>
|
|
|
</ul>
|
|
|
<p>Quick Backstory from 2009:</p>
|
|
|
<ul>
|
|
|
<li>A humble open source project for Panel Data (hence “Pandas”) from Wes McKinney.</li>
|
|
|
<li>Now the most used Python-related tag on Stack Overflow.</li>
|
|
|
</ul>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Explain what you mean by Swiss Army knife as not all students may understand that metaphor</li>
|
|
|
<li>Remind students of the meaning of exploratory data analysis (EDA)</li>
|
|
|
</ul>
|
|
|
<p><strong>Talking Points</strong>:</p>
|
|
|
“Pandas is the most prominent Python library for exploratory data analysis (EDA). The functions Pandas supports are integral to understanding, formatting, and preparing our data. Formally, we use Pandas to investigate, wrangle, munge, and clean our data. Pandas is the Swiss Army Knife of data manipulation!” “Pandas began as a humble open source project for Panel Data (hence”Pandas“) in 2009 by Wes McKinney. It has grown to be the most use Python-related tag on Stack Overflow.” - Pandas is one of the most useful data manipulation libraries. Its utilities, on the outset, replace many things we know how to do in Excel. However, we also produce a script for creating reproducible steps <strong>and</strong> Excel is limited to 1.3M rows. Pandas is not.
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="exploratory-data-analysis-eda" class="level2">
|
|
|
<h2>Exploratory Data Analysis (EDA)</h2>
|
|
|
<p>The process of understanding our dataset and producing our first level of insights.</p>
|
|
|
<p>This includes:</p>
|
|
|
<ul>
|
|
|
<li>Reading in data: “Import dog population.”</li>
|
|
|
<li>Checking data types. “Is the population count in integers?”</li>
|
|
|
<li>Renaming columns: “<code>dog_breed</code> is more helpful than <code>Biological Family</code>”</li>
|
|
|
<li>Joining together data: “Join the dog population data with the cat population data.”</li>
|
|
|
<li>Looking for missing data: “It doesn’t mention corgis.”</li>
|
|
|
<li>And more!</li>
|
|
|
</ul>
|
|
|
<p>Today, we will focus on the most ‘mission critical’ elements of EDA.</p>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Point out from the bulleted list what is mission critical</li>
|
|
|
<li>Time permitting, ask students to share a similar example of a dataset</li>
|
|
|
</ul>
|
|
|
<p><strong>Talking Points</strong>:</p>
|
|
|
<ul>
|
|
|
<li>“Exploratory Data Analysis (EDA) is the process of understanding our dataset, and producing our first level of insights. This includes reading in the data, understanding our data dictionary, checking data types, assessing descriptive statistics, renaming columns, joining together data, looking for missing data, and so much more. That sounds like a lot, but today, we will just focus on the most ‘mission critical’ elements of EDA.”</li>
|
|
|
<li>It’s common to get later in the data science workflow, only to realize unclean data or a feature could be engineered earlier in the process.</li>
|
|
|
<li>Hypothesis-driven EDA is essential to productive EDA – otherwise we will ceaselessly torture our data for answers.</li>
|
|
|
</ul>
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="quick-review" class="level2">
|
|
|
<h2>Quick Review</h2>
|
|
|
<ul>
|
|
|
<li>Exploratory Data Analysis (EDA) is the process of understanding our dataset, and producing our first level of insights.What does this include?</li>
|
|
|
<li>Pandas is a prominent Python library used for exploratory data analysis</li>
|
|
|
</ul>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Pause to gather students’ answers about what EDA includes.</li>
|
|
|
<li>Answer any clarification questions!</li>
|
|
|
</ul>
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="what-dataset-are-we-exploring" class="level2">
|
|
|
<h2>What dataset are we exploring?</h2>
|
|
|
<ul>
|
|
|
<li><p>Iowa liquor sales!</p></li>
|
|
|
<li><p>Stores report daily transactions of all alcohol they sell.</p></li>
|
|
|
<li>Iowa makes this data available.
|
|
|
<ul>
|
|
|
<li>It is an excellent, structured dataset for analysis!</li>
|
|
|
</ul></li>
|
|
|
</ul>
|
|
|
<p>Take a look at the data source <a href="https://data.iowa.gov/Economy/Iowa-Liquor-Sales/m3tr-qhgy">page</a>.</p>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Open this page in a new window.</li>
|
|
|
</ul>
|
|
|
<p><strong>Talking Points</strong>:</p>
|
|
|
<ul>
|
|
|
<li>For today’s Pandas exercises, we will be using a real dataset from the state of Iowa government on liquor sales. Due to state licensing laws, stores must report daily transactions of all alcohol they sell to the Iowa Department of Commerce’s Alcoholic Beverages Division. The state of Iowa makes this data available for analysis – and it is an excellent, structured dataset for our use!</li>
|
|
|
</ul>
|
|
|
<p>[Go down the list for students.]</p>
|
|
|
<p>Let’s take a closer look at the data dictionary, or what is included:</p>
|
|
|
<ul>
|
|
|
<li><strong>Invoice/Item Number</strong> - Concatenated invoice and line number associated with the liquor order. This provides a unique identifier for the individual liquor products included in the store order</li>
|
|
|
<li><strong>Date</strong> - Date of order</li>
|
|
|
<li><strong>Store Number</strong> - Unique number assigned to the store who ordered the liquor.</li>
|
|
|
<li><strong>Store Name</strong> - Name of store who ordered the liquor.</li>
|
|
|
<li><strong>Address</strong> - Address of the store that ordered the liquor</li>
|
|
|
<li><strong>City</strong> - City where the store who ordered the liquor is located</li>
|
|
|
<li><strong>Zip Code</strong> - Zip Code of where the store that ordered is located</li>
|
|
|
<li><strong>Store Location</strong> - Location of store who ordered the liquor. The Address, City, State and Zip Code are geocoded to provide geographic coordinates. Accuracy of geocoding is dependent on how well the address is interpreted and the completeness of the reference data used.</li>
|
|
|
<li><strong>County Number</strong> - Iowa county number for the county where store who ordered the liquor is located</li>
|
|
|
<li><strong>County</strong> - County where the store who ordered the liquor is located</li>
|
|
|
<li><strong>Category</strong> - Category code associated with the liquor ordered</li>
|
|
|
<li><strong>Category Names</strong> - Category of the liquor ordered.</li>
|
|
|
<li><strong>Vendor Number</strong> - The vendor number of the company for the brand of liquor ordered</li>
|
|
|
<li><strong>Vendor Name</strong> - The vendor name of the company for the brand of liquor ordered</li>
|
|
|
<li><strong>Item Name</strong> - Item number for the individual liquor product ordered.</li>
|
|
|
<li><strong>Item Description</strong> - Description of the individual liquor product ordered.</li>
|
|
|
<li><strong>Pack</strong> - The number of bottles in a case for the liquor ordered</li>
|
|
|
<li><strong>Bottle Volume (mL)</strong> - Volume of each liquor bottle ordered in milliliters.</li>
|
|
|
<li><strong>State Bottle Cost</strong> - The amount that Alcoholic Beverages Division paid for each bottle of liquor ordered</li>
|
|
|
<li><strong>State Bottle Retail</strong> - The amount the store paid for each bottle of liquor ordered</li>
|
|
|
<li><strong>Bottles Solde</strong> - The number of bottles of liquor ordered by the store</li>
|
|
|
<li><strong>Sale (Dollars)</strong> - Total cost of liquor order (number of bottles multiplied by the state bottle retail)</li>
|
|
|
<li><strong>Volume Sold (Liters)</strong> - Total volume of liquor ordered in liters. (i.e. (Bottle Volume (ml) x Bottles Sold)/1,000)</li>
|
|
|
<li><strong>Volume Sold (Gallons)</strong> - Total volume of liquor ordered in gallons. (i.e. (Bottle Volume (ml) x Bottles Sold)/3785.411784)</li>
|
|
|
</ul>
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="discussion-what-could-we-examine" class="level2">
|
|
|
<h2>Discussion: What Could We Examine?</h2>
|
|
|
<ul>
|
|
|
<li><p>What are some potential insights you’d like to uncover given Iowa liquor data?</p></li>
|
|
|
<li><p>What if you are examining it from the standpoint of the Iowa government?</p></li>
|
|
|
<li><p>What if you are a potential liquor store business owner?</p></li>
|
|
|
</ul>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Walk through these questions one by one. Encourage discussion - there’s no wrong answer!</li>
|
|
|
<li>For the standpoint of the Iowa government and business owner, ask why?</li>
|
|
|
</ul>
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="our-modified-iowa-liquor-dataset" class="level2">
|
|
|
<h2>Our Modified Iowa Liquor Dataset</h2>
|
|
|
<p>The full dataset is all liquor sales from 2012 to present.</p>
|
|
|
<p>There are more than 13 million rows (13,948,103+ at the time of writing)!</p>
|
|
|
<p>We will work with a modified dataset.</p>
|
|
|
<p>Key changes:</p>
|
|
|
<ul>
|
|
|
<li>Only sales from May 2017 and May 2018</li>
|
|
|
<li>Intentionally deleted:
|
|
|
<ul>
|
|
|
<li>A number of values, to practice missing data.</li>
|
|
|
<li>An arbitrary subset of entire observations, to shrink it.</li>
|
|
|
<li>A few columns, to simplify.</li>
|
|
|
</ul></li>
|
|
|
</ul>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Teaching Tips</strong>: - The first portion explains why we are using a modified version of this dataset - Describe how messy, long, and sometimes complicated actual datasets can be - Encourage students to conceptualize how they could use the dataset from the State of Iowa’s perspective as well as a prospective liquor store owner. This invests students in potential conclusions from EDA. - Walkthrough the dataset directly on the Iowa gov’t <a href="https://data.iowa.gov/Economy/Iowa-Liquor-Sales/m3tr-qhgy">page</a>.</p>
|
|
|
<strong>Talking Points</strong>: - The raw dataset itself is nearly 4GB, so we have trimmed the dataset to be <em>just</em> May 2017 and May 2018. Your RAM is welcome! - The State of Iowa may be most interested in incoming tax revenue, and predicting how much to expect - A prospective liquor store owner may be interested in seeing top selling areas in the state – but using a separate (not included) dataset on population to identify regions where per capita consumption is high, yet liquor supply is comparatively low. Moreover, liquor store owners may care most about bottles sales compared to bottle retail prices (all in the dataset!) - Like EDA, data integrity checks are an ongoing process. - Key changes: - Only sales from May 2017 and May 2018 - A number of values have been deliberately deleted. - Practice working with missing data!) - An arbitrary subset of entire observations have been deleted. - Reduce the file size from 119MB to <100MB. - A few columns were removed (invoice, address, vendor number, category as a number, county number)
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="the-first-few-rows" class="level2">
|
|
|
<h2>The First Few Rows</h2>
|
|
|
<p><img src="https://s3.amazonaws.com/ga-instruction/assets/python-fundamentals/dataset-screenshot-1.png" /></p>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Walk through this to familiarize them with how a dataset looks.</li>
|
|
|
</ul>
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="the-first-few-rows-ctd" class="level2">
|
|
|
<h2>The First Few Rows (Ctd)</h2>
|
|
|
<p><img src="https://s3.amazonaws.com/ga-instruction/assets/python-fundamentals/dataset-screenshot-2.png" /></p>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Walk through this to familiarize them with how a dataset looks.</li>
|
|
|
</ul>
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="data-integrity" class="level2">
|
|
|
<h2>Data Integrity</h2>
|
|
|
<p>The first thing we check! Assuring our data can be trusted to produce meaningful insights.</p>
|
|
|
<p>Correctly formatted datatypes.</p>
|
|
|
<ul>
|
|
|
<li>“Decimals are floats, not strings.”</li>
|
|
|
</ul>
|
|
|
<p>Representative sample for the underlying population of interest.</p>
|
|
|
<ul>
|
|
|
<li>“Did we sample sales in cities or across the whole state?”</li>
|
|
|
</ul>
|
|
|
<p>Missing Data</p>
|
|
|
<ul>
|
|
|
<li>“Why do we only have even days of the month?”</li>
|
|
|
</ul>
|
|
|
<p>No sampling or human bias.</p>
|
|
|
<ul>
|
|
|
<li>“Did we only consider liquor sales of specific varieties?”</li>
|
|
|
</ul>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Talking Points</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Ask how you might keep data integrity top of mind to maintain a clean data set</li>
|
|
|
<li>Give examples here.</li>
|
|
|
</ul>
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Pause here to ask if students have any questions</li>
|
|
|
</ul>
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="clean-truth-about-dirty-data" class="level2">
|
|
|
<h2>Clean Truth about Dirty Data</h2>
|
|
|
<ul>
|
|
|
<li><p>Assessing data integrity isn’t a one-stop step.</p></li>
|
|
|
<li><p>Much like EDA itself, it’s an ongoing process!</p></li>
|
|
|
<li><p>We uncover additional potential problems and anomalies to remedy along the way.</p></li>
|
|
|
</ul>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Talking Points</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Ask how you might keep data integrity top of mind to maintain a clean data set</li>
|
|
|
<li>Give examples here.</li>
|
|
|
</ul>
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Give examples here.</li>
|
|
|
</ul>
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="launch-our-notebook" class="level2">
|
|
|
<h2>Launch our notebook</h2>
|
|
|
<p>We’ll work in the Notebook - We’re fledgling data scientists!</p>
|
|
|
<p>The <code>.ipynb</code> file you will open is called " <code>pandas-i.ipynb</code> ".</p>
|
|
|
<p>Open it up!</p>
|
|
|
<p>Jump down to <code>Importing Pandas</code>.</p>
|
|
|
<aside class="notes">
|
|
|
<p><strong>Teaching Tips</strong>:</p>
|
|
|
<ul>
|
|
|
<li>Make sure everyone gets to the notebook successfully.</li>
|
|
|
<li>Have students assist one another and walk around the room to ensure everyone gets to the notebook successfully</li>
|
|
|
<li>Make sure all students can open and run their Notebooks. It’s only the second time they’ve done so!</li>
|
|
|
<li>The presentation is also at the top of the Notebook, so students can later reference in one place. Jump down to <code>Importing Pandas</code>.</li>
|
|
|
</ul>
|
|
|
</aside>
|
|
|
<hr />
|
|
|
</section>
|
|
|
<section id="additional-resources" class="level2">
|
|
|
<h2>Additional Resources</h2>
|
|
|
<ul>
|
|
|
<li>Pandas <a href="https://pandas.pydata.org/pandas-docs/stable/">documentation</a></li>
|
|
|
<li>DataSchool <a href="http://www.dataschool.io/easier-data-analysis-with-pandas/">30-video series</a> (by a former GA instructor!)</li>
|
|
|
</ul>
|
|
|
</section>
|
|
|
|
|
|
</div>
|
|
|
<footer><span class='slide-number'></span></footer>
|
|
|
</div>
|
|
|
<script src="../../../../lib/js/head.min.js"></script>
|
|
|
<script src="../../../../js/reveal.js"></script>
|
|
|
|
|
|
<script>
|
|
|
|
|
|
var dependencies = [
|
|
|
{ src: '../../../../lib/js/classList.js', condition: function() { return !document.body.classList; } },
|
|
|
{ src: '../../../../plugin/markdown/showdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
|
|
|
{ src: '../../../../plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
|
|
|
{ src: '../../../../plugin/prism/prism.js', async: true, callback: function() { /*hljs.initHighlightingOnLoad();*/ } },
|
|
|
{ src: '../../../../plugin/zoom-js/zoom.js', async: true, condition: function() { return !!document.body.classList; } }
|
|
|
];
|
|
|
|
|
|
if (Reveal.getQueryHash().instructor === 1) {
|
|
|
dependencies.push({ src: '../../../../plugin/notes/notes.js', async: true, condition: function() { return !!document.body.classList; } });
|
|
|
}
|
|
|
// Full list of configuration options available here:
|
|
|
// https://github.com/hakimel/reveal.js#configuration
|
|
|
Reveal.initialize({
|
|
|
controls: true,
|
|
|
progress: true,
|
|
|
history: true,
|
|
|
center: false,
|
|
|
slideNumber: true,
|
|
|
|
|
|
// available themes are in /css/theme
|
|
|
theme: Reveal.getQueryHash().theme || 'default',
|
|
|
|
|
|
// default/cube/page/concave/zoom/linear/fade/none
|
|
|
transition: Reveal.getQueryHash().transition || 'slide',
|
|
|
|
|
|
// Optional libraries used to extend on reveal.js
|
|
|
dependencies: dependencies
|
|
|
});
|
|
|
|
|
|
if (Reveal.getQueryHash().instructor === 1) {
|
|
|
Reveal.configure(dependencies.push({ src: '../../../../plugin/notes/notes.js', async: true, condition: function() { return !!document.body.classList; } }));
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
Reveal.addEventListener('ready', function() {
|
|
|
if (Reveal.getCurrentSlide().classList.contains('separator-subhead')) {
|
|
|
document.getElementById('theme').setAttribute('href', '../../../../css/theme/ga-subhead.css');
|
|
|
} else if (Reveal.getCurrentSlide().classList.contains('separator')) {
|
|
|
document.getElementById('theme').setAttribute('href', '../../../../css/theme/ga-title.css')
|
|
|
} else {
|
|
|
document.getElementById('theme').setAttribute('href', '../../../../css/theme/ga.css');
|
|
|
}
|
|
|
});
|
|
|
|
|
|
Reveal.addEventListener('slidechanged', function(e) {
|
|
|
if (Reveal.getCurrentSlide().classList.contains('separator-subhead')) {
|
|
|
document.getElementById('theme').setAttribute('href', '../../../../css/theme/ga-subhead.css');
|
|
|
} else if (Reveal.getCurrentSlide().classList.contains('separator')) {
|
|
|
document.getElementById('theme').setAttribute('href', '../../../../css/theme/ga-title.css')
|
|
|
} else {
|
|
|
document.getElementById('theme').setAttribute('href', '../../../../css/theme/ga.css');
|
|
|
}
|
|
|
});
|
|
|
</script>
|
|
|
|
|
|
</body>
|
|
|
</html>
|