You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
868 lines
26 KiB
868 lines
26 KiB
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Feature engineering in Pandas"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Loading/Exploring the data\n",
|
|
"\n",
|
|
"Load the iris.csv file from this repo into a pandas dataframe. Take a minute to familiarize yourself with the data."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Import Pandas\n",
|
|
"\n",
|
|
"Import the `pandas` library as `pd`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pandas as pd"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Read the `../data/iris.csv` dataset into an object named `iris`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"iris = pd.read_csv('../data/iris.csv')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"How many different species are in this dataset?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"3"
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"iris['species'].nunique()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"What are their names?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"array(['setosa', 'versicolor', 'virginica'], dtype=object)"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"iris['species'].unique()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"How many samples are there per species?\n",
|
|
"\n",
|
|
"<details><summary>Hint</summary>Use the <a href=\"http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html\"><code>.value_counts()</code></a> method</details>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"versicolor 50\n",
|
|
"setosa 50\n",
|
|
"virginica 50\n",
|
|
"Name: species, dtype: int64"
|
|
]
|
|
},
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"iris['species'].value_counts()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Feature Engineering\n",
|
|
"\n",
|
|
"Create a new column called `'sepal_ratio'` which is equal to sepal width / sepal length"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"iris['sepal_ratio'] = iris['sepal width (cm)'] / iris['sepal length (cm)']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Create a similar column called `'petal_ratio'`: petal width / petal length"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"iris['petal_ratio'] = iris['petal width (cm)'] / iris['petal length (cm)']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Create 4 columns that correspond to `sepal length (cm)`, `sepal width (cm)`, `petal length (cm)`, and `petal width (cm)`, only in inches."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>sepal length (cm)</th>\n",
|
|
" <th>sepal width (cm)</th>\n",
|
|
" <th>petal length (cm)</th>\n",
|
|
" <th>petal width (cm)</th>\n",
|
|
" <th>species</th>\n",
|
|
" <th>sepal_ratio</th>\n",
|
|
" <th>petal_ratio</th>\n",
|
|
" <th>sepal length (inches)</th>\n",
|
|
" <th>petal length (inches)</th>\n",
|
|
" <th>sepal width (inches)</th>\n",
|
|
" <th>petal width (inches)</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>5.1</td>\n",
|
|
" <td>3.5</td>\n",
|
|
" <td>1.4</td>\n",
|
|
" <td>0.2</td>\n",
|
|
" <td>setosa</td>\n",
|
|
" <td>0.686275</td>\n",
|
|
" <td>0.142857</td>\n",
|
|
" <td>2.007875</td>\n",
|
|
" <td>0.551181</td>\n",
|
|
" <td>1.377954</td>\n",
|
|
" <td>0.07874</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>4.9</td>\n",
|
|
" <td>3.0</td>\n",
|
|
" <td>1.4</td>\n",
|
|
" <td>0.2</td>\n",
|
|
" <td>setosa</td>\n",
|
|
" <td>0.612245</td>\n",
|
|
" <td>0.142857</td>\n",
|
|
" <td>1.929135</td>\n",
|
|
" <td>0.551181</td>\n",
|
|
" <td>1.181103</td>\n",
|
|
" <td>0.07874</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>4.7</td>\n",
|
|
" <td>3.2</td>\n",
|
|
" <td>1.3</td>\n",
|
|
" <td>0.2</td>\n",
|
|
" <td>setosa</td>\n",
|
|
" <td>0.680851</td>\n",
|
|
" <td>0.153846</td>\n",
|
|
" <td>1.850395</td>\n",
|
|
" <td>0.511811</td>\n",
|
|
" <td>1.259843</td>\n",
|
|
" <td>0.07874</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>4.6</td>\n",
|
|
" <td>3.1</td>\n",
|
|
" <td>1.5</td>\n",
|
|
" <td>0.2</td>\n",
|
|
" <td>setosa</td>\n",
|
|
" <td>0.673913</td>\n",
|
|
" <td>0.133333</td>\n",
|
|
" <td>1.811025</td>\n",
|
|
" <td>0.590552</td>\n",
|
|
" <td>1.220473</td>\n",
|
|
" <td>0.07874</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>5.0</td>\n",
|
|
" <td>3.6</td>\n",
|
|
" <td>1.4</td>\n",
|
|
" <td>0.2</td>\n",
|
|
" <td>setosa</td>\n",
|
|
" <td>0.720000</td>\n",
|
|
" <td>0.142857</td>\n",
|
|
" <td>1.968505</td>\n",
|
|
" <td>0.551181</td>\n",
|
|
" <td>1.417324</td>\n",
|
|
" <td>0.07874</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
|
|
"0 5.1 3.5 1.4 0.2 \n",
|
|
"1 4.9 3.0 1.4 0.2 \n",
|
|
"2 4.7 3.2 1.3 0.2 \n",
|
|
"3 4.6 3.1 1.5 0.2 \n",
|
|
"4 5.0 3.6 1.4 0.2 \n",
|
|
"\n",
|
|
" species sepal_ratio petal_ratio sepal length (inches) \\\n",
|
|
"0 setosa 0.686275 0.142857 2.007875 \n",
|
|
"1 setosa 0.612245 0.142857 1.929135 \n",
|
|
"2 setosa 0.680851 0.153846 1.850395 \n",
|
|
"3 setosa 0.673913 0.133333 1.811025 \n",
|
|
"4 setosa 0.720000 0.142857 1.968505 \n",
|
|
"\n",
|
|
" petal length (inches) sepal width (inches) petal width (inches) \n",
|
|
"0 0.551181 1.377954 0.07874 \n",
|
|
"1 0.551181 1.181103 0.07874 \n",
|
|
"2 0.511811 1.259843 0.07874 \n",
|
|
"3 0.590552 1.220473 0.07874 \n",
|
|
"4 0.551181 1.417324 0.07874 "
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"iris['sepal length (inches)'] = iris['sepal length (cm)'] * 0.393701\n",
|
|
"iris['petal length (inches)'] = iris['petal length (cm)'] * 0.393701\n",
|
|
"iris['sepal width (inches)'] = iris['sepal width (cm)'] * 0.393701\n",
|
|
"iris['petal width (inches)'] = iris['petal width (cm)'] * 0.393701\n",
|
|
"iris.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Apply\n",
|
|
"\n",
|
|
"Create a column called `'encoded_species'`:\n",
|
|
"- 0 for setosa\n",
|
|
"- 1 for versicolor\n",
|
|
"- 2 for virginica\n",
|
|
"\n",
|
|
"\n",
|
|
"<details><summary>Hint 1</summary>\n",
|
|
"Create a dictionary using the species as keys and the numbers 0-2 for values\n",
|
|
"</details>\n",
|
|
"\n",
|
|
"<details><summary>Hint 2</summary>\n",
|
|
" Use the dictionary in hint 1 with the <code>.apply()</code> method to create the new column\n",
|
|
"</details>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>sepal length (cm)</th>\n",
|
|
" <th>sepal width (cm)</th>\n",
|
|
" <th>petal length (cm)</th>\n",
|
|
" <th>petal width (cm)</th>\n",
|
|
" <th>species</th>\n",
|
|
" <th>sepal_ratio</th>\n",
|
|
" <th>petal_ratio</th>\n",
|
|
" <th>sepal length (inches)</th>\n",
|
|
" <th>petal length (inches)</th>\n",
|
|
" <th>sepal width (inches)</th>\n",
|
|
" <th>petal width (inches)</th>\n",
|
|
" <th>encoded_species</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>5.1</td>\n",
|
|
" <td>3.5</td>\n",
|
|
" <td>1.4</td>\n",
|
|
" <td>0.2</td>\n",
|
|
" <td>setosa</td>\n",
|
|
" <td>0.686275</td>\n",
|
|
" <td>0.142857</td>\n",
|
|
" <td>2.007875</td>\n",
|
|
" <td>0.551181</td>\n",
|
|
" <td>1.377954</td>\n",
|
|
" <td>0.07874</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>4.9</td>\n",
|
|
" <td>3.0</td>\n",
|
|
" <td>1.4</td>\n",
|
|
" <td>0.2</td>\n",
|
|
" <td>setosa</td>\n",
|
|
" <td>0.612245</td>\n",
|
|
" <td>0.142857</td>\n",
|
|
" <td>1.929135</td>\n",
|
|
" <td>0.551181</td>\n",
|
|
" <td>1.181103</td>\n",
|
|
" <td>0.07874</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>4.7</td>\n",
|
|
" <td>3.2</td>\n",
|
|
" <td>1.3</td>\n",
|
|
" <td>0.2</td>\n",
|
|
" <td>setosa</td>\n",
|
|
" <td>0.680851</td>\n",
|
|
" <td>0.153846</td>\n",
|
|
" <td>1.850395</td>\n",
|
|
" <td>0.511811</td>\n",
|
|
" <td>1.259843</td>\n",
|
|
" <td>0.07874</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>4.6</td>\n",
|
|
" <td>3.1</td>\n",
|
|
" <td>1.5</td>\n",
|
|
" <td>0.2</td>\n",
|
|
" <td>setosa</td>\n",
|
|
" <td>0.673913</td>\n",
|
|
" <td>0.133333</td>\n",
|
|
" <td>1.811025</td>\n",
|
|
" <td>0.590552</td>\n",
|
|
" <td>1.220473</td>\n",
|
|
" <td>0.07874</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>5.0</td>\n",
|
|
" <td>3.6</td>\n",
|
|
" <td>1.4</td>\n",
|
|
" <td>0.2</td>\n",
|
|
" <td>setosa</td>\n",
|
|
" <td>0.720000</td>\n",
|
|
" <td>0.142857</td>\n",
|
|
" <td>1.968505</td>\n",
|
|
" <td>0.551181</td>\n",
|
|
" <td>1.417324</td>\n",
|
|
" <td>0.07874</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
|
|
"0 5.1 3.5 1.4 0.2 \n",
|
|
"1 4.9 3.0 1.4 0.2 \n",
|
|
"2 4.7 3.2 1.3 0.2 \n",
|
|
"3 4.6 3.1 1.5 0.2 \n",
|
|
"4 5.0 3.6 1.4 0.2 \n",
|
|
"\n",
|
|
" species sepal_ratio petal_ratio sepal length (inches) \\\n",
|
|
"0 setosa 0.686275 0.142857 2.007875 \n",
|
|
"1 setosa 0.612245 0.142857 1.929135 \n",
|
|
"2 setosa 0.680851 0.153846 1.850395 \n",
|
|
"3 setosa 0.673913 0.133333 1.811025 \n",
|
|
"4 setosa 0.720000 0.142857 1.968505 \n",
|
|
"\n",
|
|
" petal length (inches) sepal width (inches) petal width (inches) \\\n",
|
|
"0 0.551181 1.377954 0.07874 \n",
|
|
"1 0.551181 1.181103 0.07874 \n",
|
|
"2 0.511811 1.259843 0.07874 \n",
|
|
"3 0.590552 1.220473 0.07874 \n",
|
|
"4 0.551181 1.417324 0.07874 \n",
|
|
"\n",
|
|
" encoded_species \n",
|
|
"0 0 \n",
|
|
"1 0 \n",
|
|
"2 0 \n",
|
|
"3 0 \n",
|
|
"4 0 "
|
|
]
|
|
},
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"species_dict = {\n",
|
|
" 'setosa': 0,\n",
|
|
" 'versicolor': 1,\n",
|
|
" 'virginica': 2\n",
|
|
"}\n",
|
|
"iris['encoded_species'] = iris['species'].map(species_dict)\n",
|
|
"iris.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## March Madness\n",
|
|
"\n",
|
|
"Let's change up the dataset to something different than flowers: March Madness!\n",
|
|
"\n",
|
|
"Read in the dataset `../data/ncaa-seeds.csv` to an object named `seeds`.\n",
|
|
"\n",
|
|
"This dataframe simulates the games that will occur in the first round of the [NCAA basketball tournament](http://www.sportingnews.com/au/ncaa-basketball/news/ncaa-tournament-2017-march-madness-bracket-schedule-matchups-print-a-bracket/1r6cau9sb1xj4131zzhay2dj5g). In the first row, you should see the following:\n",
|
|
"\n",
|
|
"| team_seed | opponent_seed |\n",
|
|
"|-----------|---------------|\n",
|
|
"| 01N | 16N |"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>team_seed</th>\n",
|
|
" <th>opponent_seed</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>01N</td>\n",
|
|
" <td>16N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>02N</td>\n",
|
|
" <td>15N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>03N</td>\n",
|
|
" <td>14N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>04N</td>\n",
|
|
" <td>13N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>05N</td>\n",
|
|
" <td>12N</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" team_seed opponent_seed\n",
|
|
"0 01N 16N\n",
|
|
"1 02N 15N\n",
|
|
"2 03N 14N\n",
|
|
"3 04N 13N\n",
|
|
"4 05N 12N"
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"seeds = pd.read_csv('../data/ncaa-seeds.csv')\n",
|
|
"seeds.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"For team_seed, the 01 is their seed, and N is their division (North). This row is saying the 1st seed in the north division will play the 16th seed (same division).\n",
|
|
"\n",
|
|
"Using the `.apply()` method, create the following new columns:\n",
|
|
"- `team_division`\n",
|
|
"- `opponent_division`\n",
|
|
"\n",
|
|
"The first row of your result should look as follows:\n",
|
|
"\n",
|
|
"| team_seed | opponent_seed | team_division | opponent_division |\n",
|
|
"|-----------|---------------|---------------|-------------------|\n",
|
|
"| 01N | 16N | N | N |\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>team_seed</th>\n",
|
|
" <th>opponent_seed</th>\n",
|
|
" <th>team_division</th>\n",
|
|
" <th>opponent_division</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>01N</td>\n",
|
|
" <td>16N</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>02N</td>\n",
|
|
" <td>15N</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>03N</td>\n",
|
|
" <td>14N</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>04N</td>\n",
|
|
" <td>13N</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>05N</td>\n",
|
|
" <td>12N</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" team_seed opponent_seed team_division opponent_division\n",
|
|
"0 01N 16N N N\n",
|
|
"1 02N 15N N N\n",
|
|
"2 03N 14N N N\n",
|
|
"3 04N 13N N N\n",
|
|
"4 05N 12N N N"
|
|
]
|
|
},
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"seeds['team_division'] = seeds['team_seed'].apply(lambda div: div[-1])\n",
|
|
"seeds['opponent_division'] = seeds['opponent_seed'].apply(lambda div: div[-1])\n",
|
|
"seeds.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now that you have the divisions, change the `team_seed` and `opponent_seed` columns to just be the numbers.\n",
|
|
"\n",
|
|
"The first row of your result should look as follows:\n",
|
|
"\n",
|
|
"| team_seed | opponent_seed | team_division | opponent_division |\n",
|
|
"|-----------|---------------|---------------|-------------------|\n",
|
|
"| 1 | 16 | N | N |"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>team_seed</th>\n",
|
|
" <th>opponent_seed</th>\n",
|
|
" <th>team_division</th>\n",
|
|
" <th>opponent_division</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>1</td>\n",
|
|
" <td>16</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>2</td>\n",
|
|
" <td>15</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>3</td>\n",
|
|
" <td>14</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>13</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>5</td>\n",
|
|
" <td>12</td>\n",
|
|
" <td>N</td>\n",
|
|
" <td>N</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" team_seed opponent_seed team_division opponent_division\n",
|
|
"0 1 16 N N\n",
|
|
"1 2 15 N N\n",
|
|
"2 3 14 N N\n",
|
|
"3 4 13 N N\n",
|
|
"4 5 12 N N"
|
|
]
|
|
},
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"seeds['team_seed'] = seeds['team_seed'].apply(lambda seed: int(seed[:-1]))\n",
|
|
"seeds['opponent_seed'] = seeds['opponent_seed'].apply(lambda seed: int(seed[:-1]))\n",
|
|
"seeds.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Create a new column called seed_delta, which is the difference between the team's seed and their opponent's. \n",
|
|
"\n",
|
|
"The first row of your result should look as follows:\n",
|
|
"\n",
|
|
"| team_seed | opponent_seed | team_division | opponent_division | seed_delta |\n",
|
|
"|-----------|---------------|---------------|-------------------|------------|\n",
|
|
"| 1 | 16 | N | N | -15 |\n",
|
|
"\n",
|
|
"<br>\n",
|
|
"<details><summary>Did you get an error?</summary>\n",
|
|
"team_seed and opponent_seed need to be numerical columns in order for you to perform mathematical operations on them.\n",
|
|
"</details>"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"anaconda-cloud": {},
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 1
|
|
}
|