PYTHR/unit-6-pandas/hw-5day-4pandas/code/.ipynb_checkpoints/lab-checkpoint.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Feature engineering in Pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading/Exploring the data\n",
    "\n",
    "Load the iris.csv file from this repo into a pandas dataframe. Take a minute to familiarize yourself with the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import Pandas\n",
    "\n",
    "Import the `pandas` library as `pd`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Read the `../data/iris.csv` dataset into an object named `iris`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many different species are in this dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What are their names?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many samples are there per species?\n",
    "\n",
    "<details><summary>Hint</summary>Use the <a href=\"http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html\"><code>.value_counts()</code></a> method</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Feature Engineering\n",
    "\n",
    "Create a new column called `'sepal_ratio'` which is equal to sepal width / sepal length"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a similar column called `'petal_ratio'`: petal width / petal length"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create 4 columns that correspond to `sepal length (cm)`, `sepal width (cm)`, `petal length (cm)`, and `petal width (cm)`, only in inches."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Apply\n",
    "\n",
    "Create a column called `'encoded_species'`:\n",
    "- 0 for setosa\n",
    "- 1 for versicolor\n",
    "- 2 for virginica\n",
    "\n",
    "\n",
    "<details><summary>Hint 1</summary>\n",
    "Create a dictionary using the species as keys and the numbers 0-2 for values\n",
    "</details>\n",
    "\n",
    "<details><summary>Hint 2</summary>\n",
    "    Use the dictionary in hint 1 with the <code>.apply()</code> method to create the new column\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## March Madness\n",
    "\n",
    "Let's change up the dataset to something different than flowers: March Madness!\n",
    "\n",
    "Read in the dataset `../data/ncaa-seeds.csv` to an object named `seeds`.\n",
    "\n",
    "This dataframe simulates the games that will occur in the first round of the [NCAA basketball tournament](http://www.sportingnews.com/au/ncaa-basketball/news/ncaa-tournament-2017-march-madness-bracket-schedule-matchups-print-a-bracket/1r6cau9sb1xj4131zzhay2dj5g). In the first row, you should see the following:\n",
    "\n",
    "| team_seed | opponent_seed |\n",
    "|-----------|---------------|\n",
    "| 01N       | 16N           |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For team_seed, the 01 is their seed, and N is their division (North). This row is saying the 1st seed in the north division will play the 16th seed (same division).\n",
    "\n",
    "Using the `.apply()` method, create the following new columns:\n",
    "- `team_division`\n",
    "- `opponent_division`\n",
    "\n",
    "The first row of your result should look as follows:\n",
    "\n",
    "| team_seed | opponent_seed | team_division | opponent_division |\n",
    "|-----------|---------------|---------------|-------------------|\n",
    "| 01N       | 16N           | N             | N                 |\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you have the divisions, change the `team_seed` and `opponent_seed` columns to just be the numbers.\n",
    "\n",
    "The first row of your result should look as follows:\n",
    "\n",
    "| team_seed | opponent_seed | team_division | opponent_division |\n",
    "|-----------|---------------|---------------|-------------------|\n",
    "| 1         | 16            | N             | N                 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a new column called seed_delta, which is the difference between the team's seed and their opponent's. \n",
    "\n",
    "The first row of your result should look as follows:\n",
    "\n",
    "| team_seed | opponent_seed | team_division | opponent_division | seed_delta |\n",
    "|-----------|---------------|---------------|-------------------|------------|\n",
    "| 1         | 16            | N             | N                 | -15        |\n",
    "\n",
    "<br>\n",
    "<details><summary>Did you get an error?</summary>\n",
    "team_seed and opponent_seed need to be numerical columns in order for you to perform mathematical operations on them.\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}