You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1751 lines
96 KiB
1751 lines
96 KiB
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"##  Pandas for EDA\n",
|
|
"by [@josephofiowa](https://twitter.com/josephofiowa)\n",
|
|
" \n",
|
|
"<!---\n",
|
|
"This assignment was developed by Joseph Nelson\n",
|
|
"\n",
|
|
"Questions? Comments?\n",
|
|
"1. Log an issue to this repo to alert me of a problem.\n",
|
|
"2. Suggest an edit yourself by forking this repo, making edits, and submitting a pull request with your changes back to our master branch.\n",
|
|
"3. Hit me up on Slack @sonylnagale\n",
|
|
"--->"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Pandas Unit Lab\n",
|
|
"\n",
|
|
"**Woo!** We've made it to the end of our Pandas Unit. Let's put our skills to the test.\n",
|
|
"\n",
|
|
"We're going to explore data from some of the top movies according to IMDB. This is a guided question-and-response lab where some areas are specific asks and others are open ended for you to explore.\n",
|
|
"\n",
|
|
"# Pandas Unit Lab\n",
|
|
"\n",
|
|
"**Woo!** We've made it to the end of our Pandas Unit. Let's put our skills to the test.\n",
|
|
"\n",
|
|
"We're going to explore data from some of the top movies according to IMDB. This is a guided question-and-response lab where some areas are specific asks and others are open ended for you to explore.\n",
|
|
"\n",
|
|
"#### Important!!!\n",
|
|
"- <font color=\"red\">**There are two ways to do this lab!**</font>\n",
|
|
" - The first way is to read in a dataset that _has already been pulled from the API and cleaned for you_ (`movies_rated.csv`). This is the recommended 'first-pass' way to do this lab.\n",
|
|
" - _After you have completed the lab using the supplied_ `movies_rated.csv`, you can call the API yourself!\n",
|
|
" - Calling the API yourself takes time! Be prepared to parse lots of JSON, read docs, etc. Consider this a take-home exercise if the students desire.\n",
|
|
"\n",
|
|
"In this lab, we will:\n",
|
|
"- Use `movie_app.py` to obtain relevant moving rating data\n",
|
|
"- Leverage Pandas to conduct exploratory data analysis, including:\n",
|
|
" - Assess data integrity\n",
|
|
" - Create exploratory visualizations\n",
|
|
" - Produce insights on top actors/actresses across films\n",
|
|
" \n",
|
|
"Let's get going!\n",
|
|
"\n",
|
|
"In this lab, we will:\n",
|
|
"- Use `movie_app.py` to obtain relevant moving rating data\n",
|
|
"- Leverage Pandas to conduct exploratory data analysis, including:\n",
|
|
" - Assess data integrity\n",
|
|
" - Create exploratory visualizations\n",
|
|
" - Produce insights on top actors/actresses across films\n",
|
|
" \n",
|
|
"Let's get going!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## The Dataset\n",
|
|
"\n",
|
|
"We'll work with a dataset on the top [IMDB movies](https://www.imdb.com/search/title?count=100&groups=top_1000&sort=user_rating), as rated by IMDB.\n",
|
|
"\n",
|
|
"\n",
|
|
"Specifically, we have a CSV that contains:\n",
|
|
"- IMDB star rating\n",
|
|
"- Movie title\n",
|
|
"- Year\n",
|
|
"- Content rating\n",
|
|
"- Genre\n",
|
|
"- Duration\n",
|
|
"- Gross\n",
|
|
"\n",
|
|
"_[Details available at the above link]_\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Import our necessary libraries"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pandas as pd\n",
|
|
"import numpy as np\n",
|
|
"import matplotlib as plt\n",
|
|
"import re\n",
|
|
"%matplotlib inline"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Read in the dataset\n",
|
|
"\n",
|
|
"First, read in the dataset, called `movies.csv` into a DataFrame called \"movies.\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"movies = pd.read_csv('../data/movies.csv')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Check the dataset basics\n",
|
|
"\n",
|
|
"Let's first explore our dataset to verify we have what we expect."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Print the first five rows."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>title</th>\n",
|
|
" <th>year</th>\n",
|
|
" <th>content_rating</th>\n",
|
|
" <th>genre</th>\n",
|
|
" <th>duration</th>\n",
|
|
" <th>gross</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>The Shawshank Redemption</td>\n",
|
|
" <td>1994</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Drama</td>\n",
|
|
" <td>142</td>\n",
|
|
" <td>1963330</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>The Godfather</td>\n",
|
|
" <td>1972</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Crime</td>\n",
|
|
" <td>175</td>\n",
|
|
" <td>28341469</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>The Dark Knight</td>\n",
|
|
" <td>2008</td>\n",
|
|
" <td>PG-13</td>\n",
|
|
" <td>Action</td>\n",
|
|
" <td>152</td>\n",
|
|
" <td>1344258</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>The Godfather: Part II</td>\n",
|
|
" <td>1974</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Crime</td>\n",
|
|
" <td>202</td>\n",
|
|
" <td>134966411</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>Pulp Fiction</td>\n",
|
|
" <td>1994</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Crime</td>\n",
|
|
" <td>154</td>\n",
|
|
" <td>1935047</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" title year content_rating genre duration \\\n",
|
|
"0 The Shawshank Redemption 1994 R Drama 142 \n",
|
|
"1 The Godfather 1972 R Crime 175 \n",
|
|
"2 The Dark Knight 2008 PG-13 Action 152 \n",
|
|
"3 The Godfather: Part II 1974 R Crime 202 \n",
|
|
"4 Pulp Fiction 1994 R Crime 154 \n",
|
|
"\n",
|
|
" gross \n",
|
|
"0 1963330 \n",
|
|
"1 28341469 \n",
|
|
"2 1344258 \n",
|
|
"3 134966411 \n",
|
|
"4 1935047 "
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"How many rows and columns are in the datset?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"(79, 6)"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies.shape"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"What are the column names?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Index(['title', 'year', 'content_rating', 'genre', 'duration', 'gross'], dtype='object')\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(movies.columns)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"How many unique genres are there?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"12"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies['genre'].nunique()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"How many movies are there per genre?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Crime 16\n",
|
|
"Drama 14\n",
|
|
"Action 11\n",
|
|
"Adventure 9\n",
|
|
"Drama 7\n",
|
|
"Biography 5\n",
|
|
"Animation 5\n",
|
|
"Comedy 4\n",
|
|
"Western 3\n",
|
|
"Mystery 2\n",
|
|
"Horror 2\n",
|
|
"Comedy 1\n",
|
|
"Name: genre, dtype: int64"
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies['genre'].value_counts()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Only run the below cells if you've obtained an [API key!](http://www.omdbapi.com/apikey.aspx)<br>Otherwise, proceed to the `importing movies_rated.csv` section below."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Obtain more data (with an API call)!\n",
|
|
"\n",
|
|
"- Let's take advantage of our `OmdbAPI` module (stored in `./OmdbAPI.py`, if you'd like to look under the hood) to obtain data from OMDB API on movie ratings. This will enable us to answer the question: **How do other publication's scores compare to IMDB ratings?** Specifically, where do Rotten Tomato critics most disagree with IMDB reviews? \n",
|
|
"- Using the OmdbAPI module, we will obtain the `Internet Movie Database`, the `Rotten Tomatoes`, and the `Metacritic` reviews on the top rated IMDB movies. We will store these ratings in new columns in a new `movies_rated` DataFrame. We have also stored the file locally at `./data/movies_rated.csv`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import OmdbAPI"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# replace e54ad9e7 with your API key\n",
|
|
"# this may take a minute\n",
|
|
"movies_rated = OmdbAPI.Omdb(movies, 'e54ad9e7').get_ratings()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>title</th>\n",
|
|
" <th>year</th>\n",
|
|
" <th>content_rating</th>\n",
|
|
" <th>genre</th>\n",
|
|
" <th>duration</th>\n",
|
|
" <th>gross</th>\n",
|
|
" <th>Internet Movie Database</th>\n",
|
|
" <th>Rotten Tomatoes</th>\n",
|
|
" <th>Metacritic</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>The Shawshank Redemption</td>\n",
|
|
" <td>1994</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Drama</td>\n",
|
|
" <td>142</td>\n",
|
|
" <td>1963330</td>\n",
|
|
" <td>9.3/10</td>\n",
|
|
" <td>91%</td>\n",
|
|
" <td>80/100</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>The Godfather</td>\n",
|
|
" <td>1972</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Crime</td>\n",
|
|
" <td>175</td>\n",
|
|
" <td>28341469</td>\n",
|
|
" <td>9.2/10</td>\n",
|
|
" <td>98%</td>\n",
|
|
" <td>100/100</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>The Dark Knight</td>\n",
|
|
" <td>2008</td>\n",
|
|
" <td>PG-13</td>\n",
|
|
" <td>Action</td>\n",
|
|
" <td>152</td>\n",
|
|
" <td>1344258</td>\n",
|
|
" <td>9.0/10</td>\n",
|
|
" <td>94%</td>\n",
|
|
" <td>82/100</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" title year content_rating genre duration \\\n",
|
|
"0 The Shawshank Redemption 1994 R Drama 142 \n",
|
|
"1 The Godfather 1972 R Crime 175 \n",
|
|
"2 The Dark Knight 2008 PG-13 Action 152 \n",
|
|
"\n",
|
|
" gross Internet Movie Database Rotten Tomatoes Metacritic \n",
|
|
"0 1963330 9.3/10 91% 80/100 \n",
|
|
"1 28341469 9.2/10 98% 100/100 \n",
|
|
"2 1344258 9.0/10 94% 82/100 "
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated.head(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Just in case there were movies that the API was unable to get, let's drop nulls."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"movies_rated.dropna(inplace=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's get the ratings in the same float format using an apply function with some regular expressions. Note the use of .copy() when writing and reading from the same dataframe as a best practice."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>title</th>\n",
|
|
" <th>year</th>\n",
|
|
" <th>content_rating</th>\n",
|
|
" <th>genre</th>\n",
|
|
" <th>duration</th>\n",
|
|
" <th>gross</th>\n",
|
|
" <th>Internet Movie Database</th>\n",
|
|
" <th>Rotten Tomatoes</th>\n",
|
|
" <th>Metacritic</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>The Shawshank Redemption</td>\n",
|
|
" <td>1994</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Drama</td>\n",
|
|
" <td>142</td>\n",
|
|
" <td>1963330</td>\n",
|
|
" <td>9.3</td>\n",
|
|
" <td>9.1</td>\n",
|
|
" <td>8.0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>The Godfather</td>\n",
|
|
" <td>1972</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Crime</td>\n",
|
|
" <td>175</td>\n",
|
|
" <td>28341469</td>\n",
|
|
" <td>9.2</td>\n",
|
|
" <td>9.8</td>\n",
|
|
" <td>10.0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>The Dark Knight</td>\n",
|
|
" <td>2008</td>\n",
|
|
" <td>PG-13</td>\n",
|
|
" <td>Action</td>\n",
|
|
" <td>152</td>\n",
|
|
" <td>1344258</td>\n",
|
|
" <td>9.0</td>\n",
|
|
" <td>9.4</td>\n",
|
|
" <td>8.2</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" title year content_rating genre duration \\\n",
|
|
"0 The Shawshank Redemption 1994 R Drama 142 \n",
|
|
"1 The Godfather 1972 R Crime 175 \n",
|
|
"2 The Dark Knight 2008 PG-13 Action 152 \n",
|
|
"\n",
|
|
" gross Internet Movie Database Rotten Tomatoes Metacritic \n",
|
|
"0 1963330 9.3 9.1 8.0 \n",
|
|
"1 28341469 9.2 9.8 10.0 \n",
|
|
"2 1344258 9.0 9.4 8.2 "
|
|
]
|
|
},
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated['Rotten Tomatoes'] = movies_rated['Rotten Tomatoes'].copy().apply(lambda x: float(re.match('\\d{1,}', x)[0])/10)\n",
|
|
"movies_rated['Internet Movie Database'] = movies_rated['Internet Movie Database'].copy().apply(lambda x: float(re.match('(\\S+)\\/', x)[1]))\n",
|
|
"movies_rated['Metacritic'] = movies_rated['Metacritic'].copy().apply(lambda x: float(re.match('(\\S+)\\/', x)[1])/10)\n",
|
|
"movies_rated.head(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Finally, let's write the cleaned result to a local file so we don't have to call the API again and risk exceeding our daily limit."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"movies_rated.to_csv('./movies_rated.csv', index=False)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Importing `movies_rated.csv`\n",
|
|
"\n",
|
|
"If you just called the API in the previous section, you can skip this and proceed to the `exploratory data analysis` section."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's read in the cleaned, rated `movies_rated.csv` file, which was included with this repo just in case you couldn't call the API."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>title</th>\n",
|
|
" <th>year</th>\n",
|
|
" <th>content_rating</th>\n",
|
|
" <th>genre</th>\n",
|
|
" <th>duration</th>\n",
|
|
" <th>gross</th>\n",
|
|
" <th>Internet Movie Database</th>\n",
|
|
" <th>Rotten Tomatoes</th>\n",
|
|
" <th>Metacritic</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>The Shawshank Redemption</td>\n",
|
|
" <td>1994</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Drama</td>\n",
|
|
" <td>142</td>\n",
|
|
" <td>1963330</td>\n",
|
|
" <td>9.3</td>\n",
|
|
" <td>9.1</td>\n",
|
|
" <td>8.0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>The Godfather</td>\n",
|
|
" <td>1972</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Crime</td>\n",
|
|
" <td>175</td>\n",
|
|
" <td>28341469</td>\n",
|
|
" <td>9.2</td>\n",
|
|
" <td>9.8</td>\n",
|
|
" <td>10.0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>The Dark Knight</td>\n",
|
|
" <td>2008</td>\n",
|
|
" <td>PG-13</td>\n",
|
|
" <td>Action</td>\n",
|
|
" <td>152</td>\n",
|
|
" <td>1344258</td>\n",
|
|
" <td>9.0</td>\n",
|
|
" <td>9.4</td>\n",
|
|
" <td>8.2</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" title year content_rating genre duration \\\n",
|
|
"0 The Shawshank Redemption 1994 R Drama 142 \n",
|
|
"1 The Godfather 1972 R Crime 175 \n",
|
|
"2 The Dark Knight 2008 PG-13 Action 152 \n",
|
|
"\n",
|
|
" gross Internet Movie Database Rotten Tomatoes Metacritic \n",
|
|
"0 1963330 9.3 9.1 8.0 \n",
|
|
"1 28341469 9.2 9.8 10.0 \n",
|
|
"2 1344258 9.0 9.4 8.2 "
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated = pd.read_csv('../data/movies_rated.csv')\n",
|
|
"movies_rated.head(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Check our datatypes. Notice anything potentially problematic?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"title object\n",
|
|
"year int64\n",
|
|
"content_rating object\n",
|
|
"genre object\n",
|
|
"duration int64\n",
|
|
"gross int64\n",
|
|
"Internet Movie Database float64\n",
|
|
"Rotten Tomatoes float64\n",
|
|
"Metacritic float64\n",
|
|
"dtype: object"
|
|
]
|
|
},
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated.dtypes"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Exploratory data analysis\n",
|
|
"\n",
|
|
"Let's transition to asking and answering some questions with our data."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"What are the top five R-Rated movies?\n",
|
|
"\n",
|
|
"*hint: Boolean filters needed! Then sorting!*"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>title</th>\n",
|
|
" <th>year</th>\n",
|
|
" <th>content_rating</th>\n",
|
|
" <th>genre</th>\n",
|
|
" <th>duration</th>\n",
|
|
" <th>gross</th>\n",
|
|
" <th>Internet Movie Database</th>\n",
|
|
" <th>Rotten Tomatoes</th>\n",
|
|
" <th>Metacritic</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>The Shawshank Redemption</td>\n",
|
|
" <td>1994</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Drama</td>\n",
|
|
" <td>142</td>\n",
|
|
" <td>1963330</td>\n",
|
|
" <td>9.3</td>\n",
|
|
" <td>9.1</td>\n",
|
|
" <td>8.0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>The Godfather</td>\n",
|
|
" <td>1972</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Crime</td>\n",
|
|
" <td>175</td>\n",
|
|
" <td>28341469</td>\n",
|
|
" <td>9.2</td>\n",
|
|
" <td>9.8</td>\n",
|
|
" <td>10.0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>The Godfather: Part II</td>\n",
|
|
" <td>1974</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Crime</td>\n",
|
|
" <td>202</td>\n",
|
|
" <td>134966411</td>\n",
|
|
" <td>9.0</td>\n",
|
|
" <td>9.7</td>\n",
|
|
" <td>9.0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>Schindler's List</td>\n",
|
|
" <td>1993</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Biography</td>\n",
|
|
" <td>195</td>\n",
|
|
" <td>534858444</td>\n",
|
|
" <td>8.9</td>\n",
|
|
" <td>9.7</td>\n",
|
|
" <td>9.3</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>The Good, the Bad and the Ugly</td>\n",
|
|
" <td>1966</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Western</td>\n",
|
|
" <td>178</td>\n",
|
|
" <td>57300000</td>\n",
|
|
" <td>8.9</td>\n",
|
|
" <td>9.7</td>\n",
|
|
" <td>9.0</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" title year content_rating genre \\\n",
|
|
"0 The Shawshank Redemption 1994 R Drama \n",
|
|
"1 The Godfather 1972 R Crime \n",
|
|
"3 The Godfather: Part II 1974 R Crime \n",
|
|
"5 Schindler's List 1993 R Biography \n",
|
|
"7 The Good, the Bad and the Ugly 1966 R Western \n",
|
|
"\n",
|
|
" duration gross Internet Movie Database Rotten Tomatoes Metacritic \n",
|
|
"0 142 1963330 9.3 9.1 8.0 \n",
|
|
"1 175 28341469 9.2 9.8 10.0 \n",
|
|
"3 202 134966411 9.0 9.7 9.0 \n",
|
|
"5 195 534858444 8.9 9.7 9.3 \n",
|
|
"7 178 57300000 8.9 9.7 9.0 "
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated[movies_rated.content_rating == 'R'].sort_values(by='Internet Movie Database', ascending=False).head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"What is the average Rotten Tomato score for the top IMDB films?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"9.087341772151897"
|
|
]
|
|
},
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated['Rotten Tomatoes'].mean()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"What is the Five Number Summary like for top rated films as per IMDB? Is it skewed?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"count 79.000000\n",
|
|
"mean 8.537975\n",
|
|
"std 0.222056\n",
|
|
"min 8.300000\n",
|
|
"25% 8.400000\n",
|
|
"50% 8.500000\n",
|
|
"75% 8.600000\n",
|
|
"max 9.300000\n",
|
|
"Name: Internet Movie Database, dtype: float64"
|
|
]
|
|
},
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated['Internet Movie Database'].describe()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The average is *slightly* higher than the median, so there's a small positive skew."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Create your own question...then answer it!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>year</th>\n",
|
|
" <th>duration</th>\n",
|
|
" <th>gross</th>\n",
|
|
" <th>Internet Movie Database</th>\n",
|
|
" <th>Rotten Tomatoes</th>\n",
|
|
" <th>Metacritic</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>year</th>\n",
|
|
" <td>1.000000</td>\n",
|
|
" <td>0.145930</td>\n",
|
|
" <td>-0.107644</td>\n",
|
|
" <td>-0.044124</td>\n",
|
|
" <td>-0.479430</td>\n",
|
|
" <td>-0.487070</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>duration</th>\n",
|
|
" <td>0.145930</td>\n",
|
|
" <td>1.000000</td>\n",
|
|
" <td>0.098006</td>\n",
|
|
" <td>0.416829</td>\n",
|
|
" <td>-0.088653</td>\n",
|
|
" <td>-0.020531</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>gross</th>\n",
|
|
" <td>-0.107644</td>\n",
|
|
" <td>0.098006</td>\n",
|
|
" <td>1.000000</td>\n",
|
|
" <td>0.146099</td>\n",
|
|
" <td>-0.019891</td>\n",
|
|
" <td>-0.038350</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>Internet Movie Database</th>\n",
|
|
" <td>-0.044124</td>\n",
|
|
" <td>0.416829</td>\n",
|
|
" <td>0.146099</td>\n",
|
|
" <td>1.000000</td>\n",
|
|
" <td>0.062015</td>\n",
|
|
" <td>0.261009</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>Rotten Tomatoes</th>\n",
|
|
" <td>-0.479430</td>\n",
|
|
" <td>-0.088653</td>\n",
|
|
" <td>-0.019891</td>\n",
|
|
" <td>0.062015</td>\n",
|
|
" <td>1.000000</td>\n",
|
|
" <td>0.765957</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>Metacritic</th>\n",
|
|
" <td>-0.487070</td>\n",
|
|
" <td>-0.020531</td>\n",
|
|
" <td>-0.038350</td>\n",
|
|
" <td>0.261009</td>\n",
|
|
" <td>0.765957</td>\n",
|
|
" <td>1.000000</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" year duration gross \\\n",
|
|
"year 1.000000 0.145930 -0.107644 \n",
|
|
"duration 0.145930 1.000000 0.098006 \n",
|
|
"gross -0.107644 0.098006 1.000000 \n",
|
|
"Internet Movie Database -0.044124 0.416829 0.146099 \n",
|
|
"Rotten Tomatoes -0.479430 -0.088653 -0.019891 \n",
|
|
"Metacritic -0.487070 -0.020531 -0.038350 \n",
|
|
"\n",
|
|
" Internet Movie Database Rotten Tomatoes Metacritic \n",
|
|
"year -0.044124 -0.479430 -0.487070 \n",
|
|
"duration 0.416829 -0.088653 -0.020531 \n",
|
|
"gross 0.146099 -0.019891 -0.038350 \n",
|
|
"Internet Movie Database 1.000000 0.062015 0.261009 \n",
|
|
"Rotten Tomatoes 0.062015 1.000000 0.765957 \n",
|
|
"Metacritic 0.261009 0.765957 1.000000 "
|
|
]
|
|
},
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# correlation between star rating and Rotten Tomato rating?\n",
|
|
"movies_rated.corr()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Challenge:** Create a dataframe that is the ratio between Rotten Tomato rating vs IMDB rating. What film has the highest IMDB : Rotten Tomato ratio? The lowest?\n",
|
|
"\n",
|
|
"*[skip this if you are low on time]*"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>Ratings Ratio</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>1.021978</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>0.938776</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>0.957447</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" Ratings Ratio\n",
|
|
"0 1.021978\n",
|
|
"1 0.938776\n",
|
|
"2 0.957447"
|
|
]
|
|
},
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"rr = pd.DataFrame(movies_rated['Internet Movie Database'] / movies_rated['Rotten Tomatoes'], columns=['Ratings Ratio'])\n",
|
|
"rr.head(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Top 3 ratings ratio movies (rated higher on IMBD compared to Rotten Tomatoes)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>title</th>\n",
|
|
" <th>year</th>\n",
|
|
" <th>content_rating</th>\n",
|
|
" <th>genre</th>\n",
|
|
" <th>duration</th>\n",
|
|
" <th>gross</th>\n",
|
|
" <th>Internet Movie Database</th>\n",
|
|
" <th>Rotten Tomatoes</th>\n",
|
|
" <th>Metacritic</th>\n",
|
|
" <th>Ratings Ratio</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>12</th>\n",
|
|
" <td>Forrest Gump</td>\n",
|
|
" <td>1994</td>\n",
|
|
" <td>PG-13</td>\n",
|
|
" <td>Drama</td>\n",
|
|
" <td>142</td>\n",
|
|
" <td>1401164</td>\n",
|
|
" <td>8.8</td>\n",
|
|
" <td>7.2</td>\n",
|
|
" <td>8.2</td>\n",
|
|
" <td>1.222222</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>19</th>\n",
|
|
" <td>Interstellar</td>\n",
|
|
" <td>2014</td>\n",
|
|
" <td>PG-13</td>\n",
|
|
" <td>Adventure</td>\n",
|
|
" <td>169</td>\n",
|
|
" <td>315544750</td>\n",
|
|
" <td>8.6</td>\n",
|
|
" <td>7.1</td>\n",
|
|
" <td>7.4</td>\n",
|
|
" <td>1.211268</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>42</th>\n",
|
|
" <td>The Intouchables</td>\n",
|
|
" <td>2011</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Biography</td>\n",
|
|
" <td>112</td>\n",
|
|
" <td>1059654</td>\n",
|
|
" <td>8.5</td>\n",
|
|
" <td>7.4</td>\n",
|
|
" <td>5.7</td>\n",
|
|
" <td>1.148649</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" title year content_rating genre duration gross \\\n",
|
|
"12 Forrest Gump 1994 PG-13 Drama 142 1401164 \n",
|
|
"19 Interstellar 2014 PG-13 Adventure 169 315544750 \n",
|
|
"42 The Intouchables 2011 R Biography 112 1059654 \n",
|
|
"\n",
|
|
" Internet Movie Database Rotten Tomatoes Metacritic Ratings Ratio \n",
|
|
"12 8.8 7.2 8.2 1.222222 \n",
|
|
"19 8.6 7.1 7.4 1.211268 \n",
|
|
"42 8.5 7.4 5.7 1.148649 "
|
|
]
|
|
},
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated.merge(rr, left_index=True, right_index=True).sort_values('Ratings Ratio', ascending=False).head(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Bottom 3 ratings ratio movies (rated lower on IMBD compared to Rotten Tomatoes)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>title</th>\n",
|
|
" <th>year</th>\n",
|
|
" <th>content_rating</th>\n",
|
|
" <th>genre</th>\n",
|
|
" <th>duration</th>\n",
|
|
" <th>gross</th>\n",
|
|
" <th>Internet Movie Database</th>\n",
|
|
" <th>Rotten Tomatoes</th>\n",
|
|
" <th>Metacritic</th>\n",
|
|
" <th>Ratings Ratio</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>66</th>\n",
|
|
" <td>Toy Story 3</td>\n",
|
|
" <td>2010</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Animation</td>\n",
|
|
" <td>103</td>\n",
|
|
" <td>499468</td>\n",
|
|
" <td>8.3</td>\n",
|
|
" <td>9.9</td>\n",
|
|
" <td>9.2</td>\n",
|
|
" <td>0.838384</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>74</th>\n",
|
|
" <td>L.A. Confidential</td>\n",
|
|
" <td>1997</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Crime</td>\n",
|
|
" <td>138</td>\n",
|
|
" <td>13182281</td>\n",
|
|
" <td>8.3</td>\n",
|
|
" <td>9.9</td>\n",
|
|
" <td>9.0</td>\n",
|
|
" <td>0.838384</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>63</th>\n",
|
|
" <td>Toy Story</td>\n",
|
|
" <td>1995</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Animation</td>\n",
|
|
" <td>81</td>\n",
|
|
" <td>83471511</td>\n",
|
|
" <td>8.3</td>\n",
|
|
" <td>10.0</td>\n",
|
|
" <td>9.5</td>\n",
|
|
" <td>0.830000</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" title year content_rating genre duration gross \\\n",
|
|
"66 Toy Story 3 2010 R Animation 103 499468 \n",
|
|
"74 L.A. Confidential 1997 R Crime 138 13182281 \n",
|
|
"63 Toy Story 1995 R Animation 81 83471511 \n",
|
|
"\n",
|
|
" Internet Movie Database Rotten Tomatoes Metacritic Ratings Ratio \n",
|
|
"66 8.3 9.9 9.2 0.838384 \n",
|
|
"74 8.3 9.9 9.0 0.838384 \n",
|
|
"63 8.3 10.0 9.5 0.830000 "
|
|
]
|
|
},
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated.merge(rr, left_index=True, right_index=True).sort_values('Ratings Ratio', ascending=False).tail(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Exploratory data analysis with visualizations\n",
|
|
"\n",
|
|
"For each of these prompts, create a plot to visualize the answer. Consider what plot is *most appropriate* to explore the given prompt.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"What is the relationship between IMDB ratings and Rotten Tomato ratings?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 22,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<matplotlib.axes._subplots.AxesSubplot at 0x7f53393ce198>"
|
|
]
|
|
},
|
|
"execution_count": 22,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated.plot(kind='scatter', x='Internet Movie Database', y='Rotten Tomatoes')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"What is the relationship between IMDB rating and movie duration?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 23,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<matplotlib.axes._subplots.AxesSubplot at 0x7f5339084da0>"
|
|
]
|
|
},
|
|
"execution_count": 23,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated.plot(kind='scatter', x='duration', y='Internet Movie Database')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"How many movies are there in each genre category? (Remember to create a plot here)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 24,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<matplotlib.axes._subplots.AxesSubplot at 0x7f5339006f98>"
|
|
]
|
|
},
|
|
"execution_count": 24,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated['genre'].value_counts().plot(kind='bar', color='dodgerblue')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"What does the distribution of Rotten Tomatoes ratings look like?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 25,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<matplotlib.axes._subplots.AxesSubplot at 0x7f5338f93780>"
|
|
]
|
|
},
|
|
"execution_count": 25,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD8CAYAAAB6paOMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAEKhJREFUeJzt3XuQJWV9xvHvIxhhjQrI4AVYF1IUaigpcbRQEzQihoiCGpNAaQLeNpYaL0lVXJOUmlSlgolRY0xFV0URFRW8oYC64oWkSsAFURcWgwoigrJKSrxFRH/54/TqOMwyPZdzembe76dq6nT36en393Kmefbt7tOdqkKS1K47DV2AJGlYBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcbsPXUAf++67b23YsGHoMiRpVbn00ku/W1VT8623KoJgw4YNbN26degyJGlVSfKNPut5aEiSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhq3Kr5ZLEnjsGHTucu6vWtPPW5ZtzcpjggkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGjS0IkpyW5KYk22Ys+5ckVyX5UpIPJtlrXO1LkvoZ54jg7cCxs5ZtAQ6rqgcB/wO8bIztS5J6GFsQVNWFwM2zln2iqm7rZi8CDhhX+5KkfoY8R/BM4PwB25ckMVAQJPlb4DbgXXewzsYkW5Ns3bFjx+SKk6TGTDwIkpwMPAF4WlXVrtarqs1VNV1V01NTU5MrUJIaM9EnlCU5Fngp8Kiq+vEk25YkzW2cl4+eCXwOODTJ9UmeBbwBuBuwJcnlSd44rvYlSf2MbURQVSfNsfit42pPkrQ4frNYkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklq3ETvPiqpHRs2nbvs27z21OOWfZtyRCBJzTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDVubEGQ5LQkNyXZNmPZPkm2JLm6e917XO1LkvoZ54jg7cCxs5ZtAi6oqkOAC7p5SdKAxhYEVXUhcPOsxScAp3fTpwNPGlf7kqR+Jn2O4F5VdSNA97rfhNuXJM2yYp9HkGQjsBFg/fr1A1cjrX3jeH5Aa1brMxgmPSL4TpL7AHSvN+1qxaraXFXTVTU9NTU1sQIlqTWTDoJzgJO76ZOBD0+4fUnSLOO8fPRM4HPAoUmuT/Is4FTgmCRXA8d085KkAY3tHEFVnbSLt44eV5uSpIXzm8WS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjesVBEkOG3chkqRh9B0RvDHJJUmel2SvsVYkSZqoXkFQVb8DPA04ENia5N1Jjllso0lekuSKJNuSnJlkj8VuS5K0NL3PEVTV1cDfAS8FHgW8PslVSZ6ykAaT7A+8EJiuqsOA3YATF7INSdLy6XuO4EFJXgtsBx4DPLGqHtBNv3YR7e4O7Jlkd2AdcMMitiFJWgZ9RwRvAC4DDq+q51fVZQBVdQOjUUJvVfUt4NXAdcCNwPer6hOz10uyMcnWJFt37NixkCYkSQvQNwgeD7y7qn4CkOROSdYBVNUZC2kwyd7ACcBBwH2BuyZ5+uz1qmpzVU1X1fTU1NRCmpAkLUDfIPgksOeM+XXdssV4LHBNVe2oqp8BHwAeschtSZKWqG8Q7FFVP9w5002vW2Sb1wFHJlmXJMDRjM49SJIG0DcIfpTkiJ0zSR4C/GQxDVbVxcDZjM45fLmrYfNitiVJWrrde673YuCsJDuv7rkP8CeLbbSqXgG8YrG/L0laPr2CoKo+n+T+wKFAgKu64/uSpFWu74gA4KHAhu53HpyEqnrHWKqSJE1MryBIcgbwW8DlwM+7xQUYBJK0yvUdEUwDD6yqGmcxkqTJ63vV0Dbg3uMsRJI0jL4jgn2BK5NcAvx058KqOn4sVUmSJqZvELxynEVIkobT9/LRzya5H3BIVX2yu8/QbuMtTZI0CX1vQ/0cRt8GflO3aH/gQ+MqSpI0OX1PFj8feCRwC/zyITX7jasoSdLk9D1H8NOqunV0jzjoHijjpaSSJmrDpnOHLmFN6jsi+GySv2H0VLFjgLOAj4yvLEnSpPQNgk3ADkZ3C/1z4DwW+GQySdLK1PeqoV8Ab+5+JElrSN97DV3DHOcEqurgZa9IkjRRC7nX0E57AH8E7LP85UiSJq3XOYKq+t6Mn29V1euAx4y5NknSBPQ9NHTEjNk7MRoh3G0sFUmSJqrvoaF/nTF9G3At8MfLXo0kaeL6XjX0e+MuRJI0jL6Hhv7yjt6vqtcsTzmSpElbyFVDDwXO6eafCFwIfHMcRUmSJmchD6Y5oqp+AJDklcBZVfXscRUmSZqMvreYWA/cOmP+VmDDYhtNsleSs5NclWR7kocvdluSpKXpOyI4A7gkyQcZfcP4ycA7ltDuvwEfq6qnJvkNYN0StiVJWoK+Vw39Y5Lzgd/tFj2jqr6wmAaT3B04Cjil2/at/PpoQ5I0QX1HBDD6V/stVfW2JFNJDqqqaxbR5sGM7mT6tiSHA5cCL6qqH81cKclGYCPA+vXrF9GMtHKM4z7615563LJvU23q+6jKVwAvBV7WLboz8M5Ftrk7cATwn1X1YOBHjG5z/WuqanNVTVfV9NTU1CKbkiTNp+/J4icDxzP6nzZVdQOLv8XE9cD1VXVxN382o2CQJA2gbxDcWlVFdyvqJHddbINV9W3gm0kO7RYdDVy52O1Jkpam7zmC9yV5E7BXkucAz2RpD6n5C+Bd3RVDXweesYRtSZKWoO9VQ6/unlV8C3Ao8PKq2rLYRqvqcn79GQeSpIHMGwRJdgM+XlWPBRb9P39J0so07zmCqvo58OMk95hAPZKkCet7juD/gC8n2UJ35RBAVb1wLFVJkiambxCc2/1IktaYOwyCJOur6rqqOn1SBUmSJmu+cwQf2jmR5P1jrkWSNID5giAzpg8eZyGSpGHMFwS1i2lJ0hox38niw5PcwmhksGc3TTdfVXX3sVYnSRq7OwyCqtptUoVIkobR96ZzkqQ1yiCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaN1gQJNktyReSfHSoGiRJw44IXgRsH7B9SRIDBUGSA4DjgLcM0b4k6VeGGhG8Dvhr4BcDtS9J6sz3hLJll+QJwE1VdWmSR9/BehuBjQDr16+fUHWay4ZN5y77Nq899bhl32ZrxvG5qE1DjAgeCRyf5FrgPcBjkrxz9kpVtbmqpqtqempqatI1SlIzJh4EVfWyqjqgqjYAJwKfqqqnT7oOSdKI3yOQpMZN/BzBTFX1GeAzQ9YgSa1zRCBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDVu0HsNSctlue/N7/MS1BJHBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMmHgRJDkzy6STbk1yR5EWTrkGS9CtD3HTuNuCvquqyJHcDLk2ypaquHKAWSWrexEcEVXVjVV3WTf8A2A7sP+k6JEkjg54jSLIBeDBw8ZB1SFLLBnseQZLfBN4PvLiqbpnj/Y3ARoD169cvup3lvk+92uDfjVoyyIggyZ0ZhcC7quoDc61TVZurarqqpqempiZboCQ1ZIirhgK8FdheVa+ZdPuSpF83xIjgkcCfAo9Jcnn38/gB6pAkMcA5gqr6byCTbleSNDe/WSxJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDVusOcRqG3e719aORwRSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjRskCJIcm+QrSb6aZNMQNUiSRiYeBEl2A/4D+APggcBJSR446TokSSNDjAgeBny1qr5eVbcC7wFOGKAOSRLDBMH+wDdnzF/fLZMkDWCI5xFkjmV1u5WSjcDGbvaHSb4ya5V9ge8uc21DWmv9gbXXp7XWH1h7fVpr/SGvWlKf7tdnpSGC4HrgwBnzBwA3zF6pqjYDm3e1kSRbq2p6+csbxlrrD6y9Pq21/sDa69Na6w9Mpk9DHBr6PHBIkoOS/AZwInDOAHVIkhhgRFBVtyV5AfBxYDfgtKq6YtJ1SJJGBnlmcVWdB5y3xM3s8rDRKrXW+gNrr09rrT+w9vq01voDE+hTqm53nlaS1BBvMSFJjVvRQZDk0CSXz/i5JcmLZ62TJK/vblfxpSRHDFXvfHr259FJvj9jnZcPVW8fSV6S5Iok25KcmWSPWe/fJcl7u8/n4iQbhqm0vx59OiXJjhmf0bOHqrWPJC/q+nLF7L+37v1Vsw/t1KNPK34/SnJakpuSbJuxbJ8kW5Jc3b3uvYvfPblb5+okJy+5mKpaFT+MTix/G7jfrOWPB85n9P2EI4GLh651if15NPDRoevr2Yf9gWuAPbv59wGnzFrnecAbu+kTgfcOXfcy9OkU4A1D19qzP4cB24B1jM4JfhI4ZNY6q2of6tmnFb8fAUcBRwDbZiz7Z2BTN70JeNUcv7cP8PXude9ueu+l1LKiRwSzHA18raq+MWv5CcA7auQiYK8k95l8eQu2q/6sNrsDeybZndGOOfs7IScAp3fTZwNHJ5nrS4UryXx9Wk0eAFxUVT+uqtuAzwJPnrXOatuH+vRpxauqC4GbZy2eub+cDjxpjl/9fWBLVd1cVf8LbAGOXUotqykITgTOnGP5ar1lxa76A/DwJF9Mcn6S355kUQtRVd8CXg1cB9wIfL+qPjFrtV9+Pt1O+33gnpOscyF69gngD7vDKGcnOXCO91eKbcBRSe6ZZB2jf/3Prne17UN9+gSrZD+a5V5VdSNA97rfHOss++e1KoKg++LZ8cBZc709x7IVfSnUPP25jNHhosOBfwc+NMnaFqI7fnkCcBBwX+CuSZ4+e7U5fnXFfj49+/QRYENVPYjRYYnTWaGqajvwKkb/avwY8EXgtlmrrarPqGefVs1+tAjL/nmtiiBgdMvqy6rqO3O81+uWFSvMLvtTVbdU1Q+76fOAOyfZd9IF9vRY4Jqq2lFVPwM+ADxi1jq//Hy6Qy334PbD4ZVk3j5V1feq6qfd7JuBh0y4xgWpqrdW1RFVdRSj//ZXz1pl1e1D8/Vple1HM31n52G57vWmOdZZ9s9rtQTBSez6MMo5wJ91Vz4cyWgof+PkSluUXfYnyb13HkNP8jBGn9H3JljbQlwHHJlkXVfz0cD2WeucA+y8quGpwKeqO+O1Qs3bp1nHz4+f/f5Kk2S/7nU98BRu/7e36vah+fq0yvajmWbuLycDH55jnY8Dj0uydzeCfVy3bPGGPnPe48z6OkYf4D1mLHsu8NxuOowedPM14MvA9NA1L7E/LwCuYDTcvQh4xNA1z9OfvweuYnTc9gzgLsA/AMd37+/B6BDYV4FLgIOHrnkZ+vRPMz6jTwP3H7rmefrzX8CVXb1Hz/E3t6r2oZ59WvH7EaPwuhH4GaN/5T+L0fmzCxiNcC4A9unWnQbeMuN3n9ntU18FnrHUWvxmsSQ1brUcGpIkjYlBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4/4fbTG3bcqrlZgAAAAASUVORK5CYII=\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated['Rotten Tomatoes'].plot(kind='hist', bins=15)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Bonus\n",
|
|
"\n",
|
|
"There are many things left unexplored! Consider investigating something about gross revenue and genres."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 26,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<matplotlib.axes._subplots.AxesSubplot at 0x7f5338f69f60>"
|
|
]
|
|
},
|
|
"execution_count": 26,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEJCAYAAACZjSCSAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAERVJREFUeJzt3XuMZnV9x/H3h4tyUURkoBsuXTUEIaYKXdGW1lYQgzfARqzEGmKo2NRaCE0sElNskyY2qaJNWxXBuuIFBUSoteqK4CWxwi5QQcGCCLguZdcLAdRKwW//eM6aKe7unFn2PGee+b1fyeQ558x5nvPJZHc+c26/k6pCktSuncYOIEkal0UgSY2zCCSpcRaBJDXOIpCkxlkEktQ4i0CSGmcRSFLjLAJJatwuYwfoY999962VK1eOHUOSZsq6det+UFVzC603E0WwcuVK1q5dO3YMSZopSe7qs56HhiSpcRaBJDXOIpCkxlkEktQ4i0CSGmcRSFLjLAJJapxFIEmNswgkqXEzcWfxY/Hr796xn3fXGTv28yRpbO4RSFLjLAJJapxFIEmNswgkqXEWgSQ1ziKQpMZZBJLUOItAkhpnEUhS4ywCSWqcRSBJjbMIJKlxFoEkNc4ikKTGDToMdZI7gQeAR4CHq2pVkn2AjwMrgTuBV1XVj4fMIUnaumnsEbygqp5dVau6+bOBq6rqEOCqbl6SNJIxDg2dCKzuplcDJ42QQZLUGboICvh8knVJTu+W7V9V9wB0r/sNnEGStA1DP6ry6KrakGQ/YE2SW/u+sSuO0wEOPvjgofJJUvMG3SOoqg3d60bgcuAo4N4kKwC6141bee/5VbWqqlbNzc0NGVOSmjZYESTZM8kTN08DLwJuBq4ETu1WOxW4YqgMkqSFDXloaH/g8iSbt/PRqvpskuuATyQ5DbgbOHnADJKkBQxWBFV1B/CsLSz/IXDsUNuVJC2OdxZLUuMsAklqnEUgSY2zCCSpcRaBJDXOIpCkxlkEktQ4i0CSGmcRSFLjLAJJapxFIEmNswgkqXEWgSQ1ziKQpMZZBJLUOItAkhpnEUhS4ywCSWqcRSBJjbMIJKlxFoEkNc4ikKTGWQSS1DiLQJIaZxFIUuMsAklqnEUgSY2zCCSpcRaBJDVu8CJIsnOSG5J8upt/apKvJ7ktyceTPG7oDJKkrZvGHsEZwC3z5v8OOK+qDgF+DJw2hQySpK0YtAiSHAi8FLigmw9wDHBpt8pq4KQhM0iStm3oPYJ3AW8GftHNPwW4r6oe7ubXAwds6Y1JTk+yNsnaTZs2DRxTkto1WBEkeRmwsarWzV+8hVVrS++vqvOralVVrZqbmxskoyQJdhnws48GTkjyEmA3YC8mewh7J9ml2ys4ENgwYAZJ0gIG2yOoqrdU1YFVtRJ4NfDFqnoNcDXwym61U4ErhsogSVrYGPcR/CVwVpLbmZwzuHCEDJKkzpCHhn6pqq4Brumm7wCOmsZ2JUkL885iSWqcRSBJjbMIJKlxFoEkNc4ikKTGWQSS1DiLQJIaZxFIUuMsAklqnEUgSY2zCCSpcRaBJDWuVxEkeebQQSRJ4+i7R/DeJNcm+dMkew+aSJI0Vb2KoKp+B3gNcBCwNslHkxw3aDJJ0lT0PkdQVbcBb2XyYJnfA/4hya1J/mCocJKk4fU9R/AbSc4DbgGOAV5eVYd10+cNmE+SNLC+Tyj7R+D9wDlV9bPNC6tqQ5K3DpJMkjQVfYvgJcDPquoRgCQ7AbtV1U+r6qLB0kmSBtf3HMEXgN3nze/RLZMkzbi+RbBbVT24eaab3mOYSJKkaepbBD9JcuTmmSS/CfxsG+tLkmZE33MEZwKXJNnQza8A/nCYSJKkaepVBFV1XZJnAIcCAW6tqv8dNJkkaSr67hEAPAdY2b3niCRU1YcGSSVJmppeRZDkIuDpwI3AI93iAiwCSZpxffcIVgGHV1UNGUaSNH19rxq6Gfi1IYNIksbRd49gX+BbSa4Ffr55YVWdsLU3JNkN+DLw+G47l1bVuUmeClwM7ANcD7y2qh7azvySpMeobxG8bTs+++fAMVX1YJJdga8m+XfgLOC8qro4yXuB04D3bMfnS5J2gL7PI/gScCewazd9HZO/5rf1npp3N/Ku3VcxGbH00m75auCkxceWJO0ofYehfj2TX97v6xYdAHyqx/t2TnIjsBFYA3wHuK+qHu5WWd99liRpJH1PFr8ROBq4H375kJr9FnpTVT1SVc8GDgSOAg7b0mpbem+S05OsTbJ206ZNPWNKkharbxH8fP4J3SS7sJVf4FtSVfcB1wDPA/bu3g+TgtiwlfecX1WrqmrV3Nxc301JkhapbxF8Kck5wO7ds4ovAf51W29IMrf5QfdJdgdeyOQJZ1cDr+xWOxW4YnuCS5J2jL5FcDawCbgJeAPwGSbPL96WFcDVSb7B5OTymqr6NJNnHp+V5HbgKcCF2xNckrRj9B107hdMHlX5/r4fXFXfAI7YwvI7mJwvkCQtAX3HGvouWzgnUFVP2+GJJElTtZixhjbbDTiZyZ3BkqQZ1/eGsh/O+/p+Vb2LyY1hkqQZ1/fQ0JHzZndisofwxEESSZKmqu+hoXfMm36YyXATr9rhaSRJU9f3qqEXDB1EkjSOvoeGztrW96vqnTsmjiRp2hZz1dBzgCu7+ZczedbA94YIJUmansU8mObIqnoAIMnbgEuq6o+HCiZJmo6+Q0wcDMx/ithDwModnkaSNHV99wguAq5NcjmTO4xfAXxosFSSpKnpe9XQ33aPmfzdbtHrquqG4WJJkqal76EhgD2A+6vq3cD67iH0kqQZ1/dRlecyGT76Ld2iXYEPDxVKkjQ9ffcIXgGcAPwEoKo24BATkrQs9C2Ch6qq6IaiTrLncJEkSdPUtwg+keR9TJ43/HrgCyziITWSpKWr71VDf989q/h+4FDgr6pqzaDJJElTsWARJNkZ+FxVvRDwl78kLTMLHhqqqkeAnyZ50hTySJKmrO+dxf8D3JRkDd2VQwBV9eeDpJIkTU3fIvi37kuStMxsswiSHFxVd1fV6mkFkiRN10LnCD61eSLJZQNnkSSNYKEiyLzppw0ZRJI0joWKoLYyLUlaJhY6WfysJPcz2TPYvZumm6+q2mvQdJKkwW2zCKpq52kFkSSNYzHPI5AkLUODFUGSg5JcneSWJN9Mcka3fJ8ka5Lc1r0+eagMkqSFDblH8DDwF1V1GPA84I1JDgfOBq6qqkOAq7p5SdJIBiuCqrqnqq7vph8AbgEOAE4ENt+gtho4aagMkqSFTeUcQZKVwBHA14H9q+oemJQFsN9W3nN6krVJ1m7atGkaMSWpSYMXQZInAJcBZ1bV/Qutv1lVnV9Vq6pq1dzc3HABJalxgxZBkl2ZlMBHquqT3eJ7k6zovr8C2DhkBknStg151VCAC4Fbquqd8751JXBqN30qcMVQGSRJC+s7DPX2OBp4LZPnGNzYLTsHeDuTZyCfBtwNnDxgBknSAgYrgqr6Kv9/0Lr5jh1qu5KkxfHOYklqnEUgSY2zCCSpcRaBJDXOIpCkxlkEktQ4i0CSGmcRSFLjLAJJapxFIEmNswgkqXEWgSQ1ziKQpMZZBJLUOItAkhpnEUhS4ywCSWqcRSBJjbMIJKlxFoEkNc4ikKTGWQSS1DiLQJIaZxFIUuMsAklqnEUgSY2zCCSpcRaBJDVusCJI8oEkG5PcPG/ZPknWJLmte33yUNuXJPUz5B7BB4HjH7XsbOCqqjoEuKqblySNaLAiqKovAz961OITgdXd9GrgpKG2L0nqZ9rnCPavqnsAutf9prx9SdKjLNmTxUlOT7I2ydpNmzaNHUeSlq1pF8G9SVYAdK8bt7ZiVZ1fVauqatXc3NzUAkpSa6ZdBFcCp3bTpwJXTHn7kqRHGfLy0Y8BXwMOTbI+yWnA24HjktwGHNfNS5JGtMtQH1xVp2zlW8cOtU1J0uIt2ZPFkqTpsAgkqXEWgSQ1ziKQpMZZBJLUOItAkhpnEUhS4ywCSWqcRSBJjbMIJKlxgw0xof5+/d079vPuOmPHfp6k5c09AklqnEUgSY2zCCSpcZ4j0II8hyEtb+4RSFLjLAJJapxFIEmNswgkqXEWgSQ1ziKQpMZZBJLUOO8jWKQdfU39EGYh41LnvRNqiXsEktQ4i0CSGmcRSFLjLAJJapxFIEmNswgkqXGjXD6a5Hjg3cDOwAVV9fYxcmj5WOqXzA6Rr8VLUpf6Zb1LPd/WTH2PIMnOwD8BLwYOB05Jcvi0c0iSJsY4NHQUcHtV3VFVDwEXAyeOkEOSxDhFcADwvXnz67tlkqQRjHGOIFtYVr+yUnI6cHo3+2CSb2/n9vYFfrCd7x3TLObulTlnTiHJ4sziz5qcOZu5WUI/70X8Wxwl8w74v3Jon5XGKIL1wEHz5g8ENjx6pao6Hzj/sW4sydqqWvVYP2faZjH3LGYGc0/bLOaexcwwyd1nvTEODV0HHJLkqUkeB7wauHKEHJIkRtgjqKqHk/wZ8Dkml49+oKq+Oe0ckqSJUe4jqKrPAJ+Z0uYe8+Glkcxi7lnMDOaetlnMPYuZoWfuVP3KeVpJUkMcYkKSGresiyDJ8Um+neT2JGePnaePJB9IsjHJzWNn6SvJQUmuTnJLkm8mmYnBD5LsluTaJP/Z5f7rsTP1lWTnJDck+fTYWfpKcmeSm5Lc2PdqlqUgyd5JLk1ya/dv/LfGzrSQJId2P+fNX/cnW78YddkeGuqGsvgv4Dgml6xeB5xSVd8aNdgCkjwfeBD4UFU9c+w8fSRZAayoquuTPBFYB5w0Az/rAHtW1YNJdgW+CpxRVf8xcrQFJTkLWAXsVVUvGztPH0nuBFZV1ZK4h6CvJKuBr1TVBd2VjntU1X1j5+qr+134feC5VXXXltZZznsEMzmURVV9GfjR2DkWo6ruqarru+kHgFuYgbvFa+LBbnbX7mvJ/2WU5EDgpcAFY2dZ7pLsBTwfuBCgqh6apRLoHAt8Z2slAMu7CBzKYgRJVgJHAF8fN0k/3SGWG4GNwJqqmoXc7wLeDPxi7CCLVMDnk6zrRg6YBU8DNgH/0h2KuyDJnmOHWqRXAx/b1grLuQh6DWWhHSfJE4DLgDOr6v6x8/RRVY9U1bOZ3OF+VJIlfTguycuAjVW1buws2+HoqjqSycjDb+wOgy51uwBHAu+pqiOAnwAzcb4RoDuUdQJwybbWW85F0GsoC+0Y3TH2y4CPVNUnx86zWN3u/jXA8SNHWcjRwAnd8faLgWOSfHjcSP1U1YbudSNwOZPDt0vdemD9vD3FS5kUw6x4MXB9Vd27rZWWcxE4lMWUdCddLwRuqap3jp2nryRzSfbupncHXgjcOm6qbauqt1TVgVW1ksm/6S9W1R+NHGtBSfbsLiSgO7TyImDJXxlXVf8NfC/J5sHbjgWW9EUQj3IKCxwWgpHuLJ6GWR3KIsnHgN8H9k2yHji3qi4cN9WCjgZeC9zUHW8HOKe7g3wpWwGs7q6q2An4RFXNzOWYM2Z/4PLJ3wzsAny0qj47bqTe3gR8pPuD8g7gdSPn6SXJHkyumnzDgusu18tHJUn9LOdDQ5KkHiwCSWqcRSBJjbMIJKlxFoEkLTGLGXwyycHdoI83JPlGkpcsdnsWgSQtPR+k/82Nb2Vy6fMRTO4t+efFbswikKQlZkuDTyZ5epLPdmM1fSXJMzavDuzVTT+J7RhBYdneUCZJy8z5wJ9U1W1JnsvkL/9jgLcxGczvTcCeTO6QXxSLQJKWuG5Ax98GLunuzgZ4fPd6CvDBqnpH99Cci5I8s6p6j05rEUjS0rcTcF83Uu6jnUZ3PqGqvpZkN2BfJkOr9/5wSdIS1g3r/t0kJ8NkoMckz+q+fTeTwfBIchiwG5NnKPTmWEOStMTMH3wSuBc4F/gi8B4mgyXuClxcVX+T5HDg/cATmJw4fnNVfX5R27MIJKltHhqSpMZZBJLUOItAkhpnEUhS4ywCSWqcRSBJjbMIJKlxFoEkNe7/ALs5arW5iNykAAAAAElFTkSuQmCC\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies_rated['gross'].plot(kind='hist', bins=15, color='dodgerblue')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>title</th>\n",
|
|
" <th>year</th>\n",
|
|
" <th>content_rating</th>\n",
|
|
" <th>genre</th>\n",
|
|
" <th>duration</th>\n",
|
|
" <th>gross</th>\n",
|
|
" <th>Internet Movie Database</th>\n",
|
|
" <th>Rotten Tomatoes</th>\n",
|
|
" <th>Metacritic</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>17</th>\n",
|
|
" <td>One Flew Over the Cuckoo's Nest</td>\n",
|
|
" <td>1975</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Drama</td>\n",
|
|
" <td>133</td>\n",
|
|
" <td>665845272</td>\n",
|
|
" <td>8.7</td>\n",
|
|
" <td>9.4</td>\n",
|
|
" <td>8.0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>Schindler's List</td>\n",
|
|
" <td>1993</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Biography</td>\n",
|
|
" <td>195</td>\n",
|
|
" <td>534858444</td>\n",
|
|
" <td>8.9</td>\n",
|
|
" <td>9.7</td>\n",
|
|
" <td>9.3</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>13</th>\n",
|
|
" <td>Fight Club</td>\n",
|
|
" <td>1999</td>\n",
|
|
" <td>R</td>\n",
|
|
" <td>Drama</td>\n",
|
|
" <td>139</td>\n",
|
|
" <td>377845905</td>\n",
|
|
" <td>8.8</td>\n",
|
|
" <td>7.9</td>\n",
|
|
" <td>6.6</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" title year content_rating genre \\\n",
|
|
"17 One Flew Over the Cuckoo's Nest 1975 R Drama \n",
|
|
"5 Schindler's List 1993 R Biography \n",
|
|
"13 Fight Club 1999 R Drama \n",
|
|
"\n",
|
|
" duration gross Internet Movie Database Rotten Tomatoes Metacritic \n",
|
|
"17 133 665845272 8.7 9.4 8.0 \n",
|
|
"5 195 534858444 8.9 9.7 9.3 \n",
|
|
"13 139 377845905 8.8 7.9 6.6 "
|
|
]
|
|
},
|
|
"execution_count": 27,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# top 10 grossing films\n",
|
|
"movies_rated.sort_values(by='gross', ascending=False).head(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|