{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##  Pandas for EDA\n",
"by [@josephofiowa](https://twitter.com/josephofiowa)\n",
" \n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas Unit Lab\n",
"\n",
"**Woo!** We've made it to the end of our Pandas Unit. Let's put our skills to the test.\n",
"\n",
"We're going to explore data from some of the top movies according to IMDB. This is a guided question-and-response lab where some areas are specific asks and others are open ended for you to explore.\n",
"\n",
"# Pandas Unit Lab\n",
"\n",
"**Woo!** We've made it to the end of our Pandas Unit. Let's put our skills to the test.\n",
"\n",
"We're going to explore data from some of the top movies according to IMDB. This is a guided question-and-response lab where some areas are specific asks and others are open ended for you to explore.\n",
"\n",
"#### Important!!!\n",
"- **There are two ways to do this lab!**\n",
" - The first way is to read in a dataset that _has already been pulled from the API and cleaned for you_ (`movies_rated.csv`). This is the recommended 'first-pass' way to do this lab.\n",
" - _After you have completed the lab using the supplied_ `movies_rated.csv`, you can call the API yourself!\n",
" - Calling the API yourself takes time! Be prepared to parse lots of JSON, read docs, etc. Consider this a take-home exercise if the students desire.\n",
"\n",
"In this lab, we will:\n",
"- Use `movie_app.py` to obtain relevant moving rating data\n",
"- Leverage Pandas to conduct exploratory data analysis, including:\n",
" - Assess data integrity\n",
" - Create exploratory visualizations\n",
" - Produce insights on top actors/actresses across films\n",
" \n",
"Let's get going!\n",
"\n",
"In this lab, we will:\n",
"- Use `movie_app.py` to obtain relevant moving rating data\n",
"- Leverage Pandas to conduct exploratory data analysis, including:\n",
" - Assess data integrity\n",
" - Create exploratory visualizations\n",
" - Produce insights on top actors/actresses across films\n",
" \n",
"Let's get going!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Dataset\n",
"\n",
"We'll work with a dataset on the top [IMDB movies](https://www.imdb.com/search/title?count=100&groups=top_1000&sort=user_rating), as rated by IMDB.\n",
"\n",
"\n",
"Specifically, we have a CSV that contains:\n",
"- IMDB star rating\n",
"- Movie title\n",
"- Year\n",
"- Content rating\n",
"- Genre\n",
"- Duration\n",
"- Gross\n",
"\n",
"_[Details available at the above link]_\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import our necessary libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib as plt\n",
"import re\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read in the dataset\n",
"\n",
"First, read in the dataset, called `movies.csv` into a DataFrame called \"movies.\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"movies = pd.read_csv('../data/movies.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Check the dataset basics\n",
"\n",
"Let's first explore our dataset to verify we have what we expect."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Print the first five rows."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
title
\n",
"
year
\n",
"
content_rating
\n",
"
genre
\n",
"
duration
\n",
"
gross
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
The Shawshank Redemption
\n",
"
1994
\n",
"
R
\n",
"
Drama
\n",
"
142
\n",
"
1963330
\n",
"
\n",
"
\n",
"
1
\n",
"
The Godfather
\n",
"
1972
\n",
"
R
\n",
"
Crime
\n",
"
175
\n",
"
28341469
\n",
"
\n",
"
\n",
"
2
\n",
"
The Dark Knight
\n",
"
2008
\n",
"
PG-13
\n",
"
Action
\n",
"
152
\n",
"
1344258
\n",
"
\n",
"
\n",
"
3
\n",
"
The Godfather: Part II
\n",
"
1974
\n",
"
R
\n",
"
Crime
\n",
"
202
\n",
"
134966411
\n",
"
\n",
"
\n",
"
4
\n",
"
Pulp Fiction
\n",
"
1994
\n",
"
R
\n",
"
Crime
\n",
"
154
\n",
"
1935047
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" title year content_rating genre duration \\\n",
"0 The Shawshank Redemption 1994 R Drama 142 \n",
"1 The Godfather 1972 R Crime 175 \n",
"2 The Dark Knight 2008 PG-13 Action 152 \n",
"3 The Godfather: Part II 1974 R Crime 202 \n",
"4 Pulp Fiction 1994 R Crime 154 \n",
"\n",
" gross \n",
"0 1963330 \n",
"1 28341469 \n",
"2 1344258 \n",
"3 134966411 \n",
"4 1935047 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How many rows and columns are in the datset?"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(79, 6)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What are the column names?"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['title', 'year', 'content_rating', 'genre', 'duration', 'gross'], dtype='object')\n"
]
}
],
"source": [
"print(movies.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How many unique genres are there?"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"12"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies['genre'].nunique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How many movies are there per genre?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Crime 16\n",
"Drama 14\n",
"Action 11\n",
"Adventure 9\n",
"Drama 7\n",
"Biography 5\n",
"Animation 5\n",
"Comedy 4\n",
"Western 3\n",
"Mystery 2\n",
"Horror 2\n",
"Comedy 1\n",
"Name: genre, dtype: int64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies['genre'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Only run the below cells if you've obtained an [API key!](http://www.omdbapi.com/apikey.aspx) Otherwise, proceed to the `importing movies_rated.csv` section below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Obtain more data (with an API call)!\n",
"\n",
"- Let's take advantage of our `OmdbAPI` module (stored in `./OmdbAPI.py`, if you'd like to look under the hood) to obtain data from OMDB API on movie ratings. This will enable us to answer the question: **How do other publication's scores compare to IMDB ratings?** Specifically, where do Rotten Tomato critics most disagree with IMDB reviews? \n",
"- Using the OmdbAPI module, we will obtain the `Internet Movie Database`, the `Rotten Tomatoes`, and the `Metacritic` reviews on the top rated IMDB movies. We will store these ratings in new columns in a new `movies_rated` DataFrame. We have also stored the file locally at `./data/movies_rated.csv`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"import OmdbAPI"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# replace e54ad9e7 with your API key\n",
"# this may take a minute\n",
"movies_rated = OmdbAPI.Omdb(movies, 'e54ad9e7').get_ratings()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
title
\n",
"
year
\n",
"
content_rating
\n",
"
genre
\n",
"
duration
\n",
"
gross
\n",
"
Internet Movie Database
\n",
"
Rotten Tomatoes
\n",
"
Metacritic
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
The Shawshank Redemption
\n",
"
1994
\n",
"
R
\n",
"
Drama
\n",
"
142
\n",
"
1963330
\n",
"
9.3/10
\n",
"
91%
\n",
"
80/100
\n",
"
\n",
"
\n",
"
1
\n",
"
The Godfather
\n",
"
1972
\n",
"
R
\n",
"
Crime
\n",
"
175
\n",
"
28341469
\n",
"
9.2/10
\n",
"
98%
\n",
"
100/100
\n",
"
\n",
"
\n",
"
2
\n",
"
The Dark Knight
\n",
"
2008
\n",
"
PG-13
\n",
"
Action
\n",
"
152
\n",
"
1344258
\n",
"
9.0/10
\n",
"
94%
\n",
"
82/100
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" title year content_rating genre duration \\\n",
"0 The Shawshank Redemption 1994 R Drama 142 \n",
"1 The Godfather 1972 R Crime 175 \n",
"2 The Dark Knight 2008 PG-13 Action 152 \n",
"\n",
" gross Internet Movie Database Rotten Tomatoes Metacritic \n",
"0 1963330 9.3/10 91% 80/100 \n",
"1 28341469 9.2/10 98% 100/100 \n",
"2 1344258 9.0/10 94% 82/100 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies_rated.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Just in case there were movies that the API was unable to get, let's drop nulls."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"movies_rated.dropna(inplace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's get the ratings in the same float format using an apply function with some regular expressions. Note the use of .copy() when writing and reading from the same dataframe as a best practice."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
title
\n",
"
year
\n",
"
content_rating
\n",
"
genre
\n",
"
duration
\n",
"
gross
\n",
"
Internet Movie Database
\n",
"
Rotten Tomatoes
\n",
"
Metacritic
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
The Shawshank Redemption
\n",
"
1994
\n",
"
R
\n",
"
Drama
\n",
"
142
\n",
"
1963330
\n",
"
9.3
\n",
"
9.1
\n",
"
8.0
\n",
"
\n",
"
\n",
"
1
\n",
"
The Godfather
\n",
"
1972
\n",
"
R
\n",
"
Crime
\n",
"
175
\n",
"
28341469
\n",
"
9.2
\n",
"
9.8
\n",
"
10.0
\n",
"
\n",
"
\n",
"
2
\n",
"
The Dark Knight
\n",
"
2008
\n",
"
PG-13
\n",
"
Action
\n",
"
152
\n",
"
1344258
\n",
"
9.0
\n",
"
9.4
\n",
"
8.2
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" title year content_rating genre duration \\\n",
"0 The Shawshank Redemption 1994 R Drama 142 \n",
"1 The Godfather 1972 R Crime 175 \n",
"2 The Dark Knight 2008 PG-13 Action 152 \n",
"\n",
" gross Internet Movie Database Rotten Tomatoes Metacritic \n",
"0 1963330 9.3 9.1 8.0 \n",
"1 28341469 9.2 9.8 10.0 \n",
"2 1344258 9.0 9.4 8.2 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies_rated['Rotten Tomatoes'] = movies_rated['Rotten Tomatoes'].copy().apply(lambda x: float(re.match('\\d{1,}', x)[0])/10)\n",
"movies_rated['Internet Movie Database'] = movies_rated['Internet Movie Database'].copy().apply(lambda x: float(re.match('(\\S+)\\/', x)[1]))\n",
"movies_rated['Metacritic'] = movies_rated['Metacritic'].copy().apply(lambda x: float(re.match('(\\S+)\\/', x)[1])/10)\n",
"movies_rated.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, let's write the cleaned result to a local file so we don't have to call the API again and risk exceeding our daily limit."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"movies_rated.to_csv('./movies_rated.csv', index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Importing `movies_rated.csv`\n",
"\n",
"If you just called the API in the previous section, you can skip this and proceed to the `exploratory data analysis` section."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's read in the cleaned, rated `movies_rated.csv` file, which was included with this repo just in case you couldn't call the API."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
title
\n",
"
year
\n",
"
content_rating
\n",
"
genre
\n",
"
duration
\n",
"
gross
\n",
"
Internet Movie Database
\n",
"
Rotten Tomatoes
\n",
"
Metacritic
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
The Shawshank Redemption
\n",
"
1994
\n",
"
R
\n",
"
Drama
\n",
"
142
\n",
"
1963330
\n",
"
9.3
\n",
"
9.1
\n",
"
8.0
\n",
"
\n",
"
\n",
"
1
\n",
"
The Godfather
\n",
"
1972
\n",
"
R
\n",
"
Crime
\n",
"
175
\n",
"
28341469
\n",
"
9.2
\n",
"
9.8
\n",
"
10.0
\n",
"
\n",
"
\n",
"
2
\n",
"
The Dark Knight
\n",
"
2008
\n",
"
PG-13
\n",
"
Action
\n",
"
152
\n",
"
1344258
\n",
"
9.0
\n",
"
9.4
\n",
"
8.2
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" title year content_rating genre duration \\\n",
"0 The Shawshank Redemption 1994 R Drama 142 \n",
"1 The Godfather 1972 R Crime 175 \n",
"2 The Dark Knight 2008 PG-13 Action 152 \n",
"\n",
" gross Internet Movie Database Rotten Tomatoes Metacritic \n",
"0 1963330 9.3 9.1 8.0 \n",
"1 28341469 9.2 9.8 10.0 \n",
"2 1344258 9.0 9.4 8.2 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies_rated = pd.read_csv('../data/movies_rated.csv')\n",
"movies_rated.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check our datatypes. Notice anything potentially problematic?"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"title object\n",
"year int64\n",
"content_rating object\n",
"genre object\n",
"duration int64\n",
"gross int64\n",
"Internet Movie Database float64\n",
"Rotten Tomatoes float64\n",
"Metacritic float64\n",
"dtype: object"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies_rated.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploratory data analysis\n",
"\n",
"Let's transition to asking and answering some questions with our data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What are the top five R-Rated movies?\n",
"\n",
"*hint: Boolean filters needed! Then sorting!*"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
title
\n",
"
year
\n",
"
content_rating
\n",
"
genre
\n",
"
duration
\n",
"
gross
\n",
"
Internet Movie Database
\n",
"
Rotten Tomatoes
\n",
"
Metacritic
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
The Shawshank Redemption
\n",
"
1994
\n",
"
R
\n",
"
Drama
\n",
"
142
\n",
"
1963330
\n",
"
9.3
\n",
"
9.1
\n",
"
8.0
\n",
"
\n",
"
\n",
"
1
\n",
"
The Godfather
\n",
"
1972
\n",
"
R
\n",
"
Crime
\n",
"
175
\n",
"
28341469
\n",
"
9.2
\n",
"
9.8
\n",
"
10.0
\n",
"
\n",
"
\n",
"
3
\n",
"
The Godfather: Part II
\n",
"
1974
\n",
"
R
\n",
"
Crime
\n",
"
202
\n",
"
134966411
\n",
"
9.0
\n",
"
9.7
\n",
"
9.0
\n",
"
\n",
"
\n",
"
5
\n",
"
Schindler's List
\n",
"
1993
\n",
"
R
\n",
"
Biography
\n",
"
195
\n",
"
534858444
\n",
"
8.9
\n",
"
9.7
\n",
"
9.3
\n",
"
\n",
"
\n",
"
7
\n",
"
The Good, the Bad and the Ugly
\n",
"
1966
\n",
"
R
\n",
"
Western
\n",
"
178
\n",
"
57300000
\n",
"
8.9
\n",
"
9.7
\n",
"
9.0
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" title year content_rating genre \\\n",
"0 The Shawshank Redemption 1994 R Drama \n",
"1 The Godfather 1972 R Crime \n",
"3 The Godfather: Part II 1974 R Crime \n",
"5 Schindler's List 1993 R Biography \n",
"7 The Good, the Bad and the Ugly 1966 R Western \n",
"\n",
" duration gross Internet Movie Database Rotten Tomatoes Metacritic \n",
"0 142 1963330 9.3 9.1 8.0 \n",
"1 175 28341469 9.2 9.8 10.0 \n",
"3 202 134966411 9.0 9.7 9.0 \n",
"5 195 534858444 8.9 9.7 9.3 \n",
"7 178 57300000 8.9 9.7 9.0 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies_rated[movies_rated.content_rating == 'R'].sort_values(by='Internet Movie Database', ascending=False).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What is the average Rotten Tomato score for the top IMDB films?"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"9.087341772151897"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies_rated['Rotten Tomatoes'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What is the Five Number Summary like for top rated films as per IMDB? Is it skewed?"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 79.000000\n",
"mean 8.537975\n",
"std 0.222056\n",
"min 8.300000\n",
"25% 8.400000\n",
"50% 8.500000\n",
"75% 8.600000\n",
"max 9.300000\n",
"Name: Internet Movie Database, dtype: float64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies_rated['Internet Movie Database'].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The average is *slightly* higher than the median, so there's a small positive skew."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create your own question...then answer it!"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
year
\n",
"
duration
\n",
"
gross
\n",
"
Internet Movie Database
\n",
"
Rotten Tomatoes
\n",
"
Metacritic
\n",
"
\n",
" \n",
" \n",
"
\n",
"
year
\n",
"
1.000000
\n",
"
0.145930
\n",
"
-0.107644
\n",
"
-0.044124
\n",
"
-0.479430
\n",
"
-0.487070
\n",
"
\n",
"
\n",
"
duration
\n",
"
0.145930
\n",
"
1.000000
\n",
"
0.098006
\n",
"
0.416829
\n",
"
-0.088653
\n",
"
-0.020531
\n",
"
\n",
"
\n",
"
gross
\n",
"
-0.107644
\n",
"
0.098006
\n",
"
1.000000
\n",
"
0.146099
\n",
"
-0.019891
\n",
"
-0.038350
\n",
"
\n",
"
\n",
"
Internet Movie Database
\n",
"
-0.044124
\n",
"
0.416829
\n",
"
0.146099
\n",
"
1.000000
\n",
"
0.062015
\n",
"
0.261009
\n",
"
\n",
"
\n",
"
Rotten Tomatoes
\n",
"
-0.479430
\n",
"
-0.088653
\n",
"
-0.019891
\n",
"
0.062015
\n",
"
1.000000
\n",
"
0.765957
\n",
"
\n",
"
\n",
"
Metacritic
\n",
"
-0.487070
\n",
"
-0.020531
\n",
"
-0.038350
\n",
"
0.261009
\n",
"
0.765957
\n",
"
1.000000
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" year duration gross \\\n",
"year 1.000000 0.145930 -0.107644 \n",
"duration 0.145930 1.000000 0.098006 \n",
"gross -0.107644 0.098006 1.000000 \n",
"Internet Movie Database -0.044124 0.416829 0.146099 \n",
"Rotten Tomatoes -0.479430 -0.088653 -0.019891 \n",
"Metacritic -0.487070 -0.020531 -0.038350 \n",
"\n",
" Internet Movie Database Rotten Tomatoes Metacritic \n",
"year -0.044124 -0.479430 -0.487070 \n",
"duration 0.416829 -0.088653 -0.020531 \n",
"gross 0.146099 -0.019891 -0.038350 \n",
"Internet Movie Database 1.000000 0.062015 0.261009 \n",
"Rotten Tomatoes 0.062015 1.000000 0.765957 \n",
"Metacritic 0.261009 0.765957 1.000000 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# correlation between star rating and Rotten Tomato rating?\n",
"movies_rated.corr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Challenge:** Create a dataframe that is the ratio between Rotten Tomato rating vs IMDB rating. What film has the highest IMDB : Rotten Tomato ratio? The lowest?\n",
"\n",
"*[skip this if you are low on time]*"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"