{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Pandas for EDA\n", "by [@josephofiowa](https://twitter.com/josephofiowa)\n", " \n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas Unit Lab\n", "\n", "**Woo!** We've made it to the end of our Pandas Unit. Let's put our skills to the test.\n", "\n", "We're going to explore data from some of the top movies according to IMDB. This is a guided question-and-response lab where some areas are specific asks and others are open ended for you to explore.\n", "\n", "In this lab, we will:\n", "- Leverage Pandas to conduct exploratory data analysis, including:\n", " - Assess data integrity\n", " - Create exploratory visualizations\n", " - Produce insights on top actors/actresses across films\n", " \n", "Let's get going!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Dataset\n", "\n", "We'll work with a dataset on the top [IMDB movies](https://www.imdb.com/search/title?count=100&groups=top_1000&sort=user_rating), as rated by IMDB.\n", "\n", "\n", "Specifically, we have a CSV that contains:\n", "- IMDB star rating\n", "- Movie title\n", "- Year\n", "- Content rating\n", "- Genre\n", "- Duration\n", "- Gross\n", "\n", "_[Details available at the above link]_\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import our necessary libraries" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read in the dataset\n", "\n", "First, read in the dataset, called `movies_rated.csv` into a DataFrame called \"movies.\" It's in the `../data` folder." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check the dataset basics\n", "\n", "Let's first explore our dataset to verify we have what we expect." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print the first five rows." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many rows and columns are in the datset?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What are the column names?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many unique genres are there?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many movies are there per genre?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check our datatypes. Do they make sense?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploratory data analysis\n", "\n", "Let's transition to asking and answering some questions with our data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What are the top five R-Rated movies?\n", "\n", "*hint: Boolean filters needed! Then sorting!*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the average Rotten Tomato score for these films?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the Five Number Summary like for these films as per IMDB? Is it skewed?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create your own question...then answer it!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Challenge:** Create a dataframe that is the ratio between Rotten Tomato rating vs IMDB rating. What film has the highest IMDB : Rotten Tomato ratio? The lowest?\n", "\n", "*[skip this if you are low on time]*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploratory data analysis with visualizations\n", "\n", "For each of these prompts, create a plot to visualize the answer. Consider what plot is *most appropriate* to explore the given prompt.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the relationship between IMDB ratings and Rotten Tomato ratings?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the relationship between IMDB rating and movie duration?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many movies are there in each genre category? (Remember to create a plot here)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What does the distribution of Rotten Tomatoes ratings look like?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bonus\n", "\n", "There are many things left unexplored! Consider investigating something about gross revenue and genres." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }