
After this lesson, you will be able to:
We actually will commence this lesson directly in the Jupyter Notebook, pandas-ii.ipynb, to walkthrough the what, why, and how all at once.
Nonetheless, below, we have included slides reviewing the key concepts.
To handle missing data, we must:
Pro tip: The faster you understand why some observations are missing, the faster and more accurately you can handle them.
# identify
df.isnull().sum()
# drop (if necessary)
df.dropna(inplace = True) #careful!
# fill in (if necessary) - replace value with desired means of filling
df.fillna(value=column.mean(), inplace=True)Groupby allows us to conduct analysis on a specific subset.
Groupby follows a “split, apply, combine” methodology:

Determine what attribute to groubpy in a cohort, and how to aggregate those values within that cohort.
e.g. If we have 300 lemonade stands, do we want to know the average amount of lemonade sold across all stands, or identify which lemonade stand sold the most?
# replace column with the column of interest!
df.groupby('column').agg['count', 'mean', 'max', 'min']def dollars_to_float(value):
# try to convert the inputted value to a float
try:
return float(value.strip('$'))
# in the case of the value being a null value, we simply return a null
except:
return np.nan
df['sale_clean'] = df['sale'].apply(dollars_to_float)