Bar Plot with Subplots

Create 2 histograms (aka bar plots) for the counts of the species: (1) Use the original data (from a) (2) Drop the rows with a NaN in the numeric features (from b).

Dataset: https://github.com/allisonhorst/palmerpenguins?tab=readme-ov-file

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data/penguins.csv')
new_df = df.dropna(subset=['species'])

plt.subplot(1, 2, 1) # place first subplot on left
df['species'].value_counts().plot(kind='bar', color='green') # create bar plot for penguin species count
plt.xlabel('Species')
plt.xticks(rotation=0) # show x labels horizontally
plt.ylabel('Count')
plt.title('Original data')

plt.subplot(1, 2, 2) # place second subplot on right
new_df['species'].value_counts().plot(kind='bar', color='blue')
plt.xlabel('Species')
plt.xticks(rotation=0)
plt.ylabel('Count')
plt.title('Cleaned data')

plt.subplots_adjust(wspace=0.5)
plt.show()

Penguin_Count_Comparison.png


Box Plots

For each numeric feature (i.e., bill length, bill depth, flipper length, body mass),

plot the boxplot of the distribution of the feature as a function of the species type.

In other words, for bill length, there should be one plot with 3 boxes (i.e., Adelie, Chinstrap, and Gentoo).