Create 2 histograms (aka bar plots) for the counts of the species: (1) Use the original data (from a) (2) Drop the rows with a NaN in the numeric features (from b).
Dataset: https://github.com/allisonhorst/palmerpenguins?tab=readme-ov-file
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/penguins.csv')
new_df = df.dropna(subset=['species'])
plt.subplot(1, 2, 1) # place first subplot on left
df['species'].value_counts().plot(kind='bar', color='green') # create bar plot for penguin species count
plt.xlabel('Species')
plt.xticks(rotation=0) # show x labels horizontally
plt.ylabel('Count')
plt.title('Original data')
plt.subplot(1, 2, 2) # place second subplot on right
new_df['species'].value_counts().plot(kind='bar', color='blue')
plt.xlabel('Species')
plt.xticks(rotation=0)
plt.ylabel('Count')
plt.title('Cleaned data')
plt.subplots_adjust(wspace=0.5)
plt.show()
For each numeric feature (i.e., bill length, bill depth, flipper length, body mass),
plot the boxplot of the distribution of the feature as a function of the species type.
In other words, for bill length, there should be one plot with 3 boxes (i.e., Adelie, Chinstrap, and Gentoo).