Valentines Titanic EDA

As today is valentines and the Titanic movie will be watched thousands of times by couples around the world… I decided to do some Exploratory Data Analysis (EDA) for you, so that you can throw around fun facts, while watching the thing. Have a great day, and happy valentines!

Setting up the project

First of all, we need to do some imports:

import matplotlib.pyplot as plt
import seaborn as sns  # really great module for graphs
import pandas as pd

Then we get the dataset and setup sns. As you can see, I already selected the index_col and changed some data to categorical for faster loading.

# Shortened link, the data is taken from github datasciencedojo/datasets
csv_url = 'https://bit.ly/2U487zA'
df = pd.read_csv(csv_url,
  index_col='PassengerId',
  dtype={'Sex':'category', 'Pclass':'category', 'Survived':'bool'},
)

sns.set(color_codes=True)

Ok, now we are ready to explore the Titanic data!

Exploring the data

1. What percent survived the catastrophe, depending on sex and cabin class?

Percent of people that survived depending on sex and cabin class

From the above picture we can see that:

2. Distribution of passenger status depending on their age and cabin class.

Distribution of passengers depending on age and class

From the graph we can see a few things. First of all, the third class had the most passengers in it. Moreover, most of the passengers there, were so called “young adults”, which didn’t end too good for them since the class had the highest mortility rate. Most children were in this class as well, as contrary to the first class, where only a handful of children were present.

3. Survival distribution with age.

Survived distribution by age

The survival rate wasn’t really dependant on the age, as the average is the same in both cases. The whiskers aren’t that much different as well… So no matter how old you were, you could end up surviving… or the opposite.

4. Did the of the ticket matter?

Price of the ticket survival

Generally, the higher the ticket price, the higher changes you had to survive, although it wasn’t a rule. It is worth noticing that both two highest prices survived along with the oldest member on the ship. The chances were increased, also if you were a child.

5. Corelations!

Corelations

6. Kernel Density Estimation (KDE)

KDE for survivability of passengers: KDE

KDE for cabin class of passengers: KDE

KDE for sex of passengers: KDE

7. What was the average ticket price for each class?

Average ticket price

Happy valentines!

Hope the EDA I have shown you will be useful for you or at least you will go “ooh” when thinking about the Titanic catastrophe. Have fun tonight all of you!

Titanic sailing off