Model Comparison & Conclusion

End-to-End Machine Learning: Titanic Survival Prediction

2 min read

Published Nov 18 2025

KerasMachine LearningMatplotlibNumPyPandasPythonscikit-learnSciPySeabornTensorFlow

Final Model Metrics

We print metrics side-by-side:

print("=== Model Comparison ===")

print("Logistic Regression - Acc: %.3f, AUC: %.3f" % (accuracy_score(y_test, pred_lr), roc_auc_score(y_test, prob_lr)))

print("Random Forest - Acc: %.3f, AUC: %.3f" % (accuracy_score(y_test, pred_rf), roc_auc_score(y_test, prob_rf)))

print("Keras Neural Network - Acc: %.3f, AUC: %.3f" % (accuracy_score(y_test, pred_keras), roc_auc_score(y_test, prob_keras)))

Output:

=== Model Comparison ===

Logistic Regression - Acc: 0.832, AUC: 0.870

Random Forest - Acc: 0.821, AUC: 0.837

Keras Neural Network - Acc: 0.810, AUC: 0.859

ROC Curves combined

plt.figure(figsize=(7, 5))

# Logistic Regression

fpr_lr, tpr_lr, _ = roc_curve(y_test, prob_lr)

plt.plot(fpr_lr, tpr_lr, label="Logistic Regression")

# Random Forest

fpr_rf, tpr_rf, _ = roc_curve(y_test, prob_rf)

plt.plot(fpr_rf, tpr_rf, label="Random Forest")

# Keras

fpr_k, tpr_k, _ = roc_curve(y_test, prob_keras)

plt.plot(fpr_k, tpr_k, label="Keras Neural Network")

plt.plot([0, 1], [0, 1], "k--", label="Chance")

plt.xlabel("False Positive Rate")

plt.ylabel("True Positive Rate")

plt.title("ROC Curve Comparison")

plt.legend()

plt.show()

What This Means (Interpretation)

I put the results in to ChatGPT, this is the response:

Logistic Regression performed the best overall.

This is very common on small, structured, tabular datasets like Titanic.
Because:

Relationships are mostly linear/logistic
Feature interactions are simple
Categorical variables carry strong signal (sex/class)
Data is low dimensional
The dataset is small (≈700 usable rows after cleaning)

Logistic regression shines in exactly these conditions.

Random Forest performed slightly worse than logistic regression.

This also makes sense:

Titanic has limited signal; throwing large trees at it doesn’t uncover new structure.
RF still performs well, showing it captures some nonlinearities, but not enough to beat logistic regression.

Your results show:

RF accuracy is slightly lower
RF ROC-AUC is appreciably lower - Meaning it’s less good at ranking survivors vs non-survivors.

The Keras neural network performed better than RF in AUC but worse than logistic regression.

Again: completely normal.

Reasons neural networks do not dominate here:

Neural nets need lots of data (thousands+, ideally tens of thousands).
Titanic has under 1k usable rows after cleaning.
Most signal is “easy” (sex & class), so a small NN doesn’t extract more than logistic regression already does.
Neural nets struggle when:
- input dimensionality is low
- dataset is small
- features are tabular (not images or text)

Still, your NN’s ROC-AUC (0.859) is very respectable, meaning it ranks individuals reasonably well despite less accuracy.

Final conclusions

Hypotheses confirmed statistically: women, children, and first-class passengers had significantly higher survival rates.
EDA visualisations clearly support these patterns.
Statistical tests (chi-square, t-test) confirm survival was not random and was influenced by demographic variables.
Scikit-learn models give strong performance out-of-the-box with Logistic Regression the best performer. Random Forest captures complex patterns effectively.
Keras model performs well but does not drastically outperform Random Forest — typical for small tabular datasets.