Predict Country happiness score by GDP per capita with Scikit-Learn in Python

Luka Onikadze
3 min readSep 25, 2023
Photo by Pixabay: https://www.pexels.com/photo/gray-scale-photo-of-gears-159298/

In order to predict the happiness score of any given country, We need data first. We can obtain the data from the kaggle.com the free platform to fetch datasets of any kind. In this case we need World Happiness Report.

Let’s load the data inside pandas dataframe and check how it looks like

import pandas as pd

df = pd.read_csv("gdp-rank-2019.csv") # I'm using 2019 version of csv

df.info()

Our target variable (Y) is “Score”, as more score the country has the happier it is. Also we can remove “Country or region” and “Overall rank” fields because “Country or region” is not correlated in anyway with “Score” calculation and “Overall rank” is just a label for “Score”

df_cleaned = df.drop(columns=["Overall rank","Country or region"])

df_cleaned.head()

Now let’s check what columns we can use as feature variables (X). The simplest way to do that is using DataFrame.corr method which will return the matrix about how columns are related to each other

df_cleaned.corr()

--

--