We found datasets on deaths from air pollution per country, population per country, and death rates per country.

Datasets

Deaths From Air Pollution Per Country Per Year (Data from Kaggle by PavanKalyan)

United States Population 1950-2021 (Data from Macrotrends)

China Population 1950-2021 (Data from Macrotrends)

India Population 1950-2021 (Data from Macrotrends)

Sweden Population 1950-2021 (Data from Macrotrends)

Liberia Population 1950-2021 (Data from Macrotrends)

United States Death Rate 1950-2021 (Data from Macrotrends)

China Death Rate 1950-2021 (Data from Macrotrends)

India Death Rate 1950-2021 (Data from Macrotrends)

Sweden Death Rate 1950-2021 (Data from Macrotrends)

Liberia Death Rate 1950-2021 (Data from Macrotrends)

Methodologies

We cleaned and merged data of deaths from air pollution per year, population per year, and death rate per year for each country from 2000 to 2017 into separate tables. For each country per year, we divided the number of deaths due to air pollution by the total population to calculate the pollution death rate. Then, we divided the pollution death rate by the total death rate per year to calculate the ratio of pollution deaths to total deaths for each country over time. Ultimately, we generated a line graph for the ratio versus year to visualize the trend of how many deaths were attributed to pollution deaths per year. Our ratio was as follows:

$$ \frac{\text{Pollution Death Rate}}{\text{Total Country Death Rate}} $$

Code

We primarily used the Pandas package on Deepnote for EDA. In the following code, we are plotting the pollution deaths to total deaths ratio versus year for the United States. We repeated the same process for China, India, Sweden, and Liberia.

us_poll_and_pop = pd.merge(pollution_deaths_us, us_population)
us_poll_and_death = pd.merge(us_poll_and_pop , us_death_rate, on=us_poll_and_pop['Year'])
us_poll_and_death = us_poll_and_death.rename(columns = {'key_0': 'Year'})
us_poll_and_death = us_poll_and_death.drop(columns = ['Code', 'Year_x', 'Year_y', ' Annual % Change_x', ' Annual % Change_y'])
us_pollution_death_rate = us_poll_and_death['Deaths from Air Pollution'] * 1000/ us_poll_and_death[' Population']
us_poll_and_death['Pollution Death Rate'] = us_pollution_death_rate
us_poll_and_death
us_pollution_vs_total = us_poll_and_death['Pollution Death Rate'] / (us_poll_and_death[' Deaths per 1000 People'] / 1000)
us_poll_and_death['Pollution Death VS Total Death'] = us_pollution_vs_total
us_poll_and_death
us_poll_and_death.plot.line(x = 'Year', y ='Pollution Death VS Total Death')