| Student's name | SCIPER |
|---|---|
| Lin Xiaoya | 423134 |
| Wu Yiqian | 423147 |
| Liu Tingsen | 422014 |
Milestone 1 • Milestone 2 • Milestone 3
10% of the final grade
This is a preliminary milestone to let you set up goals for your final project and assess the feasibility of your ideas. Please, fill the following sections about your project.
(max. 2000 characters per section)
For our project, we combined three publicly available international datasets from the World Health Organization (WHO) and the World Bank.
The sources of these data are reliable and authoritative. However, due to coming from different organizations/institutions, we need to do some data cleaning and integration (such as integrating two datasets by year and country to obtain a more correlated integrated dataset)
Datasets:
- Life Expectancy at Birth (WHO): https://www.who.int/data/gho/data/indicators/indicator-details/GHO/life-expectancy-at-birth-(years)
- GDP per Capita (World Bank): https://data.worldbank.org/indicator/NY.GDP.PCAP.CD
- NCD Mortality Rate (World Bank): https://data.worldbank.org/indicator/SH.DYN.NCOM.ZS
Life expectancy is one of the most widely used indicators of a country’s overall well-being. It reflects not only healthcare quality but also economic conditions, education, public policy, and social inequality. At the same time, economic development, often measured through GDP per capita, is commonly assumed to improve living standards and health outcomes. However, the strength and nature of this relationship is not always straightforward.
Our project aims to explore the following central questions:
- How strongly is GDP per capita associated with life expectancy across countries?
- What are the differences in life expectancy between men and women globally?
- How is GDP related to mortality from non-communicable diseases (NCDs), and does higher income necessarily imply lower NCD mortality?
By visualising these relationships, we aim to better understand the interplay between economic development and public health.
This project is relevant to students of economics, public health, and global development, as well as anyone interested in understanding global inequality. By presenting interactive visualisations and statistical summaries, we provide a clear and accessible overview of how wealth, gender, and disease burden relate to longevity.
All three datasets were loaded into pandas DataFrames. Since they originate from different sources, preprocessing was necessary before merging:
- Year variables were converted to consistent integer formats.
- Country names were standardised to ensure correct joins.
- Rows missing essential values (GDP per capita, life expectancy, or NCD mortality rate) were removed.
- GDP per capita was log-transformed to better capture non-linear relationships and reduce skewness.
The datasets were then merged using inner joins on country and year, ensuring that only observations present in all three datasets were retained. The resulting dataset spans from 2000 to 2021, with 12,060 total records.
All these works can be found in the Jupyter Notebook EDA.ipynb.
Key findings:
- On average across the dataset, women live 4.84 years longer than men.
- GDP and Life Expectancy maintain a strong logarithmic correlation
- Higher GDP tends to be associated with lower NCD mortality. However, substantial variance remains even among high-income countries.
While giants like Gapminder and the IHME’s GBD Compare offer comprehensive data on health and wealth, they function more like digital encyclopedias than narrative tools. The connection between wealth and preventable death is a story hidden in plain sight. But for most people, uncovering that story requires a tedious trek across platforms that treat human lives like static rows of data. The data is "there," but it isn't always "alive."
Our project takes the high-quality data provided by the World Bank Open Data. We’ve stripped away the academic density of the World Bank's archives to investigate a singular mystery: The Wealth Paradox. Why do some nations with high GDPs see their citizens die years earlier than those in countries with far fewer resources? By focusing on the 'exceptions', nations like The Bahamas, we look past the spreadsheets to uncover the cultural habits, dietary shifts, and hidden inequalities that determine who actually gets to grow old.
Visually, we were inspired by the clean, interactive aesthetics of The Pudding. By bringing GDP, NCD mortality, and gendered longevity into one animated interface, we transform complex public health statistics into an interactive journey.
(Note: The datasets utilized in this project have not been explored by our team in any previous ML, ADA, or semester projects).
10% of the final grade
Our comprehensive Milestone 2 report contains our detailed project goals, visualization sketches, technical tool mapping to the COM-480 syllabus, and our implementation roadmap.
The initial website skeleton and functional prototype are now live. This version demonstrates our paginated narrative structure and the layout for our upcoming D3.js visualizations.
For this milestone, we have focused on building a robust foundation for our data story:
- Web Skeleton: We developed a navigation system using HTML, CSS, and JavaScript. The site supports vertical transitions between major topics and horizontal navigation for detailed rankings.
- Narrative Flow: The investigative journey is fully drafted, moving from global demographic trends (The Gender Divide) to specific case studies (The Wealth Paradox).
- Visualization Containers: We have implemented responsive SVG containers for our D3.js widgets.
- D3.js Preparation: Our unified dataset from the WHO and World Bank has been pre-processed and is ready for the implementation of the Butterfly Chart, Racing Bar Chart, and the normalized Radar Chart.
- Deliver a fully navigable website with structured data storytelling.
- Functional interactive World Map with a manual year timeline slider.
- Normalized Radar Chart for individual country health profiles.
- Audio Sonification: Heartbeat sound effects that scale with data trends.
- Personalized Marker: User-driven data input for statistical comparison.
80% of the final grade
- < 24h: 80% of the grade for the milestone
- < 48h: 70% of the grade for the milestone