This project applies Formal Concept Analysis (FCA) to analyze family dynamics in France. It identifies patterns in family structures, single-parent households, childbearing decisions, and naturalization trends across five regional zones using census data.
- Project Overview
- Methodology
- Technologies Used
- Data Sources
- Installation and Execution
- Results Summary
- Limitations
- References
The study examines:
- Family structure and cohabitation patterns.
- Characteristics of single-parent households.
- Factors influencing childbearing decisions.
- Trends in naturalization through marriage.
By employing FCA, the project extracts meaningful patterns and generates visual representations (concept lattices) to uncover relationships within census data.
- Problem Definition: Analyze family-related variables to identify significant patterns.
- Data Preparation:
- Extracted a sample of 3 million rows per region from the national census dataset.
- Features include marital status, nationality, household size, and professional status.
- Technical Implementation:
- Utilized Python libraries:
FCApy,caspailleur, andpaspailleur. - Data mining algorithms were used to identify stable and interesting patterns.
- Utilized Python libraries:
- Results Analysis:
- Generated concept lattices to visualize relationships among variables.
- Measured pattern support and stability using Delta measures.
- Programming Language: Python
- Libraries:
- FCApy (Formal Concept Analysis)
- caspailleur (Pattern mining for binary data)
- paspailleur (Pattern mining for complex data)
- Hardware: NVIDIA A100 GPU (PNY Server)
- Data provided by The National Institute of Statistics and Economic Studies (INSEE).
- Dataset: "Housing, individuals, activity, educational and occupational mobility, residential migration in 2016."
- Regional zones analyzed:
- Zone A: Île-de-France
- Zone B: Centre-Val de Loire, Bourgogne-Franche-Comté, Normandie, Hauts-de-France
- Zone C: Grand Est, Pays de la Loire, Bretagne
- Zone D: Nouvelle-Aquitaine, Occitanie
- Zone E: Auvergne-Rhône-Alpes, Provence-Alpes-Côte d’Azur, Corse, Overseas regions
- Python 3.8+
- Required libraries installed:
pip install FCApy pip install caspailleur pip install paspailleur
- Clone the Repository:
git clone <repo-link> cd <repo-directory>
- Prepare the Dataset:
- Place the dataset in the
data/directory.
- Place the dataset in the
- Run the Analysis:
python main.py
- View Results:
- Outputs include pattern analysis, concept lattices, and stability measures.
- Family Structure: Predominantly nuclear families with 2-5 members; emerging trends in shared accommodations (Zone C).
- Single-Parent Households: Common across all zones, often with extended family support (Zones C, D, E).
- Childbearing Decisions: Influenced by marital status and employment stability; higher education may delay family formation.
- Naturalization Trends: Significant among Portuguese, Algerian, Moroccan, and Italian populations, driven by historical and geographical ties.
- Concept lattices for each zone (Appendix).
- Scatter plots showing pattern support and Delta measures.
- Resource Constraints: High memory requirements (100GB RAM) limited dataset size.
- Algorithmic Limitations: Extracted patterns were less precise due to implementation constraints.
- Dataset Scope: Excluded marital subcategories (e.g., divorced, widowed) that could provide deeper insights.
- INSEE: French National Census Data.
- Buzmakov et al. (2023), Data Complexity: An FCA-Based Approach.
- Uta Priss (2023), FCA Software Tools.
- Kuznetsov & Obiedkov (2002), Formal Concept Analysis Algorithms.
- GitHub Libraries: FCApy, caspailleur, paspailleur.