Eesha Fatima (31647) · Fatima Kaleem (31620) · Uroos Fatima (31094) Institute of Business Administration, Karachi Spring 2026 — Introduction to Artificial Intelligence — Dr. Syed Ali Raza
This project builds a multi-method AI pipeline to classify stellar light curves from the NASA Kepler Cumulative KOI (Kepler Objects of Interest) dataset. The goal is to identify which unconfirmed candidate signals are most likely to be genuine exoplanet transits, using:
- Decision Trees
- Naive Bayes
- K-Means Clustering
- Bayesian Probabilistic Reasoning
All methods are implemented from scratch using NumPy.
Exoplanet-Transit-Detection-Using-NASA-Kepler-Data/
│
├── cumulative_2026.04.12_06.34.10.csv ← Raw NASA dataset
├── koi_clean.csv ← Cleaned dataset (Phase 2 output)
│
├── cleaningdata.py ← Phase 2: data cleaning
├── preprocessing.py ← Phase 3: preprocessing + SMOTE
├── decision_tree.py ← Phase 4: Decision Tree
├── naive_bayes.py ← Phase 5: Gaussian Naive Bayes
├── kmeans.py ← Phase 6: K-Means Clustering
├── bayesian_reasoning.py ← Phase 7: Bayesian probabilistic reasoning
├── cnn_baseline.py ← Phase 8: CNN baseline
├── candidate_ranking.py ← Phase 9: Final candidate ranking
├── gui.py ← Phase 10: Interactive GUI
│
├── requirements.txt
├── .gitignore
├── LICENSE
│
├── X_train.npy ← Balanced training features (post-SMOTE)
├── y_train.npy ← Balanced training labels
├── X_val.npy ← Validation features
├── y_val.npy ← Validation labels
├── X_test.npy ← Test features
├── y_test.npy ← Test labels
├── X_candidates.npy ← 1,979 candidate features (inference)
│
├── bayesian_candidate_scores.npy ← Phase 7 output
├── bayesian_candidate_labels.npy ← Phase 7 output
├── cnn_candidate_scores.npy ← Phase 8 output
├── cnn_candidate_labels.npy ← Phase 8 output
├── final_candidate_scores.npy ← Phase 9 output
├── final_candidate_labels.npy ← Phase 9 output
├── final_candidate_tiers.npy ← Phase 9 output
├── final_candidate_ranking.npy ← Phase 9 output
├── final_agreement_counts.npy ← Phase 9 output
├── final_candidate_summary.txt ← Phase 9 output
│
└── README.md
| Metric | Score |
|---|---|
| Validation Accuracy | 95.61% |
| Test Accuracy | 94.55% |
| AUC-ROC | 0.9437 |
| Precision | 91% |
| Recall | 94% |
Key Finding: The model correctly identifies 94% of confirmed planets (Recall) while maintaining 91% precision.
Dataset: NASA Kepler Cumulative KOI Table (cumulative_2026.04.12_06.34.10.csv)
Source: NASA Exoplanet Archive
The raw dataset contains approximately 9,564 KOI entries, each representing a stellar signal flagged by the Kepler pipeline. Each entry carries over 40 engineered photometric and orbital features including transit depth, orbital period, transit duration, stellar radius, SNR, and centroid offset metrics.
Class Distribution:
| Label | Count | Role |
|---|---|---|
| FALSE POSITIVE | 4,839 | Training data (negative class) |
| CONFIRMED | 2,746 | Training data (positive class) |
| CANDIDATE | 1,979 | Inference targets (unknowns) |
Key Insight: The 1,979 CANDIDATE entries have never been confirmed or ruled out — primarily because the Kepler mission ended in 2018, ground-based telescope time is limited, and many candidates have weak signals or orbit faint stars. These are the primary scientific target of this pipeline.
- Loaded raw CSV using
pd.read_csv(..., comment='#')to skip header comment rows - Inspected missing values — identified columns with >50% null rates
- Dropped high-missingness columns (>50% null threshold)
- Dropped non-feature columns (identifiers, provenance fields, administrative metadata):
rowid, kepid, kepoi_name, koi_vet_stat, koi_vet_date,
koi_pdisposition, koi_disp_prov, koi_comment, koi_fittype,
koi_limbdark_mod, koi_parm_prov, koi_tce_delivname,
koi_quarters, koi_trans_mod, koi_datalink_dvr,
koi_datalink_dvs, koi_sparprov, koi_eccen
- Saved cleaned dataset as
koi_clean.csv
Output shape: 9,564 rows × 103 columns (102 features + 1 target)
False positive flags retained as features:
| Column | Meaning |
|---|---|
koi_fpflag_nt |
Not transit-like shape |
koi_fpflag_ss |
Secondary eclipse (eclipsing binary) |
koi_fpflag_co |
Centroid offset (background contamination) |
koi_fpflag_ec |
Ephemeris match to known false positive |
CANDIDATE rows were separated before any fitting to prevent data leakage.
Training pool → 7,585 rows (CONFIRMED + FALSE POSITIVE)
Candidates → 1,979 rows (held out entirely)
| Step | Method |
|---|---|
| Missing values | Median imputation (fit on train only) |
| Label encoding | CONFIRMED → 0, FALSE POSITIVE → 1 |
| Feature scaling | Z-score standardization (fit on train only) |
| Train/Val/Test split | Stratified 70% / 15% / 15% |
| Class rebalancing | SMOTE oversampling |
After SMOTE:
CONFIRMED = 3,387 | FALSE POSITIVE = 3,387
Saved splits:
X_train.npy y_train.npy
X_val.npy y_val.npy
X_test.npy y_test.npy
X_candidates.npy
The classifier is built from scratch using NumPy.
Measures the disorder/impurity of a dataset.
Entropy = 0 when all labels are the same; entropy = 1 at a 50/50 split.
Measures the reduction in entropy after splitting on a feature threshold.
Iterates through every feature and every unique threshold value to find the split that maximizes Information Gain.
Recursively grows the tree. Stops when:
- A node is pure (only one class remains), or
max_depth = 10is reached (to prevent overfitting)
Traverses the finished tree for new data until it reaches a leaf node (0 = Confirmed, 1 = False Positive).
Each Node object contains:
- Feature / Threshold — the question the node asks (e.g., Is transit depth < 0.05?)
- Left / Right — pointers to child branches
- Value — only present in leaf nodes; the final classification
Initially, the model achieved 99.03% accuracy — but analysis of decision paths revealed it was primarily using NASA-derived flags:
koi_fpflag_nt, koi_fpflag_ss, koi_fpflag_co, koi_fpflag_ec, koi_score
These flags are assigned after scientists already know the classification. A model using them isn't predicting planets — it's memorising NASA's notes.
Solution: These columns were removed, forcing the model to learn from raw physical observations only:
| Feature | Description |
|---|---|
| Transit Depth | How much light the planet blocks |
| Orbital Period | How long it takes to orbit the star |
| Stellar Radius | The size of the parent star |
This dropped accuracy to 94.55%, but produced a far more robust and scientifically honest model capable of generalising to new stars where these flags don't yet exist.
| Metric | Value |
|---|---|
| Accuracy | 94.55% |
| Precision | 0.91 |
| Recall | 0.94 |
| Total errors | 62 / 1,138 |
| Predicted Confirmed | Predicted False Positive | |
|---|---|---|
| Actual Confirmed | 386 ✅ | 36 ❌ |
| Actual False Positive | 26 ❌ | 690 ✅ |
Recall is the priority metric — a missed genuine planet (false negative) is more scientifically costly than a false alarm.
| Prediction | Count |
|---|---|
| 🟢 CONFIRMED (likely planet) | 743 |
| 🔴 FALSE POSITIVE | 1,236 |
This provides a prioritised list of 743 high-probability candidates for astronomers to focus follow-up observations on.
Gaussian Naive Bayes built from scratch using NumPy. Assumes each feature follows a Gaussian (normal) distribution per class. Uses log-probabilities throughout for numerical stability. An epsilon (1e-9) is added to variance and PDF values to prevent division by zero.
| Metric | Value |
|---|---|
| Test Accuracy | 85.50% |
| Precision | 71.40% |
| Recall | 100% |
| F1-Score | 83.32% |
Confusion Matrix (test set):
| Pred Planet | Pred FP | |
|---|---|---|
| Actual Planet | 412 ✅ | 0 ❌ |
| Actual FP | 165 ❌ | 561 ✅ |
Naive Bayes achieves perfect recall — it never misses a real planet — at the cost of lower precision. This makes it a valuable evidence source for the Bayesian combiner in Phase 7.
K-Means clustering built from scratch using NumPy. Uses the Elbow Method (K = 1 to 6) to identify optimal K. Operates fully unsupervised — groups stars by physical similarity without using NASA labels. The planet-rich cluster is identified post-hoc by checking cluster composition against training labels.
Result: The planet cluster contained ~54% confirmed planets, confirming that physical features carry discriminative signal even without supervision. K-Means is intentionally a weaker evidence source (FPR of 0.851 on the validation set) and its contribution to the final ensemble is appropriately downweighted in Phase 9.
A Bayesian combiner that sequentially updates a probability estimate using evidence from all three from-scratch classifiers (Naive Bayes, K-Means, Decision Tree).
How it works:
- Start with the prior: P(planet) = 0.50 (from SMOTE-balanced training set)
- For each classifier, apply Bayes' theorem:
P(planet | evidence) = P(evidence | planet) × P(planet) / P(evidence) - The output of each update becomes the prior for the next classifier
- Final score = P(planet | all three classifiers)
Likelihoods (recall and FPR) are measured on the validation set to produce honest, non-overfit estimates.
| Classifier | Recall | FPR | Notes |
|---|---|---|---|
| Naive Bayes | 0.993 | 0.220 | Strong recall, moderate FPR |
| K-Means | 0.990 | 0.851 | Weak separator — unsupervised limitation |
| Decision Tree | 0.985 | 0.004 | Dominant evidence source |
| Metric | Value |
|---|---|
| Test Accuracy | 99.12% |
| Precision | 99.26% |
| Recall | 98.30% |
| F1-Score | 98.78% |
Confusion Matrix (test set, 1,138 samples):
| Pred Planet | Pred FP | |
|---|---|---|
| Actual Planet | 405 ✅ | 7 ❌ |
| Actual FP | 3 ❌ | 723 ✅ |
Candidate Predictions: 1,192 likely planets, 787 false positives
Of the 1,192 planet predictions, 1,105 had all three classifiers in unanimous agreement (score = 0.9992).
All 8 vote combinations verified (sanity check passed):
| NB | KM | DT | Candidates | Score | Label |
|---|---|---|---|---|---|
| planet | planet | planet | 1,105 | 0.9992 | CONFIRMED |
| planet | FP | planet | 54 | 0.9859 | CONFIRMED |
| FP | planet | planet | 33 | 0.7215 | CONFIRMED |
| planet | planet | FP | 284 | 0.0712 | FALSE POSITIVE |
| FP | FP | planet | 42 | 0.1269 | FALSE POSITIVE |
| planet | FP | FP | 38 | 0.0043 | FALSE POSITIVE |
| FP | planet | FP | 64 | 0.0002 | FALSE POSITIVE |
| FP | FP | FP | 359 | 0.0000 | FALSE POSITIVE |
A 1D Convolutional Neural Network built with TensorFlow/Keras as an industry-standard baseline. Treats the 102 tabular features as a 1D sequence and applies two convolutional layers to extract local feature patterns, followed by dense layers for classification.
Architecture:
Conv1D(32 filters, kernel=3, ReLU) → MaxPooling1D(2)
Conv1D(64 filters, kernel=3, ReLU) → MaxPooling1D(2)
Flatten → Dense(64, ReLU) → Dropout(0.3) → Dense(1, sigmoid)
Trained for 20 epochs, batch size 32, Adam optimizer, binary cross-entropy loss.
| Metric | Value |
|---|---|
| Test Accuracy | 99.30% |
| Precision | 99.27% |
| Recall | 98.79% |
| F1-Score | 99.03% |
Confusion Matrix (test set, 1,138 samples):
| Pred Planet | Pred FP | |
|---|---|---|
| Actual Planet | 407 ✅ | 5 ❌ |
| Actual FP | 3 ❌ | 723 ✅ |
Candidate Predictions: 1,164 likely planets, 815 false positives
Key finding: The CNN (99.30%) outperforms the hand-coded Bayesian ensemble (99.12%) by only 0.18 percentage points, demonstrating that our from-scratch NumPy implementations are highly competitive with state-of-the-art deep learning.
Combines outputs from all five classifiers into a single final ensemble score for each of the 1,979 unresolved candidates. Uses a weighted average reflecting each method's validated test performance.
Ensemble weights:
| Classifier | Weight | Rationale |
|---|---|---|
| CNN | 40% | Highest test accuracy (99.30%) |
| Bayesian Reasoning | 40% | Near-equal accuracy (99.12%), fully interpretable |
| Decision Tree | 12% | Solid performance (94.55%), from scratch |
| Naive Bayes | 5% | Lower precision but strong recall signal |
| K-Means | 3% | Weakest separator (unsupervised, high FPR) |
Confidence tiers:
| Tier | Threshold | Count |
|---|---|---|
| HIGH | score >= 0.80 | 1,112 |
| MEDIUM | 0.50 <= score < 0.80 | 90 |
| LOW / False Positive | score < 0.50 | 777 |
Classifier agreement across all 1,979 candidates:
| Classifiers agreeing | Candidates |
|---|---|
| 5 / 5 (unanimous) | 1,066 |
| 4 / 5 | 88 |
| 3 / 5 | 70 |
| 2 / 5 | 267 |
| 1 / 5 | 129 |
| 0 / 5 | 359 |
Top candidates (score = 0.9997, all 5 classifiers unanimous): indices 2, 9, 40, 23, 1517, 45, 53, 1520, 1487, 1509 — and 1,056 more.
Outputs saved:
| File | Contents |
|---|---|
final_candidate_scores.npy |
Ensemble probability per candidate |
final_candidate_labels.npy |
1 = planet, 0 = false positive |
final_candidate_tiers.npy |
HIGH / MEDIUM / LOW per candidate |
final_candidate_ranking.npy |
Candidate indices sorted best to worst |
final_agreement_counts.npy |
How many classifiers agreed per candidate |
final_candidate_summary.txt |
Human-readable ranked list of all 1,979 |
A desktop application built with Tkinter. Loads all Phase 9 outputs and provides an interactive interface for exploring the full ranked candidate list.
To run:
python gui.pyRequires only Python 3 + NumPy + Tkinter (Tkinter ships with standard Python). All .npy output files from Phase 9 must be in the same directory.
Features:
- Animated starfield header with project and team information
- Stat bar showing total candidates, tier counts, Bayesian accuracy (99.12%), and CNN accuracy (99.30%)
- Full ranked table of all 1,979 candidates, colour-coded by confidence tier, with search bar, tier filter buttons, and click-to-sort column headers
- Confidence distribution bar chart (HIGH / MEDIUM / LOW)
- Candidate detail panel — click any row to see that candidate's ensemble score, tier, 5-dot classifier agreement indicator, and individual Bayesian and CNN score bars
All classifiers are evaluated on the held-out test set using:
- Accuracy — overall correctness
- Precision — of predicted planets, how many are real
- Recall — of real planets, how many did we catch (priority metric)
- F1-Score — harmonic mean of precision and recall
- AUC-ROC — discrimination ability across thresholds
- Confusion Matrix — breakdown of TP, FP, TN, FN
Of the 1,979 unresolved Kepler candidates that have never been confirmed or ruled out:
- 1,112 signals classified as HIGH confidence planet candidates (ensemble score >= 0.80)
- 1,066 of those have all five classifiers in unanimous agreement
- 90 signals in the MEDIUM confidence tier — worth follow-up investigation
- 777 signals classified as likely false positives
This pipeline provides astronomers with a prioritised, ranked list of candidates for ground-based follow-up observation — maximising the scientific return from limited telescope time.
Last updated: April 26, 2026