Skip to content

mohdrafik/generative_ai_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Generative AI Projects — Comprehensive Roadmap

Author: Moh Rafik
Duration: Month 3 (Weeks 9–12)
Learning Time: 6–7 hrs/day
Goal: Understand, implement, and experiment with modern Generative AI techniques — VAEs, GANs, Diffusion Models, and Transformers.


📘 Overview

This repository is part of the AI Learning Roadmap (3-Month Intensive) that includes:

  1. Machine Learning Basics
  2. Deep Learning Foundations
  3. Generative AI Projects ← (You are here)

This repository focuses on:

  • Building a conceptual and practical understanding of Generative AI
  • Implementing and training core models like Autoencoders, VAEs, GANs, and Diffusion Models
  • Exploring modern architectures like Transformers and LLMs
  • Building mini-projects for text and image generation

📂 Repository Structure

generative-ai-projects/
│
├── 01_autoencoders/
│   ├── basic_autoencoder.ipynb
│   ├── variational_autoencoder.ipynb
│
├── 02_gans/
│   ├── vanilla_gan.ipynb
│   ├── dcgan_faces.ipynb
│   ├── cycle_gan_intro.ipynb
│
├── 03_diffusion_models/
│   ├── diffusion_intro.ipynb
│   ├── stable_diffusion_walkthrough.ipynb
│
├── 04_transformers_llms/
│   ├── gpt2_finetuning.ipynb
│   ├── text_generation_huggingface.ipynb
│   ├── prompt_engineering_examples.ipynb
│
├── projects/
│   ├── text_to_image_pipeline.ipynb
│   ├── custom_gan_dataset.ipynb
│
├── assets/
│   ├── images/
│   └── figures/
│
└── README.md

📖 Learning Path (4-Week Plan)

Week Focus Topics
9 Autoencoders & VAEs Representation learning, latent space exploration
10 GANs Adversarial training, DCGANs, Conditional GANs
11 Diffusion Models Denoising processes, Stable Diffusion basics
12 LLMs & Prompt Engineering GPT, fine-tuning, text-to-image pipelines

🧮 Theoretical Summaries

1. Autoencoders (AEs)

  • Concept: Learn compressed representations (encodings) of data.
  • Structure: Encoder → Bottleneck → Decoder.
  • Loss Function: Reconstruction loss (MSE).
  • Applications: Noise removal, dimensionality reduction.

2. Variational Autoencoders (VAEs)

  • Extension of AEs: Introduces probabilistic latent variables.
  • Loss: Combination of Reconstruction + KL Divergence.
  • Formula:
    ( L = E_{q(z|x)}[log \ p(x|z)] - KL[q(z|x) || p(z)] )

3. Generative Adversarial Networks (GANs)

  • Components: Generator (G) and Discriminator (D) in a minimax game.
  • Objective:
    ( \min_G \max_D V(D, G) = E_{x \sim p_{data}}[log D(x)] + E_{z \sim p_z}[log(1 - D(G(z)))] )
  • Common Variants: DCGAN, CycleGAN, StyleGAN.

4. Diffusion Models

  • Idea: Learn to reverse a diffusion (noise-adding) process.
  • Famous Models: DDPM, Stable Diffusion.
  • Applications: Image generation, denoising, inpainting.

5. Transformers & LLMs

  • Mechanism: Self-Attention, Multi-Head Attention, Positional Encoding.
  • Popular Models: GPT, BERT, T5.
  • Applications: Text generation, summarization, translation.

💻 Implementation Summary

Libraries

  • PyTorch, TensorFlow, Keras – model implementations
  • Hugging Face Transformers – fine-tuning and inference
  • Diffusers – Stable Diffusion toolkit
  • Matplotlib, NumPy, Torchvision – visualization and datasets

Key Implementations

  1. Build Autoencoder & VAE from scratch.
  2. Train DCGAN on MNIST and CelebA datasets.
  3. Experiment with Diffusion Models (using Hugging Face).
  4. Fine-tune GPT-2 on custom text data.
  5. Build a Text-to-Image Pipeline using open models.

🚀 Projects

🧱 1. Variational Autoencoder on MNIST

Goal: Compress and reconstruct handwritten digits.
Concepts: KL divergence, latent space sampling.
Deliverables: Visualizations of latent space clusters.


🧠 2. DCGAN for Face Generation

Goal: Generate realistic human faces using adversarial training.
Concepts: Generator vs. Discriminator training loop.
Dataset: CelebA.
Deliverables: Saved generated images during training.


🌫️ 3. Diffusion Model Experiment

Goal: Learn denoising diffusion probabilistic model (DDPM).
Concepts: Forward and reverse diffusion processes.
Deliverables: Generate new images from Gaussian noise.


🗣️ 4. GPT-2 Fine-Tuning on Custom Dataset

Goal: Fine-tune GPT-2 for domain-specific text generation.
Concepts: Tokenization, causal language modeling.
Dataset: Custom text corpus.
Deliverables: Generate coherent domain-specific text samples.


🖼️ 5. Text-to-Image Pipeline

Goal: Combine transformer and diffusion components.
Concepts: Prompt-based generation, CLIP guidance.
Deliverables: Generate image outputs from textual prompts.


📊 Results Summary

Model Dataset Output Notes
VAE MNIST Reconstructed digits Clear latent structure
DCGAN CelebA Generated faces Improved over epochs
DDPM CIFAR-10 Generated images Stable results
GPT-2 Custom Domain text Fine-tuned successfully
Text-to-Image Prompts Realistic images Used CLIP + Diffusers

🧠 Key Takeaways

  • Understood probabilistic generative models.
  • Implemented GANs and learned adversarial optimization.
  • Trained diffusion-based models for image synthesis.
  • Fine-tuned and deployed transformer-based language models.
  • Built creative multimodal applications (Text → Image).

🧩 Next Step

🎯 Apply knowledge to real-world domains: biomedical imaging, text synthesis, or digital art generation.


🧰 Tools & Environment

  • Python 3.9+
  • Jupyter Notebook / VS Code
  • Libraries: torch, torchvision, transformers, diffusers, matplotlib

📚 References

  • MIT 6.S192 — Deep Generative Models
  • David Foster — Generative Deep Learning
  • Hugging Face Docs — https://huggingface.co/docs
  • DDPM Paper — Ho et al., 2020
  • GAN Paper — Goodfellow et al., 2014

✅ Progress Checklist

Task Status
Implement Autoencoder and VAE
Train DCGAN
Experiment with Diffusion Models
Fine-tune GPT-2
Build Text-to-Image pipeline
Complete all 5 projects
Final documentation & Git push

⭐ Pro Tip:
Document every experiment visually. Include GIFs of training progression, sample generations, and model performance graphs for an impressive GitHub portfolio.


📌 Maintained by Moh Rafik
💬 For queries or collaborations: [RAFIKIITBHU@GMAIL.COM or LinkedIn]

About

generative ai concepts and projects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors