Capstone Project — Single/Few-Shot 3D Reconstruction of cultural pottery artifacts using PixelNeRF, with vanilla NeRF as a baseline for comparison.
This project tackles the problem of reconstructing 3D models of Vietnamese pottery objects from a small number of 2D images. We leverage Neural Radiance Fields (NeRF) techniques to synthesize novel views and export 3D point clouds (PLY files).
| Approach | Framework | Role | Input Required |
|---|---|---|---|
| PixelNeRF | PyTorch | Primary method | 1–3 source views (generalizable) |
| NeRF (vanilla) | TensorFlow | Baseline comparison | ~100 views per object |
- End-to-end pipeline: from Blender renders → data preparation → training → 3D export
- Custom dataset of Vietnamese pottery (bowls, vases, cups, dishes) from the LR3D-CULT dataset
- Automated COLMAP camera pose estimation with fallback to synthetic turntable poses
- PLY point cloud export for visualization and downstream use
Capstone/
├── README.md
│
├── scripts/ # All utility scripts
│ ├── data_processing/ # Dataset standardization, resize, split
│ ├── camera_poses/ # Camera pose generation & fixing
│ ├── colmap/ # COLMAP automation
│ ├── export/ # 3D export (PLY point clouds)
│ └── evaluation/ # Metrics & visualization
│
├── NeRF_finetuning/ # NeRF baseline
│ ├── nerf/ # Original NeRF repo (TensorFlow)
│ ├── All_data/ # Raw dataset (~180 views/object, 512×512)
│ ├── data_nerf/ # Prepared NeRF data (train/val/test splits)
│ └── outputs/ # NeRF training outputs
│
├── PixelNerf_finetuning/ # PixelNeRF (primary)
│ ├── pixel-nerf/ # PixelNeRF repo (PyTorch)
│ │ ├── src/ # Model, data loaders, renderer, utils
│ │ ├── train/ # Training script
│ │ ├── eval/ # Evaluation & video generation
│ │ └── conf/ # HOCON config files
│ └── LR3D-CULT/ # Source dataset (Blender renders)
│
├── docs/ # Documentation
│ ├── papers/ # Reference papers (NeRF, PixelNeRF, etc.)
│ ├── reports/ # Project reports & guides
│ ├── figures/ # Pipeline diagrams & visualizations
│ └── presentations/ # Slide decks
│
├── notebooks/ # Jupyter notebooks
│ ├── analysis.ipynb # Dataset analysis
│ ├── render_demo.ipynb # NeRF rendering demo
│ ├── extract_mesh.ipynb # Mesh extraction
│ └── tiny_nerf.ipynb # Minimal NeRF tutorial
│
└── results/ # Final outputs & benchmarks
- OS: Windows 10/11
- GPU: NVIDIA GPU with CUDA support (recommended ≥ 8GB VRAM)
- COLMAP: Required for camera pose estimation (download)
PixelNeRF (Primary):
torch >= 1.10
torchvision
numpy
Pillow
imageio
tqdm
dotmap
pyhocon
open3d # for PLY export
lpips # for perceptual metrics
NeRF Baseline:
tensorflow >= 2.x
numpy
imageio
# Clone the project
git clone <repo_url>
cd Capstone
# Install PixelNeRF dependencies
cd PixelNerf_finetuning/pixel-nerf
pip install -r requirements.txt
# Install NeRF baseline dependencies (optional)
cd NeRF_finetuning/nerf
conda env create -f environment.ymlThe dataset consists of Vietnamese pottery objects rendered in Blender with 180 views per object (2 cameras × 90 frames, 360° turntable):
| Category | Prefix | Description |
|---|---|---|
| Bowls | bat_gom_ |
Ceramic bowls |
| Vases | binh_gom_ |
Ceramic vases |
| Vases (BT) | binh_gom_bt_ |
Ceramic vases (variant) |
| Cups | chen_gom_ |
Ceramic cups |
| Dishes | dia_gom_ |
Ceramic dishes |
| Bronze cups | ly_dong_ |
Bronze cups |
Each object folder contains:
object_name/
├── images/ # 180 PNG images (512×512)
├── transforms.json # Camera poses (NeRF format)
└── metadata.json # Blender render parameters
transforms.json format:
{
"camera_angle_x": 0.6911,
"frames": [
{
"file_path": "./images/0001.png",
"transform_matrix": [[4×4 camera-to-world matrix]],
"w": 512, "h": 512,
"fl_x": 349.2, "fl_y": 349.2,
"cx": 256.0, "cy": 256.0
}
]
}# Full standardization pipeline:
# 1. Select N frames uniformly, resize to 128×128, split 70:20:10
python scripts/data_processing/standard_dataset.py
# Or use the CLI version with custom args:
python scripts/data_processing/standad_resize_dataset.py \
--src "path/to/All_data" \
--dst "path/to/output_dataset" \
--n 90 --resize 128 128 --fov 50 --seed 42If COLMAP fails to register enough frames:
# Generate poses from Blender metadata (most accurate)
python scripts/camera_poses/generate_transforms_from_metadata.py
# Or use synthetic turntable poses as fallback
python scripts/camera_poses/fix_all_transforms.py
# Check dataset quality
python scripts/camera_poses/check_nerf_dataset.pycd PixelNerf_finetuning/pixel-nerf
# Train on pottery dataset
python train/train.py \
-n pottery_experiment \
-c conf/exp/multi_obj.conf \
-D path/to/dataset_pottery \
--gpu_id 0 \
--epochs 200Training logs are saved to logs/ and checkpoints to checkpoints/.
cd NeRF_finetuning/nerf
# Train on a single object
python run_nerf.py \
--config config_lego.txt \
--datadir path/to/object_data \
--basedir outputs/cd PixelNerf_finetuning/pixel-nerf
# Evaluate on test set
python eval/eval.py \
-n pottery_experiment \
-c conf/exp/multi_obj.conf \
-D path/to/dataset_pottery \
--gpu_id 0
# Calculate PSNR, SSIM, LPIPS
python eval/calc_metrics.py \
-n pottery_experiment
# Generate 360° rotation video
python eval/gen_video.py \
-n pottery_experimentpython scripts/evaluation/predict_lpips.pycd PixelNerf_finetuning/pixel-nerf
# Export PLY from a trained model
python scripts/export/export_ply.py \
--weights checkpoints/pottery_experiment/pixel_nerf_latest \
--input path/to/source_images \
--transforms path/to/transforms.json \
--output results/output.ply \
--n_views 36The exported .ply file can be viewed in MeshLab, CloudCompare, or Open3D.
| Component | Details |
|---|---|
| Encoder | ResNet-34 (ImageNet pretrained), 4 feature levels |
| MLP | ResNet-style, 3 blocks, 512 hidden dims (coarse + fine) |
| Renderer | Coarse: 64 samples, Fine: 32 samples, Depth: 16 samples |
| Positional Encoding | 6 frequencies, freq_factor=1.5 |
| Background | White |
| Component | Details |
|---|---|
| MLP | 8 layers × 256 units, skip at layer 4 |
| Positional Encoding | 10 frequencies (position), 4 frequencies (view direction) |
| Rendering | Coarse + Fine hierarchical sampling |
- Coordinate system: Y-up
- Camera orientation: Looks along −Z axis
camera_angle_x: Horizontal FOV in radians (~0.6911 rad ≈ 39.6°)transform_matrix: 4×4 camera-to-world (c2w) transformation
- Mildenhall, B., et al. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV 2020.
- Yu, A., et al. pixelNeRF: Neural Radiance Fields from One or Few Images. CVPR 2021.
- Cai, S., et al. Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation. CVPR 2022.
- Liu, R., et al. Zero-1-to-3: Zero-shot One Image to 3D Object. ICCV 2023.
- Tancik, M., et al. Nerfstudio: A Modular Framework for Neural Radiance Field Development. SIGGRAPH 2023.
- LR3D-CULT Dataset — 3D cultural heritage pottery dataset.
This project is for academic purposes (Capstone Project). The NeRF and PixelNeRF codebases retain their original licenses (MIT).