JungHoyoun

Follow

HoYoun (Julio) JungHoyoun

Follow

12 followers · 6 following

devjulio.vercel.app

Achievements

Achievements

JungHoyoun/README.md

Hoyoun Jung

I work across model efficiency, training systems, and GPU kernels to build hardware-aware LLMs under real-world compute constraints.

What I Do

Hardware-aware LLM training (modeling, numerics, systems)
GPU kernel optimization (CUDA, CuTe, Triton)
Distributed training with Torchtitan, Megatron-LM, Transformer Engine, and FlashAttention

Featured

📌 Megatron-LM PR #3345: Improved the fused linear cross entropy path to address training-efficiency bottlenecks from large-logit materialization and memory traffic.

Contact

LinkedIn: https://www.linkedin.com/in/hoyoun-jung-0859421b7/
Blog: https://junghoyoun.github.io

Popular repositories Loading

PromptCompressor PromptCompressor Public

Python 12 2
100days-gpu-challenge 100days-gpu-challenge Public

Python 2
notion-blog notion-blog Public

Forked from morethanmin/morethan-log

😎 A static blog using notion database

TypeScript
lingua lingua Public

Forked from facebookresearch/lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python
torchtitan torchtitan Public

Forked from pytorch/torchtitan

A PyTorch native library for large model training

Python
pytorch pytorch Public

Forked from pytorch/pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python