Skip to content
View JungHoyoun's full-sized avatar

Block or report JungHoyoun

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
JungHoyoun/README.md

Hoyoun Jung

I work across model efficiency, training systems, and GPU kernels to build hardware-aware LLMs under real-world compute constraints.

What I Do

  • Hardware-aware LLM training (modeling, numerics, systems)
  • GPU kernel optimization (CUDA, CuTe, Triton)
  • Distributed training with Torchtitan, Megatron-LM, Transformer Engine, and FlashAttention

Featured

  • 📌 Megatron-LM PR #3345: Improved the fused linear cross entropy path to address training-efficiency bottlenecks from large-logit materialization and memory traffic.

Contact

Popular repositories Loading

  1. PromptCompressor PromptCompressor Public

    Python 12 2

  2. 100days-gpu-challenge 100days-gpu-challenge Public

    Python 2

  3. notion-blog notion-blog Public

    Forked from morethanmin/morethan-log

    😎 A static blog using notion database

    TypeScript

  4. lingua lingua Public

    Forked from facebookresearch/lingua

    Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

    Python

  5. torchtitan torchtitan Public

    Forked from pytorch/torchtitan

    A PyTorch native library for large model training

    Python

  6. pytorch pytorch Public

    Forked from pytorch/pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

    Python