Skip to content
View okoge-kaz's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@rioyokotalab @turingmotors

Block or report okoge-kaz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
okoge-kaz/README.md

Hi there 👋

I am an Incoming PhD Student at Institute of Science Tokyo (formerly Tokyo Tech), starting in April 2026. My research lies at the intersection of HPC and Machine Learning, specifically focusing on distributed training and low-precision training (FP8/NVFP4) for Large Language Models.

I am a core contributor to the Swallow Project, a Japanese LLM development initiative, where I maintain the pre-training library and lead large-scale training experiments.

🔥 News & Updates

  • [Mar 2026] I will be presenting Swallow LLM at NVIDIA GTC 2026 in San Jose! 🗣️
  • [Jan 2026] My paper "Rewriting Pre-Training Data Boosts LLM Performance in Math and Code" has been accepted to ICLR 2026! 🎉

🔎 Seeking Opportunities

I am actively looking for Research Internship opportunities in the US If you are interested in my work on LLM systems and low-precision training, please reach out!

Popular repositories Loading

  1. llm-recipes llm-recipes Public

    Ongoing Research Project for continaual pre-training LLM(dense mode)

    Python 44 4

  2. moe-recipes moe-recipes Public

    Ongoing research training Mixture of Expert models.

    Python 21 2

  3. megatron-deepspeed-turing-techblog megatron-deepspeed-turing-techblog Public

    Turing Tech Blog Repository

    Python 5 1

  4. llm-jp-sakura-ansible llm-jp-sakura-ansible Public

    Jinja 5 2

  5. turing-techblog-megatron-deepspeed turing-techblog-megatron-deepspeed Public

    環境構築方法の詳細は以下のLinkから

    Python 2 1

  6. wandb_watcher wandb_watcher Public

    ABCI 大規模言語モデル構築支援にてwandbのジョブを監視するためのツール

    Python 2