GitHub - datavorous/inference-engineering: documenting my work in inference engineering

inference engineering

A personal research workspace for learning and documenting inference engineering work at the Language Technologies Research Center.

The field is new, and doesn't have structured and well organized information on the internet. This is my attempt to document everything: from day 0.

All my actions are logged. Problems which I was told to tackle are in here, and will be updated accordingly.

Right now, my work is on:

infra/ops: env setup, storage, singularity/docker, 10GbE cross-node, nas, kubernetes
serving: vLLM deployment, API exposure

In future (incomplete list):

optimization techniques
kernel experiments

The model quality side (kernels, quantization, distillation) comes later once the infra is stable enough to actually experiment on.

Note

It was decided that all work will be done on the Turing cluster exclusively. Turing is a heterogeneous cluster: node01-04 have RTX 6000 (48GB VRAM), node05-14 have L40S (48GB VRAM, Ada Lovelace, FP8 supported), node10 has 8× A100 40GB. I have used Claude Code and Opus 4.7 to aid the documentation process.

micro-log

creating the prompts 1 2
exploring all the abstraction layers in "inference engineering" 1
gathered the documents related to the "current state of infrastructure", and the proposed "changes" to be made. [files hidden from public]
used claude code to reference the docs + meeting notes and build a "primer" containing all sorts of definitions and mermaid diagrams to convey the entire status information in a very accessible manner.
- allowed web access to search for NVIDIA's docs, vLLM docs etc. [NEEDS MANUAL VERIFICATION]
generated the list of "problems" that we need to tackle initially
summarised everything upto this [understanding WHAT we have + minimal setup] in stage0.md. in short:
- need to setup dev environment (bare metal test + run a small model end to end)
- build install scripts to fix dependency hellhole
  - have correct paths to allow caching among users
  - additionally have some verification scripts.
- create a setup repo
- dockerize it eventually, push docker images.
- expected result: dependecy errors are avoided, libraries with pinned version(s) are used with minimal setup headache, users start using /scratch, HF models get cached.
  - FINAL: anyone can can clone, run the setup script, pull the image, and run inference.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
knowledge		knowledge
media		media
prompts		prompts
summary		summary
.gitignore		.gitignore
README.md		README.md
log.md		log.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

inference engineering

micro-log

references

About

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

inference engineering

micro-log

references

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!