K8s Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies.
Deploy a minimal Network Operator profile to automatically discover your cluster's network capabilities and hardware configuration. This phase can be skipped if you provide your own configuration file.
Specify the desired deployment profile via CLI flags or with the natural language prompt for the LLM.
Based on the discovered/provided configuration, generate a complete set of YAML deployment files tailored to your selected network profile.
git clone <repository-url>
cd launch-kubernetes
make buildThe binary will be available at build/l8k.
After building, install the binary, profiles, and config to /usr/local:
make install # Copies binary, profiles, config to /usr/local
make dev-install # Symlinks instead of copies (for development)This runs scripts/install.sh, which places:
<prefix>/bin/l8k<prefix>/share/l8k/profiles/<prefix>/share/l8k/l8k-config.yaml
Default prefix is /usr/local. Override with PREFIX=/opt/l8k make install.
make docker-build # Build Docker image (l8k:v0.1.0 + l8k:latest)
make docker-build-local # Build inside container, extract binary to host build/l8kdocker-build-local is useful when you don't have the Go toolchain installed — it compiles inside a container and copies the resulting binary to build/l8k on your host.
# Run from the Docker image
docker run --net=host \
-v ~/.kube:/kube:ro \
-v $(pwd):/output \
l8k:latest discover --kubeconfig /kube/config \
--save-cluster-config /output/cluster-config.yaml
K8s Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies.
### Discover Cluster Configuration
Deploy a minimal Network Operator profile to automatically discover your cluster's
network capabilities and hardware configuration by using --discover-cluster-config.
This phase can be skipped if you provide your own configuration file by using --user-config.
This phase requires --kubeconfig to be specified.
### Generate Deployment Files
Based on the discovered or provided configuration,
generate a complete set of YAML deployment files for the selected network profile.
Files can be saved to disk using --save-deployment-files.
The profile can be defined manually with --fabric, --deployment-type and --multirail flags,
OR generated by an LLM-assisted profile generator with --prompt (requires --llm-api-key and --llm-vendor).
### Deploy to Cluster
Apply the generated deployment files to your Kubernetes cluster by using --deploy. This phase requires --kubeconfig and can be skipped if --deploy is not specified.
Usage:
l8k [flags]
l8k [command]
Available Commands:
completion Generate the autocompletion script for the specified shell
help Help about any command
version Print the version number
Flags:
--ai Enable AI deployment
--deploy Deploy the generated files to the Kubernetes cluster
--deployment-type string Select the deployment type (sriov, rdma_shared, host_device)
--discover-cluster-config Deploy a thin Network Operator profile to discover cluster capabilities
--enabled-plugins string Comma-separated list of plugins to enable (default "network-operator")
--fabric string Select the fabric type to deploy (infiniband, ethernet)
--group string Generate templates for a specific group only (e.g., group-0)
-h, --help help for l8k
--image-pull-secrets strings Image pull secret names for NicClusterPolicy (comma-separated)
--kubeconfig string Path to kubeconfig file for cluster deployment (required when using --deploy)
--node-selector string Filter nodes for discovery by label (default "feature.node.kubernetes.io/pci-15b3.present=true")
--llm-api-key string API key for the LLM API (required when using --prompt)
--llm-api-url string API URL for the LLM API
--llm-interactive Enable interactive chat mode for LLM-assisted profile selection
--llm-model string Model name for the LLM API (e.g., claude-3-5-sonnet-20241022, gpt-4)
--llm-vendor string Vendor of the LLM API: openai, openai-azure, anthropic, gemini (default "openai-azure")
--log-file string Write logs to file instead of stderr
--log-level string Enable logging at specified level (debug, info, warn, error)
--multiplane-mode string Spectrum-X multiplane mode: swplb, hwplb, uniplane (requires --spectrum-x)
--multirail Enable multirail deployment
--network-operator-namespace string Override the network operator namespace from the config file
--number-of-planes int Number of planes for Spectrum-X (requires --spectrum-x)
--prompt string Path to file with a prompt to use for LLM-assisted profile generation
--save-cluster-config string Save discovered cluster configuration to the specified path (defaults to --user-config path if set, otherwise ./cluster-config.yaml)
--save-deployment-files string Save generated deployment files to the specified directory (default "./deployment")
--spcx-version string Spectrum-X firmware version (requires --spectrum-x)
--spectrum-x Enable Spectrum X deployment
--user-config string Use provided cluster configuration file (as base config for discovery or as full config without discovery)
Use "l8k [command] --help" for more information about a command.
Note: The help text above is auto-generated. Run
make update-readmeafter CLI changes to refresh it.
Discover cluster hardware:
l8k discover --kubeconfig ~/.kube/config \
--save-cluster-config ./cluster-config.yamlGenerate deployment manifests:
l8k generate --user-config ./cluster-config.yaml \
--fabric ethernet --deployment-type sriov --multirail \
--save-deployment-files ./deploymentsInteractive AI-assisted troubleshooting or profile selection:
l8k chat --kubeconfig ~/.kube/config \
--user-config ./cluster-config.yaml \
--llm-api-key $KEY --llm-vendor anthropic \
--llm-model claude-sonnet-4-20250514Collect a diagnostic dump:
l8k sosreport --kubeconfig ~/.kube/configThe root command still supports all flags for backward compatibility and running the full pipeline in one shot:
l8k --discover-cluster-config --save-cluster-config ./cluster-config.yaml \
--fabric ethernet --deployment-type sriov --multirail \
--save-deployment-files ./deployments \
--deploy --kubeconfig ~/.kube/configUsing the subcommand:
l8k discover --kubeconfig ~/.kube/config \
--save-cluster-config ./my-cluster-config.yamlFilter discovery to specific nodes using a label selector:
l8k discover --kubeconfig ~/.kube/config \
--save-cluster-config ./my-cluster-config.yaml \
--node-selector "feature.node.kubernetes.io/pci-15b3.present=true"Or using the root command (backward compatible):
l8k --discover-cluster-config --save-cluster-config ./my-cluster-config.yaml \
--kubeconfig ~/.kube/configUse your own config file (with custom network operator version, subnets, etc.) as the base for discovery. Without --save-cluster-config, the file is rewritten in place with discovery results:
l8k discover --user-config ./my-config.yaml \
--kubeconfig ~/.kube/configSave discovery results to a separate file instead:
l8k discover --user-config ./my-config.yaml \
--save-cluster-config ./discovered-config.yaml \
--kubeconfig ~/.kube/configGenerate and deploy with pre-existing config:
l8k generate --user-config ./existing-config.yaml \
--fabric ethernet --deployment-type sriov --multirail \
--save-deployment-files ./deployments \
--deploy --kubeconfig ~/.kube/configl8k generate --user-config ./config.yaml \
--fabric ethernet --deployment-type sriov --multirail \
--save-deployment-files ./deploymentsIn heterogeneous clusters, discovery produces multiple node groups. Use --group to generate manifests for a single group:
l8k generate --user-config ./config.yaml \
--fabric infiniband --deployment-type sriov --multirail \
--group group-0 \
--save-deployment-files ./deploymentsecho "I want to enable multirail networking in my AI cluster" > requirements.txt
l8k generate --user-config ./config.yaml \
--prompt requirements.txt --llm-vendor openai-azure --llm-api-key <OPENAI_AZURE_KEY> \
--save-deployment-files ./deploymentsUse the chat subcommand for interactive AI-assisted troubleshooting. The AI agent can collect and analyze diagnostic data (sosreport) from the cluster:
l8k chat --kubeconfig ~/.kube/config \
--user-config ./cluster-config.yaml \
--llm-api-key $KEY --llm-vendor anthropic \
--llm-model claude-sonnet-4-20250514In the session, ask about issues: "My OFED driver pods are crashing, can you investigate?"
The AI agent will automatically collect a sosreport from the cluster, examine the diagnostic data, and provide analysis with remediation steps.
You can also collect a sosreport separately and provide it to the chat session (no cluster access needed):
l8k sosreport --kubeconfig ~/.kube/config
l8k chat --sosreport-path ./network-operator-sosreport-20260306-120000 \
--llm-api-key $KEY --llm-vendor anthropic \
--llm-model claude-sonnet-4-20250514l8k supports structured output for AI agents and CI/CD pipelines. Use --output json to get machine-readable output, --yes to skip interactive prompts, and --dry-run to preview changes safely.
# Get structured output for programmatic consumption
l8k generate --user-config ./config.yaml \
--fabric ethernet --deployment-type sriov --multirail \
--save-deployment-files ./deployments \
--output json --yes 2>/dev/null | jq .Example JSON output:
{
"success": true,
"phase": "generate",
"profile": {
"fabric": "ethernet",
"deployment": "sriov",
"multirail": "true"
},
"generatedFiles": [
"./deployments/network-operator/nic-cluster-policy.yaml",
"./deployments/network-operator/sriov-network-node-policy.yaml"
],
"deployed": false,
"messages": [
{"level": "info", "message": "Generating files for profile: SR-IOV Ethernet RDMA", "timestamp": "..."}
]
}Preview what would be deployed without making changes:
l8k generate --user-config ./config.yaml --spectrum-x --deploy \
--dry-run --output json --kubeconfig ~/.kube/configAI agents can programmatically discover l8k's capabilities:
l8k schemaThis outputs a JSON description of available phases, fabrics, deployment types, flags, exit codes, and output formats.
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General error |
| 2 | Validation error (bad flags, invalid config) |
| 3 | Cluster error (API unreachable, discovery failed) |
| 4 | Deployment error (apply failed) |
| 5 | Partial success (discovery ok but deploy failed) |
In JSON mode, errors include structured fields (code, category, transient, suggestion) to help agents decide whether to retry or fix input.
During cluster discovery stage, Kubernetes Launch Kit creates a configuration file, which it later uses to generate deployment manifests from the templates. This config file can be edited by the user to customize their deployment configuration. The user can provide the custom config file to the tool using the --user-config cli flag — either as a standalone config (skipping discovery) or as a base config combined with l8k discover / --discover-cluster-config (discovery takes network operator parameters from the file and adds discovered cluster config).
The tool resolves configuration and profile paths in order: local directory first (./l8k-config.yaml, ./profiles), then installed location (/usr/local/share/l8k/), then binary-relative.
The docaDriver section controls the OFED driver deployment in the NicClusterPolicy. Set enable: true to include the ofedDriver section in generated manifests, or enable: false to omit it. This can also be overridden via the --enable-doca-driver CLI flag.
When the DOCA/OFED driver loads on a node, it replaces the inbox MLX kernel modules (mlx5_core, mlx5_ib, ib_core, etc.) with its own versions. If other kernel modules depend on the inbox MLX modules, they will block the inbox modules from being unloaded, causing the DOCA driver to fail to load.
During cluster discovery, the tool execs into nic-configuration-daemon pods and builds a full reverse dependency graph from /sys/module/*/holders/ for all loaded modules, then BFS-traverses from each of the following MLX/OFED kernel modules to find all transitive non-MOFED dependents:
mlx5_core, mlx5_ib, ib_umad, ib_uverbs, ib_ipoib, rdma_cm, rdma_ucm, ib_core, ib_cm
Discovered modules are classified into three categories:
- mlx5-prefixed modules (e.g.
mlx5_vdpa,mlx5_netdev) — NVIDIA's own modules, silently filtered out. - Known storage-over-RDMA modules (
ib_isert,nvme_rdma,nvmet_rdma,rpcrdma,xprtrdma,ib_srpt) — saved per-group asstorageModules. Discovery automatically enablesdocaDriver.unloadStorageModules: truewhen any are found. The generated NicClusterPolicy rendersUNLOAD_STORAGE_MODULES: "true". - Third-party RDMA modules (everything else, e.g.
qedr,bnxt_re,rdma_rxe) — saved per-group asthirdPartyRDMAModules. Discovery automatically enablesdocaDriver.unloadThirdPartyRDMAModules: truewhen any are found. The generated NicClusterPolicy rendersUNLOAD_THIRD_PARTY_RDMA_MODULES: "true". The driver container has 15 known third-party modules hardcoded.
Both flags are auto-enabled during discovery so the DOCA driver can unload blocking modules. A warning is emitted after discovery and generation reminding you to verify that no running workloads depend on these modules. When multiple node groups are merged, both module lists are aggregated as unions.
After discovery, the config will contain the discovered modules and auto-enabled flags:
docaDriver:
enable: true
version: doca3.3.0-26.01-1.0.0.0-0
unloadStorageModules: true # auto-enabled by discovery
enableNFSRDMA: false
unloadThirdPartyRDMAModules: true # auto-enabled by discovery
clusterConfig:
- identifier: group-0
thirdPartyRDMAModules:
- rdma_rxe
storageModules:
- nvme_rdma
- ib_isertThe generated NicClusterPolicy ofedDriver section will include:
env:
- name: UNLOAD_STORAGE_MODULES
value: "true"
- name: UNLOAD_THIRD_PARTY_RDMA_MODULES
value: "true"To disable automatic unloading, set the flags back to false in your config after discovery.
The nvIpam section supports two modes for subnet configuration:
Option 1: Manual subnet list — List each subnet explicitly. This takes precedence if the list is non-empty:
nvIpam:
poolName: nv-ipam-pool
subnets:
- subnet: 192.168.2.0/24
gateway: 192.168.2.1
- subnet: 192.168.3.0/24
gateway: 192.168.3.1Option 2: Auto-generate subnets — When the subnets list is empty but startingSubnet, mask, and offset are all set, subnets are automatically generated. Each cluster config group gets its own unique, non-overlapping subnet slice. The gateway for each subnet is the first usable address (network + 1).
nvIpam:
poolName: nv-ipam-pool
startingSubnet: "192.168.2.0"
mask: 24
offset: 1With the auto-generation example above, a cluster with 2 groups (4 east-west PFs each) would receive:
- Group 0: 192.168.2.0/24, 192.168.3.0/24, 192.168.4.0/24, 192.168.5.0/24
- Group 1: 192.168.6.0/24, 192.168.7.0/24, 192.168.8.0/24, 192.168.9.0/24
The offset parameter controls how many subnet blocks to skip between consecutive subnets (offset=1 is contiguous, offset=2 skips every other).
Example of the configuration file discovered from the cluster:
networkOperator:
version: v26.1.0
componentVersion: network-operator-v26.1.0
repository: nvcr.io/nvidia/mellanox
namespace: nvidia-network-operator
imagePullSecrets: []
docaDriver:
enable: true
version: doca3.2.0-25.10-1.2.8.0-2
unloadStorageModules: false
enableNFSRDMA: false
unloadThirdPartyRDMAModules: false
nvIpam:
poolName: nv-ipam-pool
subnets:
- subnet: 192.168.2.0/24
gateway: 192.168.2.1
- subnet: 192.168.3.0/24
gateway: 192.168.3.1
- subnet: 192.168.4.0/24
gateway: 192.168.4.1
- subnet: 192.168.5.0/24
gateway: 192.168.5.1
- subnet: 192.168.6.0/24
gateway: 192.168.6.1
- subnet: 192.168.7.0/24
gateway: 192.168.7.1
- subnet: 192.168.8.0/24
gateway: 192.168.8.1
- subnet: 192.168.9.0/24
gateway: 192.168.9.1
- subnet: 192.168.10.0/24
gateway: 192.168.10.1
- subnet: 192.168.11.0/24
gateway: 192.168.11.1
- subnet: 192.168.12.0/24
gateway: 192.168.12.1
- subnet: 192.168.13.0/24
gateway: 192.168.13.1
- subnet: 192.168.14.0/24
gateway: 192.168.14.1
- subnet: 192.168.15.0/24
gateway: 192.168.15.1
- subnet: 192.168.16.0/24
gateway: 192.168.16.1
- subnet: 192.168.17.0/24
gateway: 192.168.17.1
- subnet: 192.168.18.0/24
gateway: 192.168.18.1
- subnet: 192.168.19.0/24
sriov:
ethernetMtu: 9000
infinibandMtu: 4000
numVfs: 8
priority: 90
resourceName: sriov_resource
networkName: sriov-network
hostdev:
resourceName: hostdev-resource
networkName: hostdev-network
rdmaShared:
resourceName: rdma_shared_resource
hcaMax: 63
ipoib:
networkName: ipoib-network
macvlan:
networkName: macvlan-network
nicConfigurationOperator:
deployNicInterfaceNameTemplate: true # Enable NIC rename when needed (see NIC Interface Name Templates section)
rdmaPrefix: "rdma_r%rail%" # RDMA device name template (%rail% substituted per rail)
netdevPrefix: "eth_r%rail%" # Network interface name template (%rail% substituted per rail)
spectrumX:
nicType: "1023"
overlay: none
rdmaPrefix: roce_p%plane%_r%rail% # Spectrum-X uses its own prefixes (with %plane%)
netdevPrefix: eth_p%plane%_r%rail%
clusterConfig:
- identifier: group-0
capabilities:
nodes:
sriov: true
rdma: true
ib: true
pfs:
- deviceID: a2dc
rdmaDevice: ""
pciAddress: "0000:19:00.0"
networkInterface: ""
traffic: east-west
rail: 0
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:2a:00.0
networkInterface: ""
traffic: east-west
rail: 1
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:3b:00.0
networkInterface: ""
traffic: east-west
rail: 2
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:4c:00.0
networkInterface: ""
traffic: east-west
rail: 3
- deviceID: 101f
rdmaDevice: ""
pciAddress: 0000:5a:00.0
networkInterface: ""
traffic: east-west
rail: 4
- deviceID: 101f
rdmaDevice: ""
pciAddress: 0000:5a:00.1
networkInterface: ""
traffic: east-west
rail: 5
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:9b:00.0
networkInterface: ""
traffic: east-west
rail: 6
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:ab:00.0
networkInterface: ""
traffic: east-west
rail: 7
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:c1:00.0
networkInterface: ""
traffic: east-west
rail: 8
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:cb:00.0
networkInterface: ""
traffic: east-west
rail: 9
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:d8:00.0
networkInterface: ""
traffic: east-west
rail: 10
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:d8:00.1
networkInterface: ""
traffic: east-west
rail: 11
workerNodes:
- pdx-g22r13-2894-lh2-w01
- pdx-g24r13-2894-lh2-w02
nodeSelector:
nvidia.com/gpu.machine: ThinkSystem-SR680a-V3
- identifier: group-1
capabilities:
nodes:
sriov: true
rdma: true
ib: true
pfs:
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:1a:00.0
networkInterface: ""
traffic: east-west
rail: 0
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:3c:00.0
networkInterface: ""
traffic: east-west
rail: 1
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:4d:00.0
networkInterface: ""
traffic: east-west
rail: 2
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:5e:00.0
networkInterface: ""
traffic: east-west
rail: 3
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:9c:00.0
networkInterface: ""
traffic: east-west
rail: 4
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:9d:00.0
networkInterface: ""
traffic: east-west
rail: 5
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:9d:00.1
networkInterface: ""
traffic: east-west
rail: 6
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:bc:00.0
networkInterface: ""
traffic: east-west
rail: 7
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:cc:00.0
networkInterface: ""
traffic: east-west
rail: 8
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:dc:00.0
networkInterface: ""
traffic: east-west
rail: 9
workerNodes:
- pdx-g22r23-2894-dh2-w03
- pdx-g24r23-2894-dh2-w04
nodeSelector:
nvidia.com/gpu.machine: PowerEdge-XE9680
- identifier: group-2
capabilities:
nodes:
sriov: true
rdma: true
ib: true
pfs:
- deviceID: a2dc
rdmaDevice: ""
pciAddress: "0000:09:00.0"
networkInterface: ""
traffic: east-west
rail: 0
- deviceID: a2dc
rdmaDevice: ""
pciAddress: "0000:23:00.0"
networkInterface: ""
traffic: east-west
rail: 1
- deviceID: a2dc
rdmaDevice: ""
pciAddress: "0000:35:00.0"
networkInterface: ""
traffic: east-west
rail: 2
- deviceID: a2dc
rdmaDevice: ""
pciAddress: "0000:35:00.1"
networkInterface: ""
traffic: east-west
rail: 3
- deviceID: a2dc
rdmaDevice: ""
pciAddress: "0000:53:00.0"
networkInterface: ""
traffic: east-west
rail: 4
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:69:00.0
networkInterface: ""
traffic: east-west
rail: 5
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:8f:00.0
networkInterface: ""
traffic: east-west
rail: 6
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:9c:00.0
networkInterface: ""
traffic: east-west
rail: 7
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:cd:00.0
networkInterface: ""
traffic: east-west
rail: 8
- deviceID: a2dc
rdmaDevice: ""
pciAddress: 0000:f1:00.0
networkInterface: ""
traffic: east-west
rail: 9
workerNodes:
- pdx-g22r31-2894-ch2-w05
- pdx-g24r31-2894-ch2-w06
nodeSelector:
nvidia.com/gpu.machine: UCSC-885A-M8-H22During cluster discovery, the tool automatically identifies BlueField DPU devices (as opposed to SuperNICs or ConnectX NICs) by matching each device's partNumber against a known list of DPU product codes in pkg/networkoperatorplugin/ns-product-ids. Devices matching a DPU product code are classified as north-south traffic (management/external), while all other devices are classified as east-west traffic (GPU interconnect).
North-south PFs are included in the saved cluster configuration for visibility, but are automatically filtered out during template rendering so that only east-west PFs appear in the generated manifests. Each east-west PF is assigned a sequential rail number (rail-0, rail-1, rail-2, ...) used for naming resources like SriovNetworkNodePolicy and IPPool entries.
Example of mixed traffic types in the config:
clusterConfig:
- identifier: group-0
pfs:
- deviceID: a2dc
pciAddress: "0000:19:00.0"
traffic: east-west # SuperNIC — included in manifests
rail: 0
- deviceID: a2dc
pciAddress: "0000:2a:00.0"
traffic: east-west
rail: 1
- deviceID: a2dc
pciAddress: "0000:3b:00.0"
traffic: north-south # BlueField DPU — excluded from manifestsDuring discovery, each node group's machineType and productType are populated from GPU operator node labels (nvidia.com/gpu.machine and nvidia.com/gpu.product). When these labels are absent — for example, when the GPU operator is not deployed — the tool falls back to probing hardware directly from a nic-configuration-daemon pod on one of the group's nodes:
- Machine type: read from
/sys/class/dmi/id/product_name - GPU product type: parsed from
nvidia-smi -qoutput (the firstProduct Namefield)
Values are sanitized to match the GPU operator label format (spaces replaced with dashes). If either probe fails (e.g., nvidia-smi not installed, DMI not readable), the corresponding field is left empty and discovery continues without error.
Example of discovered hardware types in the config:
clusterConfig:
- identifier: group-0
machineType: ThinkSystem-SR680a-V3
productType: NVIDIA-H100-NVL
workerNodes:
- node-1
- node-2The nicConfigurationOperator.deployNicInterfaceNameTemplate setting controls whether a NicInterfaceNameTemplate CR is deployed to rename NIC interfaces to predictable, rail-based names (e.g., eth_r0, eth_r1). When set to true, the tool treats it as "enable when needed" rather than "always enable". The NicInterfaceNameTemplate CR and associated nicConfigurationOperator section in NicClusterPolicy are only deployed when one of the following conditions is met:
-
Merged groups with PCI address conflicts — When multiple node groups share the same GPU product type and are merged into a single group, but the same PCI address appears at different rail positions across groups. In this case PCI addresses alone cannot identify the correct rail, so interface name templates are used instead.
-
rdma_shared deployment with empty network interface names — When the deployment type is
rdma_shared(macvlan-rdma-shared or ipoib-rdma-shared profiles) and PFs have emptynetworkInterfacefields. TherdmaSharedDevicePluginusesifNamesselectors that require interface names, so NicInterfaceNameTemplate must be enabled to provide them. This typically happens when discovery finds multiple nodes per group and omits device names for safety.
When neither condition holds, name templates are disabled and the device plugin uses PCI addresses directly, avoiding the overhead of deploying the NIC configuration operator.
By default, l8k generates example workload DaemonSets (file pattern: *-example-daemonset.yaml) for each profile. To use your own workload manifest instead, specify it in the config or via CLI flag:
workload:
manifest: /path/to/my-workload.yamlOr via CLI:
l8k generate --user-config ./config.yaml \
--workload-manifest /path/to/my-workload.yaml \
--fabric ethernet --deployment-type sriov \
--save-deployment-files ./deploymentsYou can run the l8k tool as a docker container:
docker run -v ~/launch-kubernetes/user-prompt:/user-prompt -v ~/remote-cluster/:/remote-cluster -v /tmp:/output --net=host nvcr.io/nvidia/cloud-native/k8s-launch-kit:v26.1.0 --discover-cluster-config --kubeconfig /remote-cluster/kubeconf.yaml --save-cluster-config /output/config.yaml --log-level debug --save-deployment-files /output --fabric infiniband --deployment-type rdma_shared --multirailDon't forget to enable --net=host and mount the necessary directories for input and output files with -v.
make build # Build for current platform
make build-all # Build for all platforms
make clean # Clean build artifactsmake test # Run tests
make coverage # Run tests with coveragemake lint # Run linter
make lint-check # Install and run lintermake docker-build # Build Docker image
make docker-run # Run Docker container