Today, we are making the llm-d project available to the Cloud Native Computing Foundation (CNCF) as a Sandbox project.

This is not just a code release. It's a commitment to making the delivery of high-performance AI services a core, portable capability of the cloud-native stack.

By Brian Stevens, Senior Vice President and Chief Technology Officer (CTO) for AI, Red Ha

When we launched llm-d in May 2025, we set out to bridge the enormous capability gap between AI experimentation and large-scale, mission-critical production inference. By integrating llm-d into the CNCF, we are expanding the goal of a multi-vendor coalition—including CoreWeave, IBM, Google, and NVIDIA—to establish the open standard for distributed inference.

Inference drives the agentive era

As we move further into an agentive future, AI inference that powers enterprise agents across various industries is poised for widespread adoption. It will be critical that the cost and complexity of inference do not outweigh the business value of the agents themselves. However, inference can be extremely expensive, consuming large amounts of specialized accelerators, and these costs can escalate even further at scale. llm-d's advanced capabilities directly address this, meeting enterprise Service Level Objectives (SLOs) while maximizing infrastructure efficiency. Furthermore, organizations need the flexibility to deploy inference wherever it makes sense—data center, cloud, or edge—on the hardware of their choice. This flexibility is only possible if the underlying ecosystem is built on open source and open standards.

Closing the gap in the cloud-native environment

Although Kubernetes is the industry standard for orchestration, it wasn't originally designed for the unique, stateful demands of large language model inference (LLM). In a traditional microservice, a request is a request: each replica can handle it equally well. In generative AI, the cost of a request varies enormously depending on the length of the input and output tokens, the size and architecture of the model, the cache locality, and whether the model is in the preloading (compute-bound) or decoding (memory-bound) phase.

Standard service routing is blind to these dynamics, leading to inefficient allocation and unpredictable latency. This is where llm-d bridges the gap. It functions as a specialized data plane orchestration layer between high-level control planes like KServe and low-level engines like vLLM. By leveraging native Kubernetes building blocks such as Gateway API and LeaderWorkerSet (LWS), it transforms complex distributed inference into a manageable and observable cloud-native workload.

Strengthening the ecosystem through contribution

By making llm-d available to the CNCF, we are establishing well-defined pathways—proven and replicable designs that transform fragmented AI components into modular and interoperable microservices. This contribution is more than just a single project; it's about enriching the entire cloud-native landscape so that inference becomes an integral part of the same environment as traditional container-based applications.

A central part of this work is the endpoint picker (EPP). llm-d acts as a key implementation for the Inference Extension API (GAIE), and the EPP enables programmable, inference-aware routing. This means the system makes routing decisions based on the actual state of the engine, optimizing KV cache hit rates and hardware accelerator features. This is a critical requirement for maintaining sustained performance under strict service-level objectives.

llm-d complements and extends the existing set of solutions within the CNCF:

● Kubernetes: Provides the key infrastructure platform for AI workloads.
● Gateway API: Drives upstream alignment for AI-specific routing, ensuring that traffic management remains an open core component.
● KServe: Acts as the high-level control plane that integrates with llm-d to support advanced features such as disaggregated service and prefix caching.
● LeaderWorkerSet: Leverages native Kubernetes building blocks to orchestrate complex multi-node replication and expert parallelism, transforming engines like vLLM into manageable cloud-native workloads.
● Prometheus & Grafana: Exports specialized metrics such as time to first token (TTFT) to bring enterprise observability to generative AI.

Climbing together the future of inference

Collaboration has been at the heart of llm-d since its inception. When we announced llm-d last year at the Red Hat Summit, the combined efforts of the project's founding contributors, industry leaders, and academics were a source of pride for Red Hat, not only for launching llm-d but also for establishing a collaborative and future-proof foundation. In the 10 months since then, llm-d has been adopted for both private enterprise AI MaaS (Model-as-a-Service) and large-scale AI initiatives. More importantly, the project's open roots continue to deepen with a growing ecosystem of contributors and partners. Developers and enterprises are placing their trust in llm-d, and making the project available to the CNCF will support and sustain an open future. The path to successful open-source AI innovation is long, but together we are building the infrastructure to get there.

Author: By Brian Stevens, Senior Vice President and Chief Technology Officer (CTO) for AI, Red Hat

Unlabeled

Check Point expands its 21000 appliance family with a new data center security solution
Check Point® Software Technologies Ltd. announces the launch of its new Check Point 21600 appliance, which delivers throughput of up to 110 Gbps, a 30% increase in SecurityPower™ Units (SPUs)1, and ultra-low latency for demanding environments...
Extreme Networks launches the new BlackDiamond 8900 modules for data centers

Extreme Networks introduces the new BlackDiamond 8900 modules for the BlackDiamond 8800 switch. This new architecture for next-generation data centers is the only one that combines scalability of up to 582 10Gb Ethernet ports in a single rack, with maximum energy efficiency and...
Fujitsu ETERNUS DX8900 S4 optimized for Flash

Fujitsu introduces the ETERNUS DX8900 S4 solution, offering petabyte-scale flash-optimized storage for the data center. This system reduces complexity by eliminating the need to vary storage tiers for different workloads and consolidates storage silos...
Powerful PDU with 8 IEC-C13 sockets and general overload protection

PDU-Rack presents its Powerfull PDU with 8 IEC-C13 sockets and General Overload Protection. Available in single-phase versions from 16 A to 64 A.
Raritan integrates JouleX Energy Manager with its intelligent energy distribution units
Raritan announces that it has expanded the functionality of its Intelligent Power Distribution Units (IPDUs). The company's new partnership with energy management specialist JouleX has enabled the integration of JouleX Manager energy management software...
Will liquid cooling dominate thermal management in data centers?

Over the past 16 years, the thermal design power (TDP) of GPUs has quadrupled. With the growing demand for AI, cloud computing, and cryptocurrency mining, IDTechEx expects the power consumption of server and data center boards to continue increasing. With the lifespan...

Newsletter

Social networks

Subscribe to CONECtrónica magazine

You can subscribe to the Conectrónica magazine in 2 formats.

Digital Format: 5 downloadable PDF editions at an annual cost of 60 Euros (VAT included)

Paper Format: 5 editions to be received by mail at an annual cost of 180 Euros (VAT included).

Contact our subscriptions department at subscriptions@conectronica.com

Payment via Bizum or bank transfer

Connectors. FTTH Magazine. Industrial Electronics. Fiber Optic Courses, Online Seminars, Technology News and Trade Shows, Industrial Fiber Optic Cables and Connectors, Company News, Oscilloscopes and Tools, Data Centers.

Tel.: +34 91 706 56 69

Symphonic Poem, 27. Esc B. Floor 1 Pta 5

28054 (Madrid - SPAIN)

e-mail: gm2@gm2publicacionestecnicas.com or consultations@conectronica.com

Other GM2 publications include Convertronic Magazine and Gasogeno98 Channel

Privacy Policy

Legal Notice

Why we are making llm-d available to the CNCF: setting the standard for the future of AI

Unlabeled

Check Point expands its 21000 appliance family with a new data center security solution

Extreme Networks launches the new BlackDiamond 8900 modules for data centers

Fujitsu ETERNUS DX8900 S4 optimized for Flash

Powerful PDU with 8 IEC-C13 sockets and general overload protection

Raritan integrates JouleX Energy Manager with its intelligent energy distribution units

Will liquid cooling dominate thermal management in data centers?

Newsletter

Latest News

Demonstration of free-space optical communication with PCSEL: a breakthrough from Vector Photonics

AI for Good Global Summit 2026: Innovation and Leadership in Artificial Intelligence in Geneva

Iceland's Parliament modernizes its security with Genetec Security Center and IoT sensors

Commvault strengthens data and AI security in Commvault Cloud with new governance and classification capabilities

Vertiv powers AI factories with converged infrastructure alongside NVIDIA Rubin DSX

Subscribe to CONECtrónica magazine

Technical Courses and Seminars

Cybersecurity and AI: risks, opportunities and keys to the new digital balance

D-Link launches the 29th season of free webinars on networks, 5G and cloud

Taoglas launches The Antenna Podcast, a new technical space about antenna design and innovation

ISE 2026 Barcelona: Summits, Megatrends and the most ambitious content program in the AV sector

E-book: the future of satellite communications

Optical Networks

Evolution of optical networks: ROADM, OpenROADM and OpenZR+ in the 400G and 800G era

Legrand launches Chroma Link, the fiber optic solution for high-density AI networks

Hollow Core Fiber (HCF) Deployment and Testing

Network infrastructure in the age of AI: why fiber optics is key beyond bandwidth

Nokia drives optical networks for the AI era with new coherent solutions and multi-fiber amplifiers

Fiber optic network testing in hyperscale data centers: keys to ensuring performance and reliability

High-Speed Network Demands: Architectures, Challenges and Technological Evolution in the Era of Exponential Traffic

Connectors. FTTH Magazine. Industrial Electronics. Fiber Optic Courses, Online Seminars, Technology News and Trade Shows, Industrial Fiber Optic Cables and Connectors, Company News, Oscilloscopes and Tools, Data Centers.

Name
E-mail