While the promise of generative AI dominates much of today's technology landscape, the large linguistic models (LLMs) that underpin these systems continue to grow in size. As a result, building cost-effective and reliable LLM services requires significant computing power, energy resources, and specialized operational skills. These challenges, in practice, put the benefits of a customized, ready-to-deploy, and more security-conscious AI out of reach for most organizations.

Red Hat aims to address these challenges by making generative AI more accessible to more organizations through the open innovation of vLLM. Developed by the University of California, Berkeley, vLLM is a community-driven, open-source project for open model serving (how generative AI models infer and solve problems), with support for all key model families, advanced research in inference acceleration, and diverse hardware backends, including AMD GPUs, AWS Neuron, Google TPUs, Intel Gaudi, NVIDIA GPUs, and x86 CPUs. Neural Magic's leadership of the vLLM project, combined with Red Hat's robust portfolio of hybrid cloud AI technologies, will provide organizations with an open path to build AI strategies that meet their unique needs, wherever their data resides.