NVIDIA and Google Cut AI Inference Costs
Google and NVIDIA presented their hardware plans at the Google Cloud Next conference. The focus is on lowering the expenses of AI inference for large operations.
A5X Instances Boost Performance and Efficiency
The companies introduced A5X bare-metal instances. These run on NVIDIA Vera Rubin NVL72 rack-scale systems. Hardware and software work together in this design. It cuts inference cost per token by up to ten times compared to earlier generations. At the same time, it raises token throughput per megawatt by ten times.
Linking thousands of processors calls for huge bandwidth. Without it, delays slow processing. A5X instances solve this by combining NVIDIA ConnectX-9 SuperNICs with Google Virgo networking technology.
The system expands to 80,000 NVIDIA Rubin GPUs in a single site cluster. It reaches 960,000 GPUs in a multisite setup. At such sizes, workload management gets complex. Data must route perfectly across nearly a million parallel processors. This keeps compute resources active without idle periods.
Mark Lohmeyer, VP and GM of AI and Computing Infrastructure at Google Cloud, said: "At Google Cloud, we believe the next decade of AI will be shaped by customers' ability to run their most demanding workloads on a truly integrated, AI‑optimised infrastructure stack.
"By combining Google Cloud's scalable infrastructure and managed AI services with NVIDIA's industry‑leading platforms, systems and software, we're giving customers flexibility to train, tune, and serve everything from frontier and open models to agentic and physical AI workloads, while optimising for performance, cost, and sustainability."
Confidential Computing Secures Sensitive Data
Data control poses big hurdles for businesses, especially in regulated fields like finance and healthcare. Data sovereignty rules and risks to private information often halt AI projects.
Gemini directory on Neura Market models on NVIDIA Blackwell and Blackwell Ultra GPUs now enter preview on Google Distributed Cloud. Customers keep these top models in their own setups, next to sensitive data.
NVIDIA Confidential Computing provides hardware security. Models train in a shielded space. Prompts and fine-tuning data stay encrypted. No one, not even cloud operators, can see or change the data.
For shared public clouds, Confidential G4 VMs with NVIDIA RTX PRO 6000 Blackwell GPUs offer the same safeguards in preview. Regulated sectors gain fast hardware while meeting privacy needs. This marks the first cloud confidential computing for NVIDIA Blackwell GPUs.
Agentic AI Tools and Managed Training
Agentic systems link large language models to APIs, sync vector databases, and curb errors like hallucinations.
NVIDIA Nemotron 3 Super joins the Gemini Enterprise Agent Platform. Developers customize and deploy reasoning and multimodal models for agent tasks. The NVIDIA setup on Google Cloud works with Gemini and Gemma models. It lets teams build systems that reason, plan, and execute.
Stay updated
Get the day's AI and automation news in your inbox. No spam, unsubscribe anytime.
Large-scale training adds burdens like cluster management and handling failures in reinforcement learning.
Google Cloud and NVIDIA launched Managed Training Clusters on the Gemini Enterprise Agent Platform. It features a managed reinforcement learning API using NVIDIA NeMo RL. Automation handles cluster size, recovery from failures, and job runs. Teams focus on model performance, not infrastructure details.
CrowdStrike uses NVIDIA NeMo libraries like NeMo Data Designer and NeMo Megatron Bridge. They create synthetic data and tune models for cybersecurity. On Managed Training Clusters with Blackwell GPUs, this speeds threat detection and response.
Manufacturing and Industry Applications
Machine learning in heavy industry needs precise simulations, vast compute, and data format standards. NVIDIA AI infrastructure and physical AI libraries now run on Google Cloud. Firms simulate and automate factory processes.
Providers like Cadence and Siemens offer solutions on Google Cloud, sped by NVIDIA hardware. These support engineering for machinery, aerospace, and self-driving vehicles.
Old product lifecycle systems complicate data handling. NVIDIA Omniverse libraries and open-source NVIDIA Isaac Sim on Google Cloud Marketplace help. Developers build accurate digital twins and robot training sims before real tests.
NVIDIA NIM microservices, including Cosmos Reason 2 model, deploy to Google Vertex AI and Google Kubernetes Engine. Vision agents and robots sense and move in real spaces. Platforms shift from design to active industrial digital twins.
Early Users and Growing Ecosystem
Options range from full NVL72 racks to fractional G4 VMs with one-eighth GPU shares. Customers match needs for mixture-of-experts tasks and data work.
Thinking Machines Lab uses A4X Max VMs for Tinker API training. OpenAI runs large inference on NVIDIA GB300 and GB200 NVL72 systems on Google Cloud for tasks like ChatGPT. Snap moved data pipelines to GPU-accelerated Spark to lower A/B testing costs. Schrödinger cuts drug discovery sims from weeks to hours with NVIDIA compute on Google Cloud.
The developer group grew fast. Over 90,000 joined the NVIDIA and Google Cloud community in one year.
Startups such as CodeRabbit and Factory use Nemotron models on Google Cloud for code reviews and software agents. Aible, Mantis AI, Photoroom, and Baseten create data, video, and image solutions.
NVIDIA and Google Cloud together build compute bases to turn agent experiments and sims into production for securing fleets and factories.

