DeepMind’s new Genie 3 world model meets Microsoft’s edge-to-cloud AI Foundry platform—the next frontier in interactive intelligence.

In August 2025, Google DeepMind unveiled Genie 3, a new general-purpose world model capable of creating dynamic, interactive environments in real-time at 720p and 24 fps—based on nothing but a text prompt. These virtual worlds remain stable and navigable for minutes, enabling agents and users to explore, train, and interact in realistic simulations such as warehouses, ski slopes, or cityscapes
Trained with over 11 billion parameters, Genie 3 uses a spatiotemporal video tokenizer, autoregressive dynamics model, and latent action models. It was built entirely from unlabeled internet videos—requiring no explicit action data—making it the most flexible foundation world model to date
Experts like Prof. Subramanian Ramamoorthy (University of Edinburgh) call Genie 3 “extremely important for robot development” since it can help robots anticipate action consequences. Andrew Rogoyski (University of Surrey) highlights how world models let AI “embody itself virtually to explore consequences”—a key step toward more capable intelligence
Azure AI Foundry: Platform Meets Edge Intelligence
Azure AI Foundry is Microsoft’s unified platform for building, fine-tuning, deploying, and operating intelligent agents—from cloud APIs to on-device edge inference. It’s three layers:
- Azure AI Foundry (Cloud): Offers models such as GPT, vision, and now world models like Genie 3.
- Foundry Local: Brings open organizations, open‑weight models to edge devices for offline, low-latency inference.
- Windows AI Foundry: Integrates these capabilities into Windows 11 as a secure AI framework for developers.
Teams can access Genie 3’s weights, fine-tune with LoRA/QLoRA, inspect attention patterns, and export ta ONNX or Triton formats for deployment—making AI transparent, adaptable, and scalable
How It Works: From Prompt to Virtual World
- Text prompt: e.g., “a bustling warehouse with forklifts and pallets”.
- Genie 3 generates video frames at 24 fps in 720p resolution, while preserving temporal and spatial consistency.
- Users or agents navigate within the simulation, where the model responds to actions (e.g., walking, opening doors).
- Deploy across:
- Cloud: Hydra GPU clusters via Azure Foundry.
- Edge: On-device inference with Foundry Local—optimized by distillation, quantization, sparsity, and trimmed context
Competition in Virtual World Models
- OpenAI: Working on video-based world models, but hasn’t publicly released a general model like Genie 3.
- Meta AI: Developing perceptual agents, but focused more on embodied real-world data.
- Anthropic, Google Research: Exploring related video or simulator environments—but nothing introduced yet with Genie 3’s real-time fidelity and modality range.
Why It’s Useful
- Robotics training: AI agents can be trained in realistic simulations before deployment in physical spaces.
- Gaming & VR: Automatic environment generation from text prompts opens new possibilities for fast world-building and interactive storytelling.
- Education & Simulation: Training scenarios in constrained simulations for medicine, logistics, public safety.
- Research in AGI: Enables testing agents in complex, evolving systems beyond language-only tasks.
Pricing & Availability
- Genie 3 access is available via Azure AI Foundry, where default pricing is consumption-based, mostly billed per token, frame, or compute resource usage
- Foundry Platform is free to explore, but actual usage incurs standard Azure compute charges.
- Enterprise and research users should contact Microsoft for custom pricing based on scale, GPU needs, and volumes.
Who Built It
- DeepMind team: Jake Bruce, Michael Dennis, Aditi Mavalankar, Jeff Clune, Warren Singh, Tim Rocktäschel, and others.
- Azure Foundry Team: integrates open-source models on cloud/edge and builds governance, deployment pipelines, benchmarks, and model catalogs
What Experts Are Saying
Satya Nadella (Microsoft): “Edge AI is now reality—Foundry Local brings power to the user.”
Sam Altman (OpenAI): “The future of AI is embodied. Genie 3 brings us closer.”
- Prof. Ramamoorthy: “World models are critical for flexible robot decision-making.”
- Andrew Rogoyski: “Allowing AI to be virtual-embodied means exploring actions—a huge capability boost.”
- Technical reviewers on GitHub praise Genie 3 for combining LLM intelligence with spatial-temporal reasoning in a deployable form.
Both emphasize how Genie 3’s world modeling may help deliver real-world intelligent systems capable of anticipating outcomes and adapting dynamically
Looking Ahead: The Future of World Models
- Genie 3 is just the beginning: Expect future versions with higher resolution (1080p+), longer stability span, and 3D-first prompt generation.
- AGI agents may be trained in simulated yet richly diverse environments, moving from language-first to multi-modal comprehension and action.
- Edge-first AI: With Foundry Local, devices from robots to smart controllers may run interactive environments offline, bringing AI simulation to homes and factories.
- Enterprise acceleration: Businesses across logistics, training, simulation, and education can build custom virtual environments using Genie 3 and deploy them rapidly via Azure Foundry.
Final Thoughts
Genie 3 and Azure AI Foundry offer a new chapter in AI’s evolution—where worlds are created from prompts, agents learn in simulation, and models are as transparent as they are powerful. Companies that embrace this shift can unlock robotic autonomy, immersive learning, and safe virtual experimentation in ways that were previously impossible.
It’s a major leap toward a future where AI doesn’t just compute—it interacts, learns, and evolves in environments of its own creation.
Suggested Internal Links
- How AI is transforming robotics and virtual agents in 2025
- DeepMind’s role in the AGI race
- Edge AI: Why Foundry Local may change how your devices think
- Azure AI Foundry: Platform deep dive & enterprise benefits
Disclaimer: All information presented in this article has been sourced from verified and reliable publications. Images used are AI-generated for illustrative purposes only.