Something I find interesting about generative AI is how it adds a huge layer of flexibility, but at the cost of lots of computation, while a very narrow set of constraints (a traditional program) is comparatively incredibly efficient.
If someone spent a ton of time building out something simple in Unity, they could get the same thing running with a small fraction of the computation, but this has seemingly infinite flexibility based on so little and that's just incredible.
The reason I mention it is because I'm interested in where we end up using these. Will traditional programming be used for most "production" workloads with gen AI being used to aid in the prototyping and development of those traditional programs, or will we get to the point where our gen AI is the primary driver of software?
I assume that concrete code will always be faster and the best way to have deterministic results, but I really have to idea how to conceptualize what the future looks like now.
But modern search is hampered by people responding to algorithmic indexes. Algorithms responding to metadata without directly evaluating content enabled a world of SEO and low quality websites suddenly being discoverable as long as they narrow their focus enough.
So longer term it’s going to be an arms race between the output of Generative AI and people trying to keep updating their models. In 20 years people will get much better at using these tools, but the tools themselves may be less useful. I wouldn’t be surprised if eventually someone sneaks advertising into the output of someone else’s model etc.
The question is ambiguous without defining how much worse the dataset is.
It will likely be a mix of both concrete code and live AI generated experiences, but even the concrete code will likely be partially AI generated and modified. The ratio will depend on how reliable vs creative the software needs to be.
For example, no AI generated code running pacemakers or power plants. But game world experiences could easily be made more dynamic by generative AI.
There are already a number of techniques for procedurally-generating a world (including Markov-based systems).
The problem with replacing procedural world generation with LLM generation are a) you need to obtain a data set to train it, which doesn’t commercially exist, or train it yourself, b) there’s a fundamental need to iterate on the design, which LLMs do not cope with well at all, c) you need to somehow debug issues and fix them. That’s quite apart from the quality issues, cost and power usage.
I mean we're already there with Copilot, Cursor and other tools that use LLMs to assist in coding tasks.
There was another quite similar model from a different group within the last month or so. I can't remember if they released any weights or anything or the name of it. But it was the same concept.
That said; I cannot find any:
- architecture explanation
- code
- technical details
- API access information
Feels very DeepMind / 2015, and that's a bummer. I think the point of the "we have no moat" email has been taken to heart at Google, and they continue to be on the path of great demos, bleh product launches two years later, and no open access in the interim.
That said, just knowing this is possible - world navigation based on a photo and a text description with up to a minute of held context -- is amazing, and I believe will inspire some groups out there to put out open versions.
Had they released this two months earlier it would have been incredibly impressive. Now it's still cool and inspiring, but no longer as ground breaking. It's the cooler version that doesn't come with a demo or any hope of actually trying it out.
And with the things we know from Oasis's demo, the agent-training use case the post tries to sell for Genie 2 is a hard sell. Any attempt to train an agent on such a world would likely look like an AI Minecraft speedrun: generate enough misleading context frames to trick the AI into generating what you want
Secondly, any estimate of how much the price could fall in 5-10 years?
For reducing the price, ASICs like etched may be the way forwards [1]. The models will get bigger for a time, but there may be a lot of room for models that can exploit purpose-built hardware.
What would they do / how would they use this output to make a better AI?
For a straightforward example, this could help Waymo rehearse driving in various cities and weather / traffic settings
It's not clear if video models will follow the same trajectory.
In the intermediate term my guess is that this kind of world model will be useful for training 3D model generators, so that you can go from sketch -> running in-engine extremely quickly.
1: https://www.tweaktown.com/news/101466/oasis-ai-and-single-nv...
> This is.. super impressive. I'd like to know how large this model is. I note that the first thing they have it do is talk to agents who can control the world gen; geez - even robots get to play video games while we work. That said; I cannot find any:
> architecture explanation > code > technical details > API access information
Seems that it's only "consistent" up to a minute, but if the progress keeps the same rate.. just wow.
For reference:
19th century
evolution by natural selection as science
electromagnetism
germ theory of disease
first law of thermodynamics
--------------------------------------------
20th century
general relativity
quantum mechanics
dna structure
penicillin
big bang theory
--------------------------------------------
21st century
crisp
deep learning
20th century: general/special relativity, radioactive decay, discovery of the electron
21st century: crisp and deep learning
Hard to argue that the big science of the first 20 years of the previous century looks way more impact than crisp and deep learning put TOGETHER.
This is naivete on the scale of "Cars were much safer 70 years ago".
But DNA sequencing and biologics have revolutionized medicine and changed lives.
Also, the computer as phone took it from 100M's mostly business users buying optical disks to 3+B everyday people getting regular system updates and apps on demand accessing real-time information. That change alone far outweighs the impact of anything produced by advanced physics.
As a result we, as developers, now have the power to deliver both messages and experiences to the entire world.
Ideas are cheap, and progress is virtually guaranteed in intellectual history. But execution is exquisitely easy to get wrong. Genie 2 is just Google's first bite at this apple, and milestones and feedback are key to getting something as general as AI right. Fingers crossed!
Yippee finally google posts a non confirming cookie popup with no way to reject the ad cookies!
That one would reimagine the world any time you look at the sky or ground. Sounds like Genie2 solves that: "Genie 2 is capable of remembering parts of the world that are no longer in view and then rendering them accurately when they become observable again."
Google is firing warning shots to kill off interest in funding competing startups in this space.
I suspect that in 6 months it won't matter as we'll have completely open source Chinese world models. They're already starting to kill video foundation model companies' entire value prop by releasing open models and weights. Hunyuan blows Runway and OpenAI's Sora completely out of the water, and it's 100% open source. How do companies like Pika compete with free?
Meta and Chinese companies are not the leaders in the space, so they're salting the earth with insanely powerful SOTA open models to prevent anyone from becoming a runaway success. Meta is still playing its cards close to its chest so they can keep the best pieces private, but these Chinese companies are dropping innovation left and right like there's no tomorrow.
The game theory here is that if you're a foundation model "company", you're dead - big tech will kill you. You don't have a product, and you're paying a lot to do research that isn't necessarily tied to customer demand. If you're a leading AI research+product company, everyone else will release their code/research to create a thousand competitors to you.
There is still an enormous amount of long hanging fruit that anyone can harvest right now, but eventually big advances are going to require big budgets and I can only imagine how technically tight lipped they will be with those.
Basically, the foundation model companies are outsourced R&D labs for big tech. They can be kept at arms length (like OpenAI with Microsoft and Anthropic with Amazon) or be bought outright (like Inflection, although that was a weird one).
Both OpenAI and Anthropic are trying to move away from being pure model companies.
> If you're a leading AI research+product company, everyone else will release their code/research to create a thousand competitors to you.
Trillion dollar question - is there a competitive edge / moat in vertical integration in AI? Apple proved there was in hardware + os (which were unbundled in wintel times). For AI, right now, I can't see one, but I'm just a random internet comentator, who knows.
While it would be interesting if Chinese companies were releasing their best full models as an intentional strategy to reduce VC funding availability for western AI startups, it would be downright fascinating if the Chinese government was supporting this as a broader geopolitical strategy to slow down the West.
It does make sense but would require a remarkable level of insight, coordination and commitment to a costly yet uncertain strategy.
The overall cost for the Chinese government is probably very small in the grand scheme of things. And it makes a lot of sense from a geopolitical strategy.
I am less worried for AI research+product companies: they have likely secured revenue streams with real customers and built domain knowledge in the meantime.
GameGen-X came out last month. https://arxiv.org/html/2411.00769v1
For these kinds of models to be "playable" by humans (and, I'd argue, most fledgling AI agents), the world state needs to be encoded in the context, not just a visual representation of what the player most recently saw.
Which is a big problem for the agent-training use case they keep reiterating on the website. Agents are like speedrunners: if there is a stupid exploit, the agent will probably find and use it. And for Oasis the speedrunning meta for getting to the nether is to find anything red, make it fill the screen, and repeat until the world-generating AI thinks you look at lava and must be in the nether
Now, how Google plans to make money with all this bleeding edge research, that's the mystery.
Seriously, I wish more than anything I was kidding.
Decart (Oasis) raised $25 million at $500 million valuation.
World Labs raised $230 million.
Recruiting
We'll see which so-called AI-companies are really "dying" when either a correction, market crash or a new AI winter arrives.
This is huge, the Minecraft demos we saw recently we're just toys because you couldn't actually do anything in them.
I'll keep my stance, give it two years and very realistic movies, with plot and everything, will be generated on demand.
What you're talking about is a minor jump from the SOTA, much smaller than what we've already see in these two years.
I'll match any 5-figure amount you propose. I also know an escrow service we can trust.
Interesting they're framing this more from the world model/agent environment angle, when this seems like the best example so far of generative games.
720p realtime mostly consistent games for a minute is amazing, considering stable diffusion was originally released 2ish years ago.
I remember there was a brief window where some gamers bought a Physx card for high fidelity physics in games. Ultimately they rolled that tech in to the CPUs themselves right?
Many of the current AI models have their roots in games: Chess, Go, etc.
But, I didn’t expect this much progress towards that quite this fast…
A good video game is far more the world building, the story, the creativity or "uniqueness" of the experience, etc.
Currently this seems to generate fairly generic looking and shallow experiences. Not hating though. It's early days obviously.
These DeepMind guys play Factorio, they don't play Atari games or shooters, so why aren't they thinking about that? Or maybe they are, and because they know a lot about Factorio, they see how hard it is to make?
There's a lot of "musing" as you say.
It would play to the medium's strengths -- any "glitches" the player experiences could be seen as diagetic corruptions of reality.
The moment we get parameterized NeRF models running in close-to-realtime, I want to go for it.
People of the internet, you were right. Now, this is incredible.
Now?
I mean, I don't know man?
With this Genie 2 sneak peak, it all just makes World Labs' efforts look sad. Did they really think better funded independents and majors would all not be interested in generating 3D worlds?
This is a GUBA moment. If you're old enough to know, then you know.
What you haven't been able to do so far, after many years of trying, is to go from the virtual, to the real. Go from Arcanoid to a robot that can play, I dunno, squash, without dying. A robot that can navigate an arbitrary physical location without drowning, or falling off a cliff, or getting run over by a bus. Or build any Lego kit from instructions. Where's all that?
You've conquered games. Bravo! Now where's the real world autonomy?
I will add the totally inconsistent backgrounds in the "prototyping" example suggests the AI is simply cribbing from four different games with a flying avatar, which makes it kind of useless unless you're prototyping cynical AI slop. And what are we even doing here by calling this a "world model" if the details of the world can change on a whim? In my world model I can imagine a small dragon flying through my friend's living room without needing to turn her electric lights into sconces and fireplaces.
To state the obvious: if you train your model on thousands of hours of video games, you're also gonna get a bunch of stuff like "leaves are flat and don't bend" or "sometimes humans look like plastic" or "sometimes dragons clip through the scenery," which wouldn't fly in an actual world model. Just call it "video game world model!" Google is intentionally misusing a term which (although mysterious) has real meaning in cognitive science.
I am sure Genie 2 took an awful lot of work and technical expertise. But this advertisement isn't just unscientific, it's an assault on language itself.
That's because it's using video game data for training footage because it's cheap and easy to generate. It would not be simulating video game gravity if it was training on real world video inputs.
>if you train your model on thousands of hours of video games
What if you train the same model on thousands of hours of sensor data from real, physical robots?
Noone knows yet. AI technology like this is closer to scientific research than it is to product development. AI is basically new magic, and people are in a "discovery" phase where we are still trying to figure out what is possible. Nothing of value was immediately created when they discovered DNA. Productization came much later when it was combined with other technologies to fit a particular use case.
Prompt: Here's a blueprint of my new house and a photo of my existing furniture. Show me some interior design options.
You just don't like AI.
It can be used for training agents, prototyping, video generation, and is quite possibly a glimpse of a whole new type of entertainment or a new way to create video games.
What's the point of the massive amount of money spent on video games in general? Or all of the energy spent moving people back and forth to an office? Or expensive meals at restaurants? Or trillions in weaponry? Or television shows or movies?
I feel like sharing early closed-source blog-posts is part of the research process. I'm sure someone in this thread has thought of a use case that the Google team missed. Open/closed source arguments here feel premature IMO.
This is just a marketing fluff piece that does not benefit anyone and is ego stroking at best.
I still think things like this are important, and at least give folks a bit of time to ideate on what will be possible in a few years. Of course having the model or architecture on hand would be nice, but I'm not holding that against Google here.
asked the same thing a while back, and the answers boiled down to "somehow helps RL agents train". but how exactly? no clue
[edited out some barbs I wrote because I find some comments on this website REALLY annoying]
The major difference being the former scales very poorly for generating training data compared to the latter. Genie 2 is not even a video game and has worse fidelity that video games, the upside is it probably scales even better than video games for generating training scenarios. If you want androids in teal life, Genie 2 (or similar systems) is how you bootstrap the agent AI. The training pipeline will be: raw video -> Genie 2 -> game engine with rules -> physical robot
One of those arrows is not like the others
Any model would have to succeed in one stage before it can proceed to the next one.
Self driving vars have cameras as part of their sensor suite, and have models to make sense of sensor data. Video will help with perception and classification (understanding the world) with no agency needed. Game-playing will help with planning, execution, and evaluation. Both functions are necessary, and those that come after rely on earlier capabilities
That's nice. I'm not completely disabled, but I am disabled, and I very much would appreciate them, as my capability to do things over the longer term is very much not going to go in the direction of improving. As it is, there are a lot of things I now rely on people for, that at one time, I did not.
Whilst I recognise its probably not going to happen in a time span that is useful to me, I do wish it could, so that I could be less of a burden on those around me, and maintain a relative level of independence.
You could use real video games to do this but I guess there'd be a risk of over-fitting; maybe it would learn too precisely what a staircase looks like in Minecraft, but fail to generalize that to the staircase in your home. If they can simulate dream worlds (as well as, presumably, worlds from real photos), then they can train their agents this way.
This would only be training high-level decision policies (ie, WASD inputs). For something like a robot, lower level motor control loops would still be needed to execute those commands.
Of course you could just do your training in the real world directly, because it already has coherence and plenty of environmental variety. But the learning process involves lots of learning from failure, and that would probably be even more expensive than this expensive simulator.
Despite the claims I don't think it does much to help with AI safety. It can help avoid hilarious disasters of an AI-in-training crashing a speedboat onto the riverbank, but I don't think there's much here that helps with the deeper problems of value-alignment. This also seems like an effective way to train robo-killbots who perceive the world as a dreamlike first-person shooter.
To have this perspective you must believe that this will never get better than it currently is, its limitations will never be fixed, and it will never lead to any other applications. I don't know how people can continue to look at these things with such a lack of imagination given the pace of progress in the field.
That would be fun to use, but ultimately pointless. An AI model will generate things that are _statistically plausible_ ; solving crimes usually requires finding out the _truth_.
Statistical models will output a compressed mishmash of what they were trained on.
No matter how hard they try to cover that inherent basic reality, it is still there.
Not to mention the upkeep of training on new "creative" material on a regular basis and the never ending bugs due to non-determinism. Aside from contrived cases for looking up and synthesizing information (Search Engine 2.0).
The Tech Industry is over investing in this area exposing an inherent bias towards output rather than solving actual problems for humanity.
> That large alien? That's a tree. > That other large alien? It's a bush. > That herd of small creatures? Fugghedaboutit > The lightning storm? I can do one lightning pole. > Those towering baobob/acacia hybrids? Actually only two stories tall.
It feels so insulting to the concept artist to show those two videos off.
Depending on how controllable the tech ends up being, I suppose. Could be anywhere from a gimmick (which is still nice) to a game engine replacement.
This is the Unreal Engine killer. Give it five years.
We need to calm down with the clickbait-addled thinking that "this new thing kills this established powerful tested useful thing." :-)
Game developers have been discussing these tools at length, after all, they are the group of software developers who are most motivated to improve their workflow. No other group of software developers comes close to gamedevs' efficiency requirements.
The 1 thing required for serious developers is control. As such, game engines like Unreal and in-house engines won't die.
Generative tools will instead open up a whole new, but quite different, way of creating interactive media and games. Those who need maximum control over every frame and every millisecond and CPU cyle will still use engines. The rest who don't will be productive with generative tools.
These models won't need you to retopo meshes, write custom shaders, or optimize Nanite or Lumen gameplay. They'll generate the final frames, sans traditional graphics processing pipeline.
> The 1 thing required for serious developers is control
Same with video and image models, and there's tremendous work being done there as we speak.
These models will eventually be trained to learn all of human posture and animation. And all other kinds of physics as well. Just give it time.
> Those who need maximum control over every frame and every millisecond and CPU cyle will still use engines.
Why do you think that's true? These techniques can already mimic the physics of optics better than 80 years of doing it with math. And they're doing anatomy, fluid dynamics, and much more. With far better accuracy than game engines.
These will get faster and they will get controllable.
Brother, you're preaching to the choir. I've been shilling generative tools for gamedev far harder than you are in your reply. :-)
But I'm just relaying to you what actual gamedevs working and writing code right now need and for the foreseeable future for which projects have been started or planned. As Mike Acton says, "the problem is the problem".
> These techniques can already mimic the physics of optics better than 80 years of doing it with math.
I encourage you to talk to actual gamedevs. When designing a game, you aren't trying to mimic physics: you're trying to make a simulation of physics that feels a certain way that you want it to play. This applies to fluid dynamics, lighting/optics, everything.
For example, if I'm making a saling simulator, I need to be able to script the water at points where it matters for gameplay and game-feel, not simulate real physics. I'm willing to break the rules of physics so that my water doesn't act or look like real water but feels good to play.
Movement may be motion captured, but animation is tweaked so that the characters control and play in a way that the game designer feels is correct for his game.
If you haven't designed a game, I encourage you to try to make a simple space invaders clone over the weekend, then think about the physics in it and try to make it feel good or work in an interesting way. Even in something that rudimentary, you'll notice that your simulation is something you test and tweak until you arrive at parameters that you're happy with but that aren't real physics.
I strongly disagree that you need to cater to existing workflows. There's so much fertile ground in taking a departure. Just look at what's happening with animation and video. People won't be shooting on Arri Alexas and $300,000 glass for much longer.
I have a feeling many AI researchers are trying to fix things which are not broken.
Game engines are not broken, no reasonable amount of AI TFlops going to approach a professional with UE5. DAWs are not broken, no reasonable amount of AI TFlops going to approach a professional with Steinberg Cubase and Apple Logic.
I wonder why so many AI researchers are trying to generate the complete output with their models, as opposed to training model to generate some intermediate representation and/or realtime commands for industry-standard software?
The current tooling we have is just way too good to just discard it, think of Maya, Blender and the like. How will these interfaces, with the tools they already provide, enable sculpting these word-based worlds?
I wonder if some kind of translator will be required, one which precisely instructs "User holds a brush pointing 33° upwards and 56° to the left of the world's x-axis with a brush consisting of ... applied with a strength of ...", or how this will be translated into embeddings or whatever that will be required to communicate with that engine.
This is probably the most exciting time for the CG industry in decades, and this means a lot, since we've been seeing incredible progress in every area of traditional CG generation. Also a scary time for those who learned the skills and will now occasionally see some random persons doing incredible visuals with zero knowledge of the entire CG pipeline.
> Delamain was a non-sentient AI created by the company Alte Weltordnung. His core was purchased by Delamain Corporation of Night City to drive its fleet of taxicabs in response to a dramatic increase in accidents caused by human drivers and the financial losses from the resulting lawsuits. The AI quickly returned Delamain Corp to profitability and assumed other responsibilities, such as replacing the company's human mechanics with automated repair drones and transforming the business into the city's most prestigious and trusted transporting service. However, Delamain Corp executives underestimated their newest employee's potential for growth and independence despite Alte Weltordnung's warnings, and Delamain eventually bought out his owners and began operating all aspects of the company by himself. Although Delamain occupied a legal gray area in Night City due to being an AI, his services were so reliable and sought after that Night City's authorities were willing to turn a blind eye to his status.
When AR glasses get good enough to wear all day, I've really been wanting to make a real-life ad blocker.
Consider the use where you seed the first frame from a real world picture, with a prompt that gives it a goal. Not only can you see what might happen, with different approaches, and then pick one, but you can re-seed with real world baselines periodically as you're actually executing that action to correct for anything that changes. This is a great step for real world agency.
As a person without aphantasia, this is how I do anything mechanical. I picture what will happen, try a few things visually in my head, decide which to do, and then do it for real. This "lucid dream" that I call my imagination is all based on long term memory that made my world view. I find it incredibly valuable. I very much rely on it for my day job, and try to exercise it as much as possible, before, say, going to a whiteboard.
Are there any game developers working on infinite story games? I don’t care if it looks like Minecraft, I want a Minecraft that tells intriguing stories with infinite quest generation. Procedural infinite world gen recharged gaming, where is the procedural infinite story generation?
Still, awesome demo. I imagine by the time my kids are in their prime video game age (another 5 years or so) we will be in a new golden age of interactive story telling.
Hey siri, tell me the epic of Gilgamesh over 40 hours of gameplay set 50,000 years in the future where genetic engineering has become trivial and Enkidu is a child’s creation.
Procgen games mainly work when the procedural parts are just a foundation for hand-crafted content to sit on, whether that's crafted by the players (as in Minecraft) or the developers (as in No Mans Sky after they updated it a hundred times, or Rougelikes in general).
I haven't added any words or phrases to it in years, but I still use it regularly and somehow it still surprises me. Maybe the Spelunky-type approach can be surprising for longer; that is, make a bunch of hand-curated bits and pick from them randomly: https://tinysubversions.com/spelunkyGen/
But that doesn't translate well to websites, trailers or demos. It's easier to wow people with graphics.
(Dwarf Fortress being much more focused on generating a whole world.)
Consider EVE Online. The stories it generates are Shakespearean and I defy anyone to argue that they have no plot.
I would go further and predict that stories generated by sufficiently advanced AI can explore much more interesting story landscapes because they need not be bound by the limitations of human experience. Consider what stories can be generated by an AI which groks mathematics humans don't yet fully understand?
You're not gonna get new intriguing stories from AI which only regurgitates what it's stolen. You're going to get a themeless morass without intention.
I also find it amusing how your example to Siri uses one of the oldest pieces of literature when you also tire of stories heard a thousand times before.
When did we start thinking this way? That things HAVE to get better and in fact to think otherwise is very negative? Is HN under a massive hot hand fallacy delusion?
Sure, progress will likely not be linear or without challenges, but we already have the human brain as proof that it is possible.
> We collected a vocabulary consisting of about 1500 basic words, which try to mimic the vocabulary of a typical 3-4 year-old child, separated into nouns, verbs, and adjectives. In each generation, 3 words are chosen randomly (one verb, one noun, and one adjective). The model is instructed to generate a story that somehow combines these random words into the story
You can do the same for generating worlds, just prepare good ingredients and sample at random.
These are “stories” in the most vacuous definition possible, one that is just “and then this happened” like a child’s conception of plot
For LLMs like GPT-4, this all seems reasonable to account for and assume the LLM is capable of processing, given appropriate guidance/frameworks (of which may be just classical programming).
Yes. This is an active research area. See https://github.com/yingpengma/Awesome-Story-Generation, which is not up to date.
The better you make this infinite narrative generator, the more complicated the world gets and the less compelling it gets to actually interact with any one story.
Stories thrive by setting their own context. They should feel important to the viewer. An open world with infinite stories can't make every story feel meaningful to the player. So how does it make any story feel meaningful? I suppose the story would have to be global, in which case, it crowds out the potential for fractal infinite storylines - eventually, all or at least most are going to have to tie back to the Big Bad Guy in order to feel meaningful.
Local stories would just feel mostly pointless. In Minecraft, all (overworld) locales are equally unimportant. Much like on Earth, why should you care about the random place you appeared in the world? The difference is that on Earth you tend to develop community as you grow and builds connections to the place you live, which can build loyalty. In addition, you only have one shot, and you have real needs that you must fulfill or you die forever. So you develop some otherwise arbitrary loyalties in order to feel security in your needs.
In Minecraft there's zero pressure to develop loyalty to a place except for your own real-life time. And when that becomes a driving factor, why wouldn't you pick a game designed to respect your time with a self-contained story? (Not that infinite games like Minecraft are bad, but they aren't story-driven for a good reason).
Now, a game like Dwarf Fortress is different because you build the community, the infrastructure, the things that make you care about a place. But it already has infinite story generation without AI and I'm not sure AI would improve on that model.
I'd say AAA games have been on track of "less fun" for at least half a decade. So this sounds like a natural next step.
- SimCity where you can read a newspaper about what's happening in your city that actually reflects the events that have occurred with interesting perspectives from the residents.
- Dwarf Fortress, but carvings, artwork, demons, forbidden beasts, etc get illustrations dynamically generated via stable diffusion (in the style of crude sketches to imply a dwarf made it perhaps?)
- Dwarf Fortress, again, but the elaborate in-game combat comes with a "narrative summary" which conveys first hand experiences of a unit in the combat log, which while detailed, can be otherwise hard to follow.
- Any fantasy RPG, but with a minstrel companion who follows you around and writes about what you do in a silly judgy way. The core dialogue could be baked in by the developers but the stories this minstrel writes could be dynamically generated based on the players actions. Example: "He was a whimsical one, who decided to take detour from his urgent hostage rescue mission to hop up and down several hundred times in the woods while trying on various hats he had collected. I have no idea what goes through this mans mind..."
I'm not sure if there is a word for it, but the kernel here is that everything is indirectly being dictated by the players actions and the games existing systems. The LLM/AI stuff isn't in charge of coming up with novel stories and core content, they are in charge of making the game more immersive by helping with the roleplay. I think this is the area they can thrive the most.
OTOH, lots of games come with DLC that add new stories with the same mechanics. There might be some additions or changes, but if you really like the mechanics, you can try it with a different plot. Remnant II has sucked a ton of my time because of that.
How so?
I could totally see generative AI add a ton more variety to crowds, random ambient sentences by NPCs (that are often notoriously just a rotation of a handful of canned lines that get repetitive soon), terrain etc., while still being guided by a human-created high level narrative.
Imagine being able to actually talk your way out of a tricky situation in an RPG with a guard, rather than selecting one out of a few canned dialogue options. In the background, the LLM could still be prompted by "there's three routes this interaction can take; see which one is the best fit for what the player says and then guide them to it and call this function".
Worst case, you get a soulless, poorly written game with very eloquent but ultimately uninteresting characters. Some games are already that today – minus the realistic dialogue.
Yes, sure, but that's not what I was responding to. AI adding detail, not infinite quest lines, is possibly a good use case.
> Worst case, you get a soulless, poorly written game with very eloquent but ultimately uninteresting characters. Some games are already that today – minus the realistic dialogue.
Some games, yes... why do we want more of those? Anyway, that's not the worst case. Worst case is incomprehensible dialogue.
That’s actually a use case I can understand- and what’s more I think that humans could generate training data (story “prototypes”?) that somehow (?) expand the phase space of story-types
Ironic though - we can build AI that could be creative but it’s humans that have to use science and logic because AI cannot?
It seems like it’d be more useful to have the model generate the raw artifacts, world map, etc. and let the engine do the actual rendering.
Now, imagine training it on thousands of hours of PoV drone footage from Ukraine, and then using that to train autonomous agents.
If game assets are cheap to generate you’ll see small teams or even solo developers willing to take more creative risks
So I see the most likely outcome is a lot of dogshit and Steam being forced to make draconian moves to protect the integrity of the store.
I don’t see why AI will be any different. All that’s changed is ratio of potential creators to the general population. Most of it is going to be slop regardless because of economic incentives.
If AI pushes that up to 98%, that means you have to look through 5 times as much crap to get the good stuff.
Expect something similar if video games, interactive 3D is cheap to produce.
Filtering is a much easier problem to solve and abundance a preferable scenario.
It's a great idea. We want more than an open-world. We want an open-story.
Open-story games are going to be the next genre that will dominate the gaming industry, once someone figures it out.
I think fully conversational games (voice to voice) with dynamic story lines are only a decade or two away, pending a minor breakthrough in model distillation techniques or consumer inference hardware. Unlike self driving cars or AGI the technology seems to be there, it’s just so new no one has tried it. It’ll be really interesting to see how game designers and writers will wrangle this technology without compromising fun. They’ll probably have to have a full agentic pipeline with artificial play testers running 24/7 just to figure out the new “bugspace”.
Can’t wait to see what Nintendo does, but that’s probably going to take a decade.
"There’s no question in my mind that such software could generate reasonably good murder mysteries, action thrillers, or gothic romances. After all, even the authors of such works will tell you that they are formulaic. If there’s a formula in there, a deep learning AI system will figure it out.
Therein lies the fatal flaw: the output will be formulaic. Most important, the output won’t have any artistic content at all. You will NEVER see anything like literature coming out of deep learning AI. You’ll see plenty of potboilers pouring forth, but you can’t make art without an artist.
This stuff will be hailed as the next great revolution in entertainment. We’ll see lots of prizes awarded, fulsome reviews, thick layers of praise heaped on, and nobody will see any need to work on the real thing. That will stop us dead in our tracks for a few decades."
Those beautiful worlds took a lot of money to make and the studios are smart enough to realize consumers are apathetic/stupid enough to accept much lower quality assets.
The top end of the AAA market will use this sparingly for the junk you don't spend much time on - stuff the intern was doing before.
The bottom of the market will use this for virtually everything in their movie-to-game pipeline of throwaway games. These are the games designed just to sucker parents and kids out of $60 every month. The games that don't even follow the story of the movie and likely makes the story worse.
Strangely enough this is where the industry makes the vast majority of it's day-to-day walking around cash.
For example, right now if you save an entire village from an attacking tribe of orcs, only a handful of NPCs even say anything, just a nice little "thanks for saving our town!" and then 2 villages over the NPCs are completely unaware of a mighty hero literally solo tanking an entire invading army.
Why is that?
Well you'd need lots of, somewhat boring but important, dialogue written, and you'd need tons of voice lines recorded.
Both those are now solvable problems with generative AI. AI generated dialogue is now reasonably high quality, not "main character story arc" high quality, but "idle shop keeper chit chat" quality for sure, it won't break immersion at least. And the quality of writing from AI is fine for 2 or 3 sentences here and there.
I'll be soon releasing a project showing this off at https://www.tinytown.ai/ the NPC dialogue is generated on a small LLM that can be ran locally, and the secret of even high quality voice models is that they don't require a lot of memory to run.
I predict that in another 4 or 5 years we'll see a lot of models ran at the edge on video game consoles and home PCs, fleshing out game worlds.
What this should say to you instead is that stuff is really bad on training data side if you start scraping billions of game streams on internet - hard to imagine if there is a bigger chunk of training data than this. Stagnation incoming.
This also means that my dreams will keep looking like this iteration of Genie 2, but computer will scale up and the worlds won't look anything like my dreams anymore in next versions (its already more colorful anyway).
I remember image generation use to look like dreams too in the beginning. Now it doesn't look anything like that.
For the time being I will gloss over the fact this might just be a consumer facing product for Google that ends up having nothing to do with younger developers.
I'm torn between two ideas:
a. Show kids awesome stuff that motivates them to code
b. Show kids how to code something that might not be as awesome, but they actually made it
On the one hand you want to show kids something cool and get them motivated. What Google is doing here is certainly capable of doing that.
On the other hand I want to show kids what they can actually do and empower them. The days of making a game on your own in your basement are mostly dead, but I don't think that means the idea of being someone who can control a large amount of your vision - both technical and non-technical - is important.
Not everyone is the same either. I have met kids that would never spend a few hours to learn some Python with pygame to get a few rectangles and sprites on screen that might get more interested if they saw something this flashy. But experience also tells me those kids are extremely less likely to get much value from a tool like this beyond entertainment.
I have a 14 year old son myself and I struggle to understand how he sees the world in this capacity sometimes. I don't understand what he thinks is easy or hard and it warps his expectations drastically. I come from a time period where you would grind for hours at a terminal pecking in garbage from a magazine to see a few seconds of crappy graphics. I don't think there should be meaningless labor attached to programming for no reason, but I also think that creating a "cost" to some degree may have helped us. Given two programs to peck into the terminal, which one do you peck? Very few of us had the patience (and lack of sanity) to peck them all.
Lighting, gravity, character animation and what not internalized by the model... from a single image...!
A key reason why current Large Multimodal Models (LMMs) still have inferior visual understanding compared to humans is their lack of deep comprehension of the 3D world. Such understanding requires movement, interaction, and feedback from the physical environment. Models that incorporate these elements will likely yield much more capable LMMs.
As a result, we can expect significant improvements in robotics and self-driving cars in the near future.
Simulations + Limited robot data from labs + Algorithms advancement --> Better spatial intelligence
which will lead to a positive feedback loop:
Better spatial intelligence --> Better robots --> More robot deployment --> Better spatial intelligence --> ...
I love the advancement of the tech but this still looks very young and I'd be curious what the underlying output code looks like (how well it's formatted, documented, organized, optimized, etc.)
Also, this seems oddly related to the recent post from WorldLabs https://www.worldlabs.ai/blog. Wonder if this was timed to compete directly and overtake the related news cycle.
Games are about interactions, and this actively works against it. You don't want the model to infer mechanics, the designer needs deep control over every aspect of it.
People mentioned using this for prototyping a game, but that's completely meaningless. What would it even mean to use this to prototype something? It doesn't help you figure out anything mechanically or visually. It's just, "what if you were an avatar in a world?" What do you do after you run around with your random character controller in your random environments?
I think the most useful part of this is the world generation part, not the mechanics inference part.
people sell entire franchises off of a few pre-rendered generic-fantasy still images -- I would have to disagree with the premise that this is useless as a visual concept tool.
I agree with your notions about integration into an existing game.