Genie 2: A large-scale foundation world model
833 points by meetpateltech 11 hours ago | 312 comments
  • jjice 10 hours ago |
    I don't understand this space very well, but this seems incredible.

    Something I find interesting about generative AI is how it adds a huge layer of flexibility, but at the cost of lots of computation, while a very narrow set of constraints (a traditional program) is comparatively incredibly efficient.

    If someone spent a ton of time building out something simple in Unity, they could get the same thing running with a small fraction of the computation, but this has seemingly infinite flexibility based on so little and that's just incredible.

    The reason I mention it is because I'm interested in where we end up using these. Will traditional programming be used for most "production" workloads with gen AI being used to aid in the prototyping and development of those traditional programs, or will we get to the point where our gen AI is the primary driver of software?

    I assume that concrete code will always be faster and the best way to have deterministic results, but I really have to idea how to conceptualize what the future looks like now.

    • Retric 10 hours ago |
      Longer term computation isn’t really the limiting factor for generative AI, it’s training data. Generative AI is like Google search before the web responded to their search engine existing. There’s a huge quantity of high quality training data which nobody had any reason to pollute ready for the scrapping.

      But modern search is hampered by people responding to algorithmic indexes. Algorithms responding to metadata without directly evaluating content enabled a world of SEO and low quality websites suddenly being discoverable as long as they narrow their focus enough.

      So longer term it’s going to be an arms race between the output of Generative AI and people trying to keep updating their models. In 20 years people will get much better at using these tools, but the tools themselves may be less useful. I wouldn’t be surprised if eventually someone sneaks advertising into the output of someone else’s model etc.

      • Miraste 10 hours ago |
        This has already happened. Search google for a few random terms, and go through the first page of web and image results. A decent chunk will be AI-generated.
      • golol 8 hours ago |
        I disagree. With more computation you can train a bigger model on the same size training data and it will be better. There is a lot if knowledge on the internet that GPT-4 etc. have not yet learned.
        • Retric 8 hours ago |
          The issue is the training data isn’t some constant. Let’s suppose OpenAI had 10x the computing power but a vastly worse dataset, do you expect a better or worse result?

          The question is ambiguous without defining how much worse the dataset is.

    • danans 8 hours ago |
      > I assume that concrete code will always be faster and the best way to have deterministic results, but I really have to idea how to conceptualize what the future looks like now.

      It will likely be a mix of both concrete code and live AI generated experiences, but even the concrete code will likely be partially AI generated and modified. The ratio will depend on how reliable vs creative the software needs to be.

      For example, no AI generated code running pacemakers or power plants. But game world experiences could easily be made more dynamic by generative AI.

    • singularity2001 5 hours ago |
      Makes me wonder if there's any company which is trying to train a model to produce three D worlds within Unity (not as a video like oasis).
      • teamonkey 2 hours ago |
        This at least is a bit more realistic than what’s being presented by Google.

        There are already a number of techniques for procedurally-generating a world (including Markov-based systems).

        The problem with replacing procedural world generation with LLM generation are a) you need to obtain a data set to train it, which doesn’t commercially exist, or train it yourself, b) there’s a fundamental need to iterate on the design, which LLMs do not cope with well at all, c) you need to somehow debug issues and fix them. That’s quite apart from the quality issues, cost and power usage.

    • sbarre 4 hours ago |
      > Will traditional programming be used for most "production" workloads with gen AI being used to aid in the prototyping and development of those traditional programs

      I mean we're already there with Copilot, Cursor and other tools that use LLMs to assist in coding tasks.

  • me551ah 10 hours ago |
    So when I can try this?
    • ilaksh 10 hours ago |
      It's Google so I assume never. No model release, no product, no API, no detailed paper.

      There was another quite similar model from a different group within the last month or so. I can't remember if they released any weights or anything or the name of it. But it was the same concept.

    • vessenes 10 hours ago |
      You'll need to wait until Baidu or AliBaba or Nvidia publish a competing model, unfortunately, if history is any guide.
    • mhld 10 hours ago |
      Probably when Genie 10 will get integrated on a Pixel phone.
  • vessenes 10 hours ago |
    This is.. super impressive. I'd like to know how large this model is. I note that the first thing they have it do is talk to agents who can control the world gen; geez - even robots get to play video games while we work.

    That said; I cannot find any:

    - architecture explanation

    - code

    - technical details

    - API access information

    Feels very DeepMind / 2015, and that's a bummer. I think the point of the "we have no moat" email has been taken to heart at Google, and they continue to be on the path of great demos, bleh product launches two years later, and no open access in the interim.

    That said, just knowing this is possible - world navigation based on a photo and a text description with up to a minute of held context -- is amazing, and I believe will inspire some groups out there to put out open versions.

    • wongarsu 7 hours ago |
      We already knew it's possible from AI minecraft (https://oasis.decart.ai). This is just a more impressive version of that, trained on a wider range of games and with more context frames (Oasis has about a second of context, this one a minute). Even the architecture seems to be about the same.

      Had they released this two months earlier it would have been incredibly impressive. Now it's still cool and inspiring, but no longer as ground breaking. It's the cooler version that doesn't come with a demo or any hope of actually trying it out.

      And with the things we know from Oasis's demo, the agent-training use case the post tries to sell for Genie 2 is a hard sell. Any attempt to train an agent on such a world would likely look like an AI Minecraft speedrun: generate enough misleading context frames to trick the AI into generating what you want

      • achierius 3 hours ago |
        This is far beyond Oasis. Oasis had approximately 0 continuity, and the generated world was a blurry mess. This on the other hand actually approaches usability.
        • beeflet 2 hours ago |
          I don't know what the pipeline looks for these, but I assume that's due to the costs associated with training and running. Oasis had a context of only a couple of frames, while this genie model apparently runs for a couple of minutes. I guess they have a couple tricks up their sleeve to optimize this though.
        • n2d4 21 minutes ago |
          And it works on a wide variety of games, instead of just a single one with a relatively consistent art style. On the other hand, Oasis was realtime, while this one is offline; IMO getting the inference speed down was their most impressive feat, as even most decent video gen models are slower than that.
    • summerlight 7 hours ago |
      While this is impressive, yet still looks like a very early prototype. The overall nuance seems that it doesn't try to be a standalone product but a part of broader R&D projects toward general agents... I doubt if they even have any productionized modeling pipelines for this project yet and pretty sure that we won't have an open access anytime soon.
      • mclau156 7 hours ago |
        there are lots of 3D modelers spending hours on 3D worlds and assets to use in training, this seems to automate a lot of that work
      • hustwindmaple1 an hour ago |
        GDM is a research lab. They are not set up for production. There are other teams in Alphabet doing productionization stuff.
    • niceice 7 hours ago |
      Any estimates of how much one of these cost to generate and keep a minute of context?

      Secondly, any estimate of how much the price could fall in 5-10 years?

      • wongarsu 6 hours ago |
        Oasis (the Minecraft world model) can serve about 5 players on 8 H100 in real-time at 20fps in 360p. This is a much more capable model with two orders of magnitude more context. They pretty much say it can't be played real-time, which I read as they generate less than 15fps@240p on 8 GPUs. Probably why they talk so much about using it for AI training and evaluation rather than human use. There is a distilled version that works in real-time, but they don't show anything from that version (which is a statement in itself).

        For reducing the price, ASICs like etched may be the way forwards [1]. The models will get bigger for a time, but there may be a lot of room for models that can exploit purpose-built hardware.

        1: https://www.etched.com

        • onlyrealcuzzo 6 hours ago |
          > Probably why they talk so much about using it for AI training and evaluation rather than human use.

          What would they do / how would they use this output to make a better AI?

          • bionhoward 2 hours ago |
            Embodied cognition is a core theory for AGI; this would enable a vast array of bodies, environments, and situations, that high level of diversity can empower AI adaptability.

            For a straightforward example, this could help Waymo rehearse driving in various cities and weather / traffic settings

        • latchkey 5 hours ago |
          Hey! I'd love to know how this performs on 8xMI300x in comparison. Reach out to me?
      • llm_trw 5 hours ago |
        The price of LLMs has fallen 1,000 times in the last year for the same quality tokens.

        It's not clear if video models will follow the same trajectory.

      • reissbaker an hour ago |
        They don't give much info on parameter count, etc so it's hard to say concretely: Oasis (AI Minecraft) apparently runs on a single H100 [1], but this is presumably much larger — both due to higher fidelity, and due to the 60s context window instead of 1s context window for Oasis. But in 5-10 years regardless of what it takes to run now, the price will drop massively, and my bet is this would be playable in real-time. Context length will be solvable simply by increased VRAM (i.e. an H200 has 141GB per GPU, vs 80GB for an H100). Although Google is probably running these on TPUs, TPUs should follow a similar trajectory.

        In the intermediate term my guess is that this kind of world model will be useful for training 3D model generators, so that you can go from sketch -> running in-engine extremely quickly.

        1: https://www.tweaktown.com/news/101466/oasis-ai-and-single-nv...

    • lovich 6 hours ago |
      I asked this in a similar thread the other day but what is with this pattern as well exemplifies with the below quote

      > This is.. super impressive. I'd like to know how large this model is. I note that the first thing they have it do is talk to agents who can control the world gen; geez - even robots get to play video games while we work. That said; I cannot find any:

      > architecture explanation > code > technical details > API access information

    • whiplash451 4 hours ago |
      This kind of demo is probably great for hiring top talents: come work here, we have the best models and you'll have your name on the best papers.
  • artninja1988 10 hours ago |
    Looking at the list of authors, is this from their open endedness team? I found their position paper on it super convincing https://arxiv.org/abs/2406.02061
    • warkdarrior 9 hours ago |
      Did you link the wrong Arxiv paper? https://arxiv.org/abs/2406.02061 does not look like a position paper nor does it share any authors with this Genie 2 work.
  • mdrzn 10 hours ago |
    Wow.. I can't even imagine where we'll be in 5 or 10 years from now.

    Seems that it's only "consistent" up to a minute, but if the progress keeps the same rate.. just wow.

    • netdevphoenix 9 hours ago |
      Progress is not linear. For all we know, in 2027 things will slow down to a virtual halt for the next 30 years. Look at how much big science progressed in the first 20 years of the 19th century/20th century and look how little it has progressed in the first 20 years of this century. We are on the downlow compared to the last centuries and even if you look at crisp or deep learning, they are not as impactful NOW as let's say the germ theory of disease, evolution, the discovery of the double helix structure or general relativity was. Almost a quarter of a century gone and we don't have much to show for it.

      For reference:

      19th century

      evolution by natural selection as science

      electromagnetism

      germ theory of disease

      first law of thermodynamics

      --------------------------------------------

      20th century

      general relativity

      quantum mechanics

      dna structure

      penicillin

      big bang theory

      --------------------------------------------

      21st century

      crisp

      deep learning

      • dooglius 9 hours ago |
        The things you list for previous centuries aren't limited to the first 20 years
        • netdevphoenix 9 hours ago |
          19th century: electromagnetism, the voltaic pile, the double slit experiment for the light wave theory

          20th century: general/special relativity, radioactive decay, discovery of the electron

          21st century: crisp and deep learning

          Hard to argue that the big science of the first 20 years of the previous century looks way more impact than crisp and deep learning put TOGETHER.

          • dekhn 9 hours ago |
            its called crispr, not crisp.
          • samvher 9 hours ago |
            100 years later, sure. What about in December 1924?
      • Workaccount2 9 hours ago |
        >Look how little it has progressed in the first 20 years of this century

        This is naivete on the scale of "Cars were much safer 70 years ago".

      • w10-1 7 hours ago |
        crispr variants have not particularly improved treatments.

        But DNA sequencing and biologics have revolutionized medicine and changed lives.

        Also, the computer as phone took it from 100M's mostly business users buying optical disks to 3+B everyday people getting regular system updates and apps on demand accessing real-time information. That change alone far outweighs the impact of anything produced by advanced physics.

        As a result we, as developers, now have the power to deliver both messages and experiences to the entire world.

        Ideas are cheap, and progress is virtually guaranteed in intellectual history. But execution is exquisitely easy to get wrong. Genie 2 is just Google's first bite at this apple, and milestones and feedback are key to getting something as general as AI right. Fingers crossed!

  • lionkor 10 hours ago |
    > deepmind.google uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more.

    Yippee finally google posts a non confirming cookie popup with no way to reject the ad cookies!

  • wildermuthn 10 hours ago |
    The technology is incredible, but the path to AGI isn't single-player. Qualia is the missing dataset required for AGI. See attention-schema theory for how social pressures lead to qualia-driven minds capable of true intelligence.
  • simonw 10 hours ago |
    Related recent project you can try out yourself (Chrome only) which hallucinates new frames of a Minecraft style game: https://oasis.decart.ai/

    That one would reimagine the world any time you look at the sky or ground. Sounds like Genie2 solves that: "Genie 2 is capable of remembering parts of the world that are no longer in view and then rendering them accurately when they become observable again."

    • echelon 10 hours ago |
      This blows Decart's Oasis (which raised $25 million at $500 million valuation) and World Labs (which raised $230 million in complete stealth) out of the water.

      Google is firing warning shots to kill off interest in funding competing startups in this space.

      I suspect that in 6 months it won't matter as we'll have completely open source Chinese world models. They're already starting to kill video foundation model companies' entire value prop by releasing open models and weights. Hunyuan blows Runway and OpenAI's Sora completely out of the water, and it's 100% open source. How do companies like Pika compete with free?

      Meta and Chinese companies are not the leaders in the space, so they're salting the earth with insanely powerful SOTA open models to prevent anyone from becoming a runaway success. Meta is still playing its cards close to its chest so they can keep the best pieces private, but these Chinese companies are dropping innovation left and right like there's no tomorrow.

      The game theory here is that if you're a foundation model "company", you're dead - big tech will kill you. You don't have a product, and you're paying a lot to do research that isn't necessarily tied to customer demand. If you're a leading AI research+product company, everyone else will release their code/research to create a thousand competitors to you.

      • Workaccount2 9 hours ago |
        I strongly suspect that like open ai and O1, for profit companies are going to start locking down whatever advances they find.

        There is still an enormous amount of long hanging fruit that anyone can harvest right now, but eventually big advances are going to require big budgets and I can only imagine how technically tight lipped they will be with those.

      • senko 9 hours ago |
        > The game theory here is that if you're a foundation model "company", you're dead - big tech will kill you. You don't have a product, and you're paying a lot to do research that isn't necessarily tied to customer demand.

        Basically, the foundation model companies are outsourced R&D labs for big tech. They can be kept at arms length (like OpenAI with Microsoft and Anthropic with Amazon) or be bought outright (like Inflection, although that was a weird one).

        Both OpenAI and Anthropic are trying to move away from being pure model companies.

        > If you're a leading AI research+product company, everyone else will release their code/research to create a thousand competitors to you.

        Trillion dollar question - is there a competitive edge / moat in vertical integration in AI? Apple proved there was in hardware + os (which were unbundled in wintel times). For AI, right now, I can't see one, but I'm just a random internet comentator, who knows.

        • refulgentis 9 hours ago |
          I think not, it feels more like a utility to me until someone pulls their API.
      • mrandish 7 hours ago |
        > Chinese companies are not the leaders in the space, so they're salting the earth with insanely powerful SOTA open models to prevent anyone from becoming a runaway success.

        While it would be interesting if Chinese companies were releasing their best full models as an intentional strategy to reduce VC funding availability for western AI startups, it would be downright fascinating if the Chinese government was supporting this as a broader geopolitical strategy to slow down the West.

        It does make sense but would require a remarkable level of insight, coordination and commitment to a costly yet uncertain strategy.

        • whiplash451 4 hours ago |
          I don't think it requires a remarkable level of insight.

          The overall cost for the Chinese government is probably very small in the grand scheme of things. And it makes a lot of sense from a geopolitical strategy.

      • whiplash451 4 hours ago |
        The game has indeed become brutal for foundational model companies.

        I am less worried for AI research+product companies: they have likely secured revenue streams with real customers and built domain knowledge in the meantime.

    • ilaksh 9 hours ago |
      There is another recent project that is more general game generation very similar to Genie 2. I can't remember the name.

      GameGen-X came out last month. https://arxiv.org/html/2411.00769v1

    • psb217 9 hours ago |
      RE: "Genie 2 is capable of remembering parts of the world that are no longer in view and then rendering them accurately when they become observable again." -- This claim is almost certainly wildly misleading. This claim is technically true if there's any scenario where their agent, eg, briefly looked down at the ground and then back up at the sky and at least one of the clouds in the sky was the same as before looking down. However, I expect most people will interpret the claim far more broadly than the model can support. It's classic weasel wording.
      • pfortuny 8 hours ago |
        "remember parts of the world..." not even "some"... That is a tell-tale.
      • isotypic 7 hours ago |
        Looking at how no samples other than the 3 samples in the "Long horizon memory" section have any camera movement which puts something offscreen and then back onscreen, it certainly seems that they are stretching the capabilities as far as they can in writing.
      • drusepth 4 hours ago |
        Yeah, my best guess is they're probably including the previous N frames as context into generating the next model. This works to preserve continuity over a short amount of time (as you say, briefly looking at the ground and then back up), but only a short period of time.

        For these kinds of models to be "playable" by humans (and, I'd argue, most fledgling AI agents), the world state needs to be encoded in the context, not just a visual representation of what the player most recently saw.

    • wongarsu 8 hours ago |
      However the architecture they describe really sounds like it should still have that issue. I doubt they really solved it.

      Which is a big problem for the agent-training use case they keep reiterating on the website. Agents are like speedrunners: if there is a stupid exploit, the agent will probably find and use it. And for Oasis the speedrunning meta for getting to the nether is to find anything red, make it fill the screen, and repeat until the world-generating AI thinks you look at lava and must be in the nether

  • bix6 10 hours ago |
    Genuine question: What is the point of telling us about this if we can’t use it? Is it just to flex on everyone?
    • mhld 10 hours ago |
      Some kind of marketing strategy that actually nobody understands
      • jazzyjackson 9 hours ago |
        It's not that opaque, it's recruitment. Basically same marketing as a univeristy. "We do state of the art research here. If you are a talented researcher who wants to advance the field, you'll want to work here"

        Now, how Google plans to make money with all this bleeding edge research, that's the mystery.

    • tootie 10 hours ago |
      It's PR but it's also meant to entice. Let the world know Google is #1 for Gen AI, convince researchers to join Google, convince investors to boost the stock price, make Elon Musk grit his teeth. That kind of thing. In the short term, it may provide a bump in interest for existing AI products from Google.
    • ChrisArchitect 10 hours ago |
      The best minds of a generation went from thinking about how to make people click ads to how to generate 3d video game worlds.
      • adventured 9 hours ago |
        The best minds were never working on getting people to click on ads. That was an internal industry narrative so people could feel better about themselves.
        • fragmede 9 hours ago |
          seems more like an external narrative so people can feel worse about the world
      • Workaccount2 9 hours ago |
        The best minds of the generation are on wall street trying to figure out how to quickly spot inefficiently priced options 1% more often.

        Seriously, I wish more than anything I was kidding.

    • mupuff1234 10 hours ago |
      An artifact for their promotion packet.
    • echelon 10 hours ago |
      To stop competing startups from getting funding.

      Decart (Oasis) raised $25 million at $500 million valuation.

      World Labs raised $230 million.

      • UncleOxidant 3 hours ago |
        Not sure about that. Sometimes Google legitimates a field. I was in a kite power startup company back in 2019. Before Google canceled it's Makani kite power project VCs and angels would at least talk to us - it gave them some frame of reference: "Oh, this is like the kite power thing Goggle is doing?" "Right, but on a much smaller scale". After they canceled Makani in the summer of 2019 it was crickets. We folded by the end of 2019. They figured if Google couldn't make it work then it probably wasn't something to invest in.
    • justlikereddit 9 hours ago |
      [flagged]
    • xnx 9 hours ago |
      Often to establish that the authors were first in the space for when competitors announce their tech.
    • spencerchubb 6 hours ago |
      Researchers want to publish

      Recruiting

  • rvz 10 hours ago |
    Hmmm.... But we were told on HN that "Google is dying" remember? in reality, is it isn't.

    We'll see which so-called AI-companies are really "dying" when either a correction, market crash or a new AI winter arrives.

  • bearjaws 10 hours ago |
    > Genie 2 is capable of remembering parts of the world that are no longer in view and then rendering them accurately when they become observable again.

    This is huge, the Minecraft demos we saw recently we're just toys because you couldn't actually do anything in them.

    • psb217 8 hours ago |
      It's worth keeping in mind that "there exists X such that Y is true" is not the same as "Y is true for all X". People love using these sorts of statements since they're technically true as written, but most people will read them in a way that's false. Eg, the statement is true for the Minecraft demos, and for any model which doesn't exhibit literally zero persistence for (temporarily) non-visible state.
  • stoicjumbotron 10 hours ago |
    Do people within Google get to try it? If yes, how long is the approval process?
  • xcodevn 10 hours ago |
    On a very similar theme, here is the work from World Lab (founded by Fei-Fei Li, ImageNet dataset, et al.) about creating 3D worlds:

    https://www.worldlabs.ai/blog

    • momojo 3 hours ago |
      I find this work much more exciting. They're not just teaching a model to hallucinate given WASD input. They're generating durable, persistent point clouds. It looks so similar to Genie2 yet they're worlds apart.
  • moralestapia 10 hours ago |
    Not even a month ago HN was discussing Ben Affleck's take on actors and AI, somehow taking a side with him and arguing how the tech "it's just not there, etc...".

    I'll keep my stance, give it two years and very realistic movies, with plot and everything, will be generated on demand.

    • tartoran 9 hours ago |
      Ai can't generate images without awkward hallucinations yet. From that to movies that make sense to movies that people would want to watch (comparable to feature films) beyond the initial curiosity factor is a long way, if there is one.
      • moralestapia 8 hours ago |
        ChatGPT (no Sora, no World Generation, etc...) was released two years ago almost to the date.

        What you're talking about is a minor jump from the SOTA, much smaller than what we've already see in these two years.

    • Sateeshm 4 hours ago |
      I'll take that bet
      • moralestapia 2 hours ago |
        Email on profile!

        I'll match any 5-figure amount you propose. I also know an escrow service we can trust.

  • binalpatel 10 hours ago |
    This is super impressive.

    Interesting they're framing this more from the world model/agent environment angle, when this seems like the best example so far of generative games.

    720p realtime mostly consistent games for a minute is amazing, considering stable diffusion was originally released 2ish years ago.

    • uoaei 10 hours ago |
      Pixelspace is an awful place to be generating 3D assets and maintaining physical self-consistency.
      • jeroenvlek 9 hours ago |
        Ultimately even conventional 3d assets are rendered into pixelspace. It all comes down to the constraints in the model itself.
        • psb217 8 hours ago |
          A key strength of conventional 3d assets is that their form is independent of the scenes in which they will be rendered. Models that work purely in pixel space avoid the constraints imposed by representing assets in a fixed format, but they have to do substantial extra work to even approximate the consistency and recomposability of conventional 3d assets. It's unclear whether current approaches to building and training purely pixel-based models will be able to achieve a practically useful balance between their greater flexibility and higher costs. World Labs, for example, seems to be betting that an intermediate point of generating worlds in a flexible but structured format (NERFs, gauss splats, etc) may produce practical value more quickly than going straight for full freedom and working in pixel space.
  • 42lux 10 hours ago |
    I don’t know I get the excitement but as soon as you turn around and there is something completely different behind you it breaks the immersion.
  • jdlyga 9 hours ago |
    It's very cool, but we've gotten too many of these big bold announcements with no payoff. All it takes is a very limited demo and we'd be much happier.
    • rishabhparikh 9 hours ago |
      I'm guessing it would be far too expensive to make a free demo
  • sergiotapia 9 hours ago |
    Will the GPU go the way of the soundcard, and we will all purchase an "LPU"? Language Processing Unit for AIs to run fast?

    I remember there was a brief window where some gamers bought a Physx card for high fidelity physics in games. Ultimately they rolled that tech in to the CPUs themselves right?

    • 0x1ceb00da 8 hours ago |
      The graphics stuff in modern gpus is just a software layer on top of a generic processing unit. The name is a misnomer.
      • jsheard 7 hours ago |
        Partially true, a significant chunk of modern GPUs are really just very wide general purpose processors, but they do still have fixed-function silicon specifically for graphics and probably will for the foreseeable future. Intel tried to lean into doing as much as possible in general purpose compute with their Larrabee GPU project but even that still had fixed-function texture units... and the concept was ultimately a failure which hasn't been revisited.
  • k2xl 9 hours ago |
    This is impressive, but why are they all looking still like a video game? Could they have this render movie scenes with realistic looking humans? I wonder if it is due to the training set they use being mostly video games?
    • xnx 9 hours ago |
      > This is impressive, but why are they all looking still like a video game?

      Many of the current AI models have their roots in games: Chess, Go, etc.

    • nonameiguess 9 hours ago |
      I highly doubt it. While there is no ceiling in principle on how good rendering can get, even with perfect knowledge of the physics of optics, the cost to compute that physics is too high not to cut some corners. Nature gives you this for free. Every photon is deflected at exactly the right angle and frequency without anything needing to be computed. All you need is a camera to record it. At least for now, this is why every deep fake, digital de-aging, AI upscaling, grafting Carrie Fisher's face onto a different actor, and CGI in general inevitably occupies the uncanny valley.
  • corysama 9 hours ago |
    For quite a while now David Holz of Midjourney has mused that videogames will be AI generated. Like a theoretical PlayStation 7 with an AI processor replacing the GPU.

    But, I didn’t expect this much progress towards that quite this fast…

    • kypro 9 hours ago |
      Agreed. All I'd say is that these demos look quite limited in their creativity and depth. Good video games are far more than some graphics with a movable character and action states.

      A good video game is far more the world building, the story, the creativity or "uniqueness" of the experience, etc.

      Currently this seems to generate fairly generic looking and shallow experiences. Not hating though. It's early days obviously.

    • doctorpangloss 9 hours ago |
      If only it were that simple. Google spent $10b developing Stadia, where was the big hit game from that?

      These DeepMind guys play Factorio, they don't play Atari games or shooters, so why aren't they thinking about that? Or maybe they are, and because they know a lot about Factorio, they see how hard it is to make?

      There's a lot of "musing" as you say.

    • gcr 9 hours ago |
      I've had the idea for a Backrooms-style hallucinatory generative videogame for a while. Imagine being able to wander through infinitely generated surreal indoor buildingscapes that were rendered in close-to-realtime.

      It would play to the medium's strengths -- any "glitches" the player experiences could be seen as diagetic corruptions of reality.

      The moment we get parameterized NeRF models running in close-to-realtime, I want to go for it.

  • devonsolomon 9 hours ago |
    Yesterday I laughed with my brother about how harsh people on the internet were about World Labs launch (“you can only walk three steps, this demo sucks!”). I was thinking, “this was unthinkable a few years ago, this is incredible”.

    People of the internet, you were right. Now, this is incredible.

    • bilbo0s 6 hours ago |
      World Labs was kind of laughable. But at least you laughed.

      Now?

      I mean, I don't know man?

      With this Genie 2 sneak peak, it all just makes World Labs' efforts look sad. Did they really think better funded independents and majors would all not be interested in generating 3D worlds?

      This is a GUBA moment. If you're old enough to know, then you know.

  • maxglute 9 hours ago |
    2000s graphics vibes.
  • YeGoblynQueenne 9 hours ago |
    Hey, DeepMind folks, are you listening? Listen. We believe you: you can conquer any virtual world you put your mind to. Minecraft, Starcraft, Warcraft (?), Atari, anything. You can do it! With the power of RL and Neural Nets. Well done.

    What you haven't been able to do so far, after many years of trying, is to go from the virtual, to the real. Go from Arcanoid to a robot that can play, I dunno, squash, without dying. A robot that can navigate an arbitrary physical location without drowning, or falling off a cliff, or getting run over by a bus. Or build any Lego kit from instructions. Where's all that?

    You've conquered games. Bravo! Now where's the real world autonomy?

  • aithrowawaycomm 9 hours ago |
    It is jaw-dropping and dismaying how for-profit AI companies use long-standing terms like "world model" and "physics" when they mean "video game model" and "video game physics." Or, as you can plainly see, "models gravity" when they mean "models Red Dead Redemption 2's gravity function, along with its cinematic lighting effects and Rockstar's distinctively weighty animations." Which is to say Google is not modeling gravity at all.

    I will add the totally inconsistent backgrounds in the "prototyping" example suggests the AI is simply cribbing from four different games with a flying avatar, which makes it kind of useless unless you're prototyping cynical AI slop. And what are we even doing here by calling this a "world model" if the details of the world can change on a whim? In my world model I can imagine a small dragon flying through my friend's living room without needing to turn her electric lights into sconces and fireplaces.

    To state the obvious: if you train your model on thousands of hours of video games, you're also gonna get a bunch of stuff like "leaves are flat and don't bend" or "sometimes humans look like plastic" or "sometimes dragons clip through the scenery," which wouldn't fly in an actual world model. Just call it "video game world model!" Google is intentionally misusing a term which (although mysterious) has real meaning in cognitive science.

    I am sure Genie 2 took an awful lot of work and technical expertise. But this advertisement isn't just unscientific, it's an assault on language itself.

    • empath75 7 hours ago |
      > It is jaw-dropping and dismaying how for-profit AI companies use long-standing terms like "world model" and "physics" when they mean "video game model" and "video game physics." Or, as you can plainly see, "models gravity" when they mean "models Red Dead Redemption 2's gravity function, along with its cinematic lighting effects and Rockstar's distinctively weighty animations." Which is to say Google is not modeling gravity at all.

      That's because it's using video game data for training footage because it's cheap and easy to generate. It would not be simulating video game gravity if it was training on real world video inputs.

    • ricardobeat 6 hours ago |
      Remembering off-screen objects, generating spatially consistent features, modeling physical interactions and lights, understanding what "up the stairs" means, all seem to warrant talking about a world model, because that's exactly what's required to do these things compared to simply hallucinating video sequences.
    • brap 4 hours ago |
      I agree, but

      >if you train your model on thousands of hours of video games

      What if you train the same model on thousands of hours of sensor data from real, physical robots?

  • brink 9 hours ago |
    What is actually of value here? There's no actual game, it's incredibly expensive to compute, the behavior is erratic.. It's cool because it's new - but that will quickly wear off, and once that's gone, what's left? There's insane amounts of money being spent on this, and for what?
    • adverbly 9 hours ago |
      > What is actually of value here?

      Noone knows yet. AI technology like this is closer to scientific research than it is to product development. AI is basically new magic, and people are in a "discovery" phase where we are still trying to figure out what is possible. Nothing of value was immediately created when they discovered DNA. Productization came much later when it was combined with other technologies to fit a particular use case.

    • Menu_Overview 9 hours ago |
      Well, what's next? Beyond prototyping, I imagine this is an early step towards more practical agents building their own world model. Better problem solving.

      Prompt: Here's a blueprint of my new house and a photo of my existing furniture. Show me some interior design options.

    • ilaksh 9 hours ago |
      It's an obviously amazing research development.

      You just don't like AI.

      It can be used for training agents, prototyping, video generation, and is quite possibly a glimpse of a whole new type of entertainment or a new way to create video games.

      What's the point of the massive amount of money spent on video games in general? Or all of the energy spent moving people back and forth to an office? Or expensive meals at restaurants? Or trillions in weaponry? Or television shows or movies?

      • nightski 8 hours ago |
        Video games bring billions of real people joy. This is sitting in some lab at Google inaccessible to anyone.
        • lassenordahl 8 hours ago |
          Is your argument that them sharing research progress and demos doesn't benefit anybody purely because we can't immediately play around with them?

          I feel like sharing early closed-source blog-posts is part of the research process. I'm sure someone in this thread has thought of a use case that the Google team missed. Open/closed source arguments here feel premature IMO.

          • nightski 7 hours ago |
            It's not part of the research process. Being part of the search process would involve a publication and sharing code/data/results/methods. It's not research unless it can be verified by peers.

            This is just a marketing fluff piece that does not benefit anyone and is ego stroking at best.

            • lassenordahl 6 hours ago |
              Hm yeah - I think you and I just have differing opinions on the research process. I'd be a bit more vague, and define the publication process as something similar to you.

              I still think things like this are important, and at least give folks a bit of time to ideate on what will be possible in a few years. Of course having the model or architecture on hand would be nice, but I'm not holding that against Google here.

    • ThouYS 9 hours ago |
      same q here. what can I do with this "world model" that I can't do with a game like minecraft or counter strike?

      asked the same thing a while back, and the answers boiled down to "somehow helps RL agents train". but how exactly? no clue

      • ogogmad 8 hours ago |
        Making a computer game is very expensive and time-consuming. This technology might allow a 12 year old to produce a fully working AAA-quality game on their own for almost nothing. But sigh it's an early demo that needs some improving.

        [edited out some barbs I wrote because I find some comments on this website REALLY annoying]

        • ThouYS 8 hours ago |
          lol
    • awfulneutral 9 hours ago |
      Well, in the future you could imagine that instead of programming a game, you can just generate each individual frame on the fly at 60fps. You could be playing 2D Mario and then the game could have him morph into 3D and take off into space or something. You could also generate any software or OS frontend on the fly really, if you can make it so the AI can keep track of your data and make it consistent enough to be usable. Does this have positive or negative value? I don't know.
    • golol 8 hours ago |
      Do you want household androids? Because this kind of stuff is on the level of research a bery large step towards that. Think as it as ab example where we can make a model understand a lot of physical common sense stuff, which is the goal for robotics right now.
      • suddenlybananas 8 hours ago |
        This is really not the avenue for house-hold robots. Interacting with the actual physical world is very different from creating a video game.
        • sangnoir 8 hours ago |
          > Interacting with the actual physical world is very different from creating a video game

          The major difference being the former scales very poorly for generating training data compared to the latter. Genie 2 is not even a video game and has worse fidelity that video games, the upside is it probably scales even better than video games for generating training scenarios. If you want androids in teal life, Genie 2 (or similar systems) is how you bootstrap the agent AI. The training pipeline will be: raw video -> Genie 2 -> game engine with rules -> physical robot

          • youoy 7 hours ago |
            > The training pipeline will be: raw video -> Genie 2 -> game engine with rules -> physical robot

            One of those arrows is not like the others

            • sangnoir 6 hours ago |
              The final step is an oversimplification: purpose-built simulator -> deconstructed robot on a lab workbench -> controlled space -> "real world" with constraints -> real world

              Any model would have to succeed in one stage before it can proceed to the next one.

              • adverbly 5 hours ago |
                At the risk of sounding repetitive, one of those arrows is not like the others.
                • sangnoir 5 hours ago |
                  ...and?
          • mosdl 4 hours ago |
            How does turning an image into a game help with robots? Robots don't need to guess what they can't see, they would have sensors to tell them exactly what is there (like a self driving car).
            • Chilko 4 hours ago |
              I have no expertise in this area, but my assumption is that this could help for a broader sort of object/world permanence for robots - e.g. if something is no longer visible to the robot's sensors (e.g. behind an obstacle, smoke, etc) then it could use a model based on this type of tech to maintain a short-term estimate of its surroundings even when operating blind.
            • sangnoir 3 hours ago |
              > Robots don't need to guess what they can't see, they would have sensors to tell them exactly what is there (like a self driving car).

              Self driving vars have cameras as part of their sensor suite, and have models to make sense of sensor data. Video will help with perception and classification (understanding the world) with no agency needed. Game-playing will help with planning, execution, and evaluation. Both functions are necessary, and those that come after rely on earlier capabilities

      • JTyQZSnP3cQGa8B 7 hours ago |
        I don't understand how that is relevant. I certainly would not want household androids unless I'm completely disabled.
        • theshackleford 4 hours ago |
          > I certainly would not want household androids unless I'm completely disabled.

          That's nice. I'm not completely disabled, but I am disabled, and I very much would appreciate them, as my capability to do things over the longer term is very much not going to go in the direction of improving. As it is, there are a lot of things I now rely on people for, that at one time, I did not.

          Whilst I recognise its probably not going to happen in a time span that is useful to me, I do wish it could, so that I could be less of a burden on those around me, and maintain a relative level of independence.

    • mitthrowaway2 8 hours ago |
      I'm not an expert in this space but I can see the value. It allows an endless loop of generating novel scenarios and evaluating an AI agent's performance within that scenario (for example, "go up the stairs"). A world with one minute of coherence is about enough to evaluate whether the AI's actions were in the right direction or not. When you then want to run an agent on a real task in the real world, with video-input data, you can run the same policy that it learned in dream-world simulation. The real world has coherence, so the AI agent's actions just need to string together well enough minute-by-minute to work toward achieving a goal.

      You could use real video games to do this but I guess there'd be a risk of over-fitting; maybe it would learn too precisely what a staircase looks like in Minecraft, but fail to generalize that to the staircase in your home. If they can simulate dream worlds (as well as, presumably, worlds from real photos), then they can train their agents this way.

      This would only be training high-level decision policies (ie, WASD inputs). For something like a robot, lower level motor control loops would still be needed to execute those commands.

      Of course you could just do your training in the real world directly, because it already has coherence and plenty of environmental variety. But the learning process involves lots of learning from failure, and that would probably be even more expensive than this expensive simulator.

      Despite the claims I don't think it does much to help with AI safety. It can help avoid hilarious disasters of an AI-in-training crashing a speedboat onto the riverbank, but I don't think there's much here that helps with the deeper problems of value-alignment. This also seems like an effective way to train robo-killbots who perceive the world as a dreamlike first-person shooter.

    • modeless 5 hours ago |
      > It's cool because it's new - but that will quickly wear off, and once that's gone, what's left?

      To have this perspective you must believe that this will never get better than it currently is, its limitations will never be fixed, and it will never lead to any other applications. I don't know how people can continue to look at these things with such a lack of imagination given the pace of progress in the field.

      • zamadatix 4 hours ago |
        I think the problem is less to do with imagination and more to do with being willing to fail a metric shit ton to find out how, every once in a while, you didn't fail due to some really important and surprising reason you wouldn't have found nearly as quickly only ever going after what you were already certain of.
    • xandrius 5 hours ago |
      Nothing is of value until it is.
    • 3abiton 4 hours ago |
      This is an incredible start. The potential is immense, yes there arekinks, but in 10 years?
  • KaoruAoiShiho 9 hours ago |
    This is where the GPU limits on China really hurts, Chinese companies have been dropping great proof of concepts but because they have been so compute bottlenecked they can't ever really make something actually competitive or transformative.
  • tigerlily 9 hours ago |
    I can.. see this being used to solve crime, even solving unsolved mysteries and cold cases, among other alternative applications.
    • phtrivier 9 hours ago |
      I don't understand your line of reasoning here. Are you picturing a situation where you would take a photo of a crime scene, and "jump" into a virtual model created from the photo, to help generate intuitions about where to go look for clues ? Kinda like the CSI "enhance quality" meme, but on steroids ?

      That would be fun to use, but ultimately pointless. An AI model will generate things that are _statistically plausible_ ; solving crimes usually requires finding out the _truth_.

      • tigerlily 9 hours ago |
        You nailed it, and yes I was being lamely ironic. I am however terrified of a future where this type of thing happens, and people just go along with it instead of stating the obvious facts the way you just did.
        • mosdl 8 hours ago |
          Remake Blade Runner but with the twist that the snake scale was never actually there.
  • rndmize 9 hours ago |
    These clips feels like watching someone dream in real time. Particularly the door ones, where the environment changes in wild fashion, or the middle NPC one, where you see a character walk into shadow and mostly disappear and a different character walks out.
  • m3kw9 9 hours ago |
    “ Generating unlimited diverse training environments for future general agents” it may seem unlimited but up to a limited point there will be a pattern. I don’t buy that an AI can use a static model and train itself with data generated from it
  • diimdeep 9 hours ago |
    what for world models be equivalent of ChatGPT for LLM to really blow up in utility?
    • singularity2001 4 hours ago |
      text to roblox maybe?
  • rationalfaith 9 hours ago |
    As impressive as this might seem let's think about fundamentals.

    Statistical models will output a compressed mishmash of what they were trained on.

    No matter how hard they try to cover that inherent basic reality, it is still there.

    Not to mention the upkeep of training on new "creative" material on a regular basis and the never ending bugs due to non-determinism. Aside from contrived cases for looking up and synthesizing information (Search Engine 2.0).

    The Tech Industry is over investing in this area exposing an inherent bias towards output rather than solving actual problems for humanity.

  • notsylver 9 hours ago |
    I doubt it, but it would be interesting if they recorded Stadia sessions and trained on that data (... somehow removing the hud?), seems like it would be the easiest way for them to get the data for this.
    • blixt 6 hours ago |
      Seems somewhat likely to me. They probably even trained a model to do both frame generation and upscaling to allow the hardware to work more efficiently while being able to predict the future based on user input (to reduce perceived latency). Seems like Genie is just that but extrapolated much further.
  • worldmerge 9 hours ago |
    This looks really cool. How can I use it? Like can I mix it with Unity/Unreal?
  • cptroot 8 hours ago |
    For all that this is lauded as a "prototyping tool", it's frustrating to see Genie2 discarding entire portions of the concept art demo. The original images drawn by Max Cant have these beautiful alien creatures. Large ones floating, and small ones being herded(?). Genie2 just ignores these beautiful details entirely:

    > That large alien? That's a tree. > That other large alien? It's a bush. > That herd of small creatures? Fugghedaboutit > The lightning storm? I can do one lightning pole. > Those towering baobob/acacia hybrids? Actually only two stories tall.

    It feels so insulting to the concept artist to show those two videos off.

    • Kiro 8 hours ago |
      That's an odd thing to complain about. Focusing on such a minor issue feels overly critical at this stage, like anything less than a pixel perfect 3D world representation of the source image is unacceptable. Insulting? Come on... Max Cant works at DeepMind so I'm sure he's fine.
    • wongarsu 6 hours ago |
      Yeah, those two demos fell flat for me. The model performing badly on inputs far outside the training data is fine, but those two videos belong in the outtakes section or maybe a limitations section, not next to text lauding the "out-of-distribution generalization capabilities". The videos show the opposite of what's claimed.
  • zja 8 hours ago |
    I love the outtakes section in the bottom. It made me laugh but it also feels more transparent than a lot of GenAI stuff that’s being announced.
  • tsunamifury 8 hours ago |
    I'm guessing from the demo sophisticated indoor architectures do not work yet.
  • CaptainFever 8 hours ago |
    As a game developer, I'm impressed and thinking of ideas of what to do with this kind of tech. The sailboat example was my favourite.

    Depending on how controllable the tech ends up being, I suppose. Could be anywhere from a gimmick (which is still nice) to a game engine replacement.

    • echelon 8 hours ago |
      You could compress down a game to run on cheap hardware acceleration. No more Unreal Engine with crazy requirements. Once the hallucinations are fixed, you even get better lighting.

      This is the Unreal Engine killer. Give it five years.

      • noch 8 hours ago |
        > This is the Unreal Engine killer. Give it five years.

        We need to calm down with the clickbait-addled thinking that "this new thing kills this established powerful tested useful thing." :-)

        Game developers have been discussing these tools at length, after all, they are the group of software developers who are most motivated to improve their workflow. No other group of software developers comes close to gamedevs' efficiency requirements.

        The 1 thing required for serious developers is control. As such, game engines like Unreal and in-house engines won't die.

        Generative tools will instead open up a whole new, but quite different, way of creating interactive media and games. Those who need maximum control over every frame and every millisecond and CPU cyle will still use engines. The rest who don't will be productive with generative tools.

        • echelon 8 hours ago |
          > gamedevs' efficiency requirements

          These models won't need you to retopo meshes, write custom shaders, or optimize Nanite or Lumen gameplay. They'll generate the final frames, sans traditional graphics processing pipeline.

          > The 1 thing required for serious developers is control

          Same with video and image models, and there's tremendous work being done there as we speak.

          These models will eventually be trained to learn all of human posture and animation. And all other kinds of physics as well. Just give it time.

          > Those who need maximum control over every frame and every millisecond and CPU cyle will still use engines.

          Why do you think that's true? These techniques can already mimic the physics of optics better than 80 years of doing it with math. And they're doing anatomy, fluid dynamics, and much more. With far better accuracy than game engines.

          These will get faster and they will get controllable.

          • noch 7 hours ago |
            > Why do you think that's true? > These will get faster and they will get controllable.

            Brother, you're preaching to the choir. I've been shilling generative tools for gamedev far harder than you are in your reply. :-)

            But I'm just relaying to you what actual gamedevs working and writing code right now need and for the foreseeable future for which projects have been started or planned. As Mike Acton says, "the problem is the problem".

            > These techniques can already mimic the physics of optics better than 80 years of doing it with math.

            I encourage you to talk to actual gamedevs. When designing a game, you aren't trying to mimic physics: you're trying to make a simulation of physics that feels a certain way that you want it to play. This applies to fluid dynamics, lighting/optics, everything.

            For example, if I'm making a saling simulator, I need to be able to script the water at points where it matters for gameplay and game-feel, not simulate real physics. I'm willing to break the rules of physics so that my water doesn't act or look like real water but feels good to play.

            Movement may be motion captured, but animation is tweaked so that the characters control and play in a way that the game designer feels is correct for his game.

            If you haven't designed a game, I encourage you to try to make a simple space invaders clone over the weekend, then think about the physics in it and try to make it feel good or work in an interesting way. Even in something that rudimentary, you'll notice that your simulation is something you test and tweak until you arrive at parameters that you're happy with but that aren't real physics.

            • echelon 7 minutes ago |
              I've written my own 2D and 3D game engines as well as worked in Unreal. I'm currently working on a controllable diffusion engine using Bevy.

              I strongly disagree that you need to cater to existing workflows. There's so much fertile ground in taking a departure. Just look at what's happening with animation and video. People won't be shooting on Arri Alexas and $300,000 glass for much longer.

  • Const-me 8 hours ago |
    The scrolling doesn’t work in my MS Edge so I opened the page in Firefox. Firefox has “Open Video in New Tab” context menu command. When viewed that way, the videos are not that impressive. Horrible visual quality, Egyptian pyramids of random shapes which cast round shadows, etc.

    I have a feeling many AI researchers are trying to fix things which are not broken.

    Game engines are not broken, no reasonable amount of AI TFlops going to approach a professional with UE5. DAWs are not broken, no reasonable amount of AI TFlops going to approach a professional with Steinberg Cubase and Apple Logic.

    I wonder why so many AI researchers are trying to generate the complete output with their models, as opposed to training model to generate some intermediate representation and/or realtime commands for industry-standard software?

  • rougka 8 hours ago |
    Waiting for OpenAI to take this concept and make it into a product
  • qwertox 8 hours ago |
    This is... something different. It will be interesting to see how we will integrate our current 3D tooling into that prompt-based world. Sometimes a "place a button next to the the door" isn't the same as selecting a button and then clicking on the place next to the door, as it is today, or to sculpt a terrain with a brush, all heavily 3D oriented operations, involving transformation matrix calculations, while that promt-based world is build through words.

    The current tooling we have is just way too good to just discard it, think of Maya, Blender and the like. How will these interfaces, with the tools they already provide, enable sculpting these word-based worlds?

    I wonder if some kind of translator will be required, one which precisely instructs "User holds a brush pointing 33° upwards and 56° to the left of the world's x-axis with a brush consisting of ... applied with a strength of ...", or how this will be translated into embeddings or whatever that will be required to communicate with that engine.

    This is probably the most exciting time for the CG industry in decades, and this means a lot, since we've been seeing incredible progress in every area of traditional CG generation. Also a scary time for those who learned the skills and will now occasionally see some random persons doing incredible visuals with zero knowledge of the entire CG pipeline.

  • freedryk 8 hours ago |
    Forget video games. This is a huge step forward for AGI and Robotics. There's a lot of evidence from Neurobiology that we must be running something like this in our brains--things like optical illusions, the editing out of our visual blind spot, the relatively low bandwidth measured in neural signals from our senses to our brain, hallucinations, our ability to visualize 3d shapes, to dream. This is the start of adding all those abilities to our machines. Low bandwidth telepresence rigs. Subatomic VR environments synthesized from particle accelerator data. Glasses that make the world 20% more pleasant to look at. Schizophrenic automobiles. One day a power surge is going to fry your doorbell camera and it'll start tripping balls.
    • pmayrgundter 8 hours ago |
      I can't wait for Schizophrenic automobiles
      • sa-code 8 hours ago |
        There is a fleshed out realisation of this in Cyberpunk 2077. The cab AI is called Delamain

        > Delamain was a non-sentient AI created by the company Alte Weltordnung. His core was purchased by Delamain Corporation of Night City to drive its fleet of taxicabs in response to a dramatic increase in accidents caused by human drivers and the financial losses from the resulting lawsuits. The AI quickly returned Delamain Corp to profitability and assumed other responsibilities, such as replacing the company's human mechanics with automated repair drones and transforming the business into the city's most prestigious and trusted transporting service. However, Delamain Corp executives underestimated their newest employee's potential for growth and independence despite Alte Weltordnung's warnings, and Delamain eventually bought out his owners and began operating all aspects of the company by himself. Although Delamain occupied a legal gray area in Night City due to being an AI, his services were so reliable and sought after that Night City's authorities were willing to turn a blind eye to his status.

        https://cyberpunk.fandom.com/wiki/Delamain_(AI)

        • dekhn 6 hours ago |
          Probably my favorite side quest in the whole game.
    • dheera 8 hours ago |
      > Glasses that make the world 20% more pleasant to look at.

      When AR glasses get good enough to wear all day, I've really been wanting to make a real-life ad blocker.

      • sorokod 7 hours ago |
        hallucinogenics are available right now.
        • hackernewds 44 minutes ago |
          blocks more than ads
    • pelorat 7 hours ago |
      This is akin to navigating a lucid dream, nothing more. Conscious inputs to a visual stream synthesized from long term memory.
      • nomel 4 hours ago |
        > nothing more.

        Consider the use where you seed the first frame from a real world picture, with a prompt that gives it a goal. Not only can you see what might happen, with different approaches, and then pick one, but you can re-seed with real world baselines periodically as you're actually executing that action to correct for anything that changes. This is a great step for real world agency.

        As a person without aphantasia, this is how I do anything mechanical. I picture what will happen, try a few things visually in my head, decide which to do, and then do it for real. This "lucid dream" that I call my imagination is all based on long term memory that made my world view. I find it incredibly valuable. I very much rely on it for my day job, and try to exercise it as much as possible, before, say, going to a whiteboard.

    • smusamashah 7 hours ago |
      This looks like my dream worlds already but more colorful and a bit more detailed. But the way it hallucinates and becomes inconsistent going back and forth the same place is same as dreams.
      • galleywest200 2 hours ago |
        I get mild LSD flashbacks to my time in college when I look at the weird blending of edges that AI video does.
  • erulabs 8 hours ago |
    It’s interesting to me that we continue to see such pressure on video and world generation, despite the fact that for years now we’ve gotten games and movies that have beautiful worlds filled with lousy, limited, poorly written stories. Star Wars movies have looked phenomenal for a decade, full of bland stories we’ve all heard a thousand times.

    Are there any game developers working on infinite story games? I don’t care if it looks like Minecraft, I want a Minecraft that tells intriguing stories with infinite quest generation. Procedural infinite world gen recharged gaming, where is the procedural infinite story generation?

    Still, awesome demo. I imagine by the time my kids are in their prime video game age (another 5 years or so) we will be in a new golden age of interactive story telling.

    Hey siri, tell me the epic of Gilgamesh over 40 hours of gameplay set 50,000 years in the future where genetic engineering has become trivial and Enkidu is a child’s creation.

    • levkk 8 hours ago |
      No Man's Sky is kind of what you're looking for, except you may notice its quests (and worlds) become redundant quickly...I say quickly, but that became the case for me after like 30 hours of game play.
      • jsheard 8 hours ago |
        That's the kicker, LLM driven stories are likely to fall into the same trap that "infinite" procedurally generated games usually do - technically having infinite content to explore doesn't necessarily mean that content is infinitely engaging. You will get bored when you start to notice the same patterns coming up over and over again.

        Procgen games mainly work when the procedural parts are just a foundation for hand-crafted content to sit on, whether that's crafted by the players (as in Minecraft) or the developers (as in No Mans Sky after they updated it a hundred times, or Rougelikes in general).

        • est31 7 hours ago |
          Yeah, generative AI can create cool looking pictures and video but so far it hasn't managed to create infinitely engaging stories. The models aren't there yet.
          • jsheard 7 hours ago |
            I'd argue that the same principle applies to pictures, there are many genres of AI image that are cool the first time you see them, but after you've seen the exactly the same idea rehashed dozens of times with no substantial variety it starts wearing really thin. AI imagery is often recognizable as AI not just because of charactistic flaws like garbled text but because it's so hyper-clichéd.
            • lenocinor 5 hours ago |
              I wonder if there's some threshold to be crossed where it can be surprising for longer. I made a video game name generator long ago that just picks a word (or short phrase) from each of three columns. (The majority of the words / phrases are from me, though many other people have contributed.)

              I haven't added any words or phrases to it in years, but I still use it regularly and somehow it still surprises me. Maybe the Spelunky-type approach can be surprising for longer; that is, make a bunch of hand-curated bits and pick from them randomly: https://tinysubversions.com/spelunkyGen/

    • wongarsu 8 hours ago |
      Dwarf Fortress is the state of the art in procedural interactive story generation. Youtube channels like kruggsmash show how great it is in that role if you actually read all the text.

      But that doesn't translate well to websites, trailers or demos. It's easier to wow people with graphics.

      • BlueTemplar 3 hours ago |
        I think that would be Rimworld, which is laser-focused on this aspect to the point of allowing you to pick different kinds of "narrators" ?

        (Dwarf Fortress being much more focused on generating a whole world.)

    • foolfoolz 8 hours ago |
      we have reliable infinite story generation in PvP multiplayer. if the matchup is fair, every game can be different and exciting. see chess
      • miltonlost 8 hours ago |
        is PvP multiplayer considered a "story"? Is a football game a "story"? I guess if all you consider for story is "things happen", then a PvP match can be a story, but that's stretching what I would consider "story" for a game. That is the story of the match, but it's not in and of itself a plot story.
        • programd 4 hours ago |
          > is PvP multiplayer considered a "story"?

          Consider EVE Online. The stories it generates are Shakespearean and I defy anyone to argue that they have no plot.

          I would go further and predict that stories generated by sufficiently advanced AI can explore much more interesting story landscapes because they need not be bound by the limitations of human experience. Consider what stories can be generated by an AI which groks mathematics humans don't yet fully understand?

          • fwip an hour ago |
            Why would a story about nonsensical mathematics be interesting to a human?
        • wholinator2 4 hours ago |
          I agree, the parent would've been much better suited with the example of PVE/PVP Roleplaying. People make up stories all the time
    • miltonlost 8 hours ago |
      > I want a Minecraft that tells intriguing stories with infinite quest generation. Procedural infinite world gen recharged gaming, where is the procedural infinite story generation?

      You're not gonna get new intriguing stories from AI which only regurgitates what it's stolen. You're going to get a themeless morass without intention.

      I also find it amusing how your example to Siri uses one of the oldest pieces of literature when you also tire of stories heard a thousand times before.

      • 93po 7 hours ago |
        if you do basic chatgpt prompts in late 2024 asking for dynamic story telling, sure, you'll get what you said. it's super dismissive to think that wont get better over time, or that even with the tools today, that you can't get dynamic and interesting stories out of it if you provide it with the proper framework
        • krainboltgreene 7 hours ago |
          > it's super dismissive to think that wont get better over time

          When did we start thinking this way? That things HAVE to get better and in fact to think otherwise is very negative? Is HN under a massive hot hand fallacy delusion?

          • miltonlost 7 hours ago |
            Lots of people want that AI grift money and need to be pollyanna true believers to convince others that models that don't know truth are useful decision makers
          • rjrdi38dbbdb 2 hours ago |
            How could creativity in AI not get better?

            Sure, progress will likely not be linear or without challenges, but we already have the human brain as proof that it is possible.

            • fwip an hour ago |
              Mountains exist, but that doesn't mean we'll ever build a structure the size of Everest.
      • visarga 7 hours ago |
        Actually, all you need to do is to apply structured randomness to get diversity from a LLM. For example in TinyStories paper, a precursor of the Phi models:

        > We collected a vocabulary consisting of about 1500 basic words, which try to mimic the vocabulary of a typical 3-4 year-old child, separated into nouns, verbs, and adjectives. In each generation, 3 words are chosen randomly (one verb, one noun, and one adjective). The model is instructed to generate a story that somehow combines these random words into the story

        You can do the same for generating worlds, just prepare good ingredients and sample at random.

        • miltonlost 7 hours ago |
          A story is not just words crammed together that sound plausible. Is the AI going to know about pacing? About character motivations? About interconnecting disparate plots? That paper sounds like it has a scientist’s conception that a story is just words, and not complex trade offs between the start of a story and its end and middle, complexity and planning that won’t come from any sort of next-token generation.

          These are “stories” in the most vacuous definition possible, one that is just “and then this happened” like a child’s conception of plot

          • wewtyflakes 4 hours ago |
            > Is the AI going to know about pacing? About character motivations? About interconnecting disparate plots?

            For LLMs like GPT-4, this all seems reasonable to account for and assume the LLM is capable of processing, given appropriate guidance/frameworks (of which may be just classical programming).

          • Philpax 3 hours ago |
            > Is the AI going to know about pacing? About character motivations? About interconnecting disparate plots?

            Yes. This is an active research area. See https://github.com/yingpengma/Awesome-Story-Generation, which is not up to date.

    • digging 8 hours ago |
      I think that's a bit of a trap. It's not impossible, but by default we should expect it to make games less fun.

      The better you make this infinite narrative generator, the more complicated the world gets and the less compelling it gets to actually interact with any one story.

      Stories thrive by setting their own context. They should feel important to the viewer. An open world with infinite stories can't make every story feel meaningful to the player. So how does it make any story feel meaningful? I suppose the story would have to be global, in which case, it crowds out the potential for fractal infinite storylines - eventually, all or at least most are going to have to tie back to the Big Bad Guy in order to feel meaningful.

      Local stories would just feel mostly pointless. In Minecraft, all (overworld) locales are equally unimportant. Much like on Earth, why should you care about the random place you appeared in the world? The difference is that on Earth you tend to develop community as you grow and builds connections to the place you live, which can build loyalty. In addition, you only have one shot, and you have real needs that you must fulfill or you die forever. So you develop some otherwise arbitrary loyalties in order to feel security in your needs.

      In Minecraft there's zero pressure to develop loyalty to a place except for your own real-life time. And when that becomes a driving factor, why wouldn't you pick a game designed to respect your time with a self-contained story? (Not that infinite games like Minecraft are bad, but they aren't story-driven for a good reason).

      Now, a game like Dwarf Fortress is different because you build the community, the infrastructure, the things that make you care about a place. But it already has infinite story generation without AI and I'm not sure AI would improve on that model.

      • raincole 6 hours ago |
        > I think that's a bit of a trap. It's not impossible, but by default we should expect it to make games less fun.

        I'd say AAA games have been on track of "less fun" for at least half a decade. So this sounds like a natural next step.

        • digging 4 hours ago |
          That's... a bad thing
      • yesco 5 hours ago |
        I think it's all about how you spin it in, imagine:

        - SimCity where you can read a newspaper about what's happening in your city that actually reflects the events that have occurred with interesting perspectives from the residents.

        - Dwarf Fortress, but carvings, artwork, demons, forbidden beasts, etc get illustrations dynamically generated via stable diffusion (in the style of crude sketches to imply a dwarf made it perhaps?)

        - Dwarf Fortress, again, but the elaborate in-game combat comes with a "narrative summary" which conveys first hand experiences of a unit in the combat log, which while detailed, can be otherwise hard to follow.

        - Any fantasy RPG, but with a minstrel companion who follows you around and writes about what you do in a silly judgy way. The core dialogue could be baked in by the developers but the stories this minstrel writes could be dynamically generated based on the players actions. Example: "He was a whimsical one, who decided to take detour from his urgent hostage rescue mission to hop up and down several hundred times in the woods while trying on various hats he had collected. I have no idea what goes through this mans mind..."

        I'm not sure if there is a word for it, but the kernel here is that everything is indirectly being dictated by the players actions and the games existing systems. The LLM/AI stuff isn't in charge of coming up with novel stories and core content, they are in charge of making the game more immersive by helping with the roleplay. I think this is the area they can thrive the most.

      • shafoshaf 5 hours ago |
        I actually find the same issue with prequels, especially for the ones that really hit a chord (like the original Star Wars). After knowing what is going to happen in those stories, I just can't get invested in a character who I know either makes it for sure, dies before getting to the "main" story, or doesn't matter because they don't have any connection to my canon of the plot arc. Same-universe spins-offs fit this for me as well.

        OTOH, lots of games come with DLC that add new stories with the same mechanics. There might be some additions or changes, but if you really like the mechanics, you can try it with a different plot. Remnant II has sucked a ton of my time because of that.

      • lxgr 4 hours ago |
        > by default we should expect it to make games less fun.

        How so?

        I could totally see generative AI add a ton more variety to crowds, random ambient sentences by NPCs (that are often notoriously just a rotation of a handful of canned lines that get repetitive soon), terrain etc., while still being guided by a human-created high level narrative.

        Imagine being able to actually talk your way out of a tricky situation in an RPG with a guard, rather than selecting one out of a few canned dialogue options. In the background, the LLM could still be prompted by "there's three routes this interaction can take; see which one is the best fit for what the player says and then guide them to it and call this function".

        Worst case, you get a soulless, poorly written game with very eloquent but ultimately uninteresting characters. Some games are already that today – minus the realistic dialogue.

        • digging 4 hours ago |
          > I could totally see generative AI add a ton more variety to crowds, random ambient sentences by NPCs (that are often notoriously just a rotation of a handful of canned lines that get repetitive soon), terrain etc., while still being guided by a human-created high level narrative.

          Yes, sure, but that's not what I was responding to. AI adding detail, not infinite quest lines, is possibly a good use case.

          > Worst case, you get a soulless, poorly written game with very eloquent but ultimately uninteresting characters. Some games are already that today – minus the realistic dialogue.

          Some games, yes... why do we want more of those? Anyway, that's not the worst case. Worst case is incomprehensible dialogue.

    • lifeisstillgood 8 hours ago |
      Ok - you got me.

      That’s actually a use case I can understand- and what’s more I think that humans could generate training data (story “prototypes”?) that somehow (?) expand the phase space of story-types

      Ironic though - we can build AI that could be creative but it’s humans that have to use science and logic because AI cannot?

    • ec109685 8 hours ago |
      Given we have engines that can render complex 3d worlds, can maintain consistency far longer than a minute and simulate physics accurately, why put all that burden on a GenAI world generator like this?

      It seems like it’d be more useful to have the model generate the raw artifacts, world map, etc. and let the engine do the actual rendering.

    • empath75 7 hours ago |
      It only looks like a video game because video game footage is plentiful and cheap.

      Now, imagine training it on thousands of hours of PoV drone footage from Ukraine, and then using that to train autonomous agents.

    • dmarcos 7 hours ago |
      If stories (and AAA games in general) are bland in games is due in large part to how expensive are to produce. Risk tolerance is low.

      If game assets are cheap to generate you’ll see small teams or even solo developers willing to take more creative risks

      • griomnib 7 hours ago |
        Counter point: you’d see a corresponding exponential increase in QA labor, and just like with the web, Steam will be absolutely flooded with slop.

        So I see the most likely outcome is a lot of dogshit and Steam being forced to make draconian moves to protect the integrity of the store.

        • alphabetting 7 hours ago |
          Seems like there's already a lot of slop on steam and I really doubt it will be difficult for quality content to be highlighted even if the amount of games increases 1000x or more
          • dmarcos 6 hours ago |
            Yeah. Video and Youtube is an example. Filtering is not a hard problem. Mega tons of bad stuff doesn’t bother me.
            • miltonlost 6 hours ago |
              Love that Youtube filter that spits out what I should consume. Thank you corporate algorithm for telling me what is a good thing to watch
              • dmarcos 5 hours ago |
                You can subscribe to the channels you like and ignore the rest.
        • jsheard 6 hours ago |
          QAing a game built on a framework where fundamental mechanics are non-deterministic and context-sensitive sounds like a special kind of hell. Not to mention that once you find a bug there's no way to fix it directly, since the source code is an opaque blob of weights, so you just have to RLHF it until it eventually behaves.
          • griomnib 24 minutes ago |
            And meanwhile you’ve used up .1% of humanity’s remaining carbon budget on each round.
        • throwup238 5 hours ago |
          That has been the case since art was first industrialized with the printing press. Most of them don’t survive but a significant fraction, if not the vast majority, of books printed in the first century were trashy novels about King Arthur and other fantasies (we know from publisher records and bibliographies that they were very popular but don’t have detailed sales figures to compare against older content like translated Greek classics). Only a small fraction of content created since then has been preserved because most of it was slop. The good stuff made it into the Western canon over centuries but most of the stuff that survives from that time period were family bibles and archaic translations.

          I don’t see why AI will be any different. All that’s changed is ratio of potential creators to the general population. Most of it is going to be slop regardless because of economic incentives.

          • fwip an hour ago |
            Sturgeon's law says 90% of everything is crap.

            If AI pushes that up to 98%, that means you have to look through 5 times as much crap to get the good stuff.

            • griomnib 26 minutes ago |
              Exactly, “it’s bad now” != “it won’t get worse”.
      • rafaelmn 6 hours ago |
        Or you'll see a flood of shit that's impossible to filter.
        • dmarcos 6 hours ago |
          Thanks to high bandwidth Internet, YouTube and smartphones is easier than ever to produce and distribute high quality video. So much good stuff coming from it.

          Expect something similar if video games, interactive 3D is cheap to produce.

          Filtering is a much easier problem to solve and abundance a preferable scenario.

    • wildermuthn 6 hours ago |
      I love that almost all the responses to your question are, "No! Bad idea!"

      It's a great idea. We want more than an open-world. We want an open-story.

      Open-story games are going to be the next genre that will dominate the gaming industry, once someone figures it out.

      • throwup238 6 hours ago |
        IMO this will be the differentiating feature for the next generation of video game consoles (or the one after that, if we’re due for an imminent PS6/Xbox2 refresh). They can afford to design their own custom TPU-style chip in partnership with AMD/Nvidia and put enough memory on it to run the smaller models. Games will ship with their own fine tuned models for their game world, possibly multiple to handle conversation and world building, inflating download sizes even more.

        I think fully conversational games (voice to voice) with dynamic story lines are only a decade or two away, pending a minor breakthrough in model distillation techniques or consumer inference hardware. Unlike self driving cars or AGI the technology seems to be there, it’s just so new no one has tried it. It’ll be really interesting to see how game designers and writers will wrangle this technology without compromising fun. They’ll probably have to have a full agentic pipeline with artificial play testers running 24/7 just to figure out the new “bugspace”.

        Can’t wait to see what Nintendo does, but that’s probably going to take a decade.

      • spencerflem 6 hours ago |
        From 2018 - https://www.erasmatazz.com/library/interactive-storytelling/...

        "There’s no question in my mind that such software could generate reasonably good murder mysteries, action thrillers, or gothic romances. After all, even the authors of such works will tell you that they are formulaic. If there’s a formula in there, a deep learning AI system will figure it out.

        Therein lies the fatal flaw: the output will be formulaic. Most important, the output won’t have any artistic content at all. You will NEVER see anything like literature coming out of deep learning AI. You’ll see plenty of potboilers pouring forth, but you can’t make art without an artist.

        This stuff will be hailed as the next great revolution in entertainment. We’ll see lots of prizes awarded, fulsome reviews, thick layers of praise heaped on, and nobody will see any need to work on the real thing. That will stop us dead in our tracks for a few decades."

        • fragmede an hour ago |
          there's only really like seven basic plots; man v man, man v nature, man v self, man v society, man v fate/god, man v technology so we should probably just stop writing stories anyway
    • hbn 6 hours ago |
      Creativity is the one area where LLMs are completely unimpressive. They only spit out derivative works of what they’ve been trained on. I’ve never seen an LLM tell a good joke, or an interesting story. It doesn’t know how to subvert expectations, come up with clever twists, etc. they just pump out a refined average of what’s typical.
    • ddtaylor 6 hours ago |
      > It’s interesting to me that we continue to see such pressure on video and world generation, despite the fact that for years now we’ve gotten games and movies that have beautiful worlds

      Those beautiful worlds took a lot of money to make and the studios are smart enough to realize consumers are apathetic/stupid enough to accept much lower quality assets.

      The top end of the AAA market will use this sparingly for the junk you don't spend much time on - stuff the intern was doing before.

      The bottom of the market will use this for virtually everything in their movie-to-game pipeline of throwaway games. These are the games designed just to sucker parents and kids out of $60 every month. The games that don't even follow the story of the movie and likely makes the story worse.

      Strangely enough this is where the industry makes the vast majority of it's day-to-day walking around cash.

    • com2kid 3 hours ago |
      IMHO Humans will still create the overarching stories, what LLMs will do is help fill in the expensive blanks that make adding stories to a world hard.

      For example, right now if you save an entire village from an attacking tribe of orcs, only a handful of NPCs even say anything, just a nice little "thanks for saving our town!" and then 2 villages over the NPCs are completely unaware of a mighty hero literally solo tanking an entire invading army.

      Why is that?

      Well you'd need lots of, somewhat boring but important, dialogue written, and you'd need tons of voice lines recorded.

      Both those are now solvable problems with generative AI. AI generated dialogue is now reasonably high quality, not "main character story arc" high quality, but "idle shop keeper chit chat" quality for sure, it won't break immersion at least. And the quality of writing from AI is fine for 2 or 3 sentences here and there.

      I'll be soon releasing a project showing this off at https://www.tinytown.ai/ the NPC dialogue is generated on a small LLM that can be ran locally, and the secret of even high quality voice models is that they don't require a lot of memory to run.

      I predict that in another 4 or 5 years we'll see a lot of models ran at the edge on video game consoles and home PCs, fleshing out game worlds.

    • hackernewds 43 minutes ago |
      if stories you're needing, there's an LLM I have to sell you
  • Stevvo 8 hours ago |
    You can see artifacts common in screen-space reflections in the videos. I suspect they are not due to the model rendering reflections based on screen-space information, but the model being trained on games that render reflections in such a manner.
  • xavirodriguez 7 hours ago |
    uoou
  • enbugger 7 hours ago |
    Just like with the images, this will never be at good shape to actually use it for real product as it discards details completely leaving generic 3rd person controller animation.

    What this should say to you instead is that stuff is really bad on training data side if you start scraping billions of game streams on internet - hard to imagine if there is a bigger chunk of training data than this. Stagnation incoming.

  • ata_aman 7 hours ago |
    We're about to have on-demand video content and games simply based on prompts. My prediction is we'll have "prompt marketplaces" where you can gen content based on 3rd party prompts (or your own). 3-5 years.
  • smusamashah 7 hours ago |
    Its so much like my lucid dreams where world sometimes stays consistent for a while when I take its control. It's a strange feeling seeing computer hallucinating a world just like I hallucinate a world in dreams.

    This also means that my dreams will keep looking like this iteration of Genie 2, but computer will scale up and the worlds won't look anything like my dreams anymore in next versions (its already more colorful anyway).

    I remember image generation use to look like dreams too in the beginning. Now it doesn't look anything like that.

    • MrTrvp 7 hours ago |
      Soon enough I imagine we'll have dream state to cohesive reality models. Our desires and world events can be dissected and analyzed by fine grain and hint authorities to your intent before you know what they mean to you /s.
  • jckahn 6 hours ago |
    At first I was excited to see a new model, but then I saw no indication that the model is open source so I closed the page.
  • anthonymax 6 hours ago |
    Wow, is this artificial intelligence creating this already?
  • bbstats 6 hours ago |
    who is asking for this?
  • ddtaylor 6 hours ago |
    This is very impressive technology and I am active in this space. Very active. I make an (unreleased) Steam game that helps users create their own games from not knowing how to program. I also (unknowingly) co-authored tools that K12 and university are using to teach game programming.

    For the time being I will gloss over the fact this might just be a consumer facing product for Google that ends up having nothing to do with younger developers.

    I'm torn between two ideas:

    a. Show kids awesome stuff that motivates them to code

    b. Show kids how to code something that might not be as awesome, but they actually made it

    On the one hand you want to show kids something cool and get them motivated. What Google is doing here is certainly capable of doing that.

    On the other hand I want to show kids what they can actually do and empower them. The days of making a game on your own in your basement are mostly dead, but I don't think that means the idea of being someone who can control a large amount of your vision - both technical and non-technical - is important.

    Not everyone is the same either. I have met kids that would never spend a few hours to learn some Python with pygame to get a few rectangles and sprites on screen that might get more interested if they saw something this flashy. But experience also tells me those kids are extremely less likely to get much value from a tool like this beyond entertainment.

    I have a 14 year old son myself and I struggle to understand how he sees the world in this capacity sometimes. I don't understand what he thinks is easy or hard and it warps his expectations drastically. I come from a time period where you would grind for hours at a terminal pecking in garbage from a magazine to see a few seconds of crappy graphics. I don't think there should be meaningless labor attached to programming for no reason, but I also think that creating a "cost" to some degree may have helped us. Given two programs to peck into the terminal, which one do you peck? Very few of us had the patience (and lack of sanity) to peck them all.

  • empiricus 6 hours ago |
    Feed it the inputs from the real world and then it will recreate in its mind a mirror of the world. Some say this is what we do also, we live in a virtual reality created by our minds.
  • wg0 6 hours ago |
    Google is not coming slow... This is magic. As a casual gamer and someone wanting to make my own game, this is black magic.

    Lighting, gravity, character animation and what not internalized by the model... from a single image...!

  • nopinsight 6 hours ago |
    The real goal of this research is developing models that match or exceed human understanding of the 3D world -- a key step toward AGI.

    A key reason why current Large Multimodal Models (LMMs) still have inferior visual understanding compared to humans is their lack of deep comprehension of the 3D world. Such understanding requires movement, interaction, and feedback from the physical environment. Models that incorporate these elements will likely yield much more capable LMMs.

    As a result, we can expect significant improvements in robotics and self-driving cars in the near future.

    Simulations + Limited robot data from labs + Algorithms advancement --> Better spatial intelligence

    which will lead to a positive feedback loop:

    Better spatial intelligence --> Better robots --> More robot deployment --> Better spatial intelligence --> ...

  • andelink 5 hours ago |
    Is this type of on-the-fly graphics generation more expensive than purely text based LLMs? What is the inference energy impact of these types of models?
  • dartos 5 hours ago |
    > Genie 2 can generate consistent worlds for up to a minute, with the majority of examples shown lasting 10-20s.
  • lacoolj 5 hours ago |
    OpenAI launches Sora (quite a while ago now), Google needs to fire back with something else groundbreaking.

    I love the advancement of the tech but this still looks very young and I'd be curious what the underlying output code looks like (how well it's formatted, documented, organized, optimized, etc.)

    Also, this seems oddly related to the recent post from WorldLabs https://www.worldlabs.ai/blog. Wonder if this was timed to compete directly and overtake the related news cycle.

    • whiplash451 4 hours ago |
      I also find the timing vs World Labs demo disturbing.
      • alphabetting 4 hours ago |
        What's disturbing? In all likelihood the close timing was world labs rushing to get their demo out the door knowing this was coming because they wouldn't get nearly the hype they did if this came before.
  • swyx 4 hours ago |
    i was wondering when genie 1 was and... it didtn seem to get much love? https://news.ycombinator.com/item?id=39509937 @dang was there a main thread here?
  • brap 4 hours ago |
    While this is very (very) cool, what is the upside to having a model render everything at runtime, vs. having it render the 3D assets during development (or even JIT), and then rendering it as just another game? I can think of many reasons why the latter is preferable.
    • gavmor 4 hours ago |
      To me, keeping a world state in sync with rapidly changing external state is the most compelling application. Something like dockercraft: https://github.com/docker/dockercraft
  • aussieguy1234 4 hours ago |
    If it can play video games that simulate the laws of physics, could it control a robot in the physical world?
  • dangoodmanUT 3 hours ago |
    this page loads like shit
  • lifeformed 2 hours ago |
    Neat tech but people might mistake this as being useful for game development, where it'll be less helpful than useless.

    Games are about interactions, and this actively works against it. You don't want the model to infer mechanics, the designer needs deep control over every aspect of it.

    People mentioned using this for prototyping a game, but that's completely meaningless. What would it even mean to use this to prototype something? It doesn't help you figure out anything mechanically or visually. It's just, "what if you were an avatar in a world?" What do you do after you run around with your random character controller in your random environments?

    I think the most useful part of this is the world generation part, not the mechanics inference part.

    • serf 2 hours ago |
      >It doesn't help you figure out anything mechanically or visually.

      people sell entire franchises off of a few pre-rendered generic-fantasy still images -- I would have to disagree with the premise that this is useless as a visual concept tool.

      I agree with your notions about integration into an existing game.

  • aya700 2 hours ago |
    مرحبا
  • ingen0s an hour ago |
    So when is Google Glass coming back to spawn this for my pleasure?
  • beeflet an hour ago |
    These game-video models remind me of the dream-like "Mind Game" game described in Ender's Game, because of how it has to spontaneously come up with a new environment to address player input. The game in that book is also described as an AI.
  • david_shi an hour ago |
    "On the back part of the step, toward the right, I saw a small iridescent sphere of almost unbearable brilliance. At first I thought it was revolving; then I realised that this movement was an illusion created by the dizzying world it bounded. The Aleph's diameter was probably little more than an inch, but all space was there, actual and undiminished. Each thing (a mirror's face, let us say) was infinite things, since I distinctly saw it from every angle of the universe. I saw the teeming sea; I saw daybreak and nightfall; I saw the multitudes of America; I saw a silvery cobweb in the center of a black pyramid; I saw a splintered labyrinth (it was London); I saw, close up, unending eyes watching themselves in me as in a mirror; I saw all the mirrors on earth and none of them reflected me; I saw in a backyard of Soler Street the same tiles that thirty years before I'd seen in the entrance of a house in Fray Bentos; I saw bunches of grapes, snow, tobacco, lodes of metal, steam; I saw convex equatorial deserts and each one of their grains of sand..."