Image Editing with Gaussian Splatting
231 points by Hard_Space a day ago | 60 comments
  • carlosjobim a day ago |
    This is honestly genius. If I understand it correctly, instead of manipulating pixels, you turn any 2D image to a 3D model and then manipulate that model.
    • papamena a day ago |
      Yes! This really feels next-gen. After all, you're not actually interested in editing the 2D image, that's just an array of pixels, you want to edit what it represents. And this approach allows exactly that. Will be very interesting to see where this leads!
      • riggsdk a day ago |
        Or analogous of how you convert audio waveform data into frequencies with the fast-fourier transform, modify it in the frequency spectrum and convert it back into waveform again.

        Their examples does however only look a bit like distorted pixel data. The hands of the children seem to warp with the cloth, something they could have easily prevented.

        The cloth also looks very static despite it being animated, mainly because the shading of it never changes. If they had more information about the scene from multiple cameras (or perhaps inferred from the color data), the Gaussian splat would be more accurate and could even incorporate the altered angle/surface-normal after modification to cleverly simulate the changed specular highlights as it animates.

    • anamexis a day ago |
      The type of 3D model, Gaussian splatting, is also pretty neat and has been getting a lot of attention lately.

      There's been some good previous discussion on it here, like this one:

      Gaussian splatting is pretty cool https://news.ycombinator.com/item?id=37415478

      • kranke155 a day ago |
        Gaussian splatting is clearly going to change a lot of things in 3D assets, surprise to see it doing the same for 2D here.
  • aDyslecticCrow a day ago |
    Now THIS is the kind of shit I signed up for when AI started to become able to understand images properly: no shitty prompt-based generators that puke the most generalised version of every motif while draining the whole illustration industry from life.

    It's just good-ass tooling for making cool-ass art. Hell yes! Finally, there is some useful AI tooling that empowers artistic creativity rather than drains it.

    Pardon the French; I just think this is too awesome for normal words.

    • squigz a day ago |
      Generative art hasn't been what you're describing for a long time.
      • tobr a day ago |
        Hasn’t been, or has been more than?
        • squigz a day ago |
          Has been more than.
    • visarga a day ago |
      Your fault for poor prompting. If you don't provide distinctive prompts you can expect generalised answers
      • aDyslecticCrow a day ago |
        Let's say you want to rotate a cat's head in an existing picture by 5 degrees, as in the most basic example suggested here. No prompt will reliably do that.

        A mesh-transform tool and some brush touchups could. Or this tool could. Diffusion models are too uncontrollable, even in the most basic examples, to be meaningfully useful for artists.

        • 8n4vidtmkvmk a day ago |
          No, but you could rotate the head with traditional tools and then inpaint the background and touch up the neckline. It's not useless, just different.
    • jsheard a day ago |
      Yep, there's a similar refrain amongst 3D artists who are begging for AI tools which can effectively speed up the tedious parts of their current process like retopo and UV unwrapping, but all AI researchers keep giving them are tools which take a text prompt or image and try to automate their entire process from start to finish, with very little control and invariably low quality results.
      • aDyslecticCrow a day ago |
        There have been some really nice AI tools to generate bump and diffusion maps from photos. So you could photograph a wall and get a detailed meshing texture with good light scatter and depth.

        That's the kind of awesome tech that got me into AI in the first place. But then prompt generators took over everything.

        • jsheard a day ago |
          Denoising is another good practical application of AI in 3D, you can save a lot of time without giving up any control by rendering an almost noise-free image and then letting a neural network clean it up. Intel did some good work there with their open source OIDN library, but then genAI took over and now all the research focus is on trying to completely replace precise 3D rendering workflows with diffusion slot machines, rather than continuing to develop smarter AI denoisers.
      • treyd a day ago |
        Because the investors funding development of those AI tools don't want to try to empower artists and give them more freedom, they want to try to replace them.
        • ChadNauseam a day ago |
          The investors want to make money, and if they make a tool that is usable by more people than just experienced 3D artists who are tired of retopologizing their models, that both empowers many more people and potentially makes them more money.

          Aside from that, it's impossible to tools replace artists. Did cameras replace painting? I'm sure they reduced the demand for paintings, but if you want to create art and paint is your chosen medium it has never been easier. If you want to create art and 3D models are your chosen medium, the existence of AI tools for 3D model generation from a prompt doesn't stop you. However, if you want to create a game and you need a 3D model of a rock or something, you're not trying to make "art" with that rock, you're trying to make a game and a 3D model is just something you need to do that.

    • doe_eyes a day ago |
      There's a ton of room for using today's ML techniques to greatly simplify photo editing. The problem is, these are not billion dollar ideas. You're not gonna raise a lot of money at crazy valuations by proposing to build a tool for relighting scenes or removing unwanted to objects from a photo. Especially since there is a good chance that Google, Apple, or Adobe are going to just borrow your idea if it pans out.

      On the other hand, you can raise a lot of money if you promise to render an entire industry or an entire class of human labor obsolete.

      The end result is that far fewer people are working on ML-based dust or noise removal than on tools that are generating made-up images or videos from scratch.

    • CaptainFever a day ago |
      I share your excitement for this tool that assists artists. However, I don't share the same disdain for prompt generators.

      I find it enlightening to view it in the context of coding.

      GitHub Copilot assists programmers, while ChatGPT replaces the entire process. There are pros and cons though:

      GitHub Copilot is hard to use for non-programmers, but can be used to assist in the creation of complex programs.

      ChatGPT is easy to use for non-programmers, but is usually restricted to making simple scripts.

      However, this doesn't mean that ChatGPT is useless for professional programmers either, if you just need to make something simple.

      I think a similar dynamic happens in art. Both types of tools are awesome, they're just for different demographics and have different limitations.

      For example, using the coding analogy: MidJourney is like ChatGPT. Easy to use, but hard to control. Good for random people. InvokeAI, Generative Fill and this new tool is like Copilot. Hard to use for non-artists, but easier to control and customise. Good for artists.

      However, I do find it frustrating how most of the funding in AI art tools goes towards the easy-to-use side, instead of the easy-to-control side (this doesn't seem to be shared by coding, where Copilot is more well-developed than ChatGPT coding). More funding and development to the easy-to-control type would be very welcome indeed!

      (Note, ControlNet is probably a good example as easy-to-control. There's a very high skill ceiling in using Stable Diffusion right now.)

      • aDyslecticCrow a day ago |
        Good analogy. Yes, controllability is severely lacking, which is what makes diffusion models a very bad tool for artists. The current tools, even Photoshop's best attempt to implement them as a tool (smart infill), are situational at best. Artists need controllable specialized tools that simplify annoying operations, not prompt generators.

        As a programmer, I find copilot a pretty decent tool, thanks to its good controllability. ChatGPT is less so, but it is decent for finding the right keywords or libraries i can look up later.

    • TheRealPomax a day ago |
      Except this is explicitly not AI, nor is it even tangentially related to AI. This is a normal graphics algorithm, the kind you get from really smart people working on render-pipeline maths.
      • aDyslecticCrow a day ago |
        > nor is it even tangentially related to AI

        It's not a deep neural network, but it's a machine learning model. In very simple terms, it minimizes a loss from refining an estimated mesh—about as much machine learning as old-school KNN or SVM.

        AI means nothing as a word; it is basically as descriptive as "smart" or "complicated". But yes, it's a very clever algorithm invented by clever people that is finding some nice applications.

        • TheRealPomax 18 hours ago |
          Whether you agree with what it means or not, the word AI most definitely has a meaning today, moreso than ever, and that meaning is not what we (myself included, I have a masters in AI from the before-times) used to use it for. Today, AI exclusively refers to (extremely) large neural networks.
          • aDyslecticCrow 7 hours ago |
            If that is the definition, then I agree; calling this AI would downplay how clever this algorithm really is.

            But most marketing firms disagree. AI has now absorbed the terms "big data" and "algorithm" in many places. The new Ryzen AI processor, Apple intelligence, NVIDIA AI upscaling, and HP AI printer all refer to much smaller models or algorithms.

  • doctorpangloss a day ago |
    When a foreground object is moved, how are the newly visible contents of the background filled?
    • aDyslecticCrow a day ago |
      It probably isn't.

      The most logical use of this is to replace mesh-transform tools in Photoshop or Adobe Illustrator. In this case, you probably work with a transparent map anyway.

      • doctorpangloss a day ago |
        Why do gaussian splats benefit you for mesh transform applications? Name one, and think deeply about what is going on. The applications are generally non-physical transformations, so having a physical representation is worse, not better; and then, the weaknesses are almost always interacting with foreground versus background separation.

        Another POV is, well generative AI solves the issue I am describing, which should question why these guys are so emphatic about their thing not interacting with the generative AI thing. If they are not interested in the best technical solutions, what do they bring to the table besides vibes, and how would they compete against even vibesier vibes?

        • aDyslecticCrow a day ago |
          Mesh transform is extensively used to create animations and warping perspectives. The most useful kind of warping is emulating perspective and rotation. Gaussian splats allow more intelligent warping in perspective without manually moving every vertex by eye.

          Foreground-background separation is entirely uninteresting. Masking manually is relatively easy, and there are good semi-intelligent tools that make it painless. Sure, it's a major topic discussed within AI papers for some reason, but from an artist's perspective, it doesn't matter much. Masking out from the backing is generally step one in any image manipulation process, so why is that a weakness?

    • vessenes a day ago |
      The demos show either totally internal modifications (bouncing blanket changing shape / statue cheeks changing) or isolated with white background images that have been clipped out. Based on the description of how they generate the splats, I think you’d auto select the item out of the background, do this with it, then paste it back.

      The splatting process uses a pretty interesting idea, which is to imagine two cameras, one the current “view” of the image, the other one 180 degrees opposite looking back, but at a “flat” mirror image of the front. This is going to constrain the splats away from having weird rando shapes. You will emphatically not get the ability to rotate something a long a vertical axis here, (e.g. “let me just see a little more of that statue’s other side”). You will instead get a nice method to deform / rearrange.

  • zokier a day ago |
    Isn't it quite a leap to go from single image to usable 3DGS model? The editing part seems relatively minor step afterwards. I thought that 3DGS typically required multiple viewpoints, like photogrammetry.
    • chamanbuga a day ago |
      This is what I initially thought, however, I have already witnessed working demoes of 3DGS when using a single viewpoint, but armed with additional auxiliary data that is contextual relevant to the subject.
    • vessenes a day ago |
      It's not "real" 3D -- the model doesn't infer anything about unseen portions of the image. They get 3D-embedded splats out of their pipeline, and then can do cool things with them. But those splats represent a 2D image, without inferring (literally or figuratively) anything about hidden parts of the image.
    • dheera a day ago |
      Yeah exactly, this page doesn't explain what's going on at all.

      It says it uses a mirror image to do a Gaussian splat. How does that infer any kind of 3D geometry? An image and its mirror are explainable by a simple plane and that's probably what the splat will converge to if given only those 2 images.

  • chamanbuga a day ago |
    I learned about 3D Gaussian Splatting from the research team at work just 2 weeks ago, and they demoed some incredible use cases. This tech will definitely become mainstream in camera technologies.
    • spookie a day ago |
      Having some sort of fast camera view position and orientation computation with colmap + initial point prediction + gaussian splatting for 5 minutes + cloudcompare normal estimation and 3d recon wields some incredible results.

      Much better than nerf in my experience. There's however a need to clean the point cloud yourself and stuff like that.

  • TheRealPomax a day ago |
    These underlines make reading the text pretty difficult, it might be worth making the links a little less prominent to aid legibility.
  • squidsoup a day ago |
    I’ve been exploring some creative applications of Gaussian splats for photography/photogrammetry, which I think have an interesting aesthetic. The stills of flowers on my Instagram if anyone is interested: https://www.instagram.com/bayardrandel
    • echelon a day ago |
      These are great! What software do you use, and what does your pipeline look like?

      If you wanted to capture a full 3D scene, my experience with photogrammetry and NeRFs has been that it requires a tremendously large dataset that is meticulously captured. Are Gaussian splat tools more data efficient? How little data can you get away with using?

      What are the best open source Gaussian Splat tools for both building and presenting? Are there any that do web visualization particularly well?

      I might have to get back into this.

      • squidsoup a day ago |
        Thanks very much! I use Polycam on iOS for photogrammetry and generating Gaussian splats from stills. It seems to work remarkably well, but has a subscription fee (given there's processing on their servers this seems reasonable). Typically to build a splat model takes about 30-50 stills for good results, depending on the subject.

        The only open source tool I use in my workflow is CloudCompare (https://www.danielgm.net/cc/), for editing/cleaning point cloud data.

        For animation I primarily use Touch Designer which is a node based visual programming environment, exporting splats as point clouds, and Ableton/misc instruments for sound.

        No idea about web visualisation, but interesting idea!

        • rburnsanims a day ago |
          Have you tried Kiri Vs polycam?

          I was using kiris dev mode and then running that through nerfstudio to make nerfs and I'm wondering if polycam might give higher quality but I can't seem to find anyone else whose been doing this. I guess I might have to do some tests to compare.

          +1 to this workflow. TouchDesigners point transform TOP is great for aligning too.

          • squidsoup 21 hours ago |
            No afraid not sorry, I've only used Polycam. TouchDesigner is such a pleasure - I can't remember when I last found creative software as fun and interesting to explore.
  • whywhywhywhy 21 hours ago |
    The examples don't look like anything beyond what can be done with the puppet warp effect in Photoshop/After Effects
  • nakedrobot2 7 hours ago |
    this seems like an absolutely terrible idea! I thought this was going to be about editing gaussian splats, which is sorely needed. instead, it's about turning a few pixels into a gaussian splat in order to edit them?! My god, talk about using a nuclear bomb to kill a fly!