Their examples does however only look a bit like distorted pixel data. The hands of the children seem to warp with the cloth, something they could have easily prevented.
The cloth also looks very static despite it being animated, mainly because the shading of it never changes. If they had more information about the scene from multiple cameras (or perhaps inferred from the color data), the Gaussian splat would be more accurate and could even incorporate the altered angle/surface-normal after modification to cleverly simulate the changed specular highlights as it animates.
There's been some good previous discussion on it here, like this one:
Gaussian splatting is pretty cool https://news.ycombinator.com/item?id=37415478
It's just good-ass tooling for making cool-ass art. Hell yes! Finally, there is some useful AI tooling that empowers artistic creativity rather than drains it.
Pardon the French; I just think this is too awesome for normal words.
A mesh-transform tool and some brush touchups could. Or this tool could. Diffusion models are too uncontrollable, even in the most basic examples, to be meaningfully useful for artists.
That's the kind of awesome tech that got me into AI in the first place. But then prompt generators took over everything.
Aside from that, it's impossible to tools replace artists. Did cameras replace painting? I'm sure they reduced the demand for paintings, but if you want to create art and paint is your chosen medium it has never been easier. If you want to create art and 3D models are your chosen medium, the existence of AI tools for 3D model generation from a prompt doesn't stop you. However, if you want to create a game and you need a 3D model of a rock or something, you're not trying to make "art" with that rock, you're trying to make a game and a 3D model is just something you need to do that.
On the other hand, you can raise a lot of money if you promise to render an entire industry or an entire class of human labor obsolete.
The end result is that far fewer people are working on ML-based dust or noise removal than on tools that are generating made-up images or videos from scratch.
I find it enlightening to view it in the context of coding.
GitHub Copilot assists programmers, while ChatGPT replaces the entire process. There are pros and cons though:
GitHub Copilot is hard to use for non-programmers, but can be used to assist in the creation of complex programs.
ChatGPT is easy to use for non-programmers, but is usually restricted to making simple scripts.
However, this doesn't mean that ChatGPT is useless for professional programmers either, if you just need to make something simple.
I think a similar dynamic happens in art. Both types of tools are awesome, they're just for different demographics and have different limitations.
For example, using the coding analogy: MidJourney is like ChatGPT. Easy to use, but hard to control. Good for random people. InvokeAI, Generative Fill and this new tool is like Copilot. Hard to use for non-artists, but easier to control and customise. Good for artists.
However, I do find it frustrating how most of the funding in AI art tools goes towards the easy-to-use side, instead of the easy-to-control side (this doesn't seem to be shared by coding, where Copilot is more well-developed than ChatGPT coding). More funding and development to the easy-to-control type would be very welcome indeed!
(Note, ControlNet is probably a good example as easy-to-control. There's a very high skill ceiling in using Stable Diffusion right now.)
As a programmer, I find copilot a pretty decent tool, thanks to its good controllability. ChatGPT is less so, but it is decent for finding the right keywords or libraries i can look up later.
It's not a deep neural network, but it's a machine learning model. In very simple terms, it minimizes a loss from refining an estimated mesh—about as much machine learning as old-school KNN or SVM.
AI means nothing as a word; it is basically as descriptive as "smart" or "complicated". But yes, it's a very clever algorithm invented by clever people that is finding some nice applications.
But most marketing firms disagree. AI has now absorbed the terms "big data" and "algorithm" in many places. The new Ryzen AI processor, Apple intelligence, NVIDIA AI upscaling, and HP AI printer all refer to much smaller models or algorithms.
The most logical use of this is to replace mesh-transform tools in Photoshop or Adobe Illustrator. In this case, you probably work with a transparent map anyway.
Another POV is, well generative AI solves the issue I am describing, which should question why these guys are so emphatic about their thing not interacting with the generative AI thing. If they are not interested in the best technical solutions, what do they bring to the table besides vibes, and how would they compete against even vibesier vibes?
Foreground-background separation is entirely uninteresting. Masking manually is relatively easy, and there are good semi-intelligent tools that make it painless. Sure, it's a major topic discussed within AI papers for some reason, but from an artist's perspective, it doesn't matter much. Masking out from the backing is generally step one in any image manipulation process, so why is that a weakness?
The splatting process uses a pretty interesting idea, which is to imagine two cameras, one the current “view” of the image, the other one 180 degrees opposite looking back, but at a “flat” mirror image of the front. This is going to constrain the splats away from having weird rando shapes. You will emphatically not get the ability to rotate something a long a vertical axis here, (e.g. “let me just see a little more of that statue’s other side”). You will instead get a nice method to deform / rearrange.
It says it uses a mirror image to do a Gaussian splat. How does that infer any kind of 3D geometry? An image and its mirror are explainable by a simple plane and that's probably what the splat will converge to if given only those 2 images.
Much better than nerf in my experience. There's however a need to clean the point cloud yourself and stuff like that.
If you wanted to capture a full 3D scene, my experience with photogrammetry and NeRFs has been that it requires a tremendously large dataset that is meticulously captured. Are Gaussian splat tools more data efficient? How little data can you get away with using?
What are the best open source Gaussian Splat tools for both building and presenting? Are there any that do web visualization particularly well?
I might have to get back into this.
The only open source tool I use in my workflow is CloudCompare (https://www.danielgm.net/cc/), for editing/cleaning point cloud data.
For animation I primarily use Touch Designer which is a node based visual programming environment, exporting splats as point clouds, and Ableton/misc instruments for sound.
No idea about web visualisation, but interesting idea!
I was using kiris dev mode and then running that through nerfstudio to make nerfs and I'm wondering if polycam might give higher quality but I can't seem to find anyone else whose been doing this. I guess I might have to do some tests to compare.
+1 to this workflow. TouchDesigners point transform TOP is great for aligning too.