Instead, the claim is that it’s “nearly^{TM}” solved, so the proof being an abandoned repo from half a decade ago actually speaks volumes: it’s solved except for the hard part, and nobody knows how to solve the hard part.
I don’t think you can truly “solve” any problem if you think about it.
If you need a model to move, you give it to a rigger. If the mesh is bad, they will first need to remesh it or the rigging wont work right. This is the problem. They will solve this problem using a manual, labor intensive, process. It's not particularly difficult, any 3D artist considering themselves a professional ought to be able to do it. But it's not the sort of thing where you just press a button and turn some knobs and you're done either. It takes a lot of work, and in particularly bad cases it's easier to just start from scratch - remeshing a garbage mesh is indeed harder than modeling from scratch in many cases. Once either of these mechanisms have been applied, the problem will be solved and the rigger can move on to rigging.
So yes, algorithms exist that pretend to remesh, and every professional modeling systems has one built in (because your sales guys don't want to be the ones without one), but professionals do not use them in production environments (my original claim, if you recall) because their results are so bad. Indeed I'm told several meme accounts exist dedicated to how badly they screw things up when folks do try to take the shortcuts.
If this project was aiming to solve the problem (which is possible, as I have just explained), they would not have given up 5 years ago. Because it sure isn't solved now.
It gets you somewhere closer, but not a fix.
Moreover, depending on what you have at hand, the resolution of your remeshing might destroy a LOT of detail or is unable to accomodate thin sections.
Retopo isn't a solved problem. It only is for really basic, convex meshes.
In my field, analog IC design, if we face a wall, we often do some literature review with a colleague and more often than not, results are not relevant for commercial application. Forget about Monte Carlo, sometimes even there aren't full PVT corners.
In research one learns that most (almost all) papers oversell their results and a lot of stuff is hidden in the "Limitations" section. This is a significant problem, but not that big a problem within academia as everybody, at least within the field, knows to take the results with a grain of salt. But those outside academia, or outside the field, often don't take this into account.
Academic papers should be read a bit like marketing material or pitch decks.
The wire frame is going to be unrecognizable-bad.
Still a ways to go.
Expectation vs. reality: https://i.imgur.com/82R5DAc.png
It absolutely does. But great, let's look forward to Printables being ruined by off-model nonsense.
If the topology is a disaster...no.
If you're hand massaging every poly you're rather defeating the purpose.
That's a bit of an overstatement. Fixing normals is far less time consuming than creating a mesh from scratch. This is particularly a win for people who lack the artistic skill to create the meshes in the first place.
I've got a lot of technical 3D skills after using 3DSMax for years as a hobby. Unfortunately I lack the artistic skills to create good looking objects. This would definitely allow me to do things I couldn't before.
> Given a text prompt provided by the user, Stage I creates […] a 3D mesh.
That's a long way of saying, no, I don't think that this introduces a component that specifically goes 2d -> 3d from a single 2d image.
- Image Input to 3D model Output
- 3D model(format) as Input
Question: What is the current state of the art commercially available product in that niche?
But it's using for 3D gen, a model that is more flexible:
It can be conditioned on text or image.
he still needs a moat with its own ecosystem like the iphone
that probably means a bunch of H100's now for this Meta 3D Gen thing, and other yet unnannounced things still incubating in a womb of datasets
Everyone I see text to 3D, it’s ALWAYS textured. That is the obvious give-away that it is still garbage.
Show me text to wireframe that looks good and I’ll get excited.
Yes, there is gaussian splatting, NeRF and derivatives, but their outputs _really don't look good_. It's also necessary to have the surface remeshed if you go through that route, and then you need to retexture it.
Crazy thing being able to see things up to scale and so close up :)
I should've have clarified it, but yes I was talking about the extracted surface geometry.
If you use "*" instead of "_" you can write in italic :) just a thought
Maybe people mistakenly think that most standalone Quest games don't have those maps because they don't work? Well it's not the case. The standalone games (especially on Q2 vs Q3) have just very low performance budget. You strip out what you can to make your game render in 90 fps for each eye (each eye uses a different camera perspective so each frame scene has to be rendered twice).
It does't feel like an expansive world - it's the same few basic building blocks combined in every possible combination. It doesn't feel intentional or interesting.
I mean, yes it's obvious because the GPU is only so powerful. The difference against my Xbox is night-and-day.
But even if VR is unforgiving of it, it's simply what we've got, at least on affordable devices. These models seem to be perfectly fine for current mainstream VR. Maybe Apple Vision is better, I don't know.
The real fun begins when rigging gets automated. Then full AI scene generation of all the models… then add agency… then the trip never ends.
The existing ones:
- Meshy https://www.meshy.ai/ one of the first movers in this space, though it's quality isn't that great
- Rodin https://hyperhuman.deemos.com/rodin newer but folks are saying this is better
- Luma Labs has a 3D generator https://lumalabs.ai/genie but doesn't seem that popular
SAI showed Stable Diffusion 3 pictures of women laying on grass. If you haven’t been following SD3…
https://arstechnica.com/information-technology/2024/06/ridic...
If by "hold for replication outside the specific circumstances of one study" you mean "useful for real world problems" as implied by your previous comment then I don't think you are correct.
From a quick search it seems there are multiple definitions of Reproducibility and Replicability with some using the words interchangeably but the most favorable one I found to what you are saying is this definition:
>Replicability is obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.
>[...]
>In general, whenever new data are obtained that constitute the results of a study aimed at answering the same scientific question as another study, the degree of consistency of the results from the two studies constitutes their degree of replication.[0]
However I think this holds true for a lot of ML research going on. The issue is not that the solutions do not generalize. It's that the solution itself is not useful for most real world applications. I don't see what replicability has to do with it. you can train a given model with a different but similar dataset and you will get the same quality non-useful results. I'm not sure exactly what definition of replicability you are using though if there is one I missed please point it out.
You have to look at this as stepping stone research.
The potential target market is significantly different in scale (I assume, I haven't tried to estimate either). The potential competitors are... already in existence. It seems more likely now that we'll succeed at good 3d-generative-AI then it seemed before we got good 2d-generative-AI that we would succeed at that...
The problem is that the output you get is just baked meshes. If the object connects together or has a few pieces you'll have to essentially undo some of that work. Similar problems with textures as the AI doesn't work normally like other artists do.
All of this is also on top of the output being basically garbage. Input photos ultimately fail in ways that would require so much work to fix it invalidates the concept. By the time you start to get something approaching decent output you've put in more work or money than just having someone make it to begin with while essentially also losing all control over the art pipeline.
[1]: https://github.com/CLAY-3D/OpenCLAY
Or, just throw a PS1 filter on top and make some retro games
Sure, the results are excellent.
> Or, just throw a PS1 filter on top and make some retro games
There's so many creative ways to use these workflows. Consider how much people achieved with NES graphics. The biggest obstacles are tools and marketplaces.
I don’t have time to leave a longer reply, and I still need to read over their entire white paper later tonight, but I’m surprised to see someone who claims to be an artist be convinced that this is “incredible”.
The paper doesn't show topology, UVs or the output texture, so we're left to assume the models look something like what you'd find when using photogrammetry: triangulated blobs with highly segmented UV islands and very large textures. Fine for background elements in a 3D render, but unsuitable for use in a game engine or real-time pipeline.
In my job I've sometimes been given 3D scans and asked to include them in a game. They require extensive cleanup to become usable, if you care about visual quality and performance at all.
That being said, I wonder if the use of signed distance fields (SDFs) results in bad topology.
I saw a paper earlier this week that was recently released that seems to build "game-ready" topology --- stuff that might actually be riggable for animation. https://github.com/buaacyw/MeshAnything
A low poly model with good topology can be very easily subdivided and details extruded for higher definition ala Ian Hubert's famous vending machine tutorial: https://www.youtube.com/watch?v=v_ikG-u_6r0
And of course I'm sure those folks in Shanghai making the Mesh Anything paper did not have access to the datasets or compute power the Meta team had.
I'll use it to upscale 8x all meshes and textures in the original Mafia and Unreal Tournament, write a good bye letter to my family and disappear.
I think the kids will understand when they grow up.
When the person then emerges from this virtual world, it'll be like an egg hatching into a new birth, having learned the lessons in their virtual cocoon.
If you don't like this idea, it's an interesting thought experiment regardless as we can't verify, we're not already in a form of this.
3D has an extremely steep learning curve once you try to do anything non-trivial, especially in terms of asset creation for VR etc. but my real interest is where this leads in terms of real-world items. One of the major hurdles is that in the real-world we aren't as forgiving as we are in VR/games. I'm not entirely surprised to see that most of the outputs are "artistic" ones, but I'm really interested to see where this ends up when we can give AI combined inputs from text/photos/LIDAR etc and have it make the model for a physical item that can be 3D printed.
[1] https://www.technicalchops.com/articles/ai-inputs-and-output...
Me too. My first thought when seeing 2D AI generated images was that 3D would be a logical next step. Compared to pixels, there's so much additional data to work with when training these models I just assumed that 3D would be an easier problem to solve than 2D image generation. You have 3D point data, edges, edge loops, bone systems etc and a lot of the existing data is pretty well labeled too.
Still, getting from still models to something that animates is necessary:(
It will be interesting to see if AI art (and AI 3D models) will mean that we see interesting games instead created by programmers without having to hire any artists.
What I do not look forward to is the predictable spam flood of games created without both artists and programmers.
Rendering the assets by AI or buying them from the asset store is not going to change the number of generic games put out there I think, maybe AI gen can make some of them a bit more unique at best.
Just let the market sort it out. I for one can't wait for the next Cyriak or Sakupen, that can wield the full power of AI assistance to create their unique vision of what a game can be.
Really good games will still employ really good artists.
I'm an artist and a gentleman coder and I'm disgusted and offended by careless work. But I don't think I need to die on the hill of stopping infinite crappy game mills from having access to infinite crappy art.
[edit] I'm also just bitter after years working on pretty great original art / mechanics driven casual games that only garnered tiny devoted fan bases, and so I assume that when it comes to the kinds of long tail copycat games you're talking about, especially with AI art, no one's going to bother playing them anyway.
It seems as odd to me as bemoaning the way word processors let people write novels without even being good typists.
Picasso produced 50,000 paintings in his career[1], about two per day every day. So probably considerably more on some days.
It’s harder to find data on great art from relative novices. But consider the opposite — how much bad art is there from people who put their 10,000 hours or whatever in? I’m willing to believe some correlation between time spent and quality, but I am not willing to believe that tools that make artists more efficient necessarily reduce quality.
1. https://www.guggenheim.org/teaching-materials/selections-fro...
What I look for is that the artist knows what they want and that the ideas they're putting on the page are thoughtful, coherent, original, and well-executed in a style that's unique enough to justify hiring them personally. And the ability to hone ideas into visual form is not innate, nor have I ever seen it successfully done by someone who didn't spend countless hours trying and failing first.
For example, upper management, who spend time looking at and approving art pieces, almost never understand that altering them is going to make them worse. "Add something here" or "take this out", generally undermine the piece when coming from someone not trained and experienced. Writing prompts is much the same as being a manager. You never get exactly the result you expect for what you asked, but that is also because you did not have the exact vision in your own mind of how it would look before it was executed.
Practice is about developing that vision. Once you have that vision, execution is the easy part, and you don't really need a tool to draw it for you. In any case, the tool will not draw it the way you see it.
So yes, a songwriter who's written tons of songs can suddenly write a good one in 30 minutes. Most of my best songs were written longhand with no edits. That happens sometimes after writing hundreds of songs that you throw away.
Similarly, I've been coding for 25 years. Putting my fingers on the keys and typing out code is the easy part. I don't need copilot to do that for me. I don't really need a fancy IDE. What practice gives is the ability to see the best way to do something and how it fits into a larger project.
If a tool could read the artist's mind and draw exactly what the artist sees, it would be crystal clear that 10,000 hours of trial and error in image-making results in a thought process that makes great art possible (if the artist is capable of it at all). The effort is mostly in the process of developing that mental skill set.
1. 3D is actually a broad collection of formats and not a single thing or representation. This is because of the deep relation between surface topology and render performance. 2. 3D is much more challenging to use in any workflow. Its much more physical, and the ergonomics of 3DOF makes it naturally hard to place in as many places as 2D 3. 3D is much more expensive to produce per unit value, in many ways. This is why, for example, almost every indie web comic artist draws in 2D instead of 3D. In an ai first world it might be less “work” but will still be leaps and bounds more expensive.
In my opinion, the media that have the most appeal for genAI are basically (in order)
- images - videos - music - general audio - 2D animation - tightly scoped 3D experiences such as avatars - games - general 3D models.
My conclusion from being in this space was that there’s likely a world where 3D-style videos generated from pixels are more poised to take off than 3D as a data type.
USD was designed for letting very large teams collaborate on making animated movies.
It's actually terrible as an interchange format. e.g. The materials/shading system is both overengineered and underdefined. It is entirely application specific.
For the parts of USD that do work well, we have better and/or more widely supported standards, like GLTF and PBR.
It was a very dumb choice to push this anywhere.
3d need not be so complicated! We've kinda made it complicated but a simplification wave is likely coming.
The big unlock for 3d though will have to be some better compression technology. Games and 3d experiences are absolutely massive.
Probably many areas that we already use 3D assets/texturing for. Maybe objects to fill out an architectural render, CG in movies/TV shows, 3D printing, or just as an inspiration/mock-up to build off of. I'd imagine this generator is less useful for product design/manufacturing at the moment due to lack precise constraints - but maybe once we get the equivalent of ControlNets.
If weights are released, it may also serve as a nice foundation model, or synthetic data generator, for other 3D tasks (including non-generative tasks like defect detection), in the same way Stable Diffusion and Segment Anything have for 2D tasks.
> I cannot see VR replacing the interactions we have. It requires cumbersome, expensive hardware
Currently sure, but it's been a reasonably safe bet that hardware will get smaller and cheaper. Something like the Bigscreen Beyond already has a fairly small form factor.
But, I feel you're basing judgement of a 3D generator on one currently-niche potential use of 3D assets, that being VR/AR user interfaces (and in particular ones intended to replace a phone rather than, for instance, the interactive interfaces within VR games/experiences).
> The potential availability of very expensively (in terms of computing power) generated assets doesn't change that
Even just comparing computing power and not the human labour required, this is probably going to be an extremely cheap way to generate assets. The paper reports 30 seconds for AssetGen, then a further 20 seconds for TextureGen - both being feed-forward generators. They don't mention which GPU, but previous similar models have ran in a couple of minutes on consumer GPUs.
Well played, Meta.