24fps * 52 facial 3D marker * 16bit packed delta planar projected offsets (x,y) = 19.968 kbps
And this is done in Unreal games on a potato graphics card all the time:
https://apps.apple.com/us/app/live-link-face/id1495370836
I am sure calling modern heuristics "AI" gets people excited, but it doesn't seem "Magical" when trivial implementations are functionally equivalent. =3
https://www.unrealengine.com/en-US/metahuman
The artifacts in raster image data is nowhere near what a reasonable model can achieve even at low resolutions. =3
On the other hand AI videos can be easily mistaken for people or hyper realistic physical sculptures.
https://img-9gag-fun.9cache.com/photo/aYQ776w_460svvp9.webm
There's something basic about how light works that traditional computer graphics still fails to grasp. Looking at its productions and comparing it to what AI generates is like looking at output of amateur and an artist. Sure, maybe artist doesn't always draw all 5 fingers but somehow captures the essence of the image in seemingly random arrangement of light and dark strokes, while amateur just tries to do their best but fails in some very significant ways.
One could rely on the media encoder to garble output enough to look more plausible (people on potato devices are used to looking at garbage content.) However, at the end of the day the "uncanny valley" effect takes over every-time even for live action data in a auto-generated asset, as the missing data can't be "Magically" recovered with 100% certainty.
Bye =3
In movies it can be done with enough of manual tweaking by artists and a lot of photographic content around to borrow sense of reality from it.
"Potato" devices by which I assume you mean average phones, currently have better resolutions than PCs had very recently and a lot still do (1080p).
And a photo on 480p still looks more real than anything CGI (not AI).
Your signature is hilarious. I won't comment about the reasons because I don't want this whole thread to get flagged.
https://www.youtube.com/watch?v=vJG698U2Mvo
Several 8bit games had their own aesthetic charm, but were at least fun...
Cheers, =3
- Arthur C. Clarke
for the record, I found liveportrait to be well within the uncanny valley. it looks great for ai generated avatars, but the difference is very perceptually noticeable on familiar faces. still it's great.
There is "Is identical", "looks identical" and "has lost sufficient detail to clearly not be the original." - being able to differentiate between these three states is useful.
The other two are variations of lossy.
Calling one of them "perceptually lossless" is cheating, to the disadvantage of algorithms that honestly advertise themselves as lossy while still achieving "looks identical" compression.
It's also used in the first paragraph of the Wikipedia article on the term "transparency" as it relates to data compression.
No doubt encoders and the codecs themselves have improved vastly since then. It would be interesting to see if I could tell the difference in a double-blind test today.
I always intend to figure out how that works, because i don't feel a lot of audiophiles are actually speaking truth in many cases lol. Still, i don't know - i can't remember my sources to figure it out for myself :/
Are you sure? After all, you can effectively summarize meetings in a plain text which is extremely restricted in comparison to the original input. Guaranteed, exact manner of speech and motions and all subtleties should be also included to be fair, but that information is still far limited to fill the 20 kbps bandwidth.
We need far more bandwidth only because we don't yet have an efficient way to reconstruct the input faithfully from such highly condensed information. Whenever we actually could, we ended up having a very efficient lossy algorithm that still preserves enough information for us human. Unless you are strictly talking about the lossless compression---which is however very irrelevant in this particular topic---, we should expect much more compression in the future even though that might not be feasible today.
As an example, crf=18 in libx264 is considered “perceptually lossless” for most video content.
It just means that a person can't readily distinguish between the compressed image and the uncompressed image. Usually because it takes some aspect(s) of the human visual system into account.
[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=per...
"As a rule, strong feelings about issues do not emerge from deep understanding." -Sloman and Fernbach
"no perceived loss" is a perfectly internally consistent and sensible concept and is actually orthogonal to whether it's actually lossless or lossy.
For instance an actually lossless block of data could be perceptually lossy if displayed the wrong way.
In fact, even actual lossless data is always actually lossy, and only ever "perceptually lossless", and there is no such thing as actually lossless, because anything digital is always only a lossy approximation of anything analog. There is loss both at the ADC and at the DAC stage.
If you want to criticize a term for being nonsense misleading dishonest bullshit, then I guess "lossless" is that term, since it never existed and never can exist.
In that scenario it certainly would not be `transparent` ie visually without any lossy artifacts. But your perception of it would look lossless.
The future is going to be weird.
> On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead.
It’s very cool that this is possible, but the compression use case is indeed .. a bit far fetched. A insanely large model requiring the most expensive consumer GPU to run on both ends and at the same time being limited in bandwidth so much (22kbps) is a _very_ limited scenario.
Though, I somewhat doubt even 22kbps is available generally.
BTW This is the best sci-fi book ever.
Definitely worth giving it a try if you're a programmer, just for the fact that it's written by another programmer: the opening scene where they find a bunch of rules written down and just follow them reminds me of ACPI; the discussion of public-key cryptography and shipping drives full of one-time-pad around the galaxy; the "compression scheme" with the video.
It’s certainly true that Vinge doesn’t spend much time on the engineering details, but I find him unusually clear on “imagine if we had this kind of impossible-now technology, but the rest of what we know about physics remained, how would people behave?”
He was, after all, a physics professor.
Rainbow’s End is much clearer on this than his distant future stuff, of course.
Actually, he was a mathematics and computer science teacher at San Diego State University.
Reminds me of the video chat in Metal Gear Solid 1 https://youtu.be/59ialBNj4lE?t=21
This is interesting tech, and the considerations in the introduction are particularly noteworthy. I never considered the possibility of animating 2D avatars with no 3D pipeline at all.
However is does raise an interesting property in that if you are on the spectrum or have ADHD, you only need one headshot of yourself staring directly at the camera and then the capture software can stop you from looking at your taskbar or off into space.