Perceptually lossless (talking head) video compression at 22kbit/s

146 points by skandium 10 hours ago | 85 comments

andrewstuart 8 hours ago |
The more magic AI makes, the less magical the world becomes.
andai 8 hours ago |
?
EarlKing 8 hours ago |
Clearly Sauron is a jealous ringmaker and doesn't like hobbits using his ring to shitpost.
Joel_Mckay 7 hours ago |
Probably just disappointed at the wasted bandwidth:
24fps * 52 facial 3D marker * 16bit packed delta planar projected offsets (x,y) = 19.968 kbps
And this is done in Unreal games on a potato graphics card all the time:
https://apps.apple.com/us/app/live-link-face/id1495370836
I am sure calling modern heuristics "AI" gets people excited, but it doesn't seem "Magical" when trivial implementations are functionally equivalent. =3
scotty79 7 hours ago |
I think the point here is to make it photorealistic which everything apart from AI still fails at superhard.
Joel_Mckay 7 hours ago |
Take a minute to look something up first, and then formulate a more interesting opinion for us to discuss:
https://www.unrealengine.com/en-US/metahuman
The artifacts in raster image data is nowhere near what a reasonable model can achieve even at low resolutions. =3
scotty79 7 hours ago |
I know metahuman. As impressive as it is, when you judge by the standards of game graphics, if you are ever mislead into thinking metahumans are real humans or even real physically existing things it's time to see your eye doctor (and/or do MRI head scan).
On the other hand AI videos can be easily mistaken for people or hyper realistic physical sculptures.
https://img-9gag-fun.9cache.com/photo/aYQ776w_460svvp9.webm
There's something basic about how light works that traditional computer graphics still fails to grasp. Looking at its productions and comparing it to what AI generates is like looking at output of amateur and an artist. Sure, maybe artist doesn't always draw all 5 fingers but somehow captures the essence of the image in seemingly random arrangement of light and dark strokes, while amateur just tries to do their best but fails in some very significant ways.
Joel_Mckay 6 hours ago |
"AI" videos make many errors all the time, but most people are not aware of what to look for... Undetectable CGI is done in film/games all the time, and indeed it takes talent to hide the fact it is fake.
One could rely on the media encoder to garble output enough to look more plausible (people on potato devices are used to looking at garbage content.) However, at the end of the day the "uncanny valley" effect takes over every-time even for live action data in a auto-generated asset, as the missing data can't be "Magically" recovered with 100% certainty.
Bye =3
scotty79 6 hours ago |
Undetectable CGI in games ... right. I don't think you are a gamer.
In movies it can be done with enough of manual tweaking by artists and a lot of photographic content around to borrow sense of reality from it.
"Potato" devices by which I assume you mean average phones, currently have better resolutions than PCs had very recently and a lot still do (1080p).
And a photo on 480p still looks more real than anything CGI (not AI).
Your signature is hilarious. I won't comment about the reasons because I don't want this whole thread to get flagged.
Joel_Mckay 4 hours ago |
I think most "AI" slop content falls under this phenomena:
https://www.youtube.com/watch?v=vJG698U2Mvo
Several 8bit games had their own aesthetic charm, but were at least fun...
Cheers, =3
satvikpendem 7 hours ago |
> Any sufficiently advanced technology is indistinguishable from magic.
- Arthur C. Clarke
HPsquared 7 hours ago |
This is the power of numerical methods.
andrewstuart 7 hours ago |
There’s a finite amount of magic and if AI borrows it here then it must be repaid there.
psychoslave 7 hours ago |
The greatest feat ever: let magic disappear before wonder of understanding.
xyzsparetimexyz 6 hours ago |
Oh shut up. There's plenty of awful uses for ai but this isn't one of them
andai 6 hours ago |
What did you mean by this?
AndrewVos 7 hours ago |
Elon weirdly looks more human than usual in the AI version!
LeoPanthera 7 hours ago |
This is very impressive, but “perceptually lossless” isn’t a thing and doesn’t make sense. It means “lossy”.
high_byte 6 hours ago |
why not? if you change one pixel by one pixel brightness unit it is perceptually the same.
for the record, I found liveportrait to be well within the uncanny valley. it looks great for ai generated avatars, but the difference is very perceptually noticeable on familiar faces. still it's great.
codeflo 6 hours ago |
GP is correct, that’s the definition of “lossy”. We don’t need to invent ever new marketing buzzwords for well-established technical concepts.
AndrewDucker 5 hours ago |
GP is incorrect.
There is "Is identical", "looks identical" and "has lost sufficient detail to clearly not be the original." - being able to differentiate between these three states is useful.
Rygian 4 hours ago |
Lossless means "is identical".
The other two are variations of lossy.
Calling one of them "perceptually lossless" is cheating, to the disadvantage of algorithms that honestly advertise themselves as lossy while still achieving "looks identical" compression.
protimewaster 4 hours ago |
It's a well established term, though. It's been used in academic works for a long time (since at least 1970), and it's basically another term for the notion of "transparency" as it relates to data compression.
TeMPOraL 2 hours ago |
I honestly don't notice this anymore. Advertisers have been using such language since time immemorial, to the point it's pretty much a rule that an adjective with a qualifier means "not actually ${adjective}, but kind of like it in ${specific circumstances}". So "perceptually lossless" just means "not actually lossless, except you couldn't tell it from truly lossless just by looking".
tialaramex 4 hours ago |
Importantly the first one is parameterless, but the second and third are parameterized by the audience. For example humans don't see colour very well, some animals have much better colour gamut, while some can't distinguish colour at all.
travisjungroth 3 hours ago |
Perceptually lossless (nature for dogs) video compression at 15bit/s.
protimewaster 4 hours ago |
But this marketing term has been regularly used in academic papers for nearly 50 years (or probably more), so it seems like it should get a pass IMO.
It's also used in the first paragraph of the Wikipedia article on the term "transparency" as it relates to data compression.
LegionMammal978 4 hours ago |
For one, it doesn't obey the transitive property like a truly lossless process should: unless it settles into a fixed point, a perceptually lossless copy of a copy of a copy, etc., will eventually become perceptually different. E.g., screenshot-of-screenshot chains, each of which visually resembles the previous one, but which altogether make the original content unreadable.
_ZeD_ 6 hours ago |
also are .mp3, yet they are hardly discernible from the originals
rini17 5 hours ago |
not at 22kbit :)
bityard 5 hours ago |
Ability to tell MP3 from the original source was always dependent on encoder quality, bitrate, and the source material. In the mid 2000's, I tried to encode all of my music as MP3. Most of it sounded just fine because pop/rock/alt/etc are busy and "noisy" by design. But some songs (particularly with few instruments, high dynamic range, and female vocals) were just awful no matter how high I cranked the bitrate. And I'm not even an "audiophile," whatever that means these days.
No doubt encoders and the codecs themselves have improved vastly since then. It would be interesting to see if I could tell the difference in a double-blind test today.
comboy 4 hours ago |
These are always fun
https://abx.digitalfeed.net/
https://www.npr.org/sections/therecord/2015/06/02/411473508/...
unshavedyak 34 minutes ago |
iirc there's "easy" (though i don't know them) tests to validate if the signal is lossless or not. When played over speakers for humans, at least.
I always intend to figure out how that works, because i don't feel a lot of audiophiles are actually speaking truth in many cases lol. Still, i don't know - i can't remember my sources to figure it out for myself :/
lifthrasiir 6 hours ago |
It is definitely a thing given a good perceptual metric. The metric even doesn't have to be very accurate if the distortion is highly bounded, like only altering the lowermost bit. It is unfortunate that most commonly used distortion metrics like PSNR are not really that, though.
rini17 5 hours ago |
But that's mathematically impossible, to restore signal from extremely low bitrate stream with any highly bounded distortion. Perhaps only if you have highly restricted set of posible input, which online meetings aren't.
lifthrasiir 5 hours ago |
> Perhaps only if you have highly restricted set of posible input, which online meetings aren't.
Are you sure? After all, you can effectively summarize meetings in a plain text which is extremely restricted in comparison to the original input. Guaranteed, exact manner of speech and motions and all subtleties should be also included to be fair, but that information is still far limited to fill the 20 kbps bandwidth.
We need far more bandwidth only because we don't yet have an efficient way to reconstruct the input faithfully from such highly condensed information. Whenever we actually could, we ended up having a very efficient lossy algorithm that still preserves enough information for us human. Unless you are strictly talking about the lossless compression---which is however very irrelevant in this particular topic---, we should expect much more compression in the future even though that might not be feasible today.
rini17 6 minutes ago |
Okay, did not know you measure distortion that way.
rob74 5 hours ago |
Yeah, all lossy compression could be called "perceptually lossless" if the perception is bad enough...
bux93 4 hours ago |
A family member of mine didn't see the point of 1080p. Turned out they needed cataract surgery and got fancy replacement lenses in their eyes. After that, they saw the point.
tatersolid 5 hours ago |
I read “perceptually lossless” to be equivalent to “transparent”, a more common phrase used in the audio/video codec world. It’s the bitrate/quality at which some large fraction of human viewers can’t distinguish a losslessly-encoded sample and the lossy-encoded sample, for some large fraction of content (constants vary in research papers).
As an example, crf=18 in libx264 is considered “perceptually lossless” for most video content.
Bjartr 5 hours ago |
It may sound like marketing wank, but it does a appear to be an established term of art in academia as far back as 1997 [1]
It just means that a person can't readily distinguish between the compressed image and the uncompressed image. Usually because it takes some aspect(s) of the human visual system into account.
[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=per...
k__ 4 hours ago |
Is this the real-time discussion all over again?
Ladsko 4 hours ago |
Can you propose a better term for the concept then? Perceiving something as lossless is a real world metric that has a proper use case. "Perceptually lossless" does not try to imply that it is not lossy.
ComplexSystems 2 hours ago |
The term for this is "transparency." A codec is "transparent" if people can't tell the difference between the original and the compressed version.
edflsafoiewq an hour ago |
"Transparency" is a fairly annoying term for this in image/video because of the obvious polysemy.
unshavedyak 40 minutes ago |
So it would be `transparent lossy compression`? To this layman `perceptually lossless` sounds more clear, but i understand the issue with the name.
ranger_danger an hour ago |
As there are several patents, published studies, IEEE papers and thousands of google results for the term, I think it's safe to say that many people do not agree with your interpretation of the term.
"As a rule, strong feelings about issues do not emerge from deep understanding." -Sloman and Fernbach
Brian_K_White an hour ago |
It means what it already says for itself, and does not need correcting into incorrectness.
"no perceived loss" is a perfectly internally consistent and sensible concept and is actually orthogonal to whether it's actually lossless or lossy.
For instance an actually lossless block of data could be perceptually lossy if displayed the wrong way.
In fact, even actual lossless data is always actually lossy, and only ever "perceptually lossless", and there is no such thing as actually lossless, because anything digital is always only a lossy approximation of anything analog. There is loss both at the ADC and at the DAC stage.
If you want to criticize a term for being nonsense misleading dishonest bullshit, then I guess "lossless" is that term, since it never existed and never can exist.
unshavedyak 37 minutes ago |
Similar to your points, i also expect `perceptually lossless` to be a valid term in the future with respect to AI. Ie i can imagine a compression which destroys detail, but on the opposite end it uses "AI" to reconstruct detail. Of course though, the AI is hallucinating the detail, so objectively it is lossy but perceptibly it is lossless because you cannot know which detail is incorrect if the ML is doing a good job.
In that scenario it certainly would not be `transparent` ie visually without any lossy artifacts. But your perception of it would look lossless.
The future is going to be weird.
red0point 7 hours ago |
> But one overlooked use case of the technology is (talking head) video compression.
> On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead.
It’s very cool that this is possible, but the compression use case is indeed .. a bit far fetched. A insanely large model requiring the most expensive consumer GPU to run on both ends and at the same time being limited in bandwidth so much (22kbps) is a _very_ limited scenario.
jl6 6 hours ago |
130m parameters isn’t insanely large, even for smartphone memory. The high GPU usage is a barrier at the moment, but I wouldn’t put it past Apple to have 4090-level GPU performance in an iPhone before 2030.
gambiting 6 hours ago |
One cool use would be communication in space - where it's feasible that both sides would have access to high-end compute units but have a very limited bandwidth between each other.
bliteben 5 hours ago |
Wonder if its better than a single color channel hologram though
JamesLeonis 5 hours ago |
Increasingly mobile networks are like this. There are all kinds of bandwidth issues, especially when customers are subject to metered pricing for data.
bityard 5 hours ago |
Bandwidth is not the limitation in space comms, latency is.
cogman10 5 hours ago |
Underwater communications, on the other hand, could use this.
Though, I somewhat doubt even 22kbps is available generally.
omh 6 hours ago |
One use case might be if you have limited bandwidth, perhaps only a voice call, and want to join a video conference. I could imagine dialling in to a conference with a virtual face as an improvement over no video at all.
loa_in_ 4 hours ago |
Staying in contact with someone for hours on metered mobile internet connection comes to mind. Low bandwidth translates to low total data volume over time. If I could be video chatting on one of those free internet SIM cards that's a breakthrough.
loudmax 4 hours ago |
The trade-off may not be worth it today, but the processing power we can expect in the coming years will make this accessible to ordinary consumers. When your laptop or phone or AR headset has the processing power to run these models, it will make more efficient use of limited bandwidth, even if more bandwidth is available. I don't think available bandwidth will scale at the same rate as processing power, but even if it does, the picture be that much more realistic.
Vecr 6 hours ago |
Fire Upon the Deep had more or less this. Story important, so I won't say more. That series in general had absolutely brutal bandwidth limitations.
pastelsky 6 hours ago |
Did not expect to see Emraan Hashmi in this post!
shaan7 6 hours ago |
Indeed! Bollywood makes it to HN xD
JimDabell 5 hours ago |
I got some interesting replies when I suggested this technique here:
https://news.ycombinator.com/item?id=22907718
antiquark 5 hours ago |
Not quite lossless... look at the bicycle seat behind him. When he tilts his head, the seat moves with his hair.
manmal 5 hours ago |
His gaze also doesn’t quite match.
metaphor 2 hours ago |
Very noticeable jitter in bicycle front tire too.
gwd 5 hours ago |
This reminds me of a scene in "A Fire Upon the Deep" (1992) where they're on a video call with someone on another spaceship; but something seems a bit "off". Then someone notices that the actual bitrate they're getting from the other vessel is tiny -- far lower than they should be getting given the conditions -- and so most of what they're seeing on their own screens isn't actual video feed, but their local computer's reconstruction.
miohtama 3 hours ago |
And also it was a deep fake.
BTW This is the best sci-fi book ever.
Retric 3 hours ago |
Might be better if you like space opera style really soft science fiction. I really didn’t enjoy it.
gwd 2 hours ago |
A friend of mine and I both read it about the same time and discussed it afterwards. I thought it was pretty good, he thought it was not that great. What we agreed on was that in spite of there being many fantastic aspects to the book, on the whole it failed to be an awesome novel.
Definitely worth giving it a try if you're a programmer, just for the fact that it's written by another programmer: the opening scene where they find a bunch of rules written down and just follow them reminds me of ACPI; the discussion of public-key cryptography and shipping drives full of one-time-pad around the galaxy; the "compression scheme" with the video.
Boxxed 2 hours ago |
I agree that it was good but not particularly great. A Deepness in the Sky, however, is fantastic -- similar in many aspects but just flat out better all around.
mercutio2 2 hours ago |
Fascinating. Vinge is about the furthest from “soft” sci-fi I can think of. We must have very different definitions of what makes something soft.
It’s certainly true that Vinge doesn’t spend much time on the engineering details, but I find him unusually clear on “imagine if we had this kind of impossible-now technology, but the rest of what we know about physics remained, how would people behave?”
He was, after all, a physics professor.
Rainbow’s End is much clearer on this than his distant future stuff, of course.
opo an hour ago |
>He was, after all, a physics professor.
Actually, he was a mathematics and computer science teacher at San Diego State University.
https://en.wikipedia.org/wiki/Vernor_Vinge
Retric 43 minutes ago |
Soft vs hard is based on how closely the world tracks with modern physics/science. As such even just FTL is soft, let alone everything else that doesn’t fit.
aaronblohowiak 2 hours ago |
It uses technological differences as key plot and setting components not just space as sea, so it is sci fi but it is improbable in many ways so yea “soft” sci fi or more speculative fiction
lern_too_spel an hour ago |
The softness is deceptive. Hard concepts about communication and different types of brains are essential to the plot.
janandonly 27 minutes ago |
I came here to reply just this exactly and found a fellow geek beat me to it. Indeed a brilliant book.
initramfs 5 hours ago |
nice feature for low bandwidth 4G cell systems.
Reminds me of the video chat in Metal Gear Solid 1 https://youtu.be/59ialBNj4lE?t=21
hinkley 2 minutes ago |
[delayed]
MayeulC 4 hours ago |
I like how the saddle in the background moves with the reconstructed head; it probably works better with uncluttered backgrounds.
This is interesting tech, and the considerations in the introduction are particularly noteworthy. I never considered the possibility of animating 2D avatars with no 3D pipeline at all.
up2isomorphism an hour ago |
“Perceptually lossless” is an oxymoron.
Brian_K_White an hour ago |
There is no oxymoron in "no perceived loss".
ranger_danger an hour ago |
As there are several patents, published studies, IEEE papers and thousands of google results for the term, I think it's safe to say that many people do not agree with your interpretation of the term.
jacobgorm an hour ago |
Related Show HN https://news.ycombinator.com/item?id=31516108
hinkley 5 minutes ago |
The second example shown is not perceptually lossless, unless you’re so far on the spectrum you won’t make eye contact even with a picture of a person. The reconstructed head doesn’t look in the same direction as the original.
However is does raise an interesting property in that if you are on the spectrum or have ADHD, you only need one headshot of yourself staring directly at the camera and then the capture software can stop you from looking at your taskbar or off into space.