2025-02-02
First Online Date: 2024-10-09
Date Posted: 2024-12-05
Date Published: 2025-02-01 (It's being "published" at a conference)
> Concretely, our expression reveals differences in 1116 OS-browser combination pairs (94.9 %).
Very cool to see that they've even gone as far as inferring elements like the likelihood of MS Office being installed on your computer by checking the width of a container with the font 'Leelawadee' specified:
> As this font is a non-free Microsoft font for the Thai Language, we do not expect users without Microsoft Office to have it installed
There is lots of really interesting information in here past what you might figure out yourself if you've played around with abusing CSS yourself before. So many things that had just never, and probably would never have, occurred to me to try.
It is definitely worth a read (or skim) over the paper to see the lengths they went to in order to figure out some of the unique elements to fingerprint on.
https://news.ycombinator.com/item?id=32165103
In their case, the (shell of a) font file goes a little further and encodes the version of the teamviewer client that installed it
Try visiting something like https://abrahamjuliot.github.io/creepjs/ [1] on "identical" incognito mobile devices or desktops and you'll get completely different fingerprint ids
[1] this isn't even the best fingerprint extraction out there, just an eas to use open source one, there are some crazy advanced techniques not implemented in it
What IS the best tool? What other techniques do you know of that it doesn't it implement?
> you being logged in in different services that any website can check
how so?
The best fingerprinting tools aren't open source they're anti-botting services like CAPTCHA providers & probably ad networks.
This particular service has implementations for several popular fignerprinting techniques but there are so many ways to measure the same thing that even if your fingerprint looks fine on one test a different test of the same measure could detect it as unique. For example a user font fingerprint could be implemented via JS tests, canvas rendering tests or CSS sheets (like in this paper).
The tests that offer the highest degree of hardware variability and uniqueness that I've seen deal with rendering of test and images over canvas.
> how so?
By loading an image that can only be accessed if you're logged in your google / facebook / twitter accounts and checking if the image request returned an error. There's a repo that implements this for >30 different websites, but I can't remember it's name rn. I'll edit this comment later if I remember what it was called
I don't understand how this would work? Wouldn't there have to be some kind of cookie/storage that is accessible to third parties in order to know this? AFAIK this is exactly what angered people about Flash due to their use of cross-domain capable "super cookies".
Click the explanation & protection sections for info on how it works
I think this also assumes you are not using any kind of isolation for your tabs. What I don't understand though, is how it could figure out that I am logged into google even though I have third-party cookies disabled.
If you go for a stock browser without changing anything - that means you can't install ublock origin, or noscript, or adjust the cookie settings.
If the fingerprint detects you're running your browser in a VM? Because your canvas/webgl stuff reveals a graphics card that is only seen on VMs, or your mouse movement is characteristic of the way host OSes pass mouse movement to guest OSes? That's an unusual characteristic.
If you freeze the VM and everyone else installs updates? Your configuration will gradually become unusual because of its age.
And of course if you've got a 4k screen but you run your VM at 1920x1080, the gain in anonymity has come at the cost of most of your screen real estate.
Also, if you do manage to completely resist tracking by IP address, by cookies, and by browser fingerprints? Your reward is that Cloudflare and Google ReCaptcha will give you endless tedious challenges. ReCaptcha has a special extra-slow mode, specifically to punish people like you. I hope you like clicking fire hydrants!
The Captcha services don't particularly care, since obviously they don't want to punish people on a fresh system. They care far, far more about whether I'm going through a commercial VPN, doubly so if I'm using Tor. But if I'm really worried about IP tracking, I usually run it through my university network.
Of course, a sufficiently-motivated fingerprinting service can surmount any barrier in theory, with typographic analysis and whatnot. But in practice, websites tend not to care to an extraordinary extent.
E.g., I remember one person who was convinced that Google/YouTube used your specific IP address (not just your cookies, or your geo-IP location) as a major part of ad targeting. But lo and behold, the whole VM setup consistently gave me a generic set of ads, as did just wiping my browser cookies. Of course, their explanation was "Of course they're detecting that you're trying to sniff them out, so they're only giving you generic ads to uphold the conspiracy!" As if cookies + geo-IP weren't more than enough for 99.9% of users they want to display ads to.
1. Measure element dimensions and detect installed fonts (measure a piece of text with specific a specific font to see if its installed)
2. CSS functions (e.g calc) that produce different results across browsers/systems
3. Detecting browser-specific CSS property differences (e.g render a file input, measure it)
seems like you have to allow `@container` checks or something similar for this to work in order to then make your network request `#something { background-image: url('/x-browser-y-os-detected'); }`
some browsers try to randomize fignerprintable parameters but that's easy to detect
1) You can fingerprint devices using CSS
2) You can make server calls using CSS to exfiltrate the client-side data
Stopping (2) would limit the utility of (1).