edit: While the title says "personal", Jensen did say this was aimed at startups and similar, so not your living room necessarily.
The only thing it really competes with is the Mac Studio for LocalLlama-type enthusiasts and devs. It isn't cheap enough to dent the used market, nor powerful enough to stand in for bigger cards.
Running a 96GB ram model isn't cheap (often with unified memory 25% is reserved for CPUs), so maybe it will win there.
[0] https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...
Maybe there will be storage options of 1,2,and 4TB and optional 25/100/200/400 GBit interfaces. Or maybe everything except the CPU/GPU is constant, but having a 50%, 75%, or 100% of the CPU/GPU cores so they can bin their chips.
It's basically the successor to the AGX Orin and in line with its pricing (considering it comes with a fast NIC). The AGX Orin had RTX 3050 levels of performance.
I hope to see new Jetsons based on Blackwell sometime in 2026 (they tend to be slow to release those).
It uses other Arm processor cores than Digits, i.e. Neoverse V3AE, the automotive-enhanced version of Neoverse V3 (which is the server core version of Cortex-X4). According to rumors, NVIDIA Thor might have 14 Neoverse V3AE cores in the base version and there is also a double-die version.
The GPU of NVIDIA Thor is also a Blackwell, but probably with a very different configuration than in NVIDIA Digits.
NVIDIA Thor, like Orin, is intended for high reliability applications, like in automotive or industrial environments, unlike NVIDIA Digits, which is made with consumer-level technology.
However we do know that it offers 1/4 the TOPS of the new 5090. It will be less powerful than the $600 5070. Which, of course it will given power limitations.
The only real compelling value is that nvidia memory starves their desktop cards so severely. It's the small opening that Apple found, even though Apple's FP4/FP8 performance is a world below what nvidia is offering. So purely from that perspective this is a winning product, as 128GB opens up a lot of possibilities. But from a raw performance perspective, it's actually going to pale compared to other nvidia products.
At FP32 (and FP16, assuming the consumer cards are still neutered), the 5090 apparently does ~105-107 TFLOPS, and the full GB202 ~125 TFLOPS. That means a non-neutered GB202-based card could hit ~250 TFLOPS of FP16, which lines up neatly with 1 PFLOP of FP4.
In reality, FP4 is more-than-linearly efficient relative to FP32. They quoted FP4 and not FP8 / FP16 for a reason. I wouldn't be too surprised if it doesn't even support FP32, maybe even FP16. Plus, they likely cut RT cores and other graphics-related features, making for a smaller and therefore more power efficient chip, because they're positioning this as an "AI supercomputer" and this hardware doesn't make sense for most graphical applications.
I see no reason this product wouldn't come to market - besides the usual supply/demand. There's value for a small niche and particular price bracket: enthusiasts running large q4 models, cheaper but slower vs. dedicated cards (3x-10x price/VRAM) and price-competitive but much faster vs. Apple silicon. It's a good strategic move for maintaining Nvidia's hold on the ecosystem regardless of the sales revenue.
Bit bit hard to tell what's on offer on the GPU side, I wouldn't be surprised if it was RTX 4070 to 5070 in that range.
If the price/perf is high enough $3k wouldn't be a bad deal, I suspect a Strix Halo (better CPU cores, 256GB/sec memory interface, likely slower GPU cores) will be better price/perf, same max ram for unified memory, and cheaper.
A lot of people have been justifying their Mac Studio or Mac Pro purchases by the potential for running large AI models locally. Project Digits will be much better at that for cheaper. Maybe it won't run compile Chromium as fast, but that's not what it's for.
[0]: https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...
The processor is using completely different cores, and the GPU is somewhere around a 5070 for TOPs.
Seems like the storage will have options, because it's "up to 4TB". Unsure if there will be differently binned CPUs (clock or number of cores). Or connectX optional or at different speeds.
https://www.databricks.com/blog/llm-inference-performance-en...
[0]: https://newsroom.arm.com/blog/arm-nvidia-project-digits-high...
That said, enthusiasts do help drive a lot of the improvements to the tech stack so if they start using this, it’ll entrench NVIDIA even more.
Do we need more of those? We need plumbers and people that know how to build houses. We are completely full on founders and executives.
True passion for one's career is rare, despite the clichéd platitudes ecouraging otherwise. That's something we should encourage and invest in regardless of the field.
I mean, this is awfully close to being "Her" in a box, right?
Also, it’s $3000. For that you could buy subscriptions to OpenAI etc and have the dystopian partner everywhere you go.
Also, I don't particularly want my data to be processed by anyone else.
Or efficiency gains in hardware and software catchup making current price point profitable.
If they is true their path to profitability isn't super rocky. Their path to achieving their current valuation may end up being trickier though!
We still schedule "bi-weekly" meetings.
We can't agree on which way charge goes in a wire.
Have you seen the y-axis on an economists chart?
Plus, YouTube and the Google images is already full of AI generated slop and people are already tired of it. "AI fatigue" amongst majority of general consumers is a documented thing. Gaming fatigues is not.
It is. You may know it as the "I prefer to play board games (and feel smugly superior about it) because they're ${more social, require imagination, $whatever}" crowd.
"The global gaming market size was valued at approximately USD 221.24 billion in 2024. It is forecasted to reach USD 424.23 billion by 2033, growing at a CAGR of around 6.50% during the forecast period (2025-2033)"
Much of the growth in gaming of late has come from exploitive dark patterns, and those dark patterns eventually stop working because users become immune to them.
They did not collapse, they moved to smartphones. The "free"-to-play gacha portion of the gaming market is so successful it is most of the market. "Live service" games are literally traditional game makers trying to grab a tiny slice of that market, because it's infinitely more profitable than making actual games.
>those dark patterns eventually stop working because users become immune to them.
Really? Slot machines have been around for generations and have not become any less effective. Gambling of all forms has relied on the exact same physiological response for millennia. None of this is going away without legislation.
Slot machines are not a growth market. The majority of people wised to them literal generations ago, although enough people remain susceptible to maintain a handful of city economies.
> They did not collapse, they moved to smartphones
Agreed, but the dark patterns being used are different. The previous dark patterns became ineffective. The level of sophistication of psychological trickery in modern f2p games is far beyond anything Farmville ever attempted.
The rise of live service games also does not bode well for infinite growth in the industry as there's only so many hours to go around each day for playing games and even the evilest of player manipulation techniques can only squeeze so much blood from a stone.
The industry is already seeing the failure of new live service games to launch, possibly analogous to what happened in the MMO market when there was a rush of releases after WoW. With the exception of addicts, most people can only spend so many hours a day playing games.
Do I buy a Macbook with silly amount of RAM when I only want to mess with images occasionally.
Do I get a big Nvidia card, topping out at 24gb - still small for some LLMs, but I could occasionally play games using it at least.
No. There's already too much porn on the internet, and AI porn is cringe and will get old very fast.
The cutting edge will advance, and convincing bespoke porn of people's crushes/coworkers/bosses/enemies/toddlers will become a thing. With all the mayhem that results.
This hardware is only good for current-generation "AI".
(example: a thumbnail for a YT video about a video game, featuring AI-generated art based on that game. because copyright reasons, in my very limited experience Dall-E won't let you do that)
I agree that AI porn doesn't seem a real market driver. With 8 billion people on Earth I know it has its fans I guess, but people barely pay for porn in the first place so I reallllly dunno how many people are paying for AI porn either directly or indirectly.
It's unclear to me if AI generated video will ever really cross the "uncanny valley." Of course, people betting against AI have lost those bets again and again but I don't know.
I needed an uncensored model in order to, guess what, make an AI draw my niece snowboarding down a waterfall. All the online services refuse on basis that the picture contains -- oh horrors -- a child.
"Uncensored" absolutely does not imply NSFW.
How so?
Only 40% of gamers use a PC, a portion of those use AI in any meaningful way, and a fraction of those want to set up a local AI instance.
Then someone releases an uncensored, cloud based AI and takes your market?
Titanic - so about to hit an iceberg and sink?
Suppose you're a content creator and you need an image of a real person or something copyrighted like a lot of sports logos for your latest YouTube video's thumbnail. That kind of thing.
I'm not getting into how good or bad that is; I'm just saying I think it's a pretty common use case.
Incredible fumble for me personally as an investor
And if you truly did predict that Nvidia would own those markets and those markets would be massive, you could have also bought Amazon, Google or heck even Bitcoin. Anything you touched in tech really would have made you a millionaire really.
Surely a smaller market than gamers or datacenters for sure.
It’s purely an ecosystem play imho. It benefits the kind of people who will go on to make potentially cool things and will stay loyal.
100%
The people who prototype on a 3k workstation will also be the people who decide how to architect for a 3k GPU buildout for model training.
It will be massive for research labs. Most academics have to jump through a lot of hoops to get to play with not just CUDA, but also GPUDirect/RDMA/Infiniband etc. If you get older/donated hardware, you may have a large cluster but not newer features.
Also why aws is giving trainium credits for free
I have a bit of an interest in games too.
If I could get one platform for both, I could justify 2k maybe a bit more.
I can't justify that for just one half: running games on Mac, right now via Linux: no thanks.
And on the PC side, nvidia consumer cards only go to 24gb which is a bit limiting for LLMs, while being very expensive - I only play games every few months.
Maybe (LP)CAMM2 memory will make model usage just cheap enough that I can have a hosting server for it and do my usual midrange gaming GPU thing before then.
I do hope that a AMD Strix Halo ships with 2 LPCAMM2 slots for a total width of 256 bits.
No one goes to an Apple store thinking "I'll get a laptop to do AI inference".
Performance is not amazing (roughly 4060 level, I think?) but in many ways it was the only game in town unless you were willing and able to build a multi-3090/4090 rig.
Since the current MacOS comes built in with small LLMs, that number might be closer to 50% not 0.1%.
If what you say it's true you were among the first 100 people on the planet who were doing this; which btw, further supports my argument on how extremely rare is that use case for Mac users.
I wonder how it would go as a productivity/tinkering/gaming rig? Could a GPU potentially be stacked in the same way an additional Digit can?
Plus you have fast interconnects, if you want to stack them.
I was somewhat attracted by the Jetson AGX Orin with 64 GB RAM, but this one is a no-brainer for me, as long as idle power is reasonable.
Also, macOS devices are not very good inference solutions. They are just believed to be by diehards.
I don't think Digits will perform well either.
If NVIDIA wanted you to have good performance on a budget, it would ship NVLink on the 5090.
They are good for single batch inference and have very good tok/sec/user. ollama works perfectly in mac.
And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.
Qwen 2.5 32B on openrouter is $0.16/million output tokens. At your 16 tokens per second, 1 million tokens is 17 continuous hours of output.
Openrouter will charge you 16 cents for that.
I think you may want to reevaluate which is the real budget choice here
Edit: elaborating, that extra 16GB ram on the Mac to hold the Qwen model costs $400, or equivalently 1770 days of continuous output. All assuming electricity is free
And log everything too?
i think it isn't about enthusiast. To me it looks like Huang/NVDA is pushing further a small revolution using the opening provided by the AI wave - up until now the GPU was add-on to the general computing core onto which that computing core offloaded some computing. With AI that offloaded computing becomes de-facto the main computing and Huang/NVDA is turning tables by making the CPU is just a small add-on on the GPU, with some general computing offloaded to that CPU.
The CPU being located that "close" and with unified memory - that would stimulate development of parallelization for a lot of general computing so that it would be executed on GPU, very fast that way, instead of on the CPU. For example classic of enterprise computing - databases, the SQL ones - a lot, if not, with some work, everything, in these databases can be executed on GPU with a significant performance gain vs. CPU. Why it isn't happening today? Load/unload onto GPU eats into performance, complexity of having only some operations offloaded to GPU is very high in dev effort, etc. Streamlined development on a platform with unified memory will change it. That way Huang/NVDA may pull out rug from under the CPU-first platforms like AMD/INTC and would own both - new AI computing as well as significant share of the classic enterprise one.
No, they can’t. GPU databases are niche products with severe limitations.
GPUs are fast at massively parallel math problems, they anren’t useful for all tasks.
today. For the reasons like i mentioned.
>GPUs are fast at massively parallel math problems, they anren’t useful for all tasks.
GPU are fast at massively parallel tasks. Their memory bandwidth is 10x of that of the CPU for example. So, typical database operations, massively parallel in nature like join or filter, would run about that faster.
Majority of computing can be parallelized and thus benefit from being executed on GPU (with unified memory of the practically usable for enterprise sizes like 128GB).
https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...
"The GB10 Superchip enables Project DIGITS to deliver powerful performance using only a standard electrical outlet. Each Project DIGITS features 128GB of unified, coherent memory and up to 4TB of NVMe storage. With the supercomputer, developers can run up to 200-billion-parameter large language models to supercharge AI innovation."
https://www.nvidia.com/en-us/data-center/grace-cpu-superchip...
"Grace is the first data center CPU to utilize server-class high-speed LPDDR5X memory with a wide memory subsystem that delivers up to 500GB/s of bandwidth "
As far as i see it is about 4x of Zen 5.
Given workload A how much of the total runtime JOIN or FILTER would take in contrast to the storage engine layer for example? My gut feeling tells me not much since to see the actual gain you'd need to be able to parallelize everything including the storage engine challenges.
IIRC all the startups building databases around GPUs failed to deliver in the last ~10 years. All of them are shut down if I am not mistaken.
How about attaching SSD based storage to NVLink? :) Nvidia does have the direct to memory tech and uses wide buses, so i don't see any issue for them to direct attach arrays of SSD if they feel like it.
>IIRC all the startups building databases around GPUs failed to deliver in the last ~10 years. All of them are shut down if I am not mistaken.
As i already said - model of database offloading some ops to GPU with its separate memory isn't feasible, and those startups confirmed it. Especially when GPU would be 8-16GB while the main RAM can easily be 1-2TB with 100-200 CPU cores. With 128GB unified memory like on GB10 the situation looks completely different (that Nvidia allows only 2 to be connected by NVLink is just a market segmentation not a real technical limitation).
In other words, and hypothetically, if you can improve logical plan execution to run 2x faster by rewriting the algorithms to make use of GPU resources but physical plan execution remains to be bottlenecked by the storage engine, then the total sum of gains is negligible.
But I guess there could perhaps be some use-case where this could be proved as a win.
On the other hand, with a $5000 macbook pro, I can easily load a 70b model and have a "full" macbook pro as a plus. I am not sure I fully understand the value of these cards for someone that want to run personal AI models.
Also I'm unfamiliar with macs is there really a MacBook pro with 256GB of RAM?
Mac Pro [0] is a desktop with M2 Ultra and up to 192GB of unified memory.
Those Macs with unified memory is a threat he is immediately addressing. Jensen is a wartime ceo from the looks of it, he’s not joking.
No wonder AMD is staying out of the high end space, since NVIDIA is going head on with Apple (and AMD is not in the business of competing with Apple).
The fire-breathing 120W Zen 5-powered flagship Ryzen AI Max+ 395 comes packing 16 CPU cores and 32 threads paired with 40 RDNA 3.5 (Radeon 8060S) integrated graphics cores (CUs), but perhaps more importantly, it supports up to 128GB of memory that is shared among the CPU, GPU, and XDNA 2 NPU AI engines. The memory can also be carved up to a distinct pool dedicated to the GPU only, thus delivering an astounding 256 GB/s of memory throughput that unlocks incredible performance in memory capacity-constrained AI workloads (details below). AMD says this delivers groundbreaking capabilities for thin-and-light laptops and mini workstations, particularly in AI workloads. The company also shared plenty of gaming and content creation benchmarks.
[...]
AMD also shared some rather impressive results showing a Llama 70B Nemotron LLM AI model running on both the Ryzen AI Max+ 395 with 128GB of total system RAM (32GB for the CPU, 96GB allocated to the GPU) and a desktop Nvidia GeForce RTX 4090 with 24GB of VRAM (details of the setups in the slide below). AMD says the AI Max+ 395 delivers up to 2.2X the tokens/second performance of the desktop RTX 4090 card, but the company didn’t share time-to-first-token benchmarks.
Perhaps more importantly, AMD claims to do this at an 87% lower TDP than the 450W RTX 4090, with the AI Max+ running at a mere 55W. That implies that systems built on this platform will have exceptional power efficiency metrics in AI workloads.
Strix Halo is a replacement for the high-power laptop CPUs from the HX series of Intel and AMD, together with a discrete GPU.
The thermal design power of a laptop CPU-dGPU combo is normally much higher than 120 W, which is the maximum TDP recommended for Strix Halo. The faster laptop dGPUs want more than 120 W only for themselves, not counting the CPU.
So any claims of being surprised that the TDP range for Strix Halo is 45 W to 120 W are weird, like the commenter has never seen a gaming laptop or a mobile workstation laptop.
Normally? Much higher than 120W? Those are some pretty abnormal (and dare I say niche?) laptops you're talking about there. Remember, that's not peak power - thermal design power is what the laptop should be able to power and cool pretty much continuously.
At those power levels, they're usually called DTR: desktop replacement. You certainly can't call it "just a laptop" anymore once we're in needs-two-power-supplies territory.
I do not know which is the proportion of gaming laptops and mobile workstations vs. thin and light laptops. While obviously there must be much more light laptops, the gaming laptops cannot be a niche product, because there are too many models offered by a lot of vendors.
My own laptop is a Dell Precision, so it belongs to this class. I would not call Dell Precision laptops as a niche product, even if they are typically used only by professionals.
My previous laptop was some Lenovo Yoga that also belonged to this class, having a discrete NVIDIA GPU. In general, any laptop having a discrete GPU belongs to this class, because the laptop CPUs intended to be paired with discrete GPUs have a default TDP of 45 W or 55 W, while the smallest laptop discrete GPUs may have TDPs of 55 W to 75 W, but the faster laptop GPUs have TDPs between 100 W and 150 W, so the combo with CPU reaches a TDP around 200 W for the biggest laptops.
I can't find the exact Youtube video, but it's out there.
I think this is a race that Apple doesn't know it's part of. Apple has something that happens to work well for AI, as a side effect of having a nice GPU with lots of fast shared memory. It's not marketed for inference.
This is a genius move. I am more baffled by the insane form factor that can pack this much power inside a Mac Mini-esque body. For just $6000, two of these can run 400B+ models locally. That is absolutely bonkers. Imagine running ChatGPT on your desktop. You couldn’t dream about this stuff even 1 year ago. What a time to be alive!
About that... Not like there isn't a lot to be desired from the linux drivers: I'm running a K80 and M40 in a workstation at home and the thought of having to ever touch the drivers, now that the system is operational, terrifies me. It is by far the biggest "don't fix it if it ain't broke" thing in my life.
Xeon Phi failed for a number of reasons, but one where it didn't need to fail was availability of software optimised for it. Now we have Xeons and EPYCs, and MI300C's with lots of efficient cores, but we could have been writing software tailored for those for 10 years now. Extracting performance from them would be a solved problem at this point. The same applies for Itanium - the very first thing Intel should have made sure it had was good Linux support. They could have it before the first silicon was released. Itaium was well supported for a while, but it's long dead by now.
Similarly, Sun has failed with SPARC, which also didn't have an easy onboarding path after they gave up on workstations. They did some things right: OpenSolaris ensured the OS remained relevant (still is, even if a bit niche), and looking the other way for x86 Solaris helps people to learn and train on it. Oracle cloud could, at least, offer it on cloud instances. Would be nice.
Now we see IBM doing the same - there is no reasonable entry level POWER machine that can compete in performance with a workstation-class x86. There is a small half-rack machine that can be mounted on a deskside case, and that's it. I don't know of any company that's planning to deploy new systems on AIX (much less IBMi, which is also POWER), or even for Linux on POWER, because it's just too easy to build it on other, competing platforms. You can get AIX, IBMi and even IBMz cloud instances from IBM cloud, but it's not easy (and I never found a "from-zero-to-ssh-or-5250-or-3270" tutorial for them). I wonder if it's even possible. You can get Linux on Z instances, but there doesn't seem to be a way to get Linux on POWER. At least not from them (several HPC research labs still offer those).
Sad to see big companies like intel and amd don't understand this but they've never come to terms with the fact that software killed the hardware star
And it's not like they were never bitten (Intel has) by this before.
Still unforgivable that their new CPUs hit the market without excellent Linux support.
A real shame it's not running mainline Linux - I don't like their distro based on Ubuntu LTS.
I have to agree the desktop experience of the Mac is great, on par with the best Linuxes out there.
Windows has always been a barrier to hardware feature adoption to Intel. You had to wait 2 to 3 years, sometimes longer, for Windows to get around us providing hardware support.
Any OS optimizations in Windows you had to go through Microsoft. So say you added some instructions custom silicon or whatever to speed up Enterprise databases, provide high-speed networking that needed some special kernel features, etc, there was always Microsoft being in the way.
Not just in the drag the feet communication. Getting the tech people a line problem.
Microsoft will look at every single change. It did as to whether or not it would challenge their Monopoly whether or not it was in their business interest whether or not it kept you as the hardware and a subservient role.
Amd/Intel work directly with Microsoft for shipping new silicon that would otherwise require it.
Now they have some competition. This is relatively new, and Satya Nadella reshaped the company because of that.
IBM should see some entry-level products as loss leaders.
Not sure it'd competitive in price with other workstation class machines. I don't know how expensive IBM's S1012 desk side is, but with only 64 threads, it'd be a meh workstation.
They propelled on unexpected LLM boom. But plan 'A' was robotics in which NVidia invested a lot for decades. I think their time is about to come, with Tesla's humanoids for 20-30k and Chinese already selling for $16k.
0. https://www.macstadium.com/blog/m4-mac-mini-review
1. https://www.apple.com/mac/compare/?modelList=Mac-mini-M4,Mac...
I’m so tired of this recent obsession with the stock market. Now that retail is deeply invested it is tainting everything, like here on a technology forum. I don’t remember people mentioning Apple stock every time Steve Jobs made an announcement in the past decades. Nowadays it seems everyone is invested in Nvidia and just want the stock to go up, and every product announcement is a mean to that end. I really hope we get a crash so that we can get back to a more sane relation with companies and their products.
That's the best time to buy. ;)
Apple M chips are pretty efficient.
Here's a link to the part of the keynote where he says this:
WSL1 was "Linux API on top of NT kernel picoprocesses", WSL2 is "Linux VM on top of Hyper-V"
https://www.microsoft.com/en-us/research/project/drawbridge/
https://learn.microsoft.com/en-us/archive/blogs/wsl/windows-...
https://www.zdnet.com/article/under-the-hood-of-microsofts-w...
?
Yeah starting at $3,000. Surely a cheap desktop computer to buy for someone who just wants to surf the web and send email /s.
There is a reason why it is for "enthusiasts" and not for the general wider consumer or typical PC buyer.
That end of the market is occupied by Chromebooks... AKA a different GNU/Linux.
For general desktop use, as you described, nearly any piece of modern hardware, from a RasPI, to most modern smartphones with a dock, could realistically serve most people well.
The thing is, you need to serve both, low-end use cases like browsing, and high-end dev work via workstations, because even for the "average user", there is often one specific program on which they need to rely and which has limited support outside the OS they have grown up with. Course, there will be some programs like Desktop Microsoft Office which will never be ported, but still, Digitis could open the doors to some devs working natively on Linux.
A solid, compact, high-performance, yet low power workstation with a fully supported Linux desktop out of the box could bridge that gap, similar to how I have seen some developers adopt macOS over Linux and Windows since the release of the Studio and Max MacBooks.
Again, we have yet to see independent testing, but I would be surprised if anything of this size, simplicity, efficiency and performance was possible in any hardware configuration currently on the market.
A Nvidia Project Digit/GB10 for $3k with 128GB ram does sound tempting. Especially since it's very likely to have standard NVMe storage that I can expand or replace as needed, unlike the Apple solution. Decent linux support is welcome as well.
Here's hoping, if not I can fall back to a 128GB ram AMD Strix Halo/395 AI Max plus. CPU perf should be in the same ballpark, but not likely to come anywhere close on GPU performance, but still likely to have decent tokens/sec for casual home tinkering.
Did see vague claims of "starting at $3k", max 4TB nvme, and max 128GB ram.
I'd expect AMD Strix Halo (AI Max plus 395) to be reasonably competitive.
It's the best "dev board" setup I've seen so far. It might be part of their larger commercial plan but it definitely hits the sweet spot for the home enthusiast who have been pleading for more VRAM.
[0]: https://newsroom.arm.com/blog/arm-nvidia-project-digits-high...
For programs dominated by irregular integer and pointer operations, like software project compilation, 10 Arm Cortex-X925 + 10 Cortex-A725 should have a similar throughput with a 16-core Strix Halo, but which is faster would depend on cooling (i.e. a Strix Halo configured for a high power consumption will be faster).
There is not enough information to compare the performance of the GPUs from this NVIDIA Digits and from Strix Halo. However, it can be assumed that NVIDIA Digits will be better for ML/AI inference. Whether it can also be competitive for training or for graphics remains to be seen.
Are you projecting based on Arm's stated improvements from their last gen? In that case, what numbers are you using as your baseline?
That means a total of 80 execution pipelines for NVIDIA Digits, 48 execution pipelines for Snapdragon Elite and 128 equivalent execution pipelines for Strix Halo, taking into account only the complete execution pipelines, otherwise for operations like FP addition, which can be done in any pipeline, there would be 256 equivalent execution pipelines for Strix Halo.
Because the clock frequencies for multithreaded applications should be similar, if not better for Strix Halo, there is little doubt that the throughput for applications dominated by array operations should be at least 128/80 for Strix Halo vs. NVIDIA Digits, if not much better, because for many instructions even more execution pipelines are available and Zen 5 also has a higher IPC when executing irregular code, especially vs. the smaller Cortex-A725 cores. Therefore the throughput of NVIDIA Digits is smaller or at most equal in comparison with the throughput of 10 cores of Strix Halo.
On the other hand, for integer/pointer processing code, the number of execution units in a Cortex-925 + a Cortex-725 is about the same as in 2 Zen 5 cores. Therefore the 20 Arm cores of NVIDIA Digits have about the same number of execution units as 20 Zen 5 cores. Nevertheless, the occupancy of the Zen 5 execution units will be higher for most programs than for the Arm cores, especially because of the bigger and better cache memories, and also because of the lower IPC of Cortex-A725. Therefore the 20 Arm cores must be slower than 20 Zen 5 cores, probably only equivalent with about 15 Zen 5 cores, but the exact equivalence is hard to predict, because it depends on the NVIDIA implementation of things like the cache memories and the memory controller.
Assuming they are not limited by power or heat dissipation I would say that is about as good as it gets.
The hardware is pretty damn good. I am only worried about the software.
NVidia works closely with Microsoft to develop their cards, all major features come first in DirectX, before landing on Vulkan and OpenGL as NVidia extensions, and eventually become standard after other vendors follow up with similar extensions.
Wait, what do you mean exactly? Isn't WSL2 just a VM essentially? Don't you mean it'll run on Linux (which you also can run on WSL2)?
Or will it really only work with WSL2? I was excited as I thought it was just a Linux Workstation, but if WSL2 gets involved/is required somehow, then I need to run the other direction.
Edit: Sorry fucked up my math. I wanted to do 40x52x4, $4/hr being the cloud compute price but that us actually $8300, so it is actually equivalent to about 4.5 months of cloud compute. 40 hours because I presume that this will only be used for prototyping and debugging, i.e during office hours.
This is a marketplace, not cloud pricing.
If this thing was available six months ago I would have bought it instead!
However apparently 10 of the cores are the Cortex-X925 CPUs, which are a serious upgrade. Basically 10 performance cores and 10 efficiency cores that should be pretty competitive with any current apple CPU.
Seems close enough that it might well come down to if your application uses SVE (which the X925 has) or SME (which apple has). I believe generally SVE is much easier to use without using Apple proprietary libraries.
Of if you need significant memory bandwidth, apple M4 peaks at around 200GB/sec or so, the other 300GB/sec or so is available for the GPUs.
Seems quite plausible that 10 * x925 and 10 * A725 might well be more collective performance than apples 12 p-cores + 4 e-cores. But sure it's a bit early to tell and things like OS, kernel, compiler, thermal management, libraries, etc will impact actual real world performance.
Generally I'd expect the Nvidia Project Digit 10 p cores + 10 e cores + healthy memory system to be in the same ball park as the apple M4 max.
They mention 1 PFLOP for FP4, GB200 is 40 PFLOP.
Specs we’ve seen suggest the GB10 features a 20-core Grace CPU and a GPU that packs manages a 40th the performance of the twin Blackwell GPUs used in Nvidia’s GB200 AI server.
It's a garden hermit. Imagine a future where everyone has one of those(not exactly this version but some future version), it lives with you it learns with you and unlike the cloud based SaaS AI you can teach it things immediately and diverge from the average to your advantage.
In the past, in Europe, some wealthy people used to look after of a scholar living on their premises so they can ask them questions etc.
> Later, suggestions of hermits were replaced with actual hermits – men hired for the sole purpose of inhabiting a small structure and functioning as any other garden ornament.
In the Furiosa context, it's a bit like a medicine man or shaman, then. A private, unreliable source of verbal hand me downs, whose main utility is to make elites feel like they have access to knowledge without needing to acquire it for themselves or question its veracity.
We really are entering a new dark age.
All the indicators are there:
Instead of leaders like Charlemagne who unified the Frankish domain, stabilized society, and promoted education and culture, we now have leaders who want to dismantle society, education and use culture for wars.
Long-distance ocean trade routes since the 1950s have taken international commerce to another level for humans, but this is being challenged now by aging/leaking tankers, unruly piracy at transit choke points, communication cable destruction, etc.
Loss of interest in classical learning and the arts where dystopian, murder or horror movies, music and books now are the best sellers as WW3 seems to be on many people's minds now.
While innovations are still occurring for improved navigation and agricultural productivity, the Earth's ecosystem collapse is in full effect.
I wish it could reversed somehow.
I mean, fair. Very bad hermit-ing.
(Terry Pratchett has a fun parody of this in one of the Discworld books; the garden hermit gets two weeks' holidays a year, which he spends in a large city.)
Maybe it will still make sense to have your personal AI in some data center, but on the other hand, there is the trend of governments and mega corps regulating what you can do with your computer. Try going out of the basics, try to do something fun and edge case - it is very likely that your general availability AI will refuse to help you.
when it is your own property, you get the chance to overcome restrictions and develop the thing beyond the average.
As a result, having something that can do things that no other else can do and not having restrictions on what you can do with this thing can become the ultimate superpower.
Personally I think Strix Halo workstations may come with expendable memory, storage and free PCIe slots. But then you have to deal with ROCm...
Ideally we can configure things like Apple Intelligence to use this instead of OpenAI and Apple's cloud.
But it's clear that everyone's favorite goal is keretsuification. If you're looking for abnormal profits, you can't do better than to add a letter to FAANG. Nvidia already got into the cloud business, and now it's making workstations.
The era of specialists doing specialist things is not really behind us. They're just not making automatic money, nor most of it. Nvidia excelled in that pool, but it too can't wait to leave it. It knows it can always fail as a specialist, but not as a kereitsu.
https://www.okdo.com/wp-content/uploads/2023/03/jetson-agx-o...
I wonder what the specifications are in terms of memory bandwidth and computational capability.
That said, you can probably boot a Debian or Gentoo system using the Nvidia provided kernel if need be.
Anyone willing to guess how wide?
"According to the Grace Blackwell's datasheet- Up to 480 gigabytes (GB) of LPDDR5X memory with up to 512GB/s of memory bandwidth. It also says it comes in a 120 gb config that does have the full fat 512 GB/s."
via https://www.reddit.com/r/LocalLLaMA/comments/1hvj1f4/comment...
"up to 512GB/s of memory bandwidth per Grace CPU"
https://resources.nvidia.com/en-us-data-center-overview/hpc-...
I'd be happy to be wrong, but I don't see anything from Nvidia that implies a 512 bit wide memory interface on the Nvidia Project DIgits.
This is more accurately a descendant of the HPC variants like the article talks about - intentionally meant to actually be a useful entry level for those wanting to do or run general AI work better than a random PC would have anyways.
I'm mildly skeptical about performance here: they aren't saying what the memory bandwidth is, and that'll have a major impact on tokens-per-second. If it's anywhere close to the 4090, or even the M2 Ultra, 128GB of Nvidia is a steal at $3k. Getting that amount of VRAM on anything non-Apple used to be tens of thousands of dollars.
(They're also mentioning running the large models at Q4, which will definitely hurt the model's intelligence vs FP8 or BF16. But most people running models on Macs runs them at Q4, so I guess it's a valid comparison. You can at least run a 70B at FP8 on one of these even with fairly large context size, which I think will be the sweet spot.)
Do I understand that right? It seems way to cheap.
Main issue is the ram they’re using here isn’t the same as is in GPUs
This isn't competing with cloud, it's competing with Mac Minis and beefy GPUs. And $3000 is a very attractive price point in that market.
I do appreciate that my MBP can run models though!
I get what you're saying, but there are also regulations (and your own business interest) that expects data redundancy/protection which keeping everything on-site doesnt seem to cover
tinybox red and green are for people looking for a quiet home/office machine. tinybox pro is for people looking for a loud compact rack machine.” [0]
>the size of several ATX desktops
For $40,000, a Tinybox pro is advertised as offering 1.36 petaflops processing and 192 GB VRAM.
For about $6,000 a pair of Nvidia Project Digits offer about a combined 2 petaflops processing and 256 GB VRAM.
The market segment for Tinybox always seemed to be people that were somewhat price-insensitive, but unless Nvidia completely fumbles on execution, I struggle to think of any benefits of a Tinygrad Tinybox over an Nvidia Digits. Maybe if you absolutely, positively, need to run your OS on x86.
I'd love to see if AMD or Intel has a response to these. I'm not holding my breath.
2 PFLOPS at FP4.
256 GB RAM, not VRAM. I think they haven't specified the memory bandwidth.
Also, the Tinybox's memory bandwidth is 8064 GB/s, while the Digits seems to be around 512 GB/s, according to speculation on Reddit.
Moreover, Nvidia's announced their RTX 5090s priced at $2k, which could put downward pressure on the price of Tinybox's 4090s. So the Tinybox green or pro models might get cheaper, or they might come out with a 5090-based model.
If you're the kind of person that's ready to spend $40k on a beastly ML workstation, there's still some upside to Tinybox.
It’s obviously not guaranteed to go this route, but an LLM (or similar) on every desk and in every home is a plausible vision of the future.
One can only wish for this, but Nvidia would be going against the decades-long trend to emaciate local computing in favor of concentrating all compute on somebody else's linux (aka: cloud).
Also I consider this a dev board. Soon this tech will be everywhere, in our phones, computers...
You could already plug that to your home assistant and have your own Star Trek computer you can ask questions from. And NVIDIA seems to know this is the future and they were the first in the market.
This is really game changer.
They should make a deal with Valve to turn this into 'superconsole' that can run Half Life 3 (to be announced) :)
I'm bracing for a whole new era of unsufferable binary blobs for Linux users, and my condolences if you have a non-ultramainstream distro.
MediaTek, a market leader in Arm-based SoC designs, collaborated on the design of GB10, contributing to its best-in-class power efficiency, performance and connectivity.
I assume that means USB and such peripherals is MediaTek IP, while the Blackwell GPU and Grace CPU is entirely NVIDIA IP.
That said, NVIDIA hasn't been super-great with the Jetson series, so yeah, will be interesting to see what kind of upstream support this gets.
[1]: https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...
The owner of the market, Illumina, already ships their own bespoke hardware chips in servers called DRAGEN for faster analysis of thousands of genomes. Their main market for this product is in personalised medicine, as genome sequencing in humans is becoming common.
Other companies like Oxford Nanopore use on-board GPUs to call bases (i.e., from raw electric signal coming off the sequencer to A, T, G, C) but it's not working as well as it could due to size and power constraints. I feel like this could be a huge game changer for someone like ONT, especially with cooler stuff like adaptive sequencing.
Other avenues of bioinformatics, such as most day-to-day analysis software, is still very CPU and RAM heavy.
It is of course possible that these chips enable analyses that are currently not possible/prohibited by cost, but at least for now, this will not be the limiting factor for genomics, but cost of sequencing (which is currently $400-500 per genome)
I've worked in a project some years ago where we were using data from genome sequencing of a bacteria. Every sequenced sample was around 3GB of data and sample size was pretty small with only about 100 samples to study.
I think the real revolution will happen because code generation through LLMs will allow biologists to write 'good enough' code to transform, process and analyze data. Today to do any meaningful work with genome data you need a pretty competent bioinformatician, and they are a rare breed. Removing this bottleneck is what will allow us to move faster in this field.
At $3,000, it will be considerably cheaper than alternatives available today (except for SoC boards with extremely poor performance, obviously). I also expect that Nvidia will use its existing distribution channels for this, giving consumers a shot at buying the hardware (without first creating a company and losing consumer protections along the way).
$3000 gets me a 64-core Altra Q64-22 from a major-enough SI today: https://system76.com/desktops/thelio-astra-a1-n1/configure
And of course if you don't care about the SI part, then you can just buy that motherboard & CPU directly for $1400 https://www.newegg.com/asrock-rack-altrad8ud-1l2t-q64-22-amp... with the 128-core variant being $2400 https://www.newegg.com/asrock-rack-altrad8ud-1l2t-q64-22-amp...
For certain applications, e.g. for those with many array operations, the 20 cores of Digits might match 40 cores of Altra at equal clock frequency, but the cores of Digits are likely to also have a higher clock frequency, so for some applications the 20 Arm cores of Digits may provide a higher throughput than 64 Altra cores, while also having a much higher single-thread performance, perhaps about double.
So at equal price, NVIDIA Digits is certainly preferable as a workstation instead of a 64-core Altra. As a server, the latter should be better.
There have not been any published benchmarks demonstrating the speed of Cortex-X925 in a laptop/mini-PC environment.
In smartphones, Cortex-X925 and Snapdragon Elite have very similar speeds in single thread.
For multithreaded applications, 10 big + 10 medium Arm cores should be somewhat faster than 12 Snapdragon Elite.
The fact that NVIDIA Digits has a wider memory interface should give it even more advantages in some applications.
The Blackwell GPU should have much better software support in graphics applications, not only in ML/AI, in comparison with the Qualcomm GPU.
So NVIDIA Digits should be faster than a Qualcomm laptop, but unless one is interested in ML/AI applications the speed difference should not be worth the more than double price of NVIDIA.
> lots of people will buy these machines to get an AArch64 Linux workstation—even if they are not interested in AI or Nvidia GPUs.
Still I expect the Nvidia systems will be easier to get, especially for (de jure) consumers.
If one can skip buying gaming rig with a 5090 with its likely absurd price then this 3k becomes a lot easier for dual use hobbyists to swallow
Edit 5090 is 2k
The 5090 surprised me with the two slot height design while having a 575W power budget.
Putting a screen next to your product that doesn't have video out would be quite disingenuous though. So I'd be surprised if it has zero output at all.
...I guess there is a risk that it has an output but it's more like a CPU iGPU style basic output rather than being powered by the main GPU.
>This paper describes how the performance of AI machines tends to improve at the same pace that AI researchers get access to faster hardware. The processing power and memory capacity necessary to match general intellectual performance of the human brain are estimated. Based on extrapolation of past trends and on examination of technologies under development, it is predicted that the required hardware will be available in cheap machines in the 2020s.
and this is about the first personal unit that seems well ahead of his proposed specs. (He estimated 0.1 petaflops. The nvidia thing is "1 petaflop of AI performance at FP4 precision").
Future versions will get more capable and smaller, portable.
Can be used to train new types models (not just LLMs).
I assume the GPU can do 3D graphics.
Several of these in a cluster could run multiple powerful models in real time (vision, llm, OCR, 3D navigation, etc).
If successful, millions of such units will be distributed around the world within 1-2 years.
A p2p network of millions of such devices would be a very powerful thing indeed.
If you think RAM speeds are slow for the transformer or inference, imagine what 100Mbs would be like.
If this hypothetical future is one where mixtures of experts is predominant, where each expert fits on a node, then the nodes only need the bandwidth to accept inputs and give responses — they won't need the much higher bandwidth required to spread a single model over the planet.
https://s3.amazonaws.com/cms.ipressroom.com/219/files/20250/...
Source: https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...
Not sure if that isn't expected though? Likely most people wouldn't even notice, and the company can say they're dogfooding some product I guess.
It doesnt have to be two enter/backspace/shift. Keyboard layout seems to be almost identical to Azio L70 Keyboard (at least the keys).
Never underestimate how lazy companies with a ~$3 trillion market cap can be.
Absolutely, I'm all for dogfooding! But when you do, make sure you get and use good results, not something that looks like it was generated by someone who just learned about Stable Diffusion :)
Just like Mac OS is free when you buy a Mac, having the latest high-quality LLM for free that just happens to run well on this box is a very interesting value-prop. And Nvidia definitely has the compute to make it happen.
While I'm quite the "AI" sceptic I think it might be interesting to have a node in my home network capable of a bit of this and that in this area, some text-to-speech, speech-to-text, object identification, which to be decent needs a bit more than the usual IoT- and ESP-chips can manage.
"DGX OS 6 Features The following are the key features of DGX OS Release 6:
Based on Ubuntu 22.04 with the latest long-term Linux kernel version 5.15 for the recent hardware and security updates and updates to software packages, such as Python and GCC.
Includes the NVIDIA-optimized Linux kernel, which supports GPU Direct Storage (GDS) without additional patches.
Provides access to all NVIDIA GPU driver branches and CUDA toolkit versions.
Uses the Ubuntu OFED by default with the option to install NVIDIA OFED for additional features.
Supports Secure Boot (requires Ubuntu OFED).
Supports DGX H100/H200."
Nvidia Jetson Nano, A SBC for "AI" debuted with already aging custom Ubuntu 18.04 and when 18.04 went EOL, Nvidia abandoned it completely without any further updates to its proprietary jet-pack or drivers and without them all of Machine Learning stack like CUDA, Pytorch etc. became useless.
I'll never buy a SBC from Nvidia unless all the SW support is up-streamed to Linux kernel.
In general, Nvidia's relationship with Linux has been... complicated. On the one hand, at least they offer drivers for it. On the other, I have found few more reliable ways to irreparably break a Linux installation than trying to install or upgrade those drivers. They don't seem to prioritize it as a first class citizen, more just tolerate it the bare minimum required to claim it works.
That was years ago, but it happened multiple times and I've been very cautious ever since.
HDR support is still painful, but that seems to be a Linux problem, not specific to Nvidia.
1. https://www.datacenterdynamics.com/en/news/nvidia-updates-ge...
They've also significantly improved support for wayland and stopped trying to force eglstreams on the community. Wayland+nvidia works quite well now, especially after they added explicit sync support.
... as in remember the time a ransomware hacker outfit demanded they release the drivers or else .....
https://www.webpronews.com/open-source-drivers-or-else-nvidi...
> Nvidia's relationship with Linux has been... complicated.
For those unfamiliar with Linus Torvalds' two-word opinion of Nvidia:Maybe if Nvidia makes it to four trillion in market cap they'll have enough spare change to keep these older boards properly supported, or at least upstream all the needed support.
https://github.com/archlinuxarm/PKGBUILDs/pull/1580
Edit: It's been a while since I did this, but I had to manually build the kernel, overwrite a dtb file maybe (and Linux_for_Tegra/bootloader/l4t_initrd.img) and run something like this (for xavier)
sudo ./flash.sh -N 128.30.84.100:/srv/arch -K /home/aeden/out/Image -d /home/aeden/out/tegra194-p2972-0000.dtb jetson-xavier eth0
(I guess we can put aside the issue of Nvidia's closed source graphics drivers for the moment)
In any case, Ubuntu is what it comes with.
This is more like a micro-DGX then, for $3k.
Compute is evolving way too rapidly to be setting-and-forgetting anything at the moment.
This isn't the 80s when compute doubled every 9 months, mostly on clock scaling.
Revolutionary developments are: multi-layer wafer bonding, chiplets (collections of interconnected wafers) and backside power delivery. We don't need the transistors to keep getting physically smaller, we need more of them, and at increased efficiency, and that's exactly what's happening.
There is still progress being made in hardware, but for most critical components it's looking far more logarithmic now as we're approaching the physical material limits.
I believe Nvidia has some published numbers for the 5000 series that showed DLSS off performance, which allowed a fair comparison to the previous generation, on the order of 25%, then removed it.
Thankfully the 3rd party benchmarks that use the same settings on old and new hardware should be out soon.
In 4 years, you'll be able to combine 2 of these to get 256gb unified memory. I expect that to have many uses and still be in a favorable form factor and price.
I can only think of raspberry pi...
But the impression I get from this device is that it's closer in spirit to the Grace Hopper/datacenter designs than it is the Tegra designs, due to both the naming, design (DGX style) and the software (DGX OS?) which goes on their workstation/server designs. They are also UEFI, and in those scenarios, you can (I believe?) use the upstream Linux kernel with the open source nvidia driver using whatever distro you like. In that case, this would be a much more "familiar" machine with a much more ordinary Linux experience. But who knows. Maybe GH200/GB200 need custom patches, too.
Time will tell, but if this is a good GPU paired with a good ARM Cortex design, and it works more like a traditional Linux box than the Jeton series, it may be a great local AI inference machine.
https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...
The 5090 has 1.8TB/s of MBW and is in a whole different class performance-wise.
The real question is how big of a model will you actually want to run based on how slowly tokens generate.
$100M, 2.35MW, 6000 ft^2
>>Designed for AI researchers, data scientists, and students, Project Digits packs Nvidia’s new GB10 Grace Blackwell Superchip, which delivers up to a petaflop of computing performance for prototyping, fine-tuning, and running AI models.
$3000, 1kW, 0.5 ft^2
Beyond that, the factors seem reasonable for 2 decades?
Being expressed as 1 10 100 1000 is not really different than being expressed as 1 2 3 4. There's still only four bits, i.e. 16 different possible values, no matter how we decide to express that in human-friendly terms.
Isn't it actually FP64?
https://s3.amazonaws.com/cms.ipressroom.com/219/files/20250/...
First product that directly competes on price with Macs for local inferencing of large LLMs (higher RAM). And likely outperforms them substantially.
Definitely will upgrade my home LLM server if specs bear out.
This goes against every definition of cloud that I know off. Again proving that 'cloud' means whatever you want it to mean.
Joking aside, personally will buy this workstation in a heartbeat if I have the budget to spare, one in the home and another in the office.
Currently I have an desktop/workstation for AI workloads with similar 128GB RAM that I bought few years back that cost around USD5K without the NVIDIA GPU that I bought earlier for about USD1.5K, with a total of about USD6.5K without a display monitor. This the same price of NeXT workstation (with a monitor) when it's sold back in 1988 without adjusting for inflations (now around USD18K) but it is more than 200 times faster in CPU speed and more than 1000 times RAM capacity than the original 25 MHz CPU and 4 MB RAM, respectively. The later updated version of NeXT has graphic accelerator with 8 MB VRAM, since the workstation has RTX 2080 it is about 1000 times more. I believe the updated NeXT with graphic accelerator is the one that used to develop original Doom software [1].
If NVIDIA can sell the Project Digits Linux desktop at USD3K with similar or more powerful setup configurations, it's going to be a winner and probably can sell by truckloads. It seems to has NeXT workstation vibe to it that used to develop the original WWW and Doom software. Hopefully it will be used to develop many innovative software but now using open source Linux software eco-system not proprietary one.
The latest Linux kernel now has real-time capability for more responsive desktop experience and as saying goes, good things come to those who wait.
[1] NeXT Computer: