Robots are a branch of industrial manufacturing machinery. That is not, historically, a high-margin business. It also demands high reliability and long machine life.
Interestingly, there's a trend towards renting robots by the working hour. It's a service - the robot company comes in, sets up robot workers, services them as needed, and monitors them remotely. The robot company gets paid for each operating hour. Pricing is somewhat below what humans cost.[1]
[1] https://bernardmarr.com/robots-as-a-service-a-technology-tre...
The end user usually doesn't have the expertise to even maintain the systems, nor does it make sense for them to do it in-house.
Charging per item of work (operating hour or thing processed) allows use of consultants but keeps incentives aligned between all parties (maximize uptime/productivity).
Lots of dotcom busts in the late 90s were concepts that worked 10-15 years later. We just did not have broadband and smartphones. Battery and AI tech is quite likely to be the missing piece robotics lacked in the past.
Cheap semiconductors as well.
Fabricating a chip on a 28nm and 48nm process is extremely commodified nowadays. These are the same processes used to fabricate an Nvidia Tesla or an i7 or Xeon barely a decade ago, so the raw compute power available at extremely commodified prices is insane.
Just about every regional power has the ability to fabricate an Intel i7 or Nvidia Tesla equivalent nowadays.
And most regional powers have 3-7 year plans to build domestic 14nm fabrication capacity as well now. A number of firms like Taiwan's PSMC have made a killing selling the end-to-end IP and workflow for fabrication.
This will probably take off once Amazon finally gets robots that can do unboxing, picking, and boxing. They've been trying for years to get that to work. Amazon already has robots doing most of the lifting and carrying, but people still handle each item.
People have been trying to do bin picking fulfillment with robots since the 1980s. Swisslog, Brightpick, and Universal Robotics have all demoed this, but so far it's not working well enough to take over. It's getting close, though.
"Slow but steady" I would call it.
While it's not Rome, the operating areas for Waymo, at least in San Francisco, are not all grids of modern wide streets either.
I'm still puzzled on why Waymo insists on not having any remote driving or any remote advising cars on where/how to get themselves out of a situation. Yes that would cost a little more - but this is early stages so it's just a little more money at this point (lol) - in exchange for avoiding embarassing PR bullshit about cars self-honking at each other or rides stuck in infinite hesitation loop or not knowing what to do when there is a traffic cone on the hood. I haven't seen any convincing arguments for not having that. Anyone heard a legitimately good tech or liability reason? I doubt I would have missed it but...
I am not entirely sure that is solved. And certainly not years ago. And it is only close in US where the data are trained. Doesn't mean it could be used in Japan ( where they are doing testing now ) driving on the different of the road with very different culture and traffics.
You could easily use the same logic to say humans haven’t solved driving yet either!
It's like learning to code in JS on a 2024 MacBook pro and thinking you can "just" transfer your skills to cobol on 1970s hardware because both are "programming"
I’m simply talking about “300 days of sun” as being the limiting factor. You extrapolated the rest.
I still think it'll do well because even if you need to hire 1 person to remotely monitor every 10 cars (I doubt Waymo has anywhere near that many support staff) it's still better than having to pay 10 drivers who may or may not actually be good at driving. But to really take over they'll need to be much more independent.
2 rides went fine though neither was particularly challenging. The third though the car decided to head down a narrow side street where a pickup in front was partially blocking the road making a dropoff. There was enough space to just squeeze by and it was clear the truck expected the car to. A few cars turned in behind the waymo, effectively trapping it in as it didn't know how to proceed. The dropoff eventually completed and it was able to pull forward
re: region, I’d like to see it take on more challenging conditions, like in India for example where things are chaotic even for human drivers. I doubt that it’ll survive over here.
There is absolutely no meaningful signal about a system’s safety that can be derived from one person using a system for two weeks.
At best it can only demonstrate that a system is wildly unsafe.
There is a very large chasm of 9s between one person being able to detect an unsafe system in two weeks of use and actually having a truly safe system.
Self driving is robotics. Simple as that.
Building a robot that can cook or fold a t-shirt, for example, is much harder.
Your observation from this short time window isn't enough to prove the usefulness of something as serious as life and death.
But factory robots haven't propelled Kuka, Fanuc, ABB, UR, Staubli and peers to anything like the levels of success nvidia is already at. A market big enough to accommodate several profitable companies with market caps in the tens of billions might not drive much growth for a company with a trillion-dollar market cap.
nvidia has several irons in the fire here. Industrial robot? Self-driving car? Creepy humanoid robots? Experimental academic robots? Whatever your needs are, nvidia is ready with a GPU, some software, and some tutorials on the basics.
That's because the past year of robotics advancements (e.g. https://www.physicalintelligence.company/blog/pi0, https://arxiv.org/abs/2412.13196) has been driven by advances in machine learning and multimodal foundation models. There has been very little change in the actual electronics and mechanical engineering of robotics. So it's no surprise that the traditional hardware leaders like Kuka and ABB are not seeing massive gains so far. I suspect they might get the Tesla treatment soon when the Chinese competitors like unitree start muscling into the humanoid robotics space.
Robotics advancements are now AI driven and software defined. It turned out that adding a camera and tying a big foundation model to a traditional robot is all you need. Wall-E is now experiencing the ImageNet moment.
Perhaps I wasn't explicit enough about the argument I was trying to make.
Revenue in business is about selling price multiplied by sales volumes, and I'm not sure factory robot sales volumes are big enough to 'drive future growth' for nvidia.
According to [1] there were 553,000 robots installed in factories in 2023. Even if every single one of those half a million robots needed a $2000 GPU that's only $1.1 billion in revenue. Meanwhile nvidia had revenue of 26 billion in 2023, and 61 billion in 2024.
Many of those robots will be doing basic, routine things that don't need complex vision systems. And 54% of those half a billion robot arms were sold in China - sanctions [2] mean nvidia can't export even the 4090 to China, let alone anything more expensive. Machine vision models are considered 'huge' if they reach half a gigabyte - industrial robots might not need the huge GPUs that LLMs call for.
So it's not clear nvidia can increase the price per GPU to compensate for the limited sales volumes.
If nvidia wants robotics to 'drive future growth' they need a bigger market than just factory automation.
[1] https://ifr.org/img/worldrobotics/2023_WR_extended_version.p... [2] https://www.theregister.com/2023/10/19/china_biden_ai/
https://www.reddit.com/r/interestingasfuck/comments/1h1i1z1/...
But if you're just buying the arm itself? There are quality robot arms, like the €38,928 UR10e [1], that are within reach of SMEs. No multi-million-dollar budget required.
[1] https://shop.wiredworkers.io/en_GB/shop/universal-robots-ur1...
And those duties can be achieved with today’s mechanics — they just need good control, which is now seeing ferocious progress
As an example, imagine you are given a height map, a 2D discrete search space overlayed in the height map, 4 legs, and robot dynamics for every configuration of the legs in their constrained workspace. Find the optimal toe placement of the 4 legs. Although a GPU isn't designed exactly to deal with this sort of problem, if it's framed as a reduction problem it still significantly out performs a multi core CPU.
Maybe they try a completely different approach with reinforcemnt learning and a ton of parallel simulations?
I think the "object detection" goes quite far beyond the classic "objection detection" bounding boxes etc we're used to seeing. So not just a pair of x,y coords for the bounding box for e.g. a mug of coffee in the robot's field of view, but what is the orientation of the mug? where is the handle? If the handle is obscured, can we infer where it might be based on what we understand for what a mug typically looks like and plan our gripper motion towards it (and at 120hz etc)? Is it a solid mug, or a paper cup (affects grip strength/pressure)? Etc etc. Then there is the whole thing about visually show the robot once what you are doing, and it automatically "programs" itself to repeat the tasks in a generalised way etc. Then you could probably spawn 100 startups just on hooking up a LLM to tell a robot what to do in a residential setting (make me a coffee, clear up the kitchen, take out the trash etc)
This has all been possible before of course, but could it be done "on device" in a power efficient way? I am guessing they are hoping to sell a billion or two chips + boards to be built directly into things to do so so that your next robotic vacuum or lawn mower or whatever will be able to respond to you yelling at it and not mangle your pets/small children in the process.
I eagerly await the day when I have a plug and play robot platform that can tell the difference between my young children and a fox, and attack the fox shitting/shredding something small and fluffy in the garden but ignore the kids
Similar to autonomous vehicles, doing complex multi sensor things very quickly.
Surgical robotics is a great example, lots of cool use cases coming out in that field.
In the past two years two very important developments appeared around imitation learning and LLMs. Some starting points for this rabbit hole:
1. HuggingFace LeRobot: https://github.com/huggingface/lerobot
2. ALOHA: https://aloha-2.github.io/
Aloha is a great example of that. It's great for demos, like the one where their robot "cooked" (not really) one shrimp, but if you wanted to deploy it to real peoples' houses you'd have to train it for every task in every house over a few hours at a time. And "a task" is still at the level of "cook (not really) one shrimp". You want to cook (not really) noodles? It's a new task and you have to train it all over again from scratch. You want it to fold your laundry? OK but you need to train it on each piece of laundry you want it to fold, separately. You want it to put away the dishes? Without exaggeration you'd have to train it to handle each dish separately. You want it to pick up the dishes from the kitchen? Train for that. You want it to pick up the dishes from the living room? Train for that. And so on.
It sucks so much with miserable disappointment that it could bring on a new AI winter on its own, if Google was dumb enough to try and make it into a product and market it to people.
Robot maids and robot butlers are a long way away. Yeah but you can cook one shrimp (not really) with a few hours of teleoperation training in your kitchen only. Oh wow. We could never cook (not really) one shrimp before. I mean we could but this uses RL and so it's just one step from AGI.
It's nonsense on stilts.
I believe it will take on the order of 100M hours of training data of doing tasks in real world (so, not just Youtube videos), and much larger models than we have now to make general-purpose robotics working, but I also believe that this will happen.
I've saved your comment to my favorites and hope to revisit it in 10 years.
I concluded that it couldn't be done with classical machine vision, and that this "neural network" nonsense wasn't going to catch on. Very slow, computationally inefficient, full of weirdos making grandiose claims about "artificial intelligence" without the results to back it up, and they couldn't even explain how their own stuff worked.
These days - you want to find the boundary between cut and uncut grass, even though lighting levels can change and cloud cover can change and shadows can change and reflections can change and there's loads of types of grass and grass looks different depending on the angle you look from? Just label some data and chuck a neural network at it, no problemo.
Didn't work as well as I'd hoped back in those days though, as you could lose carrier lock if you got too close to trees (or indeed buildings), and our target market was golf courses which tend to have a lot of trees. And in those days a dual-frequency RTK+IMU setup was $20k or more, which is expensive for a lawnmower.
I find that even though signals get significantly weaker under trees, mine still works wonderfully in a complex large garden scenario. It will depend on your exact unit/model, as well as their firmware and how it chooses to deal with these scenarios.
This is all pretty much automated by nvidia’s toolkits, and you can do it cheaply on rented hardware before dropping your pretrained model into cheap kit - what a time to be alive.
If only.
Having been faced with the same problem in the real world:
1) There isn't a data bank of millions of images of cut / uncut grass
2) If there were, there's always the possibility of sample bias. E.g. all the cut photos happen to have been taken early in the day, of uncut late in the day, and we get a "time-of-day" detector. Sample bias is oddly common in vision data sets, and machine learning can look for very complex sample bias
3) With something like a lawnmower, you don't want it to kill people or run over flowerbeds. There can be actual damages. It's helpful to be able to understand and validate things.
Most machine vision algorithms I actually used in projects (small n) made zero use of neural networks, and 100% of classical algorithms I understand.
Right now, the best analogy to NLP is BERT. At that point, neural techniques were helpful for some tasks, and achieved stochastically interesting performance, but were well below the level of general uses, and 95% of what I wanted to do used classical NLP. IF I had a large data set AND could do transfer training from BERT AND didn't need things to work 100% of the time, BERT was great.
Systems like DALL-e and the reverse are moving us in the right direction. Once we're at GPT / Claude / etc.-level performance, life will be different, and there's a light at the end of the tunnel. For now, though, the ML machine is still a pretty limited way to go.
Think of it this way. What's cheaper:
1) A consulting project for a human expert in machine vision (tens or hundreds of thousands of dollars)
2) Hiring cheap contractors to build out a massive dataset of photos of grass (millions of dollars)
If you don’t have the second how can you trust the first? Without the dataset to test on your human experts will deliver you slop and be confident about it. And you will only realise the many ways their hand finessed algorithms fail once you are trying to field the algorithm.
> With something like a lawnmower, you don't want it to kill people or run over flowerbeds.
Best to not mix concerns though. Not killing people with an automatic lawnmover is about the right mechanical design, appropriately selected slow speed, and bumper sensors. None of this is an AI problem. We don’t have to throw out good engineering practices just because the product uses AI somewhere. It is not an all or nothing thing.
The flowerbed avoidance question might or might not be an AI problem depending on design decisions.
> Hiring cheap contractors to build out a massive dataset of photos of grass (millions of dollars)
I think that you are over estimating the effort here. The database doesn’t have to be so huge. Transfer learning and similar techniques reduced the data requirements by a lot. If all you want is a grass height detector you can place stationary cameras in your garden, collect a bunch of data and automatically label them based on when you moved the grass. That will obviously only generalise to your garden, but if this is only a hobby project maybe that is all you want? If this is a product you intend to sell for the general public then of course you need access to a lot of different gardens to test it on. But that is just the nature of product testing anyway.
One of the key things is that if you don't understand how things work, your test dataset needs to be the world. A classical system can be analyzed, and you can pick a test dataset which maximally stresses it. You can also engineer environments where you know it will work, and 9 times out of 10, part of the use of classical machine vision in safety-critical systems is to understand the environments it works in, and to only use it in such environments.
Examples:
- Placing the trackball sensor inside of the mouse (or the analogue for a larger machine) allows the lighting and everything else to be 100% controlled
- If it's not 100% controlled, in an industrial environment, you can still have well-understood boundaries.
You test beyond those bounds, and you understand that it works there, and by interpolation, it's robust within the bounds. You can also analyze things like error margin since you know if an edge detection is near the threshold or has a lot of leeway around it.
One of the differences with neural networks is that you don't understand the failure modes, so it's hard to know the axes to test on. Some innocuous change in the background might throw it completely. You don't have really meaningful, robust measures of confidence, so you don't know if some minor change somewhere won't throw things. That means your test set needs to be many orders of magnitude bigger.
For nitpickers: You can do sensitivity analysis, look at how strongly things activate, or a dozen other things, but the keywords there were "robust" and "meaningful."
1. Test datasets can be a lot smaller than training datasets.
2. For tasks like image segmentation, having a human look at a candidate segmentation and give it a thumbs up or a thumbs down is much faster than having them draw out the segments themselves.
3. If labelling needs 20k images segmented at 1 minute per image but testing only needs 2k segmentation results checked at 5 seconds per image, you can just do the latter yourself in a few hours, no outsourcing required.
I will admit that "no problemo" made it sound easier than it actually is. But in the past I considered it literally impossible whereas these days I'm confident it is possible, using well known techniques.
> There isn't a data bank of millions of images of cut / uncut grass
True - but in my case I literally already had a robot lawnmower equipped with a camera. I could have captured a hundred thousand images pretty quickly if I'd known it was worth the effort.
> With something like a lawnmower, you don't want it to kill people or run over flowerbeds.
I agree - at the time I was actually exploring a hybrid approach which would have used landmarks for navigation when close enough to detect the landmarks precisely, and cut/uncut boundary detection for operating in the middle of large expanses of grass, where the landmarks are all distant. And a map for things like flowerbeds, and a LIDAR for obstacle tracking and safety.
So the scope of what I was aiming for was literally cut/uncut grass detection, not safety-of-life human detection :)
I hoped by doing so I could produce respectable results without the need to spend $$$$$ on a dual-frequency RTK GPS & IMU system.
If you’re trying to predict something within the manifold of data on the internet (which is incredibly vast, but not infinite), you will do very well with today’s LLMs. Building an internet-scale dataset for another problem domain is a monumental task, still with significant uncertainty about “how much is enough”.
People have been searching for the right analogy for “what type of company is Open AI most like?” I’ll suggest they’re like an oil company, but without the right to own oil fields. The internet is the field, the model is the refining process (which mostly yield the same output but with some variations - not dissimilar from petroleum products).. and the process / model is a significant asset. And today, Nvidia is the only manufacturer of refining equipment.
If you take the analogy further, while oil was necessary to jumpstart the petrochemical industry, biofuels and synthetic oil could potentially replace the natural stuff while keeping the rest of the value chain in tact (maybe not economical, but you get the idea). Is there a post-web source of data for LLMs once the well has been poisoned by bots? Maybe interactive chats?
https://arxiv.org/abs/2406.09246
It turns out you can take a vision language foundational model that has a broad understanding of visual and textual knowledge and fine tune it to output robot actions given a sequence of images and previous actions.
This approach beats all previous methods by a wide margin and transfers across tasks.
To be able to visually determine weight, texture, and how durable something is can be done with those systems so long as we have a training set.
* Mapping. Nowadays generating a dense grid of costs can be done insanely fast on GPU. There's just no excuse to not use a GPU on every robot so it can build a fast map, unless you move at snail speed.
* Computer Vision. Classical depth mapping is best done on a GPU. Classical computer vision object detection has fallen away to the rise in ML-based CV for segmentation. Some (IMHO) overzealous practitioners are trying to eat away at estimation and tracking, which IMHO will recede a little since there was nothing wrong with the estimators (just Bayesian stats) to begin with, it was always the measurements. Still, for detection (and sometimes association), ML on GPU is the way to go and that will very likely not change. It has gotten so good that you can get away without using other sensors and just deploying a vision system (though I don't recommend it, but this is what Tesla does). This is an obvious case for one (or one more) GPU on every robot.
* Planning - End to end planning is eating traditional planning now, similar to CV. There are some areas where this is an obvious win (e.g., complex manipulation tasks), and some areas where some overzealous overreach is happening (e.g., simpler planning tasks like routing). But ML on GPUs is here to stay for all planning tasks, especially when estimating costs from complex data, even if a classical planner uses those costs. And I'd be remiss if I didn't mention policy-based planning, which does a huge amount of training to generate essentially a fast lookup table for actions. Deployment of these types of planners often requires a very good estimator to determine what state you are in - and this is a great area for ML, mapping real world messy data to a clean state lookup. I think this can typically be done without a GPU, due to training prior to deployment, but if you have a GPU already (see prior two), you will find this is a good use of it.
* Low-level planning / Controls - Shares a small overlap with above, but mostly concerned with fast responses to transient data and stabilizing the system. I've heard, but not seen directly, that learned policies are coming into vogue here. But regardless, it is a common thread that a network can assist with estimating costs and states to allow a traditional controls system to operate more reliably. I doubt this will necessitate a GPU, but like above, will gladly use it if required and available.
To add to this, consider that we're generally not talking about discrete, gaming-type GPUs, we're talking about purpose built robotics-targeted embedded systems that speak native CUDA. The Jetson family, in particular.
Previously it was far too overpriced for most uses (except for someone developing a certified automotive device), but at the new price and performance it has become competitive with the existing alternatives in the same $150 to $300 price range, which are based on Intel, AMD, MediaTek, Qualcomm or Rockchip CPUs.
i just want a tiiiny gpu for $10 so i can run smaller models at higher speed than possible with xtensa/rp2040 having limited simd support etc.
Neural accelerators are coming into MCUs. The just released STM32N6 is probably among the best. Alif with the U55/U85 has been out for a little while. Maxim MAX78000 has a CNN accelerator out for a couple of years. More will come in the next few years - though not from Nvidia any time soon.
If politicians had real skin in the game there'd be far less war.
What they want is border and population control that involves very few ordinary citizens, in large part in expectation of something like hundreds of millions of climate refugees. After having spent a couple of years killing and maiming poor people with almost nowhere to go you tend to need quite a bit of medical care and usually join the anti-war movement regardless if you got a college degree out of it or not.
I find it likely we'll see gun mounted robodog patrols along occidental borders within ten years from now, after having tested it on populations elsewhere.
Their bet is that AI will unlock robotics use and they don't want to be simply compute providers, they want to innovate on the whole chain, software, hardware, services, everything.
Their position is quite unique as their R&D is basically financed by their future competitors, they are making bank while going where the puck will be.
Because you are pooling computer resources across many more users you have better amortized cost than if you had a workstation on every home that was idle 99% of the time. Of course, latency and bandwidth to the cell tower is worse than wifi, but better than if the compute is done on a remote server.
I have no idea of which model will succeed for which use cases, but the idea is sound.
Again: which computation paradigm works best depends on the use case. It is not an all or nothing situation.
As much as there are market drivers to make the cloud attractive to businesses and legitimate reasons why cloud solutions are better than alternatives there is a real desire in a growing number of people to have a solution that they can tinker with but that also has a polished UI so they don't have to tinker with it when they don't want to.
I'm still surprised they did not create an App Store for AI. Basically lock everything down and make developers pay a % of their revenue, Apple style.
At enterprise scale, locked down marketplaces don't work. They act as an forcing factor for larger organizations to build in house because no one wants vendor lock-in or to lose money via an arbitrage.
This is a major reason why you'll see large deals pushing for enhanced customization options or API parity, as larger customers have the ability to push back against vendor lock-in.
Furthermore, a relatively open market (eg. NGC) acts as a loss-leader by allowing a community to develop using a corporate standard, thus allowing you to build stickiness without directly impacting a customer's bottom-line
Fundamentally, a company driven by Enterprise revenue (eg. Nvidia) will have a different marketplace structure from a B2C product such as Apple's App Store where purchasers have little power.
If this was true, we'd have more mobile computing platforms. Large enterprises publish in Apple's AppStore.
Purchasers from Apple's App Store are primarily individual consumers. It is a B2C play.
Monetizing an Nvidia marketplace such as NGC would be foolhardy as the primary users/"purchasers" are organizations with budgets and procurement power. It is an Enterprise B2B player.
In enterprise sales, the power differential between (mid- and upper-market) customers and vendors is in the customer's favor, as they have significant buying power and thus a higher user acquisition cost. The upside is revenue is much higher, margins are better, and you can differentiate on product as commodification is difficult.
This is less so in consumer facing sales as customers have significantly weaker buying power, but conversely have a much lower user acquisition cost at scale. Hence, a growth-based GTM approach is critical, as you need customers in aggregate to truly unlock revenue at scale.
Also, I don't believe the balance of power is tipping towards business customers, as nvidia is basically the only relevant player.
If you think smartphone customers and server customers are evaluating hardware based on the same criteria, then why isn't Apple the leading datacenter hardware OEM?
> Large enterprises publish in Apple's AppStore.
Yeah? Where's my Pro Tools download on the App Store? Where's my Cinema4D download? Can I get Bitwig Studio from there? Hell, is iTerm2 or Hammerspoon even available there?
Large enterprises very explicitly don't publish on the MacOS App Store because it is a purely raw deal. If you're developing a cross-platform app (which most large enterprises do), then you've already solved all the problems the App Store offers to help with. It's a burdonsome tax for anyone that's not a helpless indie, and even the indies lack the negotiating power that makes the App Store profitable for certain enterprises.
Because Apple doesn't want to be in the B2B space.
More tactically, excessive charging on marketplace pushes vendors away from selling on AWS Marketplace and makes them develop alternative deployment methods, which reduce the stickiness of AWS, as hyperscalers are commodified nowadays.
Motorola learned that the hard way 40 years ago when pushing excessively restrictive OEM and Partnership rules compared to IBM.
AWS is only as strong as it's Partnership ecosystem, as companies that are purchasing tend to use 80-90 different apps along with their cloud.
Basically, Enteprise Sales shows hallmarks of a Stag Hunt Game, so a mutually beneficial pricing strategy amongst vendors (AWS, AWS Partners such as Nvidia, MSP) is ideal.
But you can say exactly the same thing about large companies publishing (consumer apps) in the App Store. Why would they want vendor lock-in?
I think there are many early innovators that fail in later stage growth because of this issue.
But off the shelf mini PCs are much more user friendly for existing software IME.
Thankfully ARM being so wide spread and continuing to grow this wont matter as much.
I'd love you to point me in the direction of an off-the-shelf mini PC that has 64gb of addressable memory and CUDA support.
On the other hand, if you force the CUDA support condition and any automatic translation of CUDA programs is not accepted as good enough, then this mandates the use of a discrete NVIDIA GPU, which can be provided only by a mini-ITX mini-PC.
There are mini-ITX boards with laptop Ryzen 7940HX or 7945HX CPUs, at prices between $400 and $550. To such a board you must add 64 GB of DRAM, e.g. @ $175, and a GPU, e.g. a RTX 4060 at slightly more than $300.
Without a discrete GPU, a case for a mini-ITX motherboard has a volume of only 2.5 liter. With a discrete GPU like RTX 4060, the volume of the case must increase to 5 liter (for cases with PCIe extenders, which allow a smaller volume than typical mini-ITX cases).
So your CUDA condition still allows what can be considered an off-the-shelf mini-PC, but mandating CUDA raises the volume from the 0.5 L of a NUC-like mini-PC to 5 L and the price is also raised 2 or 3 times.
This of course unless you choose an Orin for CUDA support, but that will not give you 64 GB of DRAM, because NVIDIA has never provided enough memory in any of their products, unless you accept to pay a huge overprice.
Coming from web / app dev this was my very least favorite part of working on the software side of robotics with ROS.
To be brutally honest, you aren't the primary persona in the robotics space.
If you have limited resources (as any organization does), the PM for DevEx will target customers with the best "bang-for-buck" from a developer effort to revenue standpoint.
Most purchasers and users in the robotics and hardware space tend to be experienced players in the hardware, aerospace, and MechE world, which has different patterns and priorities from a purely software world.
If there is a case to be made that there is a significant untapped market, it makes sense for someone like you to go it on your own and create an alternate offering via your own startup.
The article references a "ChatGPT moment" for physical robotics, but honestly I think the Chat GPT moment has kind of come and gone, and the world still runs largely as it ever did. Probably not the best analogy, unless they're just talking about buckets of VC money flowing into the space to fund lots of bad ideas, which would be good for NVIDIA financially.
As an admitted non-expert in this field, I guess the one thing that really annoys me about articles like this is the lack of a concrete vision. It's like Boston Dynamics and their dancing robots, which while impressive, haven't really amounted to much outside of the lab. The last thing I remember reading was a military prototype to carry stuff for infantry that ended up being turned down because it was too loud.
The article even confirms this general perspective, ending with "As of right now, we don’t have very effective tools for verifying the safety and reliability properties of machine learning systems, especially in robotics. This is a major open scientific question in the field,” said Rosen."
So whatever robot you're developing is incredibly complex, to be trusted with heavy machinery or around consumers directly, while being neither verifiably safe nor reliable.
Sorry, but almost everything in this article sounds like a projection of AI-hype onto physical robotics, with all the veracity of "this is good for Bitcoin". Sounds like NVIDIA is doing right by its shareholders though.
As the LLM, generative AI, etc. bubble begins to deflate due to investors and companies finding it hard to make profits from those AI usecases, Nvidia needs to pivot. This article indicates that Nvidia is hedging on robotics as the next driving force that will continue to sustain the massive interest in their products. Personally, I don't see how robotics can maintain that same driving force for their products, and investors will find it hard to squeeze profit out of it, and they'll be back to searching for another hype. It's like Nvidia is trying to create a market to justify their products and continued development, similar to what Meta has tried, to spectacular failure, with the Metaverse for their virtual products.
After the frenzy that sustained these compute products transitioned from big data, to crypto, and now, to AI, I'm curious what the next jump will be; I don't think the "physical AI" space of robotics can sustain Nvidia in the way that they're hoping.
On the investment side, it's hard to say that since ROIC is still generally up and to the right. As long as that continues, so will investment.
Then biggest gap I see is expected if you look at past trends like mobile and the internet: In the first wave of new tech there's a lot of trying to do the old things in the new way, which often fails or gives incremental improvements at best.
This is why the 'new' companies seem to be doing the best. I've been shocked at so many new AI startups generating millions in revenue so quickly (billions with OpenAI, but that's a special case). It's because they're not shackled to past products, business models, etc.
However, there are plenty of enterprise companies trying to integrate AI into existing workflows and failing miserably. Just like when they tried to retrofit factories with electricity. It's not just plug and play in most cases, you need new workflows, etc. That will take years and there will be plenty more failures.
The level of investment is staggering though, and might we see a crash at some point? Maybe, but likely not for a while since there's still so much white space. The hardest thing with new technologies like this is not to confuse the limits of our imagination with the limits of reality (and that goes both ways).