If they did not follow the steps to replicate (pre-training, using less compute, etc.) and then failed, so what's wrong with calling out the flaws in their attempted "replication"?
Actually, people should criticize flawed papers. That's how science works! When you publish scientific papers, you should expect criticism if there's something that doesn't look right.
The only way to avoid that is to get critical feedback before publishing the paper, and it's not always possible, so then the scientific debate happens in public.
Though, the broader question is how useful the results of the original paper are to other people who might do the same thing.
> It's not a value judgement, just doesn't help his case at all.
Calling it "bullying" looks like a value judgment to me. Am I missing something?
To me, Dean's response is quite sensible, particularly given his claims the other papers made serious mistakes and have potential conflicts of interest.
Second, even a healthy environment can be undermined by lack of skills or resources, intellectual dishonesty, or conflicts of interest.
That's not an argument made in the linked tweet. His claim is "they couldn't replicate it because they didn't follow the steps", which seems like a very reasonable claim, regardless of the motivation behind making it.
Either the research is as much of a breakthrough as is claimed and Google is about to pull way ahead of all these other "idiots" who can't replicate their method even when it is described to them in detail, or the research is flawed and overblown and not as effective as claimed. This seems like exactly the sort of question the market will quickly decide over the next couple of years and not worth arguing over.
Why do a non-zero amount of people have seemingly religious beliefs about this topic on one side or the other?
n.b. you're on a social news site
> pull way ahead of all these other "idiots"
Pulling way ahead sounds sufficient, not necessary. Can we prove it's not the case? Let's say someone says that's why Gemini inference is so cheap. Can we show that's wrong?
> "idiots"
?
The guys at EDA companies care because Google's result makes them look like idiots when you take the paper at face value, and does advance the state of the art a bit. They have been working hard for marginal improvements, and that some team of ML people can come in and make a big splash with something like this is offensive to them. Furthermore, the result is not that impressive and does not generalize enough to be useful to them (and competent teams at these companies absolutely have checked).
The fact that the result is so minor is the reason that this is so contentious.
https://sparktoro.com/blog/is-google-losing-search-market-sh...
If anything, now is the best time for TPU to grow and I'd say investing in TPU gave Google an edge. There is no other large scale LLM that was trained on anything but NVIDIA GPUs. Gemini is the only exception. Every big company is scrambling to make their own hardware in the AI era while Google already has it.
Everyone I know who worked with TPUs loves how well they scale. Sure Jax has a learning curve but it's not a problem, especially given the performance advantages it gives.
Google does indeed lock in their own ROI with deciding to not compete with AMD / Graphcore etc, but that also rooflines their total market. If they were to come up with a compelling Android-based Jetson-like edge product, and if demand for said product eclipses total GPU demand (robotics explosion?) then they might have a ramp to compete with NVidia. But the USB TPUs and phone accelerators today are just toys. And toys go to the Google graveyard, because Googlers don’t build gardens they treat everything like toys and throw them away when they get bored.
Because lots of engineers are being told by managers "Why aren't we using that tool?" and a bunch of engineers are stuck saying "Because it doesn't actually work." aka "Google is lying through their teeth." to which the response is "Oh, so you know better than Google?" to which the reponse is "Yeah, actually, I fucking do. Now piss off and let me finish timing closure this goddamn block that is already 6 weeks late."
Now can you understand why this is a bit contentious?
Marketing "exaggerations" from authority can cause huge amounts of grief.
In my little corner of the world, I had to sit and defend against the lies that a startup with famous designers were putting out about power consumption while we were designing similar chips in the space. I had to go toe to toe with Senior VPs over it and I had to stand my ground and defend my team who analyzed things dead on. All this occurred in spite of the fact that they had no silicon. In addition, I knew the famous designers involved would happily lie straight to your face having worked with them before and having been lied straight to my face and having had to clean up the mess when they left the company.
To be fair, it is also the only time I have had a Senior VP remember the kerfuffle and apologize when said startup finally delivered silicon and not only were the real numbers not what they claimed they weren't even close to the ones we were getting.
If you have personal experience with Jeff Dean et al that you're willing to share, I'd be interested in hearing about it.
From where I'm sitting it looks like, "Google spent a fortune on deep learning, and got a small but real win. People who don't like Google failed to follow Google's recipe and got a large and easily replicated loss."
It's not even clear that Google's approach is feasible right now for companies not named Google. It is not clear that it works on other classes of chip. It is not clear that the technique will grow beyond what Google already got. It is really not clear that anyone should be jumping on this.
But there is a world of difference between that, and concluding that Google is lying.
From where I'm sitting it looks like Google cooked the books maximally, barely beat humans let alone state of the art algorithms, published a crappy article in Nature because it would never have passed editorial muster at something like DAC or an IEEE journal and now have to browbeat other people who are calling them out on it.
And that's the best interpretation we can cough up.
I'll go further, we don't even have any raw data that says that they actually did beat the humans. Some of the humans I know who run P&R are REALLY good at what they do. The data could be completely made up. Given how much scientific fraud has come out lately, I'm amazed at the number of people defending Google on this.
Where I'm from, we call what Google is doing both "lying" and "bullying".
Look, Google can easily defuse this in all manner of ways. Publish their raw data. Run things on testbenches and benchmarks that the EDA tools vendors have been running on for years. Run things on the open source VLSI designs that they sponsored.
What I suspect happened is that Google's AI group has gotten used to being able to make hyperbolic marketing claims which are difficult to verify. They then poked at place and route, failed, and published an article anyway because someone's promotion is tied to this. They expected that everybody would swallow their glop just like every other time, be mostly ignored and the people involved can get their promotions and move on.
Unfortunately, Google is shoveling bullshit around something that has objective answers; real money is at stake; and they're getting rightfully excoriated for it.
Whoops.
Likewise the importance of spending 20x as much money on the training portion seems easy to verify, and significant.
That they would fail to properly test against industry standard workbenches seems reasonable to me. This is a bunch of ML specialists who know nothing about chip design. Their background is beating everyone at Go and setting a new state of the art for protein folding, and not chip design. If you dismiss those particular past accomplishments as hyperbolic marketing, that's your decision. But you aren't going to find a lot of people in these parts who agree with you.
If you think that those were real, but that a bunch of more recent accomplishments are BS, I haven't been following closely enough to have an opinion. The stuff that crossed my radar since AlphaFold is mostly done at places like OpenAI, and not Google.
Regardless, the truth will out. And what Google is claiming for itself here really isn't all that impressive.
"If Cheng et al. had reached out to the corresponding authors of the Nature paper, we would have gladly helped them to correct these issues prior to publication" (https://arxiv.org/pdf/2411.10053)
That's how you actually do a reproduction study - you reach out to the corresponding authors and make sure you do everything exactly the same. But at this point, it's hard to imagine the AlphaChip folks having much patience with them.
I don't think it's easier to get into DAC / an IEEE journal than Nature.
Their human baseline was the TPU physical design team, with access to the best available tools: rdcu.be/cmedX
and this is still the baseline to beat in order to get used in production, which has happened for multiple generations of TPU.
TPU is export controlled and super confidential -- multi-billion dollar IP! -- so I don't see raw data coming out anytime soon.
If the Nature paper made it clear that RL is not seriously expected to work on non-TPU chips, it would have have probably been rejected. If RL works on many other chips, then evidence should be easy to publish.
As I said, the truth will out. I was just unhappy with the case previously being offered.
Discussions like this are _how_ the market decides whether or not this achievement is real or not.
The tweet just says that the reproduction attempt didn't didn't actually follow the original methodology. There is no claim that the authors of the replication attempt were "idiots" or anything similar, you just made that up. The obviously fallacious logic in "they couldn't replicate it ..., therefore it's replicable" is also a total fabrication on your part.
Making a novel claim implies its *_claimed_ replicability.
"You did not follow the steps" is calling them idiots.
The only inference I made is that he's pressed to comment. He could have said nothing.. instead he's lashing out publicly, because other people were unable to replicate it. If there's no problem replicating the work, why hasn't that happend? Any other author would be worried if a publication about their work were saying "it's not replicable" and trying their best to help replicate it.. but somehow that doesn't apply to him.
But that was explicitly limited to 8 hours for all setups. Do they have another paper that shows that you can't increase the number of hours of a smaller GPU setup to compensate?
Jeff drank so much kool aid he forget what water is.
Where does it say that? Dean outlines explicit steps that the authors missed in the tweet.
This is easy to debunk from the Google side: release a tool. If you don't want to release a tool, then it's unsubstantiated and you don't get to publish. Simple.
That having been said:
1) None of these "AI" tools have yet demonstrated the ability to classify "This is datapath", "This is array logic", "This is random logic". This is the BIG win. And it won't just be a couple of percentage points in area or a couple of days saved when it works--it will be 25%+ in area and months in time.
2) Saving a couple of percentage points in random logic isn't impressive. If I have the compute power to run EDA tools with a couple of different random seeds, at least one run will likely be a couple percentage points better.
3) I really don't understand why they don't do stuff on analog/RF. The patterns are smaller and much better matches to the kind of reinforcement learning that current "AI" is suited for.
I put this snake oil in the same category as "financial advice"--if it worked, they wouldn't be sharing it and would simply be printing money by taking advantage of it.
None of this has been done. This is table stakes if you want to talk about your EDA algorithm advancement. If this weren't coming out of Google, everybody would laugh it out of the room (see what happened to a similar publication with similar claims from a Chinese source--everybody dismissed it out of hand--rightfully so even though that paper was MUCH better than anything Google has promulgated).
Extraordinary claims require extraordinary evidence. Nothing about AlphaChip even reaches ordinary evidence.
If they hadn't gotten a publication in Nature for effectively a failure, this would be way less contentious.
> Nothing about AlphaChip even reaches ordinary evidence.
You reply is wildly confident and dismissive. If correct, why did Nature choose to publish?I read your comment, but I'm not following -- or maybe I disagree with it -- I'm not sure yet.
"Snake oil" is an emotionally loaded term that raises the temperature of the conversation. That usually makes having a conversation harder.
From my point of view, AlphaGo, AlphaZero, AlphaFold were significant achievements. Agree? Are you claiming that AlphaChip is not? Are you claiming they are perpetrating some kind of deception or exaggeration? Your numbered points seem like valid criticisms (I haven't evaluated them closely), but even if true, I don't see how they support your "snake oil" claim.
Peer review doesn't mean as much as Elsevier would like you to believe. Plenty of peer-reviewed research is absolute trash.
In and of itself, "Being published in a peer reviewed journal" does not place the contents of a paper beyond reproach or criticism.
If a paper / experiment is done with intellectual honesty, great! If it doesn’t make a big splash, fine.
Looking up the thread, you can see the context. Many of us pushed back against vague claims that AlphaChip was "snake oil". Like good engineers, we split apart the problem into clearer concepts. The "snake oil" proponents did not offer compelling replies, did they? Instead, they retreated to irrelevant points that have no bearing on making sense of the "snake oil" claim.
Sometimes technical people forget to bring their "debugging" skills to bear on conversations. There is a metaphorical connection; good debuggers would disambiguate terms, decompose the problem, answer questions, find cruxes, synthesize, find clearer terms, generate alternative explanations, and so on.
Every scientist will tell you that "peer reviewed" is not a mark of quality, correctness, impact, value, accuracy, whatever.
Scientists care about replication. More correctly, they care that your work can be built upon. THAT is evidence of good science.
These things you mentioned had obvious benchmarks that were easily surpassed by the appropriate "AI". The evidence that they were better wasn't just significant, it was obvious.
This leaves the fact that with what appears to be maximal cooking of the books, the only thing AlphaChip seems to be able to beat is human, manual placement and not anything algorithmic--even from many, many generations ago.
Trying to pass that off as a significant "advance" in a "scientific publication" borders on scientific fraud and should definitely be called out.
The problem here is that I am certain that this is wired to the career trajectories of "Very Important People(tm)" and the fact that it essentially failed miserably is simply not politically allowed.
If they want to lie, they can do that in press releases. If they want published in something reputable, they should have to be able to provide proper evidence for replication.
And, if they can't do that, well, that's an answer itself, no?
If true, your stated concerns with the AlphaChip paper -- selective benchmarking and potential overselling of results - reflect poor scientific practice and possible intellectual dishonesty. This does not constitute scientific fraud, which occurs when the underlying method/experiment/rules are faked.
If the paper has issues with how it positions and contextualizes its contribution, criticism is warranted, sure. But don't confuse this with "scientific fraud".
Some context: for as long as benchmark suites have existed, people rightly comment on which benchmarks should be included and how they should be weighted.
These air quotes suggests the commenter above doesn't think the paper qualifies a scientific publication. Such a characterization is unfair.
When I read the Nature article titled "Addendum: A graph placement methodology for fast chip design" [1], I see writing that more than meets the bar for a scientific publication. For example:
> Since publication, we have open-sourced a software repository [21] to fully reproduce the methods described in our paper. External researchers can use this repository to pre-train on a variety of chip blocks and then apply the pre-trained model to new blocks, as was done in our original paper. As part of this addendum, we are also releasing a model checkpoint pre-trained on 20 TPU blocks [22]. For best results, however, we continue to recommend that developers pre-train on their own in-distribution blocks [18], and provide a tutorial on how to perform pre-training with our open-source repository [23].
[1]: https://www.nature.com/articles/s41586-024-08032-5
[18]: Yue, S. et al. Scalability and generalization of circuit training for chip floorplanning. In Proc. 2022 International Symposium on Physical Design 65–70 (2022).
[21]: Guadarrama, S. et al. Circuit Training: an open-source framework for generating chip floor plans with distributed deep reinforcement learning. GitHub https://github.com/google-research/circuit_training (2021).
[23]: Guadarrama, S. et al. Pre-training. GitHub https://github.com/google-research/circuit_training/blob/mai... (2021).
However it's hard to see how being provably 2 years behind the first even in your own company in an incredibly hot area that people are doing tons of work in makes you suddenly second. By that logic I might still be in time to claim the silver for the 100m at the Paris olympics if I pop over there in the next 18 months or so.
I can see you created this account just to comment on this thread so I'm sure you have more inside information than I do given that I'm really not connected to this in any way. Enjoy your work at Google Research. I think you guys do cool stuff. It's a shame in my opinion that you choose to damage your credibility by making (and defending) such obviously false claims rather than concentrating on the genuinely innovative work you have done advancing the field.
Sure, there are some techniques in financial markets that are only valuable when they are not widely known. But claiming this pattern applies universally is incorrect.
Publishing a technique doesn't prove it doesn't work. (Stating it this way makes it fairly obvious.)
DeepMind, like many AI research labs, publish important and useful research. One might ask "is a lab leaving money off the table by publishing?". Perhaps a better question is "What 'game' is the lab playing and over what time scale?".
Given infinite time and compute - maybe the approach is significantly better. But that’s just not practical. So unless you see dramatic shifts - no one is going to throw away proven results on your new approach because of the TTM penalty if it goes wrong.
The EDA industry is (has to be) ultra conservative.
> The EDA industry is (has to be) ultra conservative.
What is special about EDA that requires it to be more conservative?> None of these "AI" tools have yet demonstrated the ability to classify "This is datapath", "This is array logic", "This is random logic".
Sounds like a good objective, one that could be added to training parameters. Or maybe it isn't needed (AI can 'understand' some concepts without explicitly tagging)
> If I have the compute power to run EDA tools with a couple of different random seeds, at least one run will likely be a couple percentage points better.
Then do it?! How long does it actually take to run? I know EDA tools creators are bad at some kinds of code optimization (and yes, it's hard) but let's say for a company like Intel, if it takes 10 days to rerun a chip to get 1% better, that sounds like a worthy tradeoff.
> I put this snake oil in the same category as "financial advice"--if it worked, they wouldn't be sharing it and would simply be printing money by taking advantage of it.
Yeah I don't think you understood the problem here. Good financial advice is about balancing risks and returns.
> EDA companies are garbage
I don't understand this comment. Can you please explain? Are they unethical? Or do they write poor software?EDA companies are gatekeeping monopolies. They absolutely abuse their monopoly position to extract huge chunks of money out of companies, and are pretty much single-handedly responsible for the fact that the hardware startup ecosystem is moribund compared to that of the software startup ecosystem.
They have been horrible liars about performance and benchmarketing for decades. They dragged their feet miserably over releasing Linux versions of their software because they were extracting money based upon number of CPU licenses (everything was on Sparc which was vastly inferior). Their software hasn't really improved all that much over decades--mostly they benefited from Moore's Law. They have made a point of stifling attempts at interoperability and open data exchange. They have bought lots of competitors mostly to just shut them down. I can go on and on.
The EDA companies aren't quite Oracle--but they're not far off.
This is one of the reasons why Google is getting pounded over this--maybe even unfairly. People in the field are super sensitive about bullshit claims from EDA vendors--we've heard them all and been on the receiving end of the stick far too many times.
This was the case before EDA companies even appeared. Hardware is hard because it's manufacturing. You can't "iterate quickly", every iteration costs millions of dollars and so does every mistake.
This is true for injection molding and yet we do that all the time in small businesses.
A mask set for an older technology can be in the range of $50K-$100K. That's right about the same price as injection molds.
The main difference is that Solidworks is about $25K while Cadence, et al, is about a megabuck.
Yes but not single-handedly -- it's them and the foundries, hand-in-hand.
No startup can compete with Synopsys because TSMC doesn't give out the true design rules to anybody smaller than Apple for finfet processes. Essentially their DRC+LVS software has become a DRM-encoded version of the design rule manual.
Agreed with most you mentioned but not about EDA companies are not worst than Oracle, at least Oracle is still supporting popular and useful open source projects namely MySQL, Virtualbox, etc.
What open-source design software these EDA companies are supporting currently although most of their software originated from open source EDA software from UC Berkeley, etc?
This is a fallacious argument. A better chip design process does not eliminate all other risks like product-market fit or the upfront cost of making masks or chronic mismanagement.
In short, I think the Nature authors have made some reasonable criticisms regarding the training methodology employed by the ISPD authors, but the extreme compute cost and runtime of AlphaChip still makes it non-competitive with commercial autofloorplanners and AutoDMP. Regardless, I think the ISPD authors owe the Nature authors an even more rigorous study that addresses all their criticisms. Even if they just try to evaluate the pre-trained checkpoint that Google published, that would be a useful piece of data to add to the debate.
A GPU costs $1-2/hr on the cloud market. So, ~$100-200 for inference, and ~$800-1600 for pre-training, which amortizes across chips. Cloud prices are an upper bound -- most CS labs will have way more than this available on premises.
In an industry context, these costs are completely dwarfed by the rest of the chip design process. (For context, the licensing costs alone for most commercial EDA software are in the millions of dollars.)
Are the other methods scalable in that way?
See appendix of the Nature article: rdcu.be/cmedX
The modern versions of that hill climb also use some RL (placing and routing chips is sort of like a game), but not in the way Jeff Dean wants it to be done.
Then they could pre-train on chips that are in-distribution for that task.
See also section 3.1 of their response paper, where they describe a comparison against commercial autoplacers: https://arxiv.org/pdf/2411.10053
It is possible that the pre-training step may overfit to a particular class of chips or may fail to converge given a general sample of chip designs. That would make the pre-training step unable to be used in the setting of a commercial EDA tool. The people who do know this are the people at EDA companies who are smart and not arrogant and who benchmarked this stuff before deciding not to adopt it.
If you want to make a good-faith assumption (that IMO is unwarranted given the rest of the paper), the people trying to replicate Google's paper may have done a pre-training step that failed to converge, and then didn't report it. That failure to converge could be due to ineptitude, but it could be due to data quality, too.
In fact, existing algorithms such as naive simulated annealing can be easily augmented with ML (e.g. using state embeddings to optimize hyperparameters for a given problem instance, or using a regression model to fine-tune proxy costs to better correlate with final QoR). Indeed, I strongly suspect commercial CAD software is already applying ML in many ways for mixed-placement and other CAD algorithms. The criticism against AlphaChip isn't about rejecting any application of ML to EDA CAD algorithms, but rather the particular formulation they used and objections to their reported results / comparisons.
The Nature authors already presented such a study in their appendix:
"To make comparisons fair, we ran 80 SA experiments sweeping different hyperparameters, including maximum temperature (10^−5, 3 × 10^−5, 5 × 10^−5, 7 × 10^-5, 10^−4, 2 × 10^−4, 5 × 10^−4, 10^−3), maximum SA episode length (5 × 10^4, 10^5) and seed (five different random seeds), and report the best results in terms of proxy wirelength and congestion costs in Extended Data Table 6"
Non-paywalled Nature article link: rdcu.be/cmedX
One key argument in the rebuttal against the ISPD article is that the resources used in their comparison were significantly smaller. To me, this point alone seems sufficient to question the validity of the ISPD work's conclusions. What are your thoughts on this?
Additionally, I noticed that the neutral tone of this comment is quite a departure from the strongly critical tone of your article toward the AlphaChip work (words like "arrogance", "disdain", "hyperbole", "belittling", "hostile" for AlphaChip authors, as opposed to "excellent" for a Synopsys VP.) Could you share where this difference in tone originates?
I believe this is a fair criticism, and it could be a reason why the ISPD Tensorboard shows divergence during training for some RTL designs. The ISPD authors provide their own justification for their substitution of training time for compute resources in page 11 of their paper (https://arxiv.org/pdf/2302.11014).
I do not think it changes the ISPD work's conclusions however since they demonstrate that CMP and AutoDMP outperform CT wrt QoR and runtime even though they use much fewer compute resources. If more compute resources are used and CT becomes competitive wrt QoR, then it will still lag behind in runtime. Furthermore, Google has not produced evidence that AlphaChip, with their substantial compute resources, outperforms commercial placers (or even AutoDMP). In the recent rebuttal from Google (https://arxiv.org/pdf/2411.10053), the only claim on page 8 says Google VLSI engineers preferred RL over humans and commercial placers on a blind study conducted in 2020. Commercial mixed placers, if configured correctly, have become very good over the past 4 years, so perhaps another blind study is warranted.
> Additionally, I noticed that the neutral tone of this comment is quite a departure from the strongly critical tone of your article
I will openly admit my bias is against the AlphaChip work. I referred to the Nature authors as 'arrogant' and 'disdainful' with respect to their statement that EDA CAD engineers are just being bitter ML-haters when they criticize the AlphaChip work. I referred to Jeff Dean as 'belittling' and 'hostile' and using 'hyperbole' with respect to his statements against Igor Markov, which I think is unbecoming of him. I referred to Shankar as 'excellent' with respect to his shrewd business acumen.
That said, on page 8, the paper says that 'standard licensing agreements with commercial vendors prohibit public comparison with their offerings.' Given this inherent limitation, what alternative approach could have been taken to enable a more meaningful comparison between CT and CMP?
Perhaps the Cadence license agreement signed by a corporation is different than the one signed by a university. In such a case, they could partner with a university. But I doubt their license agreement prevents any public comparison. For example, see the AutoDMP paper from NVIDIA (https://d1qx31qr3h6wln.cloudfront.net/publications/AutoDMP.p...) where on page 7 they openly benchmark their tool against Cadence Innovus. My suspicion is they wish to keep details about the TPU blocks they evaluated under tight wraps.
If publicizing comparisons of CMPs is as permissible as you suggest, have you seen a publication that directly compares a Cadence macro placement tool with a Synopsys tool? If I were the technically superior party, I’d be eager to showcase the fairest possible comparison, complete with transparent benchmarks and tools. In the CPU design space, we often see standardized benchmarking tools like SPEC microbenchmarks and gaming benchmarks. (And IMO that's part of why AMD could disrupt the PC market.) Does the EDA ecosystem support a similarly open culture of benchmarking for commercial tools?
Do you have any evidence to claim this? The whole point of this thread is that the direct comparisons might have been insufficient, and even the author of "The Saga" article who's biased against the AlphaChip work agreed.
> Granted, Google is belittling these comparisons, but that's what you'd expect.
This kind of language doesn't help any position you want to advocate.
About "the potential to disrupt", a potential is a potential. It's an initial work. What I find interesting is that people are so eager to assert that it's a dead-end without sufficient exploration.
Maybe you know more such published papers than I do, or you know the reasons why there aren't many. Somehow this lack of follow-up over three years suggests a dead-end.
As for "belittle", how would you describe the scientific term "regurgitating" used by Jeff Dean? Also, the term "fundamentally flawed" in reference to a 2023 paper by two senior professors with serious expertise and track record in the field, that for some reason no other experts in the field criticize? Where was Jeff Dean when that paper was published and reported by the media?
Unless Cheng and Kahng agree with this characterization, Jeff Dean's timing and language are counterproductive. If he ends up being wrong on this, what's the right thing to do?
That's the ISPD paper referenced many times in this whole thread.
> Stronger Baselines
Re: "Stronger baselines", the paper "That Chip Has Sailed" says "We provided the committee with one-line scripts that generated significantly better RL results than those reported in Markov et al., outperforming their “stronger” simulated annealing baseline." What is your take on this claim?
As for 'regurgitating,' I don’t think it helps Jeff Dean’s point either. Based on my and vighneshiyer's discussion above, describing the work as "fundamentally flawed" does not seem far-fetched. If Cheng and Kahng do not agree with this, I believe they can publish another invited paper.
On 'belittle,' my main issue was with your follow-up phrase, 'that’s what you’d expect.' It comes across as overly emotional and detracts from the discussion.
Regarding lack of follow-ups (I am aware of), the substantial resources required for this work seem beyond what academia can easily replicate. Additionally, according to "the Saga" article, both non-Jeff Dean authors have left Google until recently, but their Twitter/X/LinkedIn seem to say they came back to Google and seem to have worked on this "Sailing Chip" paper.
Personally, I hope they reignite their efforts on RL in EDA and work toward democratizing their methods so that other researchers can build new systems on their foundation. What are your thoughts? Do you hope they improve and refine their approach in future work, or do you believe there should be no continuation of this line of research?
To clarify "you'd expect" - if Jeff Dean is correct, he'd deny problems and if he's wrong he'd deny problems. So, his response carries little information. Rationally, this should be done by someone else with a track record in chip implementation.
Additionally, in case you forgot to answer, what is your wish for the future of this line of research? Do you hope to see it improve the EDA status quo, or would you prefer the work to stop entirely? If it is the latter, I would have no intention of continuing this conversation.
If only. The comparison in Cheng et al. is the only public comparison with CMP that I can recall, and it is pretty suss that this just so happens to be a very pro-commercial-autoplacer study. (And, Cheng et al. have cited 'licensing agreements' as a reason for not giving out the synthesized netlists necessary to reproduce their results.)
Reminded a bit of Oracle. They likewise used to (and maybe still?) prohibit any benchmarking of their database software against that of another provider. This seems to be a common move for solidifying a strong market position.
Like, for a CPU, you want to be sure it behaves properly for the given inputs. Anyone remember that floating point error in, was it Pentium IIs or Pentium IIIs?
I mean, I guess if the chip is designed for AI, and AIs are inherently nonguaranteed output/responses, then the AI chip design being nonguaranteed isn't any difference in nonguarantees.
Unless it is...
The same way you verify a human-generated one.
> Anyone remember that floating point error in, was it Pentium IIs or Pentium IIIs?
That was 1994. The industry has come a long way in the intervening 30 years.
Didn't Intel screw up design recently? Oxidation and degradation of ring bus IIRC.
I think you're asking a different question, but in the context of the OP researchers are exploring AI for solving deterministic but intractable problems in the field of chip design and not generating designs end to end.
Here's an excerpt from the paper.
"The objective is to place a netlist graph of macros (e.g., SRAMs) and standard cells (logic gates, such as NAND, NOR, and XOR) onto a chip canvas, such that power, performance, and area (PPA) are optimized, while adhering to constraints on placement density and routing congestion (described in Sections 3.3.6 and 3.3.5). Despite decades of research on this problem, it is still necessary for human experts to iterate for weeks with the existing placement tools, in order to produce solutions that meet multi-faceted design criteria."
The hope is that Reinforcement Learning can find solutions to such complex optimization problems.
> Despite decades of research on this problem, it is still necessary for human experts to iterate for weeks with the existing placement tools, in order to produce solutions that meet multi-faceted design criteria.
Ironically, this sounds a lot like building a bot to play StarCraft, which is exactly what AlphaStar did. I had no idea that EDA layout is still so difficult and manual in 2024. This seems like a very worth area of research.I am not an expert in AI/ML, but is the ultimate goal: Train on as many open source circuit designs as possible to build a base, then try to solve IC layouts problems via reinforcement learning, similar to AlphaStar. Finally, use the trained model to do inference during IC layout?
These are chip layouts used for fabbing chips. I don't think you will find many open source designs.
EDAs works closely with foundries (TSMC, Samsung, GlobalFoundaries). This is the bleeding edge stuff to get the best performance for NVIdia or AMD or Intel.
As an individual, it's very hard and expensive to fab your chip (though there are companies that pool multiple designs).
Surely all of the people who did the work that the innovation rests on should be confident they will be relevant, involved, comfortable, and safe in the post-innovation world?
And yet it’s not clear they should feel this way. Luddism seems an unfounded ideology over the scope of history since the origin of the term. But over the period since “AI” entered the public discussion at the current level? Almost two years exactly? Making the Luddite agenda credible has seemed a ubiquitous talking point.
Over that time frame technical people have been laid off in staggering numbers, a steadily-shrinking number of employers have been slashing headcount and posting EPS beats, and “AI” has been mentioned in every breath. It’s so extreme that even sophisticated knowledge of the kinds of subject matter that goes into AlphaChip is (allegedly) useless without access to the Hopper FLOPs.
If the AI Cartel was a little less rapacious, people might be a little more open to embracing the AI revolution.
The whole publication process seems dishonest, starting from publishing in Nature (why not ISCCC or something similar?)
Why would you publish in ISCCC when you can get into Nature?
Assuming Google isn’t lying, a lot of controversy would go away if they actually released their benchmark data for independent people to look at. They are still refusing to do so: https://cacm.acm.org/news/updates-spark-uproar/ Google thinks we should simply accept their conclusions by fiat. And don’t forget about this:
Madden further pointed out that the “30 to 35%” advantage of RePlAce was consistent with findings reported in a leaked paper by internal Google whistleblower Satrajit Chatterjee, an engineer who Google fired in 2022 when he first tried to publish the paper that discredited the “superhuman” claims Google was making at the time for its AI approach to chip design.
It is entirely appropriate to make “personal attacks” against Jeff Dean, because the heart of the criticism is that his personality is dishonest and authoritarian: he publishes suspicious research and fires people who dissent.[1] Jeff Dean hypocritically sneering about the critique being a conference paper is especially galling. What an unbelievable asshole.
(I agree with you in principle; my example above is meant to show that standards for things such as reproducibility aren't easily defined. There are so many factors to consider.)
Specifically:
> In particular the authors did no pre-training (despite pre-training being mentioned 37 times in our Nature article), robbing our learning-based method of its ability to learn from other chip designs
But in the Circuit Training Google repo[1] they specifically say:
> Our results training from scratch are comparable or better than the reported results in the paper (on page 22) which used fine-tuning from a pre-trained model.
I may be misunderstanding something here, but which one is it? Did they mess up when they did not pre-train or they followed the "steps" described in the original repo and tried to get a fair reproduction?
Also, the UCSD group had to reverse-engineer several steps to reproduce the results so it seems like the paper's results weren't reproducible by themselves.
[1]: https://github.com/google-research/circuit_training/blob/mai...
So no contradiction: pretrain on old designs then finetune on new design, vs train on everything mixed together throughout. Finetuning can cause catastrophic forgetting. Both could have better performance than not including old designs.
The Circuit Training repo was just going through an example. It is common for an open-source repo to describe simple examples for testing / validating your setup --- that does not mean this is how you should get optimal results in general. The confusion may stem from their statement that, in this example, they produced results that were comparable with the pre-trained results in the paper. This is clearly not a general repudiation of pre-training.
If Cheng et al. genuinely felt this was ambiguous, they should have reached out to the corresponding authors. If they ran into some part of the repo they felt they had to "reverse-engineer", they should have asked about that, too.
[8] Prior to publication of Cheng et al., our last correspondence with any of its authors was in August of 2022 when we reached out to share our new contact information.
[9] In contrast, prior to publishing in Nature, we corresponded extensively with Andrew Kahng, senior author of Cheng et al. and of the prior state of the art (RePlAce), to ensure that we were using the appropriate settings for RePlAce."
So, collective action problems are nearly a statistical certainty across a wide variety of situations. And yet we still "blame" individuals? We should know better.
He's not the first Jeffery with a lot of power who doesn't care.
Phrasing it this way isn't useful. Talking about choice in the abstract doesn't help with a game-theoretic analysis. You need costs and benefits too.
There are many people who face something like a prisoner's dilemma (on Twitter, for example). We could assess the cost-benefit of a particular person leaving Twitter. We could even judge them according to some standards (ethical, rational, and so on). But why bother?...
...Think about major collective action failures. How often are they the result of just one person's decisions? How does "blaming" or "judging" an individual help make a situation better? This effort on blaming could be better spent elsewhere; such as understanding the system and finding leverage points.
There are cases where blaming/guilt can help, but only in the prospective sense: if a person knows they will be blamed and face consequences for an action, it will make that action more costly. This might be enough to deter than decision. But do you think this applies in the context of the "do I leave Twitter?" decision? I'd say very little, if at all.
Cross-posting to a Mastodon account is not that hard.
I look at this from two viewpoints. One is that it's good that he spends most of this time and energy doing research/management and not getting bogged down in culture war stuff. The other is that those who have all this power ought to wield it a tiny tiny bit more responsibly. (IMHO social influence of the elites/leaders/cool-kids are also among those leverage points you speak of.)
Also, I'm not blaming him. I don't think it's morally wrong to use X. (I think it's mentally harmful, but X is not unique in this. Though character limit does select for "no u" type messages.) I'm at best cynically musing about the claimed helplessness of Jeff Dean with regards to finding a forum.
But the best (probably only) way to put downward pressure on that is via internal incentives, controls, and culture. You push hard enough for such percent per cadence with no upper bound and graduate the folks who reliably deliver it without checking if the win was there to begin with? This is scale-invariant: it could be in a pod, a department, a company, a hedge fund that owns much of those companies, a fund of those funds, the federal government.
Sooner or later your leadership is substantially penetrated by the unscrupulous. We see this in academia with the spate of scandals around publications. We see this in finance with, who can even count that high anymore. You see Holmes and SBF in prison but the folks they funded still at the apex of relevance and everyone from that clique? Everyone who didn’t just fall of a turnip truck knows has carried that ideology with them and has better lawyers now.
There’s an old saw that a “fish rots from the head”. We can’t look at every manner of shadiness and constant scandal from the iconic leaders of our STEM industry and say “good for them, they outsmarted the system” and expect any result other than a broad-spectrum attack on any honest, fair, equitable status quo.
We all voted with our feet (and I did my share of that too before I quit in disgust) for a “might makes right” quasi-religious system of ideals, known variously as Objectivism, Effective Altruism, and Capitalism (of which it is no kind). We shouldn’t be surprised that everything is kind of tarnished sticky now.
The answer today? I don’t know. Work for the less bad as opposed to more bad companies, speak out at least anonymously about abuses, listen to the leaders speak in interviews and scrutinize it. I’m open to suggestions.
"To be clear, we do NOT have evidence to believe that RL outperforms academic state-of—art and strongest commercial macro placers. The comparisons for the latter were done so poorly that in many cases the commercial tool failed to run due to installation issues." and that's supposedly a screenshot from an internal presentation done by Jeff Dean.
https://regmedia.co.uk/2023/03/26/satrajit_vs_google.pdf
As an outsider, I find it very difficult to judge if Chatterjee was a bad and expensive hire (because he suppressed good results by coworkers) or if he was a very valuable employee (because he tried to prevent publishing false statements).
1. preventing bad things
2. preventing bad in a way that all junior members on the receiving end feel bullied
So judging from the article alone, it's either suppressing good results and 2. above, both of which are not valuable in my book
According to a Google investigator's sworn statement, he admitted that he didn't have evidence to suspect the AlphaChip authors of fraud: "he stated that he suspected that the research being conducted by Goldie and Mirhoseini was fraudulent, but also stated that he did not have evidence to support his suspicion of fraud".
I feel like if someone persistently makes unsupported allegations of fraud, they should not be surprised if they get shown the door.
The comparison against commercial autoplacers might be this one (from That Chip Has Sailed - https://arxiv.org/pdf/2411.10053):
"In May of 2020, we performed a blind internal study[12] comparing our method against the latest version of two leading commercial autoplacers. Our method outperformed both, beating one 13 to 4 (with 3 ties) and the other 15 to 1 (with 4 ties). Unfortunately, standard licensing agreements with commercial vendors prohibit public comparison with their offerings."
[12] - "Our blind study compared RL to human experts and commercial autoplacers on 20 TPU blocks. First, the physical design engineer responsible for placing a given block ranked anonymized placements from each of the competing methods, evaluating purely on final QoR metrics with no knowledge of which method was used to generate each placement. Next, a panel of seven physical design experts reviewed each of the rankings and ties. The comparisons were unblinded only after completing both rounds of evaluation. The result was that the best placement was produced most often by RL, followed by human experts, followed by commercial autoplacers."
Also, you are using an unreviewed document from Google not published in any conference to counter published papers with specific results, primarily the Cheng et al paper. Jeff Dean did like that paper, so he can take it up with the conference and convince them to unpublish it. If he can't, maybe he is wrong.
Perhaps, you are biased toward Google, but why do think we should trust a document that was neither peer-reviewed nor published at a conference?
Legal nitpick - you can get away with alleging pretty much whatever you want in a legal complaint. You can't even be sued for defamation if it turns out later you were lying.
Jeff Dean isn't saying that Cheng et al. should be unpublished; he's saying that they didn't run the method the same way. It is perfectly fine for someone to try changing the method and report what they found. What's not fine is to claim that this means that Google was lying in their study.
Google claimed their new algorithm as a breakthrough. If this were the so, the algorithm would have helped design chips in many different cases. Now, the defense is that it only works for some inputs, and those inputs cannot be shared. This is not a serious defense and looks like a coverup.
He even ran a study internally (with Markov), but, as the AlphaChip authors describe:
In 2022, it was reviewed by an independent committee at Google, which determined that “the claims and conclusions in the draft are not scientifically backed by the experiments” [33] and “as the [AlphaChip] results on their original datasets were independently reproduced, this brought the [Markov et al.] RL results into question” [33]. We provided the committee with one-line scripts that generated significantly better RL results than those reported in Markov et al., outperforming their “stronger” simulated annealing baseline. We still do not know how Markov and his collaborators produced the numbers in their paper. (https://arxiv.org/pdf/2411.10053)
AI Alone Isn't Ready for Chip Design - https://news.ycombinator.com/item?id=42207373 - Nov 2024 (2 comments)
That Chip Has Sailed: Critique of Unfounded Skepticism Around AI for Chip Design - https://news.ycombinator.com/item?id=42172967 - Nov 2024 (9 comments)
Reevaluating Google's Reinforcement Learning for IC Macro Placement (AlphaChip) - https://news.ycombinator.com/item?id=42042046 - Nov 2024 (1 comment)
How AlphaChip transformed computer chip design - https://news.ycombinator.com/item?id=41672110 - Sept 2024 (194 comments)
Tension Inside Google over a Fired AI Researcher’s Conduct - https://news.ycombinator.com/item?id=31576301 - May 2022 (23 comments)
Google is using AI to design chips that will accelerate AI - https://news.ycombinator.com/item?id=22717983 - March 2020 (1 comment)
For instance:
> Much of this unfounded skepticism is driven by a deeply flawed non-peer-reviewed publication by Cheng et al. that claimed to replicate our approach but failed to follow our methodology in major ways. In particular the authors did no pre-training (despite pre-training being mentioned 37 times in our Nature article),
This could easily be written more succinctly, and with less bias, as:
> Much of this skepticism is driven by a publication by Cheng et al. that claimed to replicate our approach but failed to follow our methodology in major ways. In particular the authors did no pre-training,
Calling the skepticism unfounded or deeply flawed does not make it so, and pointing out that a particular publication is not peer reviewed does not make its contents false. The authors would be better served by maintaining a more neutral tone rather than coming off accusatory and heavily biased.
Was the TPU physical design team also taken in? And also MediaTek? And also TF-Agents, which publicly said they re-produced the AlphaChip method and results exactly?
Someone mentioned here before that Google folks have been using "hyperbolae". So, if MediaTek can clarify how they are using AlphaChip, everyone wins.
One interesting aspect of this though is vice-versa, whilst Google has oodles of compute, Synopsys has oodles of data to train on (if, and this is a massive if, they can get away with training on customer IP).