For the context encode it’s always close to as fast as a model with a similar number of active params.
For running on your own the issue is going to be fitting all the params on your gpu. If you’re loading off disk anyways this will be faster but if this forces you to put stuff on disk it will be much slower.
Only when talking about how fast it can produce output. From a capability point of view it makes sense to compare the larger number of parameters. I suppose there's also a "total storage" comparison too, since didn't they say this is 8bit model weights, where llama is 16bit?
Apparently 20% of Nvidia's quarterly revenue is booked in Singapore where shell companies divert product to China: https://news.ycombinator.com/item?id=42048065
EDIT: The HN title was changed, which previously made the claim. But as HN user swyx pointed out, Tencent is also claiming this is open source, e.g.: "The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry".
Edit: Also, if you don't want to follow or deal with EU law, you don't do business in the EU. People here regularly say if you do business in a country, you have to follow its laws. The opposite also applies.
https://news.bloomberglaw.com/ip-law/openais-aggressive-cour...
Can any lawyer on here defend OpenAI's request? Or is the article not characterizing it well in the quote?
Model weights could be treated the same way phone books, encyclopedias, and other collections of data are treated. The copyright is over the collection itself, even if the individual items are not copyrightable.
Encyclopedias are copyrightable. Phone books are not.
eg if I upload Marvels_Avengers.mkv.onnx and it reliably reproduces the original (after all, it's just a fact that the first byte of the original file is OxF0, etc)
Judge the output, not the system.
IIRC, this is wrong. Independent creation is a valid (but almost impossible to prove) defense in US copyright law.
This example is not an independent creation, but your reasoning seems wrong.
Are they, or are they collections of probabilities? If they are probabilities, and those probabilities change from model to model, that seems like they might be copywritable.
If Google, OpenAI, Facebook, and Anthropic each train a model from scratch on an identical training corpus, they would wind up with four different models that had four differing sets of weights, because they digest and process the same input corpus differently.
That indicates to me that they are not a collection of facts.
At some point, with sufficiently many hyperparameters being chosen, that starts becoming a creative decision. If 5 parameters are available and all are left at the default, then no, that's not creative. If there are ten thousand, and all are individually tweaked to yield what the user wants, is that creative?
Not to mention all of these companies write their own algorithms to do the training which can introduce other small differences.
It depends on the jurisdiction. The US Supreme Court ruled that phone books are not copyrightable in the 1991 case Feist Publications, Inc., v. Rural Telephone Service Co.. However, that is not the law in the UK, which generally follows the 1900 House of Lords decision Walter v Lane that found that mere "sweat of the brow" is enough to establish copyright – that case upheld a publisher's copyright on a book of speeches by politicians, purely on the grounds of the human effort involved in transcribing them.
Furthermore, under its 1996 Database Directive, the EU introduced the sui generis database right, which is a legally distinct form of intellectual property from copyright, but with many of the same features, protecting mere aggregations of information, including phone directories. The UK has retained this after Brexit. However, EU directives give member states discretion over the precise legal mechanism of their implementation, and the UK used that discretion to make database rights a subset of copyright – so, while in EU law they are a technically distinct type of IP from copyright, under UK law they are an application of copyright. EU law only requires database rights to have a term of 15 years.
Do not be surprised if in the next couple of years the EU comes out with a "AI Model Weights Directive" establishing a "sui generis AI model weights right". And I'm sure US Congress will be interested in following suit. I expect OpenAI / Meta / Google / Microsoft / etc will be lobbying for them to do so.
> "By open-sourcing the Hunyuan-Large model"
In particular, restrictions on ML models will leave you without access to extremely powerful resources that are available to people in other countries, and to people in your own country who don't mind operating outside the law. Copyright maximalism is not, in fact, a good thing, and neither is overbearing nanny-statism. Both will ultimately disempower you.
It doesn't matter if an individual personally has access to ML models, because government and/or huge corporations will ensure that individuals cannot use them for anything that would threaten government or corporate interests
This unfettered explosion of ML growth is disempowering all of us. Those with power are not using these tools to augment us, they are hoping to replace us.
Never mind that I've gotten things done with ChatGPT that would otherwise have taken much longer, or not gotten done at all. If this is what "disempowerment" feels like, bring it on.
Although the tech is nowhere near ready to make it happen, I would be very happy to be "replaced" by AI. I have better things to do than a robot's job. You probably do, too.
Say what you want about ML models, they will get better at a rate that outpaces any possible self-improvement on your part. (Maybe you've noticed that those jokes about six-fingered people aren't aging particularly well.) The same is true for me, and I don't want to be left behind as that happens. At the national scope, countries that act to restrict or impede progress in this area will be outcompeted dramatically in the long run.
[0] https://arxiv.org/pdf/2411.02265 [1] https://llm.hunyuan.tencent.com/
They use
- 16 experts, of which one is activated per token
- 1 shared expert that is always active
in summary that makes around 52B active parameters per token instead of the 405B of LLama3.1.
Anyone have some background on this?
There's many places where the model might be used which could count as high-risk scenarios and require lots of controls. Also, we have:
GPAI models present systemic risks when the cumulative amount of compute used for its training is greater than 10^25 floating point operations (FLOPs). Providers must notify the Commission if their model meets this criterion within 2 weeks. The provider may present arguments that, despite meeting the criteria, their model does not present systemic risks. The Commission may decide on its own, or via a qualified alert from the scientific panel of independent experts, that a model has high impact capabilities, rendering it systemic.
In addition to the four obligations above, providers of GPAI models with systemic risk must also:
- Perform model evaluations, including conducting and documenting adversarial testing to identify and mitigate systemic risk.
- Assess and mitigate possible systemic risks, including their sources.
- Track, document and report serious incidents and possible corrective measures to the AI Office and relevant national competent authorities without undue delay.
- Ensure an adequate level of cybersecurity protection."
They may not want to meet these requirements.Is there a reason this number was chosen?
We've come so astonishingly far in like two years. I have no idea what AI will do in another year, and it's thrilling.
When splitting models layer by layer, users in r/LocalLLaMA have reported good results with as low as PCIE 3.0 x4 as the interconnect (4GB/s). For tensor parallelism, the interconnect requirements are higher but the upside can be faster speeds in accordance to number of GPUs split across (whereas layer by layer operated like a pipeline, so isn't necessarily faster than what a single GPU can provide, even if splitting across 8 GPUs).
Once you have something with 192 GB it gets interesting. You could probably have 7 at FP8 per GPU. At FP16 it probably only would fit 3 per card, requiring 9 again.
I'd say for the current memory layout of cards they missed a little bit the sweet spot. With slightly smaller models or one expert less one should be able to run it on 8 H100s at FP8 or 2 B100s at FP8 or even on 4 B100s at FP16 if I calculated correctly.
They're actually a best-case for CPU inference vs dense models. I usually run deepseek 2.5 quanted to q8, but if this model works well I'll probably switch to it once support hits llama.cpp.