These jumpers or shortcuts do create connections between the relevant new concepts in the model, but by directly connecting them instead of associating them through the existing network of concepts, nuance is lost and the bypassed areas become deemphasized, leading to forgetting of previously held associations.
Because of this, In general, fine tuning produces better results than LoRa in most cases, especially when forgetting of existing training is detrimental.
Or, to further oversimplify the issue in SE terms, LoRa == monkeypatching. (Is this a kind of intruder dimension?)
Also, LoRa tuning an extensively tuned model occasionally provokes full on delusional “insanity” or gibberish seizures.
I have had really good luck though using a highly tuned model as the training basis for a LoRa and then applying that LoRa mask to the base version of that model. I’m not sure why that seems to work better than the same LoRa training directly on the base model.
But no, another "AI model discussions" Ya'll need to start picking names for thing that don't collide with other's "Rapid Unique Sentence Training" for preloading language models with non orthogonal sentences. "Phillips-Young Training Hyper-Orthogonal Networks" Using the work of Phillips and Young to restructure Orthogonal Networks to be hyper dense.
You're thinking of LoRa radio, from Long Range. There's one of you in each LoRA comment section, I have a hard time believing it's an actual mistake in good faith anymore.
This sounds interesting, but I can't see that they do much with this result. Are they saving it for a follow up paper? I would think that if their whole paper is about a big problem with LoRAs and they then find what looks like an easy solution for that problem that would warrant more than a paragraph just before the conclusion.
It would also have been interesting if they included the DoRA method, they reference it briefly and that paper claims to resemble fine tuning learning behavior.
But perhaps this paper is focused on LoRA behavior, and a separate paper comparing various improvements is better.