https://myswamp.substack.com/p/improving-accessibility-using...
Maybe I'll redo it and add in 1.5-8b, it's so cheap it doesn't hurt to add it lol.
https://substack.com/profile/107132439-michael-barajas/note/...
> To make this model as useful as we can, we are doubling the 1.5 Flash-8B rate limits, meaning developers can send up to 4,000 requests per minute (RPM).
You can even compare the late limit here https://ai.google.dev/pricing
Most editors can easily support LLMs via Fill in Middle operation mode
But I do wonder, how well does Gemini 1.5 Pro / Flash recall from the context window? For example, when both chatGPT and Claude allowed for 8k context window, Claude was still way far ahead with recalling what you've said compared to chatGPT which tend to forget tokens after a while, so you had to remind it.
As of the recall performance, I can't really speak from my experience, you should try yourself :)