There are a lot of things that don't appear in ELO scores. For one, they will not reflect that you cannot prompt women's faces in Flux. We can only speculate why.
I feel like AI should just be treated as fair use as long as its not 100% blatantly a literal clone of the original work.
Ideogram and Flux both have their own broad set of limitations that are non-technical and unpublished. IMO they are not really motivated by legal concerns, other than the lack of transparency itself.
So maybe the issue is that transparency, and that the hazy legal climate means no transparency. You can't go anywhere and see the detailed list of dataset collection and captioning opinions for proprietary models. Open Model Initiative, trying to make a model, did publish their opinions, and they're not getting sued anytime soon. However, their opinions are an endless source of conflict.
It is actually making it harder to use the technology to represent women characters, which is so ironic. That said, I could just lEaRn tO dRaW or pAy aN aRtIsT right? The discourse around this is so shitty.
"our most advanced and efficient model yet"
"a significant step forward in our mission to empower creators"
I get it, you can't sell things if you don't market them, and you can't make a living making things if you don't sell them, but it's exhausting.
What does "state of the art" mean? That it's using the latest "cutting edge" model technology?
When Apple releases a new iPhone Pro Max, it's "state of the art". When they release a new iPhone SE, there's an argument to be made that it's not because it uses 2 year old chips. But what would it even mean for BFL to release a model which wasn't "state of the art"
> our most advanced and efficient model yet
Yes, likewise, this is how technology companies work. They release something and then the next thing they release is more advanced.
> a significant step forward in our mission to empower creators
Going from 12 seconds to 4 seconds is a significant speed boost, but does it move the needle on their mission to empower creators? These are their words, not mine, it's a technical achievement and impressive incremental progress, but are there users out there who are more empowered by this? significantly more empowered!?
Did you miss the first flux release? Black forest labs aren't screwing around. The team consists of many of the _actual_ originators of Stable Diffusion's research (which was effectively co-opted by Emad Mostaque who is likely a sociopath).
That's not what "state of the art" means, and if it did it would still be hollow marketing jargon, because there are specific and meaningful ways to say that FLUX1.1 [pro] outperforms all competitors (and they do say so, later in the press release)
Your confusion about what "state of the art" means is exactly why marketers still use the phrase even though it has been overused and worn out since at least the 1980's. State of the art means something is "new", and that it is the "latest development", and that it incorporates "cutting edge" technology. The implication is that new is better, and that the "state of the art" is an improvement over what came before. (And to be clear, that's often true! Including in this case!) But that's not what the phrase actually means, it just means that something is new. And every press release is about something new.
FLUX1.1 [pro] would be state of the art even if it was worse than the previous version. Stable Diffusion 2.0 was state of the art when it was released.
But you keep pretending that close source AI is a sustainable comparison.
That’s why I said I run Flux Dev.
- Take your morning to the next level!
Some comparisons against DALL-E 3.
https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Ima...
> ... photo with the text "FLUX 1.1 [Pro]", ..., must say "1.1", ...
...And of course, it does not.
I managed to get it running on an old computer with a 2060 Super, taking ~1.5 minutes per image gen. People are generating on a 1080.
It's about 6 lines of Python.
For goodness sake, the MET in New York has a massive trove of open CC0 type licensed art. Dear BFL, please ease up a bit on this, and add some art-art to your models, they will be better as a result.
I actually use flux to generate image for purposes of adherence, then pull it in as a canny/depth controlnet with more established models like realvis, unstableXL, etc.
I suspect we'll see the answer to this is LoRAs. Two examples that stick out are:
- Flux Tarot v1 [0]
- Flux Amateur Photography [1]
Both of these do a great job of combining all the benefits of Flux with custom styles that seem to work quite well.
[0] https://huggingface.co/multimodalart/flux-tarot-v1 [1] https://civitai.com/models/652699?modelVersionId=756149
Here are a few I've recently trained: https://civitai.com/user/dvyio
A reasonable amount of training images (50 or so), and then I train for 2,000-ish steps for a new style.
Many of them work well with Flux, particularly if they're illustration-based. Some don't seem to work at all, so I didn't upload those!
I have a MacBook Air so I train using the various API providers.
For training a style, I use Replicate: https://replicate.com/ostris/flux-dev-lora-trainer/train
For training a concept/person, I use fal: https://fal.ai/models/fal-ai/flux-lora-fast-training
With fal, you can train a concept in around 2 minutes and only pay $2. Incredibly cheap. (You could also use it for training a style if you wanted to. I just found I seem to get slightly better results using Replicate's trainer for a style.)
If you do end up training one on yourself with fal, it should ultimately take you here (https://fal.ai/models/fal-ai/flux-lora) with your new LoRA pre-filled.
Then:
1. Click 'Add item' to add another LoRA and enter the URL of a style LoRA's SafeTensor file (with Civitai, go to any style you like and copy the URL from the download button) (you can also find LoRAs on Hugging Face)
2. Paste that SafeTensor URL as the second LoRA, remembering to include the trigger word for yourself (you set this when you start the training) and the trigger word for the style (it tells you on the Civitai page)
3. Play with the strength for the LoRAs if you want it to look more like you or more like the style, etc.
-----
If you want a style LoRA to try, this one of SNL title cards I trained actually makes some great photographic images. https://civitai.com/models/773477/flux-lora-snl-portrait (the download link would be https://civitai.com/api/download/models/865105?type=Model&fo...)
-----
There's a lot of trial and error to get the best combinations. Have fun!
I want to make a LoRA of Peokudin-Gorskii photographs from the Library of Congress collection and they have thousands of photos, so I’m curious whether that’s effective for autogenerating the caption for images.
I have a preset in there that I sometimes use to generate captions using GPT-4o.
If you use Replicate, they'll also generate captions for you automatically if you wish. (I think they use LLaVA behind the scenes.) I typically use this just because it's easier, and seems to work well enough.
I have not used runpod or airgpu, and not affiliated.
Similar deal with Replicate: an A100 there is over $5/hr, whereas on Runpod it's $1.64/hr.
And if you use the "serverless" services, the pricing becomes even more astronomical; as you note, $1/minute is unreasonably expensive: that's over 20x the cost of renting 8xH100s on Runpod's "Secure Cloud" (and 8xH100s are extreme overkill for finetuning image generators: even 1xH100 would be sufficient, meaning it's actually 160x markup).
Here are a few in Degar style I made after training for 2,500 steps. I'd love to hear what you think of them. To my (untrained) eye, they seem a little too defined, perhaps?
A possible solution may be to incorporate artificial images in the training data. So, create an initial LoRA with the original Degas images and generate 500 images. From those generated images, pick the ones that most resemble Degas. Add those to the training set and train again. Repeat until (hopefully) it learns the correct style.
Whatever they're doing at Midjourney is still impressive. No training needed and a better result.
Diversity is a double-edged sword. It's a desirable feature where you want it, and an undesirable feature everywhere else. If you want an impressionist painting, then it's good to have Monet and Degas in the training corpus. On the other hand, if you want a photograph of water lilies, then it's good to keep Monet out of the training data.
It feels like they just removed names from the datasets to make it worse at recreating famous people and artists.
But that real art still exists, and can still be found, so what exactly is the loss here?
And between 1995 and 2022 the amount of Art produced surpasses the cumulative output of all other periods of human history.
I suspect the same goes for art styles. There's such huge variety that really they'd be better surveys by separate models.
When I ask for a man playing accordion, it’s usually a somewhat flawed piano accordion, but If I ask for a woman playing accordion, it’s usually a button accordion. I’ve also seen a few that are half-button, half-piano monstrosities.
Also, if I ask for “someone playing accordion”, it’s always a woman.
No one in the image space wants to admit it, but well over half of your user base wants to generate hardcore NSFW with your models and they mostly don’t care about any other capabilities.
Comparisons of similar prompt using Midjourney 6.1
Also, flux (schnell, dev) can be run on your local machine.
If you really want to use a paid service, Ideogram is probably the best one out there that balances quality with adherence. DALL-E 3 also has good adherence as well though the quality can sometimes be iffy, and it's very puritanical in terms of censorship.
I had similar issues trying to paint a "I cast non-magic missile" meme with a fantasy wizard using a missile launcher. No model out there (I've tried SD, SDXL, FLUX.1dev and now this FLUX1.1pro) knows how a missile launcher looks like (neither as a generic term, nor any specific systems) and even has no clue how it's held, so they all draw really weird contraptions.
And, sure, that's what LoRAs are for. If I can figure out how to train one for FLUX, in a way that would actually produce something meaningful (my pitiful attempts at SDXL LoRA training were... less that stellar, and FLUX is quite different from everything). Although that's probably not worth it for making a meme picture...
flux is amazing, but I find it requires a very literal description, which pushes the "creative work" back to the text itself. Which can certainly be a good thing, just a bit less gratifying to non visual types like myself. :)
I wonder, only somewhat jokingly, if one could make text generators which "imagine" detailed fantastical scenes, suitable for feeding to a text to image model.
This is miles ahead of most other image generation models available today.
edit: nevermind, it's a macos app
Ironically, I am afraid to type the website out and will keep it unknown here. My account could be suspended because of this. It had already reached -1 karma. It's better to keep my account alive.
I just tried this Flux1.1 pro page (prompt: "A sad Macintosh user who is upset because his computer can't play games") and was very impressed by the detail and "understanding" this model has.