From Twitter/X:
Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date.
Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike.
More details and examples of what Movie Gen can do https://go.fb.me/kx1nqm
Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt.
Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment.
Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes.
Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video.
We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.
It being 30B gives me hope.
[0] https://ai.meta.com/blog/generative-ai-text-images-cm3leon/
Considering that Facebook/Meta releases blog posts titled "Open Source AI Is the Path Forward" but then refuses to actually release any Open Source AI, I'm guessing the answer is a hard "No".
They might release it under usage restrictions though, like they did with Llama, although probably only the smaller versions, to limit the output quality.
Just because pytorch is Open Source doesn't mean everything Meta AI releases is Open Source, not sure how that would make sense.
Datasets for Llama 3 is "A new mix of publicly available online data.", not exactly open or even very descriptive. That could be anything.
And no, the training code for Llama 3 isn't available, response from a Meta employee was: "However, at the moment-we haven't open sourced the pre-training scripts".
Here is the Llama source code, you can start training more epochs with it today if you like: https://github.com/meta-llama/llama3/blob/main/llama/model.p...
It's rumored Llama 3 used FineWeb, but you're right that they at least haven't been transparent about that: https://huggingface.co/datasets/HuggingFaceFW/fineweb
For models I prefer the term "open weight", but to assert they haven't open sourced models at all is plainly incorrect.
Correct me if I'm wrong, but that's the code for doing inference?
Meta employee told me just the other day: "However, at the moment-we haven't open sourced the pre-training scripts", can't imagine they would be wrong about it?
https://github.com/meta-llama/llama-recipes/issues/693
> For models I prefer the term "open weight"
Personally, "Open" implies I can download them without signing an agreement with LLama, and I can do whatever I want with it. But I understand the community seems to think otherwise, especially considering the messaging Meta has around Llama, and how little the community is pushing back on it.
So Meta doesn't allow downloading the Llama weights without accepting the terms from them, doesn't allow unrestricted usage of those weights, doesn't share the training scripts nor the training data for creating the model.
The only thing that could be considered "open" would be that I can download the weights after signing the terms. Personally I wouldn't make the case that that's "open" as much as "possible to download", but again, I understand others understand it differently.
Apologies - you are right and I was wrong. I would edit my comments but they're past the edit window, will leave a comment accordingly.
In fact, the more realistic the deepfakes become, the less harmful actual revenge porn and stolen sex videos can be, because of plausible deniability.
This "good deepfakes will prevent harm because of plausible deniability" is absurd copium, and utterly divorced from reality.
Speak to victims some time. You are not helping them.
That's something that can be fixed in a future release or you can fix it right now with some filters in post in your pipeline.
This has been the default excuse for the last 5+ years. I won't hold my breath.
AI in general is from 1950, or more generally from whenever the abacus was invented. This very website runs on AI, and always has. I would implore us to speak more exactly if we’re criticizing stuff; “LLMs” came around (in force) in 2023, both for coherent language use (ChatGPT 3.5) and image use (DALLE2). The predecessors were an order of magnitude less capable, and going back 5 years puts us back in the era of “chatbots”, aka dumb toys that can barely string together a Reddit comment on /r/subredditsimulator.
> AI so far has given us ability to mass produce shit content of no use to anybody
Good AI goes largely undetected, for the simple reason that it closely matches the distribution of non-AI content.
Controversial aside: This is same bias that results in non-passing trans people being representative of the whole. Passing trans folk simply blend in.
Today we have these realistic videos that are still in the uncanny valley. That's insane progress in the span of a year. Who knows what it will be like in another year.
Let'em cook.
2 years ago we had decent video generation for clips
7 months ago we have Sora https://news.ycombinator.com/item?id=39393252 (still silence since then)
With these things, like DALL-E 1 and GPT-3, the original release of the game changer often comes ca. 2 years before people can actually use it. I think that's what we're looking at.
I.e. it's not as fast as you think.
And isn’t this model open source…? So we get access to it, like, momentarily? Or did I miss something?
The only interesting questions now have nothing to do with capability but with economics and raw resources.
In a few years, or less, clearly we'll be able to take our favorite books and watch unabridged, word-for-word copies. The quality, acting, and cinematography will rival the biggest budget Hollywood films. The "special effects" won't look remotely CG like all of the newest Disney/Marvel movies -- unless you want them to. If publishers put up some sort of legal firewall to prevent it, their authors, characters, and stories will all be forgotten.
And if we can spend $100 of compute and get something I described above, why wouldn't Disney et al throw $500m at something to get even more out of it, and charge everyone $50? Or maybe we'll all just be zoo animals soon (Or the zoo animals will have neuralink implants and human level intelligence, then what?)
That would be a boring movie.
I don't think so at all. You're thinking a movie is just the end result that we watch in theaters. Good directing is not a text prompt, good editing is not a text prompt, good acting is not a text prompt. What you'll see in a few years is more ads. Lots of ads. People who make movies aren't salivating at this stuff but advertising agencies are because it's just bullshit content meant to distract and be replaced by more distractions.
But at the same time, while it is indeed true that the end result is far more than simply just making good images, LLMs are weird interns at everything — with all the negative that implies as well as the positive, so they're not likely to produce genuinely award winning content all by themselves even though they can do better by asking them for something "award winning" — so it's certainly conceivable that we'll see AI indeed do all these things competently at some point.
I'm also expecting, before 2030, that video game pipelines will be replaced entirely. No more polygons and textures, not as we understand the concepts now, just directly rendering any style you want, perfectly, on top of whatever the gameplay logic provided.
I might even get that photorealistic re-imagining of Marathon 2 that I've been wanting since 1997 or so.
This isn't an intentional "feature" of these models; rather, it's kind of an inherent part of how such models work — they learn associations between tokens and structural details of images. Artists' names are tokens like any other, and artists' styles are structural details like any other.
So, unless the architecture and training of this model are very unusual, it's gonna at least be able to give you something that looks like e.g. a "pencil illustration."
There were movies with horrible VFX that still sold perfectly well at the time.
But then, a lot of people have financial reasons to ignore the problem. What's too bad, because it's hindering the creation of stuff that are actually useful.
These models do seem like they could be great photorealism/stylization shaders. And they are also pretty good at stuff like realistic explosions, fluid renders etc. That stuff is really hard with CG.
I won't argue wether text to speech qualifies as an AI but I agree they must be making bank.
Might also be an AI voice-changer (i.e. speech2speech) model.
These models are most well-known for being used to create "if [famous singer] performed [famous song not by them]" covers — you sing the song yourself, then run your recording through the model to convert the recording into an equivalent performance in the singer's voice; and then you composite that onto a vocal-less version of the track.
But you can just as well use such a model to have overseas workers read a script, and then convert that recording into an "equivalent performance" in a fluent English speaker's voice.
Such models just slip up when they hit input phonemes they can't quite understand the meaning of.
(If you were setting this up for your own personal use, you could fine-tune the speech2speech model like a translation model, so it understands how your specific accent should map to the target. [I.e., take a bunch of known sample outputs, and create paired inputs by recording your own performances of them.] This wouldn't be tenable for a big low-cost operation, of course, as the recordings would come from temp workers all over the world with high churn.)
But the people who position themselves to profit from the energy consumption of the hardware will profit from all of it: the LLMs, the image generators, the video generators, etc. See discussion yesterday: https://news.ycombinator.com/item?id=41733311
Imagine the number of worthless images being generated as people try to find one they like. Slop content creators iterate on a prompt, or maybe create hundreds of video clips hoping to find one that gets views. This is a compute-intensive process that consumes an enormous amount of energy.
The market for chips will fragment, margins will shrink. It's just matrix multiplication and the user interface is PyTorch or similar. Nvidia will keep some of its business, Google's TPUs will capture some, other players like Tenstorrent (https://tenstorrent.com/hardware/grayskull) and Groq and Cerebras will capture some, etc.
But at the root of it all is the electricity demand. That's where the money will be made. Data centers need baseload power, preferably clean baseload power.
Unless hydro is available, the only clean baseload power source is nuclear fission. As we emerge from the Fukushima bear market where many uranium mining companies went out of business, the bottleneck is the fuel: uranium.
That said, this channel has been producing videos well before ChatGPT3.5/4 so at the very least they probably started with human written scripts.
The image had all the telltale signs of being AI generated (too much detail, the lights & shadows were the wrong scale, the focus of the lens was odd for the kind of photo, etc). I checked that other group, and sure enough, they claim to be about sharing "miniature dioramas" but all they share is AI-generated crap.
And in the original group, which I'm a member of and is full of people who actually create dioramas -- let's say they are "subject matter experts" -- nobody suspected anything! To them, who are unfamiliar with AI art, the photo was of a real hand-made diorama.
At this point, looking at a big tech SoMe feed I would expect that everything is, or at least could be, gen AI content.
I could see it being pretty shocking if I hadn't, but I honestly can't imagine how I'd miss that.
> I could see it being pretty shocking if I hadn't, but I honestly can't imagine how I'd miss that.
The point of the video wasn't to count correctly, but to see the gorilla
Essentially, send me a video of something I care about and I will only look for that thing. Most people are not detectives, and even most would-be detectives aren’t yet experts.
Who even needs AI generated videos when you can just act out absurdity and pretend it's real?
There's a morbid path from the grainy Iraq war and earlier shaky footage, through IS propaganda which at the time had basically the most intense combat footage ever released to the Ukraine war. Which took it to the morbid end conclusion of endless drone video deaths and edited clips 30+ mins long with day long engagements and defending.
And yes, to answer your belief that there is none - there is loads of "cinematic body cam footage out there now".
Fake images play into that, but they don't need to be AI generated for that to be true, it's been true forever.
The only regret I have about that is losing video as a form of evidence. CCTV footage and the like are a valuable tool for solving crimes. That’s going to be out the window soon.
I have yet to find examples of this
* https://www.reddit.com/gallery/1fvs0e1
* https://old.reddit.com/r/StableDiffusion/comments/1fak0jl/fi...
- The monkey in hotspring video, if not for its weird beard...
- The koala video I would have mistaken for hollywood-quality studio CGI (although I would know it's not real because koalas don't surf... do they?)
- The pumpkin video if played at 1/4 resolution and 2x speed
- The dog-at-Versailles video edit
If the videos are that good, I'm sure I already can't distinguish between real photos and the best AI images. For example, ThisPersonDoesNotExist isn't even very recent, but I wouldn't be able to tell whether most of its output is real or not, although it's limited to a certain style of close-up portrait photography.
Not to take away from your point but it's more limited than one might think from this phrase. As an exercise, open that page and scroll so the full image is on your screen, then hover your mouse cursor within the iris of one of the eyes, refresh and scroll again. (Edit: I just noticed there's a delayed refresh button on the page, so one can click that and then move their mouse over the eye to skip a full page refresh.) I've yet to see a case where my mouse cursor is not still in line with the iris of the next not-person.
There's an entire pattern of reels that are basically just ripped-off-content with enough added noise to (I presume) avoid content detection filters. Then the comments have links to scam sites (but are labelled as "the IMDB page for this content").
Always important to bear in mind that the examples they show are likely the best examples they were able to produce.
Many times over the past few years a new AI release has "wowed" me, but none of them resulted in any sudden overnight changes to the world as we know it.
VFX artists: You can sleep well tonight, just keep an eye on things!
Everyone should sleep well tonight, but only because we’ll look out for each other and fight for just distribution of resources, not because the current job market is stable. IMO :)
Here's an example thread: https://www.reddit.com/r/vfx/comments/1e4zdj7/in_the_climate...
I am not trying to be negative, however it is the reality that ML/LLM has eliminated entire industries. Medical transcription for example is essentially gone.
What is the required amount of labor humans should have to do?
that’s quite a productive thing. art has tremendous value to society.
why don’t we automate the washing machine more instead of automating the artist?
Automating more in the real world is much (much) harder than grabbing the low-hanging fruits in the digital world.
As someone who’s worked in the industry previously and am quite involved still, very few studios are using it because of the lack of direction it can take and the copyright quagmire. There are lots of uses of ML in VFX but those aren’t necessarily GenAI.
GenAI hasn’t had an effect on the industry yet. It’s unlikely it will for a while longer. Bad business moves from clients are the bigger drain, including not negotiating with unions and a marked decline in streaming to cover lost profits.
Pumpkin patch - Not sitting on the grass, not wearing a scarf, no rows of pumpkins the way most people would imagine.
Sloth - that's not really a tropical drink, and we can't see enough of the background to call it a "tropical world".
Fire spinner - not wearing a green cloth around his waist
Ghost - Not facing the mirror, obviously not reflected the way the prompter intended. No old beams, no cloth-covered furniture, not what I would call "cool and natural light". This is probably the most impressively realistic-looking example, but it almost certainly doesn't come close to matching what the prompter was imagining.
Monkey - boat doesn't have a rudder, no trees or lush greenery
Science lab - no rainbow wallpaper
This seems like nitpicking, and again I can't underestimate how unbelievable the technology is, but the process of making any kind of video or movie involves translating a very specific vision from your brain to reality. I can't think of many applications where "anything that looks good and vaguely matches the assignment" is the goal. I guess stock footage videographers should be concerned.
This all matches my experience using any kind of AI tool. Once I get past my astonishment at the quality of the results, I find it's almost always impossible to get the output I'm looking for. The details matter, and in most cases they are the only thing that matters.
But better training can't fix the more general problem that I'm describing. Perfect-looking videos aren't useful if you can't get it to follow your instructions.
But this limits promotion where actors do interviews and sell the movie to the public. It also limits an actor doing something crazy that tanks a movie like a tweet.
But for plenty of other masters - think Cassavetes, Mike Leigh, even PTA - the actor's outstanding talent and instincts bring something to the script and vision that is outside of their prescriptive powers. Their focus is essentially setting up a framework for magic to happen inside of.
I avoid updating to each new Firefox version, because from time to time they break some features important for me.
I use Firefox on my personal device: the website worked fine though took an extra "hiccup" to load compared to Edge (version 131.0 on Windows).
Are you ready to become a penguin in all of your posts to maximise aquatic engagement? I am.
The clothing changes also have pretty rough edges, or just look like they're floating over the original model. The 3D glasses one looked atrocious. The lighting changes are also pretty lacking.
Are you still impressed though?
It's only going to get better, faster, cheaper, easier.[a]
Sooner than anyone could have expected, we'll be able to ask the machines: "Turn this book into a two-hour movie with the likeness of [your favorite actor/actress] in the lead role."
Sooner than anyone could have expected, we'll be able to have immersive VR experiences that are crafted to each person.
Sooner than anyone could have expected, we won't be able to identify deepfakes anymore.
We sure live in interesting times!
---
[a] With apologies to Daft Punk: https://www.youtube.com/watch?v=gAjR4_CbPpQ
I don't think you get "The Green Mile" from something like this.
Local art, local actors, local animations telling stories about local culture. A netflix for every city, even neighborhoods. That's going to be crazy fun.
There is so much that has to be conveyed in making a film, if you want it to say something particular.
And? What's the problem with that? You seem to be locked in a "prompt to get a movie" mindset.
...yet
The limitations of reality seem to have a positive effect on the overall process of film making, for whatever reason. I expect generative AI film will be at least as bad. Gonna be hard to get an entire well-crafted film out of them.
The same AI can still raise the minimum bar for quality. Or replace YouTubers and similar while they're still learning how to be good in the first place.
No idea where we are in this whole process yet, but it's a continuum not a boolean.
Pinnacle is not the word I'd use. Race to the bottom, least possible effort, plausibly deniable quality, gross exploitation, capitalist bottom line - those are all things I'd use to describe current "art" awards like Grammy, Oscars, Cannes, etc.
The media industry is run by exploiting artists for licensing rights. The middle men and publishers add absolutely nothing to the mix. Google or Spotify or platforms arguably add value by surfacing, searching, categorizing, and so on, but not anywhere near the level of revenue capture they rationalize as their due.
When anyone and everyone can produce a film series or set of stories or song or artistic image that matches their inner artistic vision, and they're given the tools to do so without restriction or being beholden to anyone, then we're going to see high quality art and media that couldn't possibly be made in the grotesquely commercial environment we have now. These tools are as raw and rough and bad performing as they ever will be, and are only going to get better.
Shared universes of prompts and storylines and media styles and things that bring generative art and storytelling together to allow coherent social sharing and interactive media will be a thing. Kids in 10 years will be able to click and create their own cartoons and stories. Parents will be able to engage by establishing cultural parameters and maybe sneak in educational, ethical, and moral content designed around what they think is important. Artists are going to be able to produce every form of digital media and tune and tweak their vision using sophisticated tools and processes, and they're not going to be limited by budgets, politics, studio constraints, State Department limitations, wink/nod geopolitical agreements with nation states, and so on.
Art's going to get weird, and censorship will be nigh on impossible. People will create a lot of garbage, a lot of spam, low effort gifs and video memes, but more artists will be empowered than ever before, and I'm here for it.
Any accolades, be that professional groups, people's awards, rotten tomatoes or IMDB ratings.
> Race to the bottom, least possible effort, plausibly deniable quality, gross exploitation, capitalist bottom line - those are all things I'd use to describe current "art" awards like Grammy, Oscars, Cannes, etc.
I find them ridiculous in many ways, but no, one thing they're definitely not is a race to the bottom.
If you want to see what a race to the bottom looks like, The Room has a reputation for being generally terrible, "bad movie nights" are a thing, and Mystery Science Theater 3000's schtick is to poke fun at bad movies.
> The media industry is run by exploiting artists for licensing rights.
Yes
> The middle men and publishers add absolutely nothing to the mix. Google or Spotify or platforms arguably add value by surfacing, searching, categorizing, and so on, but not anywhere near the level of revenue capture they rationalize as their due.
I disagree. I think that every tech since a medium became subject to mass reproduction (different for video and audio, as early films were famously silent) has pushed things from a position close to egalitarianism towards a winner-takes-all. This includes Google: already-popular things become more popular, because Google knows you're more likely to engage with the more popular thing than the less popular thing. This dynamic also means that while anyone will be free to make their own personal vision (although most of us will have all the artistic talent of an inexperienced Tommy Wiseau), almost everyone will still only watch a handful of them.
> Art's going to get weird, and censorship will be nigh on impossible.
Bad news there, I'm afraid. AI you can run on your personal device, is quite capable of being used by the state to drive censorship at the level of your screen or your headphones.
Nearly all the movies that go to theaters aren't "meaningful art". Not only that but what's meaningful to you isn't necessarily what's meaningful to others.
If someone can get their own personal "Godzilla VS The Iron Giant" crossover made into a feature-length film it will be meaningful to them.
No but what they are is expensive, flashy, impressive productions which is the only reason people are comfortable paying upwards of $25 each to see them. And there's no way in the world that an AI movie is going to come anywhere close to the production quality of Godzilla vs Kong.
And like, yeah, their example videos at the posted link are impressive. How many attempts did those take? Are they going to be able to maintain continuity of a character's appearance from one shot to the next to form a coherent visual structure? How long can these shots be before the AI starts tripping over itself and forgetting how arms work?
We will be the old ones going "back in my day, you had to actually shoot movies on a camera! And background objects had perfect continuity!" And they will roll their eyes at us and retort that nobody pays attention to background objects anyway.
But I have faith that people will notice the difference. The current generation may not care about autotune, but that doesn't mean another generation won't. People rediscover differences and decide what matters to them.
When superhero movies were new, almost everyone loved them. I was entranced. After being saturated with them... the audience dropped off. We started being dissatisfied with witty one-liners and meaningless action. Can you still sell a super-hero movie? Sure. Like all action movies, they internationalize well. But the domestic audiences are declining. It makes me think of Westerns. At one time, they were a hollywood staple. Now, not so much. Yes, they still make them, and a good one will do fine, but a mediocre one... maybe not.
The previous generation's care about autotune was also flatly wrong. Autotune was used by a few prominent artists then and is more widely used now as an aesthetic choice, for the sound it creates which is distinctly not natural singing, as the effect was performed by running the autotune plugin at a much, much higher setting than was expected in regular use.
Tone correction occurs in basically every song production now, and you never hear it. Hell, newer tech can perform tone correction on the fly for live performances, and the actual singing being done on the stage can be swapped out on the fly with pre-recorded singing to let the performer rest, or even just lipsync the entire thing but still allow the performer to jump in when they want to and ad-lib or tweak delivery of certain parts of songs.
The autotune controversey was just wrong from end to end. When audio engineers don't want you to hear them correcting vocals, you don't hear it. I'd be willing to buy another engineer being able to hear tone correction in music, but if a layman says they do, sorry but I assume that person's full of shit.
And maybe they won't have a problem with it, like you say, maybe that'll just be their "normal" but that seems so fucking sad to me.
For newer in-copyright works, public libraries commonly offer Libby:
https://company.overdrive.com/2023/01/25/public-libraries-le...
It gives anyone with a participating-system library card free electronic access to books and magazines. And it's unlikely that librarians themselves will be adding AI book-slop to the title selection.
To be clear, I'm not talking great literature. I'm talking Clifford the Big Red Dog type stuff.
That said I still have a number of problems with this assertion:
It will absolutely be down, in part, to economic necessity. Amazon's platform is already dealing with a glut of shitty AI books and the key way they get ahead in rankings is being cheaper than human-created alternatives, and they can be cheaper because having an AI slop something out is way less expensive and time consuming than someone writing/illustrating a kid's book.
Moreover, our economy runs on the notion that the easier something is to do, the more likely people are to do it at scale, and vetting your kids media is hard and annoying as a parent at the best of times: if you come home from working your second job and are ready to collapse, are you going to prepare a nutritious meal for your child and set them up with insightful, interesting media? No you're going to heat up chicken nuggets and put them in front of the iPad. That's not good, but like, what do you expect poor parents to do here? Invent more time in the day so they can better raise their child while they're in the societal fuckbarrel?
And yes, before it goes into that direction, yes this is all down to the choices of these parents, both to have children they don't really have the resources to raise (though recent changes to US law complicates that choice but that's a whole other can of worms) and them not taking the time to do it and all the rest, yes, all of these parents could and arguably should be making better choices. But ALSO, I do not see how it is a positive for our society to let people be fucked over like this constantly. What do we GAIN from this? As far as I can tell, the only people who gain anything from the exhausted-lower-classes-industrial-complex are the same rich assholes who gain from everything else being terrible, and I dunno, maybe they could just take one for the team? Maybe we build a society focused on helping people instead of giving the rich yet another leg up they don't need?
This is what I mean by "complicated factors of culture and habit." An iPad costs more than an assortment of paper books. Frozen chicken nuggets cost more than basic ingredients. But the iPad and nuggets are faster and more convenient. The kids-get-iPad-and-nuggets habit is popular with middle-income American families too, not just poor families where parents work two jobs. The economic explanation is too reductive.
I'm not trying to say that this is the "fault" of parents or of anyone in particular. When the iPad came out I doubt that Apple engineers or executives thought "now parents can spend less time engaging with children" or that parents thought of it as "a way to keep the kids quiet while I browse Pinterest" but here we are.
Is it too much to ask for a hint of caution with regard to our most vulnerable populations brains?
Circa 2020 a huge number of fed-up good-intentioned engineers and designers quit. It had no effect, at all.
To be clear: I am not saying that engineers need to be better at preventing this stuff. I am saying regulators need to demand that companies be careful, and study how this stuff is going to affect people, not just yeet it into the culture and see what happens.
There is a vast difference between a formulaic hollywood movie and some guy with a camera. If I say "Godzilla vs. The Iron Giant" what is the plot? Who is the good guy? Why does the conflict take place?
AI will come up with something. Will it be compelling even to the audience of one?
As a toy, maybe. As an artistic experience... not convinced.
You still aren't getting it. Movie directors aren't making these decisions either.
What they are doing, is listening to market focus groups and checking off boxes based on the data from that.
A market focus group driven decisions for a movie is just as much, if not more so, of an "algorithm" than when a literal computer makes the decision.
Thats not art. Its the same as if a human manually did an algorithm by hand and used that to make a movie.
If it were really all just market decisions, directors would have no influence. This is not remotely the case. Nor are they paid as though that were true.
I assure you, they don't do surveys on the punchiness and strategy used by foley artists; the slope and toe of the film stock chosen for cut scenes by the DP or that those cut scenes should be shot like cut scenes instead of dream sequences; the kind of cars they use; how energetic the explosions are; clothing selection and how the costumes change situationally or throughout the film; indescribably nuanced changes in the actor's delivery; what fonts go on the signs; which props they use in all of the sets and the strategies they use to weather things; what specific locations they shoot at within an area and which direction they point the camera, how the grading might change the mood and imply thematic connections, subtle symbolism used, the specifics of camera movements, focus, and depth of field, and then there's the deeeep world of lighting... All of those things and a million others are contributions from individual artists contributing their own art in one big collaborative project.
You're imagining "pls write film" but the case of being able to film something and then adjust and tweak it, easily change backdrops etc could lead to much higher polish on creations from smaller producers.
Would the green mile be any less hard hitting if the lights flickering were caused by an AI alteration to a scene? If the mouse was created purely by a machine?
The printing press led to publishing works being reachable by more people so we got tons of garbage but we also got those few individual geniuses that previously wouldn't have been able to get their works out.
I see similarities in indie video/PC games recently too. Once the tech got to the point that an individual or small group could create a game, we got tons of absolute garbage but also games like Cave Story and Stardew Valley (both single creators IIRC).
Anything that pushes the bar down on the money and effort needed to make something will result in way more of it being made. It also hopefully makes it possible for those rare geniuses to give us their output without the dilution of having to go through bigger groups first.
I'm also excited from the perspective that this decouples skills in the creative process. There have to be people out there with tremendous story telling and movie making skills who don't have the resources/connections to produce what they're capable of.
To do something similar, this has to allow the director (or whomever is prompting the AI) to control all meaningful choices so that they get more or less the movie the intend. That seems far away from what is demonstrated.
So now a director with a limited budget but with a good vision and understanding of the tools available has a better chance to realize their vision. There will be tons of crap put out by this tool as well. But I think/hope that at least one person uses it well.
But because it will make shooting a movie more accessible to people with limited budgets, the movie studios, who literally gatekeep access to their sets and moviemaking equipment, are going to have a smaller moat. The distribution channels will still need to select good films to show in theaters, TV, and streaming, but the industry will probably be changing in a few years if this development keeps pace.
I'm not against tools for directors, but the thing is, directors tell actors things and get results. Directors hire cinematographers and work with them to get the shots they want. Etc. How does that happen here?
Also, as someone else mentioned, there is the general problem that heavily CG movies tend to look... fake and uncompelling. The real world is somehow just realer than CG. So that also has to be factored into this.
It will start out with more believable green screen backgrounds and b roll. Used judiciously, it will improve immersion and cost <$10 instead of thousands. The actors and normal shots will still be the focus, but the elements that make things more believable will be cheaper to add.
Have you ever noticed that explosions look good? Even in hobby films? At some point it became easy to add a surprisingly good looking explosion in post. The same thing will happen here, but for an increasing amount of stuff.
Which doesn't mean it won't keep happening (economics), but it doesn't necessarily mean any improvement in movie quality.
https://www.vulture.com/article/movies-fire-computer-generat...
But maybe you do get Deadpool & Wolverine 3
Guess where the money is?
But, in reality, even making that kind of film is miles away from these examples.
Well, given the studios still hold the copyright, they can severely constrain supply to keep profits up.
My suspicion is that this kind of stuff gradually reduces some of the labor involved in making films and allows studios to continue padding their margins.
It's a powerful tool. A painting isn't better because the artist made their own paint. A movie made with IRL camera may not be better than one made with an AI camera.
However, I see an interesting middle ground appear: a talented writer could utilize the AI tooling to produce a movie based upon their own works without having to involve Hollywood, both giving more writers a chance to put their works in front of an audience as well as ensuring what's produced more closely matches their material (for better or worse).
The algorithms and people making content for the algorithm were trends that have dominated for years already.
None of that is "real" art, when you are just making something optimized for an algorithm.
Similarily, a film director "just" gives guidance to a bunch of people: actors, camera operator, etc. Do you consider the movie is his creation, even if he didn't directly perform any action? A photographer just has to push a button and the camera captures an image. Is the output still considered his creation? Yes and Yes, so I think we should consider the same with AI assisted art forms. Maybe the real topic is the level of depth and sophistication in the art (just like the difference between your iPhone pictures and a professional photographer's) but in my opinion this is orthogonal to it being human or AI generated.
To be honest so far we have mostly seen AI video demos which were indeed quite uninteresting and shallow, but now filmmakers are busy learning how to harness these tools, so my prediction is that in no time you will see high quality and captivating AI generated films.
(1) https://artefact-ai-film-festival.com/golden-hours-66f869b36... Please consider liking it!
Excellent work!
If yall needed evidence of these tools giving everyday people the ability to make emotion-tugging creations, I'll send you a picture of the tears!
Now I'm thinking I can finally make the (IMO) dope music videos that come to me sometimes when I'm listening to a song I really love.
[1] The first minutes of UP
I liked it.
What happened to the mom? Why does the kid get older and younger looking? why does the city flicker in the beginning? which kid is his in the ballet performance? why do they randomly have "lazy eye"? i could keep going but i think we all get my point.
I can intuit the tropes used by the AI to convey meaning, and i'd be willing to list them all with relevant links for the paltry sum of $50. Be warned, it will be a very large list. Tropes and "memes" are doing 100% of the heavy lifting of this "art".
Sorry, human. As someone who stopped creating art on a daily basis due to market dilution (read: it's too hard to build a fanbase that i care about), i am very critical of most "art" produced anyhow.
this is dogshit.
Not that cyber-bullying and usurpation schemes escalating a whole new level being less of a concern in the aftermaths, to be clear.
Most people will not notice if the soundtrack to a new TV show is made by a 5 word AI prompt of "exciting build-up suspense scene music" while they're playing pouring money into their mobile gacha game to get the "cute girl, anime, {color} {outfit}" prompt picture that is SSS rank.
You or I might not care for AI slop, but it's a lot cheaper to produce for Netflix or Zinga or Spotify or whatever, and if they go this route, they don't have to pay for writers, actors, illustrators, songwriters, or licensing for someone else's product. They'll just put their own AI content on autoplay after what you're currently watching, and hope most people don't care enough to stop it and choose something else.
Ever notice how they never show anyone moving quickly in these clips?
Counterpoint: home "studio" recording has been feasible for decades, but music execs are not ruffled. Sure, you get a Billie Eilish debut album once a generation, but the other 99.99% of charting music is from the old guard. The media/entertainment machine is so much bigger than just creating raw material.
As they say, porn is always the leading spear of technology. It's something to keep an eye on (no pun intended) to understand how society will accept/integrate generated content.
However, commercially it seems like a niche within the existing structures of porn. Mostly competing with the market for animated stuff. At least that's where it is right now, and its already at photorealistic parity with human content creators.
We have not had the ability to create interesting ai porn vids yet. How would we? Meta just showed movie gen.
But i'm pretty sure the short images of moving woman very subtle might gotten the one or other of. Just wait a little bit until you can really create wat you are looking for.
I'm not sure exactly what models the account owners use, but I think it's a mix of Stable diffusion video touched up with adobe tooling.
What scares me most is that in my opinion, by far the best prompt writers are the ones who are deeply "motivated" and "experienced" with prompting. Often the best prompters have only one hand on the keyboard at a time.
I can trivially fine-tune and create more art from certain artists in an hour than they have produced themselves in their whole careers. This makes a lot of people very upset.
This sounds marginally above fanfiction, so I do think it'll be very easy to tell them apart. "Terminator, except with Adam Sandler and set on Mars" is a cute, gimmicky idea, not a competitor to serious work.
I'm sure this can be used to create entertaining movies that are fun and wacky. I don't think it can create impactful movies.
It's no more ridiculous than saying what makes a painting impactful is the brush strokes. But if I copy Picasso's work stroke for stroke, why am I not Picasso? After all, the dumbass paints like a child, admittedly! How could someone like him ever be considered a great painter?
However, merely describing something is not doing the thing. Otherwise, the business analysists at my company would be software engineers. No, I make the software, and they describe it.
The end-goal here is humanless automation, no? Then I'm not sure your assumption holds up. If there's no human, I question the value.
You may question the value but if it’s anything like rugs you won’t be in the majority. People pay a significant premium for artisanal handmade rugs but that being said, more than 95% of the rugs people use are machine made because they’re essentially indistinguishable from a handmade one and are much, much cheaper and just as functional.
I think you can do this with video too, just more challenging right now.
On social media platforms, typically the most popular content triggers the strongest emotions. It's rage-bait however, or sadness bait, or any other kind of emotional manipulation. It tricks the human mind and drives up engagement, but I don't think that is indicative of its value.
To be clear, I'm certain that's not what you're doing, and the music is good. But I think it's complicated enough that triggering emotions isn't enough data to ascertain value.
I don't know, exactly, what combination of measurements are needed to ascertain value. But I'm confident human-ness is part of the equation. I think if people are even aware of the fact a human didn't make something they lose interest. That makes the future of AI in entertainment dicey, and I think that's what fuels the constant dishonesty around AI we're seeing right now. Art is funny in how it works because, I think, intention does matter. And knowledge about the intention matters, too. It maybe doesn't make much sense, but that's how I see it.
Meanwhile, as someone who has been engaged with the AI art community for years, and spent years volunteering part-time as a content moderator for Midjourney, the process of creating art via AI with intentionality is deeply human.
As an MJ mod, I have seeeeeeen things.... It's like browsing though people's psyche. Even in public portfolios people bare their souls because they assume no one will bother to look. People use AI to process the world, their lives, their desires, their trauma. So much of it is straight-up self-directed art therapy. Pages and pages, thousands of images stretching over weeks, sometimes months, of digging into the depths of their selves.
Now go through that process to make something you intend to speak publicly from the depth of your own soul. You don't see much of that day to day because it is difficult. It's risky at a deeply personal level to expose yourself like that.
But, be honest: How much deeply personal art do you see day to day? You see tons of ads and memes. But, to find "real art" you have to explicitly dig for it. Shitposting AI images is as fun and easy as shitposting images from meme generators. So, no surprise you see floods of shitposts everywhere. But, when was the last time you explicitly searched out meaningful AI art?
You bring up a good point - very little. But, to be fair, those people aren't necessarily trying to convince me it's art.
I think you're mostly right but I am a little caught up on the details. I think it's mostly a thing of where the process is so different, and involves no physical strokes or manipulation, that I doubt it. And maybe that's incorrect.
However, I will also see a lot of people who don't know how to do art pretending like they've figured it all out. I also see the problem with that. It wouldn't be such a problem if people didn't take such an overly-confident stance in their abilities. I mean, it's a little offensive for that guy mucking around for an hour to act like he's DiVinci. And maybe he's a minority, I wouldn't know, I don't have that kind of visibility into the space.
I think a lot of the friction comes from that. Shitposts are shitposts, but I mean... we call them shitposts, you know? They, the people that make them, call them shitposts. There's a level of humility there I haven't necessarily seen with "AI Bros".
I think, if you really love art, AI can be a means to create a product but it can also be a starting point to explore the space. Explore styles, explore technique, explore the history. And I think that might be missing in some cases.
For a personal example, I'm really into fashion and style. I love clothes and always have. But it's really been an inspiration to me to create clothes, to sew. I've done hand sewing, many machine stitches too. And I don't need to - I could explore this in a more "high-level" context, and just curate clothing. But I think there's value in learning the smaller actions, including the obsolete ones.
https://www.clairesilver.com/collections
from the POV of fashion illustration. Her "corpo|real" collection took something like 9 months to create and was published nine months ago.
To use a real-world example: if the Renaissance-era patrons had merely written down their preferences and had work made to match those preferences, it's highly unlikely that you'd have gotten the Mona Lisa or David.
Which is to say that, there will definitely be some interesting and compelling art made with AI tools. But it will be made by a specific person with an artistic vision in mind, and not merely an algorithm checking boxes.
At best, you'll get something like a generic sitcom. The idea that "all visual stimulus will be satisfied by hypertuned AI models" doesn't line up with how people experience the arts, at all.
I don't want to carry mechanical solutions labelled culture - deterministic enough, despite hallucinations - into the next generation that follows my own. It's an impressive advancement for automation, sure, but just not a value worth sharing as human development.
That being said, I think GenAI could be a valuable addition in any blueprint-/prototype-/wireframing phase. But, ironically, it positions itself in stark contrast to what I would consider my standards to contemporary brainstorming, considering the current Zeitgeist:
- truthful to history and research (GenAI is marketing and propaganda)
- aware of resources (GenAI is wasteful computing)
- materialistic beyond mere capitalistic gains (GenAI produces short-lived digital data output and isn't really worth anything)
Out of curiosity, what is it that people do with these things? Do they put them on TikTok?
The demos was made by nerds (said with love) with a limited time window. Wait until the creatives get a hold of the tool.
I've been doing this with ChatGPT, except it's more of a "turn into a screenplay" then "create a graphic of each scene" and telling it how I want each character to look. It's works pretty well but results in more of a graphic novel than a movie. I'm definitely been waiting for the video version to be available!
Many creative works these days require the effort and input of so many people, so much time, and so much money that they can't have a specific creative vision. Mediums like book, comics, indie movies, and very low budget indie games, where the the end product was created by the smallest number of people, have the most potential to be interesting and creative. They can take risks. This doesn't mean they will be good, most aren't, but it means that the range of quality is much broader, with some having a chance to shine in ways which big budget projects just can't. The issue with small teams and small budgets is that they are inherently limited in what they can create. Better tools allows smaller groups of people to make things that previously would have required an entire studio but without diluting the creative vision.
Will this also result in a tidal wave of low effort garbage? Of course it will. But that can be ignored.
RIP Pika and ElevenLabs… tho I guess they always can offer convenience and top tier UX. Still, gotta imagine they’re panicking this morning!
Upload an image of yourself and transform it into a personalized video. Movie Gen’s cutting-edge model lets you create personalized videos that preserve human identity and motion.
Given how effective the still images of Trump saving people in floodwater and fixing electrical poles have been despite being identifiable as AI if you look closely (or think…), this is going to be nuts. 16 seconds is more than enough to convince people, I’m guessing the average video watch time is much less than that on social media.Also, YouTube shorts (and whatever Meta’s version is) is about to get even worse, yet also probably more addicting! It would be hard to explain to an alien why we got so unreasonably good at optimal content to keep people scrolling. Imagine an automated YouTube channel running 24/7 A/B experiments for some set of audiences…
If one watches movies, reads books, etc. just to pass the time, maybe this would be some kind of boon. But for those of us looking for meaningful commentary on life, looking to connect with other human beings, this would be some circle of hell. It's some kind of solipsism.
If you disagree with that, you're basically saying La Jetee isn't art, which would be a hard sell.
There are a LOT of choices in making a movie, and if you just let the AI make them, you are getting "random" (uncontrolled) choices. I don't think that is going to compare favorably to the real thing.
If you can specify all that, then it's just a tool. Cool. But it's still going to take pro-level skills to use it.
But would AI be able to quote Vertigo, like La Jetee does? Doesn't art, at least to some degree, require intent (including all intentional subversions of that intent dogma, of course)?
Of course AI can never truly experience being human, it has no emotions, but it is excellent at mimicry and it can certainly provide a meaningful outside perspective.
Is there anything to say about humanity that is not in the training corpus already?
Nothing AI has yet done has demonstrated anything at the level of art or mastery. I guess I'm unconvinced that throwing a million stories into the blender and synthesizing is going to produce a compelling one.
That prediction became true for like 5% of the population, everyone else is probably stupider than they were before, thanks to social media.
Similarly, I think your prediction will apply to a small subset of humanity.
I get that this is kind of a fundamental line in the sand for most of the "AI art" going around, and it seems like most people fall on one side or the other. "I consume art for entertainment" vs "I interact with art to experience the human condition".
I also don't want to say that AI Art has no value, because I think as a tool to help artists realize their vision it can be very useful! I just don't think that art entirely made by AI is interesting.
The problem: In my limited playing of these tools they don't quite make the mark and I would easily be able to tweak something if I had all the layers used. I imagine in the future products could be used to tweak this to match what I think the output should be....
At least the code generation tools are providing source code. Imagine them only giving compiled bytecode.
It so happens that there are innumerable samples of prose and source code and rendered songs and videos and images to use as this training data.
But that's not so much the case for professional workflows (outside of software development).
If the tools can evolve to generating usefully detailed and coherent media projects instead of just perceptually convincing media assets, it's going to be a while before they get there.
As it stands, the only chance you have of depicting a consistent story across a series of shots is image-to-video, presuming you can use LoRAs or similar techniques to get the seed photos consistent in themselves.
Is it available for use now? Nope
When will it be available for use? On FB, IG and WhatsApp in 2025
Will it be open sourced? Maybe
What are they doing before releasing it? Working with filmmakers, improving video quality, reducing inference time
It’s already nearly impossible to find quality content on the internet if you don’t know where to look at.
We will finally achieve the dream of everything being in public domain!
I wonder if it will be the same with AI. When you can have anything for nothing, it has no value. So the digital world will have little meaning.
More likely the average person will happily lap up AI generated slop.
> the technical skills necessary to make movies, draw, or pluck strings
AI will (hopefully) be an accelerator for the people still putting in the hours. At least it is for coding
The act of creating teaches you to be better at creating, in that way and in that context. This is why people with practice and expertise (e.g., professional artists, like screenwriters and musicians) can reliably create new things.
I don't agree. There's some skill, some theory, behind it. But mastering this alone is almost worthless.
There's a huge overlap between creatives and mental illness, particularly bipolar disorder. It seems perfectly mentally stable people lack that edge and insight. To me, that signals there is some magic behind it.
And it's magic because then it must not be rationale and it must not make sense, because the neurotypical can't see it.
I think it's sort of like how you can beat professional poker players with an algorithm that's nonsensical. They're professionals so they're only looking at rationale moves; they don't consider the nonsensical.
That's the biggest edge, commitment.
To think that you _need_ to be neurodivergent to be an artist is non-sensical and stating mastering the craft itself is worthless is indicative of a lack of respect for their work.
I'm baffled by this type of comment here in all honesty. Really, broaden your horizons.
You will notice I never said this.
All I said, and is true, is there is a correlation between being an artist and being neurodivergent.
> stating mastering the craft itself is worthless
Where did I say this too?
It appears you're having an argument with a ghost. You're correct, that argument is baffling! I wonder then why you made it up if you're just gonna get baffled by it? Seems like a waste of time, no?
Look, art is two things: perspective and skill. One without the other is worthless.
I can have near perfect skill and recreate amazing works of art. And I will get nowhere. Or, I can have a unique and profound perspective but no skill, and then nobody will be able to decipher my perspective!
> But mastering this alone is almost worthless.
> And it's magic because then it must not be rationale and it must not make sense, because the neurotypical can't see it.
Not trying to take them out of context, but specifying them. You mention, from my understanding, that mastering is almost worthless without the magic, and the magic only being there if you're neurodivergent.
This implies one cannot be a proper artist if not neurodivergent. Now, I could be misinterpreting it, so I apologize in advance.
> There's a huge overlap between creatives and mental illness
Keyword overlap, but I don't think it's 100%
Magic is maybe not the right word here, but I do think it's indescribable. It's some sort of perspective.
But I stand by this: > that mastering is almost worthless without the magic
How, exactly, you obtain the magic is kind of unknown. But I do think you need it. Because skill alone is just not worth much outside of economics. You can make great corporate art, but you're not gonna be a great artist.
I think if you're perfectly rationally minded, you're going to struggle a lot to find that magic. I shouldn't say it's impossible, but I think it's close to.
I don’t know if neurodivergence might have any overlap, but I wouldn’t be surprise that a study reveals it to be as correlated as the fact that most rich people were born in wealthy families.
Take poaching eggs for example. Let’s say you automate that 100% so as a human you never need to do it again. Well, how good are your omelettes then? It’s a similar activity — keeping eggs at the right temperature and agitation for the right amount of time. Every new thing you learn to do with eggs — poaching, scrambling, omelettes, soft-cooking for ramen — will teach you more about eggs and how to work with them.
So the more you automate your cooking with eggs the worse you get at all egg-related things. The KitchenBot-9000 poaches and scrambles perfect eggs, so why bother? And you lose the knowledge of how to do it, how to tell the 30-second difference between “not enough” and “too much.”
The public can create vast amounts of spectacular original content right now using Dalle, MidJourney, Stable Diffusion - they have very little interest in doing so. Only a tiny fraction of the population has demonstrated that it cares what-so-ever about generative media. It's a passing curiosity for a flicker of an instant for the masses.
The hilariously fantastical premise of: if we just give people massive amounts of time, they'll dedicate their brains to creativity and exploration and live exceptionally fulfilling lives - we already know that's a lie for the masses. That is not what they do at all if you give them enormous amounts of time, they sit around doing nothing much at all (and if you give them enormous amounts of money to go with it, they do really dumb things with it, mostly focused on rampant consumerism). The reason it doesn't work is because all people are not created equal, all people are not the same, all brains are not wired the same, the masses are mimics, they are unable & unwilling to originate as a prime focus (and nothing can change that).
Csound: To make a sine tone, we'll describe the oscillator in a textfile as if it were a musical instrument. You can think of this textfile as a blueprint for a kind of digital orchestra. Later we'll specify how to "play" this orchestra using another text file, called the score.
Most human-created art is rather bad. I used to go to a lot of art openings, and we'd look at some works and ask "will this have been tossed in five years?"
That said, AI is probably a threat to roles in the entertainment industry. But it's also worth noting that much of the creativity was being sucked out of entertainment well before AI arrived.
And of course if you can combine skills with sculpture with graphic design you're getting more specialized and are more likely to make a living - even if the field of graphic design is decimated by AI. That's generally how I feel about my skills as a programmer. I'm not just a programmer. So even if AI does most of the work with coding I can still write code for income as long as it's not the only reason I'm getting paid.
So far AI doesn’t seem very good at the creative element.
We heard it again when electronic music started becoming a thing.
Formula 1 wouldn’t exist if the blacksmiths had their way.
The unknown scares people because they are afraid of their known paradigms being shattered. But the new things ahead are often beyond anything of which we could ever dream.
Be optimistic.
Conversations I have with people in real life almost always come back to this point. Most people find AI stuff novel, but few find it particularly interesting on an artistic level. I only really hear about people being ecstatic about AI online, by people who are, for lack of a better term, really online, and who do not have the skills, know-how, or ability, to make art themselves.
I always find the breathless joy that some people express at this stuff with confusion. To me, the very instant someone mentions "AI generated" I just instantly find it un-interesting artistically. It's not the same as photoshop or using digital art suites. It's AI generated. Insisting on the bare minimum human involvement as a feature is just a non-starter for me if something is presented as art.
I'll wait to see if the utopian vision people have for this stuff comes to fruition. But I have enough years of seeing breathless positivity for some new tech curdle into resignation that it's ended up as ad focused, bland, MBA driven, slop, that I'm not very optimistic.
You can make the guidance as superficial or detailed as you like. Input detailed descriptions, use real images as reference, you can spend a minute or a day on it. If you prompt "cute dog" you should expect generic outputs. If you write half a screen with detailed instructions, you can expect it to be mostly your contribution. It's the old "you're holding it wrong" problem.
BTW, try to input an image in chatGPT or Claude and ask for a description, you will be amazed how detailed it can get.
You want a painting of your dog. You send the painter dozens of photos of your dog. You describe your dog in rapturous, incredible, detail. You receive a painting in response. Did you make that painting? Were you the artist in any normal parlance?
When you use chatGPT or Claude you're signing up to getting/receiving the image generated as a response to your prompt, not creating that image. You're involvement is always lessened.
You might claim you made that image, but then you would be like a company claiming they made the response to their brief, or the dog owner insisting they were the painter, which everyone would consider nonsensical if not plain wrong. Are they collaborators? Maybe. But the degree of collaboration in making the image is very very small.
The symphony conductor just waves her hands reading the score, does she make music? The orchestra makes all the sounds. She just prompts them. Same for movie director.
Exactly, isn't it amazing? You can travel the latent space of human culture in any direction. It's an endless mirror house where you can explore. I find it an inspiring experience, it's like a microscope that allows zooming into anything.
How familiar are you with what is possible and how much human effort goes towards achieving it?
Photography, digital painting, 3D rendering -- these all went through a phase of being panned as "not real art" before they were accepted, but they were all eventually accepted and they all turned out to have their own type of merit. It will be the same for AI tools.
> Photography, digital painting, 3D rendering
You still make these. You sit down and form the art.
When you use AI you don't make anything, you ask someone else to make it, i.e. you've commissioned it. It doesn't really matter if I sit down for a portrait and describe in excruciating detail what I want, I'm still not a painter.
It doesn't even matter, in my eyes, how good or how shit the art is. It can be the best art ever, but the only reason art, as a whole, has value is because of the human aspect.
Picasso famously said he spent his childhood learning how to paint professionally, and then spent the rest of his life learning how to paint like a child. And I think that really encapsulates the meaning of art. It's not so much about the end product, it's about the author's intention to get there. Anybody can paint like a child, very few have the inclination and inspiration to think of that.
You can see this a lot in contemporary art. People say it looks really easy. Sure, it looks easy now, because you've already seen it and didn't come up with it. The coming up with it part is the art, not the thing.
When you use a camera you don't make anything. You press a button and the camera makes it. You haven't even described it.
When you use photoshop you don't make anything. You press buttons and the software just draws the pixels for you. It doesn't make you a painter.
When you use 3D rendering software you don't make anything. You tell the computer about the scene and the computer makes it. You've barely commissioned it.
It's easy to be super reductive. Easy but wrong.
It's the difference between making a house with wood and making a house by telling someone to make a house. One is making a house, one isn't.
The problem with AI is that it's natural language. So there's no skill there, you're describing something, you're commissioning it. When I do photoshop, I'm not describing anything, I'm modifying pixels. When I do 3D modeling, I'm not describing anything, I'm doing modeling.
You can say that those more formal specifications is the same as a description. But it's not. Because then why aren't the business folks programmers? Why aren't the people who come up with the requirements software engineers? Why are YOU the engineer and not them?
Because you made it formally, they just described it. So you're the engineer, they're the business analysts.
Also, as a side note, it's not at all reductive to say people who use AI just describe what they want. That is literally, actually, what they do. There's no more secret sauce than that - that is where the process begins and ends. If that makes it seem really uninspired then that's a clue, not an indicator that my reasoning is broken.
You can get into prompt engineering and whatever, I don't care. You can be a prompt engineer then, but not an artist. To me it seems plainly obvious nobody has any trouble applying this to everyone else, but suddenly when it's AI it's like everyone's prior human experience evaporates and they're saying novel things.
AI makes it easy to generate ten thousand random images. Making something of interest still requires a lot of digging in the tools and in your self.
Not that that isn't a skill in it of itself. I just don't think it's a creationary skill. What you're creating is the description, not the product.
The better you get the closer you can get to your original vision.
> Photography, digital painting, 3D rendering
Those are not the same as AI. Using AI is akin to standing beside a great pianist and whispering into his ear that you want "something sad and slow" and then waiting for him to play your request. You might continue to give him prompts but you're just doing that. In time, you might be called a "collaborator" but your involvement begins at bare minimum and you have to justify that you're more involved --- the pianist doesn't, the pianist is making the music.
You could record the song and do more to the recording, or improv along with your own instrument. But just taking the raw output again and again is simply getting a response to your prompt again and again.
The prompt themselves are actually more artistic as they venture into surrealist poetry and prose, but the images are almost always much less interesting artistically than the prompts would suggest.
Ok, now I know you're watching through hate goggles. Fortunately, not everyone will bring those to the party.
> Using AI is akin... [goes on to describe a clueless iterative prompting process that wouldn't get within a mile of the front page]
You've really outed yourself here. If you think it's all just iterative prompting, you are about 3 years behind the tools and workflows that allow the level of quality and consistency you see in the best AI work.
> https://www.reddit.com/r/greentext/comments/zq91wm/anons_dis...
Which is exactly the opposite of what the artists claim to want. But god is it hilarious following the anti-AI artists on Twitter who end up having to apologize for liking an AI-generated artwork pretty much as a daily occurrence. I just grab my popcorn and enjoy the show.
Every passing day the technologies making all of this possible get a little bit better and every single day continues to be the worst it will ever be. They'll point to today's imperfections or flaws as evidence of something being AI-generated and those imperfections will be trained out with fine tuning or LoRA models until there is no longer any way to tell.
E: A lot of them also don't realize that besides text-to-image there is image-to-image for more control over composition as well as ControlNet for controlling poses. More LoRA models than you can imagine for controlling the style. Their imagination is limited to strictly text-to-image prompts with no human input afterwards.
AI is a tool not much different than Photoshop was back when "digital artists aren't real artists" was the argument. And in case anyone has forgotten: "You can't Ctrl+Z real art".
Ask any fractal artists the names they were called for "adjusting a few settings" in Apophysis.
E2:
We need more tests such as this. The vast majority of people can't identify AI nearly as well as they think they can identify AI - even people familiar with AI who "know what to look for".
https://www.tidio.com/blog/ai-test/
Artworks (3/4) | Photos (6/7) | Texts (3/4) | Memes (2/2)
Fun excerpt by the way:
> Respondents who felt confident about their answers had worse results than those who weren’t so sure
> Survey respondents who believed they answered most questions correctly had worse results than those with doubts. Over 78% of respondents who thought their score is very likely to be high got less than half of the answers right. In comparison, those who were most pessimistic did significantly better, with the majority of them scoring above the average.
Well put. This is also my experience. And I'm no AI doom-monger or neo-Luddite.
Yes, I've noticed this. The people who are excited about it usually come off as opportunistic (hence the "breathless joy"), and not really interested in letting whatever art/craft they want to make deeply change them. They just want the recognition of being able to make the thing without the formative work. (I hesitate to point this out, anticipating allegations of elitism.)
Plus, really online people tend to dominate online discussions, giving the impression that the public will be happy to consume only AI generated things. Then again, the public is happy to consume social media engagement crap, so I'm very curious what the revealed preference is here.
The value in learning this stuff is that it changes you. I'll be forever indebted to my guitar teacher partially because he teaches me to do the work, and that evidence of doing the work is manifest readily, and to play the long, long game.
For video, it's possible AI can feed into the overall creative pipeline, but I don't see it replacing the human touch. If anything, it opens up the industry to less-technical people who can spend more time focusing on the human touch. Even if the next big film has AI generation in it, if it came from someone with a fascinating story and creative insight, I'll still likely appreciate it.
But I don't think human creativity is going anywhere. Unless there is some breakthrough that moves it far beyond anything we've seen so far, AI will always be trailing behind us. Human creativity might become a more boutique product, like heirloom tomatoes, but there will always be people who value it.
Any AI content that's good, and there are a few of them, actually has plenty of human creativity in it.
There are some AI artist that begin to emerge or there are some AI generated personas out there who are interesting but they are interesting only because the people behind it made it interesting.
I am not fatalistic at all for the creatives. AI is going to wipe out the producers and integrators(people that specialize in putting things together, like coders who code when tasked, painters who paint when commissioned, musicians that play once provided with the score), not the creatives.
The GOTCHA, IMHO, will be people not developing skills because the machine can do it but I guess maybe they will the skills that make the machine sing.
I have little faith in an optimistic view of human nature where we voluntarily turn more toward more intellectual or worthy pursuits.
On one hand, entertainment has often been the seed that drives us to make the imagined real, but the adjacent possible of rewarding adventure/discovery/invention only seems to get more unaffordable and out of reach. Intellectual revolutions are like gold rushes. They require discovery, that initial nugget in a stream, the novel idea that opens a door to new opportunities that draws in the prospectors. Without fresh opportunity, there is no enthusiasm and we stew in our juices.
I suspect the only thing that might save us from total solipsistic brain-in-vat immersion in entertainment... is something like glp-1 type antagonists. If they can help us resist a plate of Danish maybe they can protect us from barrages of Infinite Jest brain missiles from Netflix about incestuous cat wizards or whatever. Who knows what alternatives this new permanently medicated society, Pharma-Sapiens, might pursue instead though.
Reading these threads sometimes feels like a bad idea, because you just get new sad ideas on how things will almost certainly be used to make it worse than just the ones you can come up on your own.
I think the musicians that are barely hanging on at this point would prefer to create over having to slog around on tours to pay their health insurance. But nobody is paying for creation.
It's a compelling thought - we all like hope - and I think it might be realistic if all of humanity were made up of the same kind of people who read hacker news.
But is this not what the early adopters of the internet thought? I wasn't there - this is all second hand - but as far as I know people felt that, once everyone gained the ability to learn anything and talk to anyone, anywhere, humanity would be more knowledgeable, more thoughtful, and more compassionate. Once everyone could effortlessly access information, ignorance would be eliminated.
After all, that's what it was like for the early adopters.
But it wasn't so in practice.
I worry that hopeful visions of the future have an aspect of projecting ourselves onto humanity.
Seems more likely we'll just plug ourselves into ever more addicting dopamine machines. That's certainly the trend so far anyway.
AI generated works will find a place beside human generated works.
It may even improve the market for 'artsy' films and great acting by highlighting the difference a little human talent can make.
It's not the art that's at risk, it's the grunt work. What will shift is the volume of human-created drek that employed millions to AI-created drek that employs tens.
My limited understanding is that AI could generate Netflix top 10 hits that mostly recycle familiar jokes. The creators made a great product, but i expect anyone who attended film school would rather try something new, only issue is Netflix wont foot the bill (i know, they take a few oscar swings a year now).
Recent examples: TV Glow, Challengers, Strange Darling. All movies with specific, unique perspectives, visuals, acting choices, scripts, shots, etc. Think about the perspective in The Wire, The Sopranos, Curb Your Enthusiasm. There is plenty of great work that obviously is nearly impossible to reproduce by an AI and i hope that AI "art" is taxed in a way that funds human projects.
I follow a lot of the new AI gen crowd on Twitter. This community is made up of a lot of creative industry people. One guy who worked in commercials shared a recent job he was on for a name brand. They had a soundstage, actors, sound people, makeup, lighting, etc. setup for 3 days for the shoot. Something like 25 people working for 3 days. But behind that was about 3 months of effort if one includes pre-production and post-production. Think about editing, color correction, sound editing, music, etc.
Your creative children may live in a world where they can achieve a similar result themselves. Perhaps as a small team, one person working on characters, one person doing audio, one person writing a script. Instead of needing tens of thousands of dollars of rented equipment and 25 experts, they will be able to take ideas from their own head and realize them with persistence and AI generation.
I honestly believe these new tools will unlock potential beyond what we can currently imagine.
Curious if anybody has a solution or if this works for that
- Every script in Hollywood will now be submitted with a previs movie.
- Manga to anime converters.
- Online commercials for far more products.
Manga to anime already exists.
Commercials, particularly for social/online, already happening as well.
I can see myself paying a little too much to have a local setup for this.
It's going to be interesting to see how that plays out when you can make just about any kind of media you wish. (Especially when you can mix this as a form of 'embodiment' to realize relationships with virtual agents operated by LLMs.)
Yes, it is. You should try it.
Welcome to making music lol. Since there is so much of it, you have to make the absolute best to even be considered. And then, because so many people make the absolute best, people only care about the persona making the music (as great as you are, you aren’t Taylor Swift, Kendrick Lamar, Damon Albarn). Your friends will never care about your music just because you are friends, don’t fall into that trap. Also nobody cares about music without good lyrics, because again, there is just so much instrumental content out there that sounds the same, lyrics differentiate it with a human, emotional element.
Just make stuff for fun. Listen to it every now and then and feel the magic of “hehe I made that”
Well, that's an exaggeration if I've ever seen one. Firstly, so much of current chart music has atrocious lyrics. And secondly, instrumental music is very popular.
Not to sound too crass, but a parallel could be drawn to smelling one's own farts and wondering why no one else appreciates the smell.
With all due respect, how could there be when at the click of a button you can generate entire songs? You didn't come up with the chord progression, the structure, the melodic motifs, or the lyrics.
My attachment to my works is directly proportional to the amount of effort it took to create them.
It's not the craft that drives attachment in this case but the emotional resonance of something that you think should exist finally existing.
Author's attachment is to a large degree based on the false notion that they somehow contributed to the creation process.
The generic, frigid, un-interesting "product" that is produced by said AI is why no one other than the prompter is moved by the result.
My point wasn’t to debate the merit of generated music, it was simply to highlight the effect I described.
Production requires specifying very precise requirements, which the current gen AI is unable to follow. Even at the most fuzzy production level like "a song with strings and a choir", Suno will generate something completely irrelevant. And if you will try to go deeper -- use a classic Moog synth line in the chorus -- don't expect to generate something meaningful.
I won't argue that in the most broad sense, prompt engineering is a creative process. Picking which shoes to wear to work is also a creative process. My argument is that this has barely anything to do with the process of music composition or production. You can literally reuse the same prompt to generate an image or a poem.
Both Suno and Udio allow paid subscribers to upload their own clips to extend from. It works for setting up a beat or extending a full composition from a DAW.
Suno's is more basic than Udio's which allows in painting and can create intros as well as extensions, but the tools are becoming more and more powerful for existing musicians. With Udio you can remix the uploaded clip so you can create the cord progression and melody using one set of instruments or styles (or hum it) and transform it into another.
I also use this feature all the time to move compositions from one service to the other. Suno is better at generating intros and interesting melodies while Udio is better at the editing afterwards.
It’s not a sense of pride or accomplishment. I don’t know what it is. Maybe a small amount of pride. It’s hard to say. But there is a definite connection that feels different listening to songs i requested vs those that other people have.
“I want a funny road trip movie staring Jim Carey and Chris Farley, based in Europe, in the fall, where they have to rescue their mom played by Lucille ball from making the mistake of marrying a character played by an older Steve Martin.”
10 minutes later your movie is generated.
If you like it, you save it, share it, etc.
You have a queue of movies shared by your friends that they liked.
Content will be endless and generated.
Sub one percent of people are going to be willing to put in the hours to do it.
The bulk of the spammed created content will be: the masses very briefly playing with the generative capabilities, producing low quality garbage that after five minutes nobody is interested in and then the masses will move on to the next thing to occupy a moment of their time. See: generative image media today. So few people care about the crazy image creation abilities of MidJourney or Flux, that you'd think it didn't exist at all (other than the occasional related headline about deepfakes and or politics).
The extreme majority will all watch the same things just as they do today. High quality AI content will be difficult to produce and will be nearly as limited in the future as any type of high quality content is today. The masses will stick to the limited, high quality media and disregard that piles of garbage. Celebrity will also remain a pull for content, nothing about that will ever change (and celebrity will remain scarce, which will assist in limiting what the masses are interested in).
By and large people only want to go where other people are at. Nothing about AI will change that, it's a trait that is core to humanity. The way that applies to content is just the same as it does a restaurant: content is a mental (and sometimes physical) destination experience just as a restaurant or vacation trip is.
Though a reason we would gravitate towards common media more is if what someone brought up in the comments here comes to pass, and celebrities/actors license their likeness to studios only, and amateur tools are not licensed to use them. Though I think there will always be crafty/illegal ways around this. Also, likeness probably won't be worth much, if we can generate any type of character we like anyway. I, for one, couldn't be happier for celebrities and the cultural obsession around them to disappear.
Plebs will get the mass produced stuff, just like it has been for junk food.
In the information case, even if you wanted to sell good quality, verifiable content, how are you going to keep up with the verification costs, or pay people when someone can just dupe your content and automate its variations?
People who are poor dont have the luxury of time, and verifications cannot be automated.
Most people dont work in infosec or Trust and safety, so this discussion wont go anywhere, but please just know - we dont have the human bandwidth to handle these outcomes.
Bad actors are more prolific and effective than good, because they dont have to give a shit about your rules or assumptions.
I do hope that more talented people will have more leverage to create without the traditional gatekeeping, but I also doubt this will happen as the gatekeepers are all funding AI tooling as well.
I'd rather have those people work on climate change solutions
The argument should never be about reducing energy usage, rather it should be about how we generate that energy in a clean, renewable way.
Better to spend 10x amount of energy on humans that will give the same result?
At least microsoft and google are on a co2 neutral race.
And all of these clusters doing something can also do research and partially do.
Its valid critisism, but we need to stop co2 production on a lot of other industries before we do that for datacenters. Datacenters save a lot more co2 (just think about not having to drive to a bank to do bank business).
My startup develops AI for the nuclear power industry to drive process, documentation, and regulatory efficiency. We like to say "AI needs nuclear and nuclear needs AI".
Big tech has finally realized/gone public that casually saying things like "we're building our next 1GW datacenter" is uhh, problematic[0].
For some time now there has been significant interest/activity in wiring up entire datacenters to nuclear reactors (existing Gen 2, SMRs, etc):
https://finance.yahoo.com/news/nvidia-huang-says-nuclear-pow...
https://www.ans.org/news/article-5842/amazon-buys-nuclearpow...
https://www.yahoo.com/news/microsoft-signs-groundbreaking-en...
https://www.cnbc.com/2024/09/10/oracle-is-designing-a-data-c...
https://thehill.com/policy/technology/4913714-google-ceo-eye...
[0] - https://www.npr.org/2024/07/12/g-s1-9545/ai-brings-soaring-e...
My mind instantly assumes it a money thing and they're just wanting to charge millions for it, therefore out of reach for the general public. But then with Meta's whole stance on open ai models, that doesn't seem to ring true.
runwayml.com
pika.art
hailuoai.com
lumalabs.ai
This and Sora are particularly annoying, though, for how they put together these huge flashy showcases like they're announcing some kind of product launch and then... nothing. Apparently there's value in just flexing your AI-making muscle now and then.
I pay for runway right now for experiments and it works. The problem is that maybe 1 out of 10 prompts result in something useable. And when I say useable I have pretty low standards. Since the model pumps out 5 or 10 second clips you have to be pretty creative since the models still struggle with keeping any kind of consistency between shots. Things like lighting, locations, characters can all morph within/between cips.
The issue isn't quality exactly, it is like 80% there. When it works, it is capable of blowing your mind. You can get something that looks like it is a bonafide Hollywood shot. But that is a single 5 second or 10 second clip. So far there is no easy way to reliably piece those together to make even a 1 minute long TikTok.
The real problem is the cost. Since you have to sometimes do 10 prompts to get a single acceptable shot it is like a 10x multiplier on the cost per second of video. That can get very expensive for even short experiments.
that's probably where the quality is, but not the billions
We offer both image-to-video (same situation as Runway, need a few attempts to make something awesome) and video-to-video (under the name "Restyle 2.0") - this is our newest tool and is highly reliable, i.e. you can get complex motion (kissing, handshakes, boxing, skateboarding, etc) with controllable changes to input video (changing outfits, characters, backgrounds, styles).
Unlike Runway and Kling, we currently offer a smiple UNLIMITED plan for just $10/mo. Check it out! https://alpha.nim.video
They do exist, Luma AI DreamMachine is pretty cool. As well as Kling, Minimax, etc. But they aren't anything like Sora or this appear to be. They work but these, while likely cherry-picked, are still a whole new breed of video generation. But who knows if we'll ever actually get to use them or if we're just supposed to reflect on them and think about how cool and impressive Facebook and OpenAI are.
- How many clicks of Generate are budgeted for?
- How many clicks should each user’s quota be?
- How much advertising revenue will be earned per click?
- Why should they give away a million dollars?
Right now, AI costs for this are so high that offering this feature ‘for free’ would bankrupt a small country in a matter of days, if everyone on Meta used it once. It doesn’t particularly matter what the exact cost is: it’s simply not tolerable to anyone who owes payment for the services provided.
This is also why the AI industry is trying to figure out how to shift as much AI processing as possible to devices without letting users copy their models to profit off of the training research spend.
Facebook absolutely does not have a fleet of GPUs idling that could suddenly spring into action to generate a billion of these videos, nor do they have power stations on standby ready to handle the electricity load.
Those videos are a good measure for monitoring AI video improvement.
I made this with it (after training a Flux Lora on myself)
https://vm.tiktok.com/ZGdJ6uSh1/
Also interesting - blog post from someone who actually got to use Sora https://www.fxguide.com/fxfeatured/actually-using-sora/
TLDR; it’s still quite frustrating to use
Yeah, we might get the bad killer robots. But it's more likely this will make it unnecessary to wonder where on this blue planet you can still live when we power the deserts with solar and go to space. Getting clean nutrition and environment will be within reach. I think that's great.
As with all technology: Yes a car is faster than you. And you can buy or rent one. But it's still great to be healthy and able to jog. So keep your brains folks and get some skills :)
The model is not released and probably won’t be for a while.
And it probably costs Meta-scale infra to fine-tune to your needs.
#cabincrew
#scarletjohanson
#amen
But I'm worried about this tech being used for propaganda and dis information.
Someone with a 1K computer and enough effort can generate a video that looks real enough. Add some effects to make it look like it was captured by a CCTV or another low res camera.
This is what we know about, who knows what's behind NDAs or security clearances.
These are smooth, consistent, no landslide (except sloth floating in water, the stones on right are moving at much higher rate than the dock coming closer), no things appearing out of nowhere. Editing seems not as high quality (the candle to bubble example).
To me, these didn't induce nausea while being very high quality makes it best among current video generators.
I am grossed out by this. my instinct is to avoid ai slop. The interesting part to me is: What next? Where do we go? Will it be that "human" forums are pushed further into obscurity of the internet? Or will go so far as that we all start preferring meeting in person? Im clueless here
I was rather thinking classical cryptography baked into generative networks.
Maybe in the future?
I'm a worldcoiner but so far it's just been free money.
For a long time people have speculated about The Singularity. What happens when AI is used to improve AI in a virtuous circle of productivity? Well, that day has come. To generate videos from text you need video+text pairs to train on. They get that text from more AI. They trained a special Llama3 model that knows how to write detailed captions from images/video and used it to consistently annotate their database of approx 100M videos and 1B images. This is only one of many ways in which they deployed AI to help them train this new AI.
They do a lot of pre-filtering on the videos to ensure training on high quality inputs only. This is a big recent trend in model training: scaling up data works but you can do even better by training on less data after dumping the noise. Things they filter out: portrait videos (landscape videos tend to be higher quality, presumably because it gets rid of most low effort phone cam vids), videos without motion, videos with too much jittery motion, videos with bars, videos with too much text, video with special motion effects like slideshows, perceptual duplicates etc. Then they work out the "concepts" in the videos and re-balance the training set to ensure there are no dominant concepts.
You can control the camera because they trained a dedicated camera motion classifier and ran that over all the inputs, the outputs are then added to the text captions.
The text embeddings they mix in are actually a concatenation of several models. There's MetaCLIP providing the usual understanding of what's in the request, but they also mix in a model trained on character-level text so you can request specific spellings of words too.
The AI sheen mentioned in other comments mostly isn't to do with it being AI but rather because they fine-tune the model on videos selected for being "cinematic" or "aesthetic" in some way. It looks how they want it to look. For instance they select for natural lighting, absence of too many small objects (clutter), vivid colors, interesting motion and absence of overlay text. What remains of the sheen is probable due to the AI upsampling they do, which lets them render videos at a smaller scale followed by a regular bilinear upsample + a "computer, enhance!" step.
They just casually toss in some GPU cluster management improvements along the way for training.
Because the MovieGen was trained on Llama3 generated captions, it's expecting much more detailed and high effort captions than users normally provide. To bridge the gap they use a modified Llama3 to rewrite people's prompts to become higher detail and more consistent with the training set. They dedicated a few paragraphs to this step, but it nonetheless involves a ton of effort with distillation for efficiency, human evals to ensure rewrite quality etc.
I can't even begin to imagine how big of a project this must have been.
> Upload an image of yourself and transform it
> into a personalized video. Movie Gen’s
> cutting-edge model lets you create personalized
> videos that preserve human identity and motion.
A stalker’s dream! I’m sure my ex is going to love all the videos I’m going to make of her!Jokes aside, it’s a little bizarre to me that they treat identity preservation as a feature while competitors treat that as a bug, explicitly trying not to preserve identity of generated content to minimize deepfake reputation risk.
Any woman could have flagged this as an issue before this hit the public.
Both unhealthy
As someone who has worked on payments infrastructure before, it's probably nice if your first thought is what great things an aunt can buy for her niece, but you're better off asking what bad actors can do with your software, or you're in for a bad surprise.
I would expect nothing less of Zuck than to imbue a culture of “tech superiority at all costs” and only focus on the responsible aspect when it can be a sales element.
Step 1. Train AI on pornographic videos
Step 2. Feed AI images of your ex
Step 3. Profit
why be extra weird and use a personal reference?
* product made without use of AI or any unnatural components. pure mountain iron
https://ai.meta.com/blog/movie-gen-media-foundation-models-g...
What I hope (since I am building a story telling front-end for AI generated video) is that they consider b2c and selling this as a bulk service over an api.
I’m here looking at users and wondering - the content pipelines are broader, but the exit points of attention and human brains are constant. How the heck are you supposed to know if your content is valid?
During a recent apple event, someone on YT had an AI generated video of Tim Cook announcing a crypto collaboration; it had a 100k users before it was taken down.
Right now, all the videos of rockets falling on Israel can be faked. Heck, the responses on the communities are already populated by swathes of bots.
It’s simply cheaper to create content and overwhelm society level filters we inherited from an era of more expensive content creation.
Before anyone throws the sink at me for being a Luddite or raining on the parade - I’m coming from the side where you deal with the humans who consume content, and then decide to target your user base.
Yes, the vast majority of this is going to be used to create lovely cat memes and other great stuff.
At the same time, it takes just 1 post to act as a lightning rod and blow up things.
Edit:
From where I sit, there are 3 levels of issues.
1) Day to day arguments - this is organic normal human stuff
2) Bad actors - this is spammers, hate groups, hackers.
3) REALLY Bad actors - this is nation states conducting information warfare. This is countries seeding African user bases with faked stories, then using that as a basis for global interventions.
This is fake videos of war crimes, which incense their base and overshadow the harder won evidence of actual war crimes.
This doesn’t seem real, but political forces are about perception, not science and evidence.
Maybe get a job where interviewers are biased against my actual look and pedigree
Just ignore everyone else’s use of the tool
That's precisely the hard part!
https://www.goodreads.com/book/show/203092.The_Information_B...
> After the era of the atomic bomb, Virilio posits an era of genetic and information bombs which replace the apocalyptic bang of nuclear death with the whimper of a subliminally reinforced eugenics. We are entering the age of euthanasia.
Not attempting to justify their actions or the outcomes, just that media itself is and has been long known to be a powerful weapon, like the fabled story of a city besieged by a greater army, who opened their gates to the invaders knowing that the invaders were lead by a brilliant strategist.
The invader strategist, seeing the gates open, deduced that there must be a giant army laying in wait and that the gates being open were a trap, and so they turned and left.
Had they entered they would have won easily, but the medium of communication, an open gate before an advancing horde, was enough in and of itself to turn the tide of a pitched battle.
When we reach the point where we can never believe what we see or hear or think on our own, how will we ever fight?
Again, plain common sense just works, most of the times.
We've basically been living in a privileged and brief time in human history for the last 100-200 years, where you could mostly trust your eyes and years to learn about events that you didn't directly witness. This didn't exist before photography and phonograms: if you didn't witness an event personally, you could only rely on trust in other human beings that told you about it to know of it actually happened. The same will soon start to be true again, if it isn't already: a million videos from random anonymous strangers showing something happening will mean nothing, just like a million comments describing it mean nothing today.
This is not a brave new world of post-truth such as the world has never seen before. It is going back to basically the world we had before photo, video, and sound recordings.
I think I would not like to live in a world in which democracy isn’t the predominant form of government. The ability of the typical person to understand and form their own opinions about the world is quite important to democracy, and journalism does help with that. But I guess the modern version of image and video heavy journalism wasn’t the only thing we had the whole time; even as recent as the 90’s (I’m pretty sure; I was just a kid), newspapers were a major source. And somehow America was invented before photojournalism, but of course that form of democracy would be hard for us to recognize nowadays…
It is only when we got these portable video screens that stuff like YouTube and TikTok became really important news sources (for better or worse; worse I would say). And anyway, people already manage to take misleading or out of context videos, so it isn’t like the situation is very good.
Maybe AI video will be a blessing in disguise. At some point we’ll have to give up on believe something just because we saw it. I guess we’ll have to rely on people attesting to information, that sort of thing. With modern cryptography I guess we could do that fairly well.
Edit: Another way of looking at it: basically no modern journalist or politician has a reputation better than an inanimate object, a photos or video. That’s a really bizarre situation! We’re used to consulting people on hard decisions, right? Not figuring out everything by direct observation.
So we're not going all the way back, but the era of believing strangers because they have photographic or video proof is drawing to a close.
Sure, Tim Cook can sign a video so I know he is the one who published it - though watching it on https://apple.com does more or less the same thing. But if the video is showing some rockets hitting an air base, the cryptography doesn't do anything to tell you if these were real rockets or its an AI-generated video. It's your trust in Tim Cook (or lack thereof) that determines if you believe the video or not.
Practically speaking, no one is going yo check provenance when scrolling through Reddit sitting on the pot.
True endstage adtech will require attention modeling of individuals so that you can predict target response before presenting optimized material.
It's not just a step back, it's a step into black. Each person has to maintain an encrypted web of trust and hope nobody in their trust ring is compromised. Once they are, it's not clear even in person conversations aren't contaminated.
Just like the ability to emulate the writing style of your trusted humans was (somewhat) commonplace in the time in which you'd only talk to distant friends over letters.
> Once they are, it's not clear even in person conversations aren't contaminated.
How exactly could any current or even somewhat close technology alter my perception of what someone I'm talking to in-person is saying?
Otherwise, the points about targeting are fair - PR/propaganda has already advanced considerably compared to even 50 years ago, and more personalized propaganda will be a considerable problem, regardless of medium.
The rate of production is the incomparable, no matter what the parallels may seem.
read simulacra and simulation: https://0ducks.wordpress.com/wp-content/uploads/2014/12/simu...
or this essay from pre-war germany https://en.wikipedia.org/wiki/The_Work_of_Art_in_the_Age_of_...
I feel that it’s not appreciated, that we are (were) part of an information ecosystem / market, and this looks like the dawn of industrial scale information pollution. Like firms just dumping fertilizer into the waterways with no care to the downstream impacts, just a concern for the bottom line.
How would you know that the British burned down the white house in 1812? Anyone could fake a paper document saying it so. (Except many people were illiterate.)
As far as I can see you need institutions you can trust.
Maybe I’m not well informed but there seem to be no example for the issues you describe with photos.
I believe it’s actually worse than you think. People believe in narratives, in stories, in ideas. These spread.
It has been like this forever. Text, pictures, videos are merely ways to proliferate narratives. We dismiss even clear evidence if it doesn’t fit our beliefs and we actively look for proof for what we think is the truth.
If you want to "fight" back you need to start on the narrative level, not on the artifact level.
It will not make you creative. It will not give you taste or talent. It is a technical tool that will mostly be used to produce cheap garbage unless you develop the skills to use it as a part of your creative toolkit -- which should also include many, many other things.
Impressive on the relative quality of the output. And of the productivity gains, sure.
But meh on the substance of it. It may be a dream for (financial) producers. For the direct customers as well (advertisement obviously, again). But for creators themselves (who are to be their own producers at some point, for some)?
On the maker side, art/work you don't sweat upon has little interest and emotional appeal. You shape it about as much as it shapes you.
On the viewer side, art that's not directed and produced by a human has little interest, connection and appeal as well. You can't be moved by something that's been produced by someone or something you can't relate to. Especially not a machine. It may have some accidental aesthetic interest, much like generative art had in the past. But uninhabited by someone's intent, it's just void of anything.
I know it's not the mainstream opinion, but Generative AI every day sounds more and more like cryptocurrencies and NFTs and these kinds of technologies that did not find _yet_ their defining problem to which they could be a solution.
[1] https://www.dw.com/en/whatsapp-in-india-scourge-of-violence-...
Though maybe there's hope if..
1. All deepfake image & video tech are enforced to add watermark labels & all websites that publish are force to label fake too.
2. Crazy idea but a govt issued Internet ID (ID.me is closest to that now with having to use to file taxes with the IRS) where your personal repuation and credit score are affected by publishing fake/scam/spam crap on the Internet ..affectively helping to destroy it. I want good actors on the web not ones that are out for a buck and in turn destroying it.
I can accept the premise that TikTok is trying to do this. Do we have any objective measurement on how effective it has been?
I’m not suggesting TikTok themselves is trying to do this, but it (and twitter, instagram, facebook, etc etc) is shaping people’s world views.
My default perspective is that because humans are so adaptable, every technology shapes our world views. TikTok and Instagram impact us, but so does the plow and shovel. We have research that shows IG harming self-image in some segments of teen girls; what I have not seen evaluated much is how Youtube DIY videos bring self-esteem through teaching people skills on how to make things. These platforms also connect people - my wife had a very serious but rare complication in pregnancy, and her mental health was massively improved by being able to connect with a group of women who had been through/were going through something similar.
My overall point is that it's not very interesting to me to say that technology shapes our world views. Which views? In which way, to what extent? Is it universal, or a subpopulation? Are there prior indications, or does it incept these views? Which views? How much good or harm? How do we balance that?
But what we are left with is a very small view through the keyhole of a door into a massive room that is illuminated with a flickering flashlight. We then glom onto whatever evidence supports our biases and preconceptions, ignoring that which is unstated, unpopular, or violates our sense of the world.
Like cool a movie doesn’t need to cost $200 million or whatever.
Imagine if those creative types were freed up to do something different. What would we see? Better architecture and factories? Maybe better hospitals?
At the level of image/video synthesis: Some leading companies have suggested they put watermarks in the content they create. Nice thought, but open source will always be an option, and people will always be able to build un-watermarked tools.
At the level of law: You could attempt to pass a law banning image/video generation entirely, or those without watermarks, but same issue as before– you can't stop someone from building this tech in their garage with open-source software.
At the level of social media platforms: If you know how GANs work, you already know this isn't possible. Half of image generation AI is an AI image detector itself. The detectors will always be just about as good as the generators- that's how the generators are able to improve themselves. It is, I will not mince words, IMPOSSIBLE to build an AI detector that works longterm. Because as soon as you have a great AI content classifier, it's used to make a better generator that outsmarts the classifier.
So... smash the looms..?
I think cryptographic signing and the classic web of trust approaches are going to prove the most valuable tools in doing so, even if they're definitely not a panacea.
Downside is that large original video assets would need to be published, for such verification to work.
Are you concerned about predicting the direction or "real" state of your national economy? Videos aren't going to give you that. Largely, you can't know. Heavily curated statistical reports compiled and published by national agencies can only give you a clear view in retrospect. Are you concerned that a hurricane might be heading your way and you need to leave? Don't listen to videos on social media. Listen to your local weather authority. Are you concerned about whether X candidate for some national office really said a thing? Why? Are any of these people's characters or policy positions really that unclear that the reality or unreality of two seconds worth of words coming out of their mouths are going to sway your overall opinion one way or another?
Things you should actually care about:
- How are you family and friends doing? Ask them directly. If you can't trust the information you get back, you didn't trust them to begin with.
- How should you live your life? Stick with the classics here, man. Some combination of Aristotle, Ben Graham, and the basic AHA guidelines on diet and exercise will get you 95% of the way there.
- How do you fix or clean or operate some equipment or item X that you own? Get that information from the manufacturer.
Things you shouldn't care about:
- Is the IDF or Hamas committing more atrocities?
- Does Kamala Harris really support sex changes for convicted felons serving prison sentences funded by public money?
- Can Koalas actually surf?
Accept at some point that you can't know everything at all times and that's fine. You can know the things that matter. Get information from sources you actually trust, as in individual people or specific organizations you know and trust, not anonymous creators of text on Reddit. If you happen to be a national strategic decision maker that actually needs to know current world events, you're in luck. You have spy agencies and militaries that fully control the entire chain of custody from data collection to report compilation. If they're using AI to show you lies, you've got bigger problems anyway.
We already implicitly do this: if a news outlet we trust publishes a photo and does not state that they are unsure of its veracity we assume that it is an authentic photo. Using cryptographic signing that news outlet could explicitly state that they have determined the photo to be real. They could add any type of signed statement to any bit of information, really. Even signing something as being fake could be done, with the resulting signed information being shareable (although one would imagine that any unsigned information would be extremely suspect anyway).
The web of trust approach is to have a distributed system of trust that allows for less institutional parties to be able to earn trust and provide 'trusted' information, but there are also plenty downsides to it. A similar distributed system that determines trustworthiness in a more robust way would be preferable, but I am not aware of one.
In my (historically unpopular) opinion we have two optional choices outside of but still allowing for this anonymous free-for-all:
A private company like Facebook uses a privileged system of identification and authentication based on login/password/2FA and relying on state-issued identification verification,
OR, what I feel is better, a public institution that uses a common system based on PKI and state-issued identification, eg, the DMV issuing DoD Common Access Cards.
Trusting districts and nation-states could sign each other's issuing authorities.
The benefits are multifaceted! It helps authenticate the source of deep fakes. It helps fight astroturfing, foreign or otherwise. It helps to remove private companies fueled by advertising revenue from being in a privileged position of identification, etc, etc.
I totally understand any downvotes but I would prefer if you instead engaged me in this conversation if you disagree.
I'd love to have this picked apart instead of just feeling bummed out.
Like pretty much any tool involving detection of / protection from erroneous things, it's forever a cat and mouse game. There will always be new viruses, jailbreaks, banned content, 0-days etc. AI detection is no different.
If yellow newspapers were able to push us to war despite us knowing that "the written word was never reliable to start with", what will be the impact of the combination of this technology and the internet used against a population that has been conditioned over generations to trust video.
You can go on Youtube to see charlatans peddle all sorts of convenient truths with no evidence.
You don't even need AI. The bug is in the human wetware.
I think the issue with trust is rooted elsewhere - in social relations, politics, and not in AI generated content.
Do you read the news at all? If you can't trust any of them, then why even bother?
Hard to say anything is impossible off of one point - but discrimination afaik is generally seen as the easier problem of the two, given you only need to give a binary output as opposed to a continuous one.
return Math.random() < Math.pow(0.5, (new Date()).getFullYear() - 2023) ? "Not AI" : "AI";
This should increase in accuracy over time.Thats why you make it punishable by potential prison time if you create/disseminate an non watermarked video generated in this way.
Digital minimalism is looking more and more attractive.
Before you downvote, don't get this as a belittling the effort and all the results, they are stunning, but as a sincere question.
I do plenty of photography, I do a lot of videography. I know my way around Premiere Pro, Lightroom and After Effects. I also know a decent amount about computer vision and cg.
If I look at the "edited" videos, they look fake. Immediately. And not a little bit. They look like they were put through a washing machine full of effects: too contrasty, too much gamma, too much clarity, too low levels, like a baby playing with the effect controls. Can't exactly put my fingers on, but comparing the "original" videos to the ones that simply change one element, like the "add blue pom poms to his hands", it changes the whole video, and makes the whole video a bit cartooney, for lack of a better word.
I am simply wondering why?!
Is that a change in general through the model that processes the video? Is that something that is easy to get rid of in future versions, or inherently baked into how the model transforms the video?
You can also watermark plain text by generating "invisible" patterns.
Of course, in all these cases, the watermarks are trivial to remove: just re-encode the output with an open model. Which is why I hope there will be no federal law that tries to enforce something that is categorically unenforceable.
I will now review some of the standout clips.
That alien thing in the water is horrifying. The background fish look pretty convincing, except for the really flamboyant one in the dark.
I guess I should be impressed that the kite string seems to be rendered every frame and appears to be connected between the hand and the kite most of the time. The whole thing is really stressful though.
drunk sloth with weirdly crisp shadow should take the top slot from girl in danger of being stolen by kite.
man demonstrates novel chain sword fire stick with four or five dimensions might be better off in the bin...
> The camera is behind a man. The man is shirtless, wearing a green cloth around his waist. He is barefoot. With a fiery object in each hand, he creates wide circular motions. A calm sea is in the background. The atmosphere is mesmerizing, with the fire dance.
This just reads like slightly clumsy lyrics to a lost Ween song.
This is _the worst that machines will ever be at this task_, and most of the improvements that need to be made are a matter of engineering ingenuity, which can be translated to research dollars.
Seriously though. This is the company that is betting hard on VR goggles. And these are engines that can produce real time dreams, 3d, photographic quality, obedient to our commands. No 3d models needed, no physics simulations, no ray tracing, no prebuilt environments and avatars. All simply dreamed up in real time, as requested by the user in natural language. It might be one of the most addictive technologies ever invented.
Could be, but it's a bit dystopian to imagine that the government would have a say on the images you can generate- locally and in realtime- and send straight to your own eyes, don't you think? Dystopian and very difficult to enforce, too.
They're not really showing signs of slowing down either. Hey, Zuck, always thought you were kind of lame in the past. But maybe you weren't a one trick pony after all.
Deepmind. Protein folding and solving math problems is just less sexy.
Especially based on the examples on this site, it's not a far reach to say that they will start to generate video ads of you (yes, YOU! your face! You've already uploaded hundreds of photos for them to reference!) using a specific product and showing how happy you are because you bought it. Imagine scrolling Instagram and seeing your own face smelling some laundry detergent or laughing because you took some prescription medicine.
Anyone able to update a dinosaur?
Anything longer than a single clip is just a bunch of these clips stitched together.