And I say all that completely slackjawed that this is possible.
I bet that if you select a British accent you will get fewer of them.
Hmm.... Scottish, Welsh, Irish (Nor'n) or English? If English, North or South? If North, which city? Brummie? Scouse? If South, London? Cockney or Multicultural London English [0]?
[0] https://en.wikipedia.org/wiki/Multicultural_London_English
Castlebridge is 10 minutes away by car. Madness!
https://accentbiasbritain.org/accents-in-britain/
Also, we have yet to define precisely define what is meant by 'British'. This probably needs a "20 falsehoods people believe about..."-type article.
They do not mean Irish or Scottish accents; if they did, they would have said exactly that, because those accents are quite different from standard (British) English accents. So different, in fact, that even Americans can readily tell the difference, when they frequently have some trouble telling English and Australian accents apart.
Also, to most English speakers, "English accent" doesn't make much sense, because "English" is the language. It sounds like saying a German speaker, speaking German, has a "German accent". Saying "British accent" differentiates the language (English, spoken by people worldwide) from the accent (which refers to one part of one country that uses that language).
Imagine being stuck on a call with this.
> "Hey, so like, is there anything I can help you with today?"
> "Talk to a person."
> "Oh wow, right. (chuckle) You got it. Well, before I connect you, can you maybe tell me a little bit more about what problem you're having? For example, maybe it's something to do with..."
"How's it going. We're gonna start by taking you back to your 2022 favorites, starting with the sweet sounds of XYZ". There's very little you can tweak about it, the suggestions kinda suck, but you're getting a fake friend to introduce them to you. Yay, I guess..
Listening to this on 1.75x speed is excellent. I think the generated speaking speed is slow for audio quality, bc it'd be much harder to slow-down the generated audio while retaining quality than vice versa.
A lot of people are just like that IRL.
They cannot just say "the food was fine", it's usually some crap like "What on earth! These are the best cheese sticks I've had IN MY EN TI R E LIFE!".
We may not know that a given speaker is a GenX Methodist from Wisconsin that grew up at skate parks in the suburbs, but we hear clusters of speech behavior that lets our brain go "yeah, I'm used to things fitting together in this way sometimes"
These don't have that.
Instead, they seem to mostly smudge together behaviors that are just generally common in aggregate across the training data. The speakers all voice interrupting acknowledgements eagerly, they all use bright and enunciated podcaster tone, they all draw on similar word choice, etc -- they distinguish gender and each have a stable overall vocal tone, but no identity.
I don't doubt that this'll improve quickly though, by training specific "AI celebrity" voices narrowed to sound more coherent, natural, identifiable, and consistent. (And then, probably, leasing out those voices for $$$.)
As a tech demo for "render some vague sense of life behind this generated dialog" this is pretty good, though.
now it's bad voice actors, in 2 years it'll be great ones
The protest itself is exactly the kind of thing that will be avoided by replacing humans, demonstrated writ-large for the people with the cheque-book.
I can understand the spirit of protest and why it occurs, but it just seems so out-of-line strategically/tactically when used against automation that's taking jobs.
Just the order of events is kind of funny to me, and this applies to automation-job-taking protest the world over : A technique is demonstrated that displaces workers, the workers then picket and refuse to work -- understandable, but faced with the current prospect of "This mechanism performs similar work for cheaper", it seems counter-productive to then demonstrate the worst-case-scenario for the patron : a work stoppage that an automated workforce would never experience, alongside legal fees that would never be encountered had they an automated work-force.
That all said, protest is one of the only weapons in the arsenal of the working -- it just feels as if the argument against automation is one of the places where that technique rings hollow.
In the case of media/movies/literature/etc, I think the power to force corporations to value humans is solely in the hands of the consumer -- and unfortunately that's such an unorganized 'group' that it's unlikely they will establish any kind of collective action that would instantiate change.
- guy whose genAI product will definitely be used to spam zero-effort slop all over the internet.
We are witnessing in real time the answer to why 'The Matrix' was set when it was. Once AI takes over there is no future culture.
This is a big problem that needs to be talked about more, the endgoal of AI seems to be quite grim for jobs and generally for humans. Where will this pure profit lead to? If all advertising will be generated who will want to have anything to do with all the products they’re advertising?
In general, I have a feeling double digit growth forever is impossible. Facebook and Google both reported YoY growth in 15%+ this week iirc and I have a feeling they are only able to achieve this by destroying either competitors or adjacent industries rather than by "making the pie bigger". It will end at some point.
As people get fed up with AI generated crap, companies will start to pay very good money to the few remaining good human creatives in order to differentiate themselves. The field will then be seen as desirable, people will start working hard for to get these jobs, companies will take apprentices hoping they will become masters later, etc... We may lose a generation, but certainly not the entire future.
Of course, it is just one of many possible futures, but I think the most likely if you take your assumptions as a postulate. It may turn out that AIs end up not displacing creative jobs too much, or going the other way, that AIs end up being truly creative, building their own culture together with humans, or not.
Step 0. Some People make novel art like a jingle that is unlike anything yet.
Step 1. Early use of said jingle creates a buzz and generated good sales results.
Step 2. It gets copied everywhere and by everyone. It is now a meme.
This is the step I think where generative AI can help. Slightly transform existing art to fit a particular purpose. This lets businesses save money by not paying humans do this work.
Problem is we don't know where the next person or when this step 0 comes from... When we soak up all the "slack" and send all the "money" to the top because lets face it that's how it will work. The money "saved" from AI won't make goods and services cheaper by any significant measure. We will still have to pay as much as we can afford to pay.
I’ve learnt things and been exposed to ideas developing software for work that I simply wouldn’t have if I was only doing it in my spare time.
> the majority of podcasts are from a group of generic white guys
When I listened to the audio samples before coming to the comments, I thought: "oh, like those totally lifeless and bland U.S. accents from podcasts, YT, etc."
I wouldn't associate it with skin colour or gender though at all. I've no idea why you'd go there - any skin colour and any gender is absolutely welcomed into the fold of U.S. cultural production, if they can produce bland generic "content" sincerely enough, it seems to me.
Disclaimer: many U.S. accents are interesting and wonderful (Colorado; Tom Waits), they don't all sound generic and bland. I have U.S. friends therefore I can pass judgment (TM).
Please don't think that I'm trying to suggest... anything . It's just that I'm getting used to read this pattern in the output of LLMs. "While this and that is great...". Maybe we're mimicking them now? I catch myself using these disclaimers even in spoken language.
This is good, but certainly not yet great.
In general people find the back and forth between the "hosts" engaging and also gives people time to digest the contents.
In similar vein, I’m glad they told me it was a funny story, because otherwise I wouldn’t have known.
The problem is that people talking over each other is not a format I long to listen to.
I often would like to listen to a blog post instead of reading it, but haven't found an easy, quick solution yet.
I tried piping text through OpenAI's tts-1-hd, model and it is the first one I ever found that is human like enough for me to like listening to it. So I could write a tool for my own usecase that pipes the text to tts-1-hd and plays the audio. But maybe there is already something with a public web interface out there?
And it's Eleven Labs quality- which unless I've fallen behind the times is the highest quality TTS by a margin.
I did a bit of research and it seems to be, by far, the highest-quality TTS engine that is free and you can do things like pause and continue.
There are other options that have higher-quality voices, but they aren't free.
Is this related to LLM, or is this a completely different branch of AI, and is it just a coincidence? I am curious.
Astounding
On the bright side, you can stop watching these channels and have more time for serious things.
What are some examples? I haven't encountered this.
Almost all of the results will not consist of 'jazz' in any real sense, but instead a collection of uncanny melodies and chord progressions that wonder around going nowhere, traditionally accompanied by an obscenely eye-offending diffusion model-generated mishmash of seasonal tropes and incongruent interior design choices. Often, it's MIDI bossa nova presumably written by either a machine or someone who's only ever heard a few bars of music at a time and has no idea that 'feel' or 'soul' are a thing.
Because when I search "jazz" on YT I'm just getting legit music videos and jazz playlists -- stuff like Norah Jones, top 100 jazz classics playlists, etc.
But I assume that search results are personalized.
Sure. I just tried in private browsing mode, and got mostly the same. Here are a few of the very first results I get for 'jazz':
https://www.youtube.com/watch?v=xhL3Cb740VY
https://www.youtube.com/watch?v=8UXFapv_kFI
https://www.youtube.com/watch?v=nKNnzbi-v9E
https://www.youtube.com/watch?v=ABmQvH5K75w
https://www.youtube.com/watch?v=-jgEswq9ZlI
Some are worse than others.
I'm not sure all (or any) of it actually is AI though. I assume that's coming very soon, but I suspect this stuff is cynically and methodically hand-composed.
By the way: I have nothing against generative composition! Brian Eno has been doing this stuff longer than anyone else, and it's very cool. I'm sure you could make some 'generative jazz' that's actually distinctive and artistic, but this isn't it.
Nor for that matter Mozart, who wrote simple algorithmic compositions powered by dice. These were common musical games in his day.
1. Voice acting for low-budget/no-budget animations and games.
2. Billions of youtube "top 50 building demolitions" where the forgettable presentation is narrated by forgettable AI. Now we'll get "podcast style" conversation narration over those videos. Instead of bailing after 30 sec with regret, you might make a whole minute.
3. Reaction videos? Sometimes I weaken. I want to see a random person's reaction to their "first time listening" to the famous song they somehow have never heard until this moment. If we humans lower ourselves to reaction videos, we'll watch/listen to AI chatting to itself about things we love. Once the content gets "spicy", beyond the potato salad google demos, the floodgates will open. God help us.
I think they absolutely will, because "resonating" is not a material phenomenon, it's something people decide that they're doing. Your connection with an actor on television is not an actual connection. Most of acting is learning the times and length to be silent while making a particular face (dictated by the director) in order for the audience to project feelings and thoughts onto you. You're thinking about your camera blocking, or your groceries, and your audience sees you thinking about some plot point in a fictional world.
I've got a theory that we severely damaged a generation of girls by inundating them with images of girls their own age singing songs and acting parts all written and directed by middle-aged men - ones who chose as a profession to write songs in the voices of, write fiction in the voices of, and to direct, photograph and choreograph in person, tween girls. Their models of themselves have come from looking at these depictions of girls, who were never allowed to speak for themselves, and resonating.
We've all been on those webinars where it's clear -- despite the infusions (on cue) of "enthusiasm" from the speaker attempting to make it sound more natural and off-the-cuff -- that they are reading from a script.
It's a difficult-to-mask phenomenon for humans.
That all said, I actually have more grace for an AI sounding like this than I do for a human presenter reading from a script. Like, if I'm here "live" and paying attention to what you're saying, at least do me the service of truly being "here" with me and authentically communicating vs. simply reading something.
If you're going to simply read something, then just send it to me to read too - don't pretend it's a spontaneously synchronous communication.
frontier garbage.