Creates hyper-realistic voice clones from just 3 seconds of audio

anyvoice.net

・

54 points

・

blacktechnology

・

4 months ago

47 comments

xnx ・ 4 months ago

What model is this using? I've had good results with e2-ft-tts running locally via Pinokio. You can also run it online for free https://huggingface.co/spaces/mrfakename/E2-F5-TTS

porjo ・ 4 months ago

Thanks, I got a better result with this than Anyvoice.

bugglebeetle ・ 4 months ago

Sure, just let me submit my voice for cloning to a closed sourced, online service of unknown provenance. What could ever go wrong?

dvh ・ 4 months ago

That's why you submit politician's voice instead
- dunham ・ 4 months ago
  
  It would be fun to have a clone of Majel Barrett's voice for something like Siri or Alexa.
- HanClinto ・ 4 months ago
  
  Yeah, but they have you read a specific text, so not as much of an option if you use the primary demo.
  Seems like a heck of a nice way to gather a training set! :)
  
  unsnap_biceps ・ 4 months ago
  
  The "upload audio" feature doesn't require any specific text.
  
  lubujackson ・ 4 months ago
  
  Cue reference to "Sneakers"...

delgaudm ・ 4 months ago

Hey there /u/blacktechnology, could you email me a few seconds of your voice so I can upload it to this site and see how the cloning goes? I'd love to see what I could do with a copy of your voice. Kthxbye.

F7F7F7 ・ 4 months ago

Is this Reddit? The snark in the comment and the /u/ thing had me confused for a minute.

superkuh ・ 4 months ago

I submitted an 8 second clip of speech and the resulting synthesized speech did not sound like the same voice. Too bad.

infogulch ・ 4 months ago

I hope you have a nice voice, I'll be listening to it try to sell me an extended car warranty for the next 3 months.
- superkuh ・ 4 months ago
  
  It was not my voice. It was spoken word clips from a song I legally purchased, the clips uploaded in the context of fair use.

esperent ・ 4 months ago

We've been advertising to get someone to take over the lease on a commercial building. Surprisingly, we've had several of what seem like very obvious scam attempts - people stringing us along, not trying to bargain (we are in a haggling country, people always try to bargain), asking us to wait unreasonable amounts of time, and finally when pressed breaking down into logical inconsistencies. So, not even good scam attempts.

I was wondering, what's the point? I mean, it's a building. You pay money, you sign the lease (in person), you get the use of the building. No money, no building. Where's the scam opportunity?

The only thing I can think of is that they're trying to get enough data and personal info to clone our voices and use that to try and gain access to bank accounts or to scam our relatives. Even if I'm wrong in this case, this seems like a major new vulnerability in society. I mean, if someone who sounded (and with video AI, perhaps even looked) exactly like me called up my mother and pretended I'd been violently robbed or had an accident, she'd transfer money in a heartbeat.

I'm considering that I should set up some kind of code system with my family for this. As in, if I ever end up in a situation where I need help, I'll use a particular code phrase. If I don't know it, assume it's an AI clone.

0x20cowboy ・ 4 months ago

> I'm considering that I should set up some kind of code system with my family for this. As in, if I ever end up in a situation where I need help, I'll use a particular code phrase.
You absolutely should. And include something for videos (like FaceTime calls). Especially if these members of your family are boomers+
- esperent ・ 4 months ago
  
  Have you done it yourself? What did you use, a phrase or code, something like that?
  
  floydnoel ・ 4 months ago
  
  In Terminator 2, Arnold Schwarzenegger's character uses the family pet's name to determine that the foster parents were dead. I think something like this, that is knowledge-based and would be easy to remember in a crisis would work best.
trod1234 ・ 4 months ago

There are potentially several reasons why people might do these things while not conforming to goodfaith practices.
One of them is potentially cloning your voice, another is market manipulation, and logical inconsistencies tend to mean its AI driven which is what bad actors have embraced to mount cost effective attacks.
I call these attacks because these are resource drain attacks intended to impose cost and cause interference.
The market manipulation aspects are quite subtle, and what I'm about to discuss applies to just about any place communication where you receive a signal from a person,is used.
First, time is cost, and as a bad actor you can impose cost through deceit in open societies by stringing people along and other interference.
You can also saturate the environment so that business cannot be done. Jamming the medium.
The business pays their people to do business (labor), and when they are interacting with an AI the cost asymmetry is striking. You have real customers mixed in with fake bad actors. There is a conversion ratio for successful business transactions.
Now consider what happens when that same business is flooded with offers for the same thing, and instead of 1 in 100, its 1 in 10000. You have cost for each prospective candidate transaction that is sunk cost. Those costs scale until you cannot make profit, go out of business, and fail. Even if you ignore most, if you can't differentiate the good from the bad, there is no way to optimize cost effectively.
The same type of ratio is used for signal to noise ratios in communications. If you can't communicate because you cannot differentiate between the signal and the noise, no communication becomes possible. This is communications jamming, and interference.
Then there is also the targeted psychological effects of the manufactured distorted reflective appraisal you have of the situation. People use reflected appraisal in their judgments every day, these insights are often feelings about observations you've seen, and by and large they may not accurately reflect reality when manipulated for intent.
If all you see is that transactions have become impossible to make, and you can't judge actual demand, you may make mistakes and have to shut down, or sell your business for pennies. This concentrates the market, and some people are always the winners in this type of exchange (generally not you).
You may also do other things based upon those manufactured distortions, and few question whether that reflected appraisal that you get is actually correct. It largely happens underneath perception, and these entities are targeting that.
In many respects, its a form of torture and mental coercion often without you recognizing or knowing about it; few actually react correctly.
Joost Meerloo, Robert Lifton, and Robert Cialdini wrote books covering the subject matters involved.
Similar happenings are occurring in the jobs market with ghost jobs. By imposing cost tortuously, you create a market floor where new entities can't enter the market, and where some entities who are in the market, will go out of business. Its fairly identify who benefits the most in such arrangements.
Interference, Sabotage, and Sieving, Concentration, and brittle failures.
Who might stand to gain, large corporations in the same business sector, nation-states who are on a pre-war footing seeking to destabilize and demoralize prior to a hot war.
A collapse of the market is something nation states might try and induce, because that is in effect the collapse of the economy.
Chaos drives smaller companies dependent on debt, out of business to be gobbled up by large companies concentrating the market sector.
With regards to AI cloning, yes you should set up subtle challenge responses phrases and code words.
Code words are vulnerable to being recorded and reused. They are sticky because its hard to memorize, and not one-time use.
Challenge response phrasing may simply be part of the conversation, but with special meaning only between you and them.
- esperent ・ 4 months ago
  
  > You can also saturate the environment so that business cannot be done. Jamming the medium
  I can't help but wonder if your huge, conspiracy laden comment is intended to do exactly this.
  
  trod1234 ・ 4 months ago
  
  Perhaps you should be more specific in your comments if you don't want to run afoul of the HN no snipe policy. Backhanded baseless comments are snipes.
  Your comment as it stands is structured in a way to discredit without rational basis as a nullification, and follows overgeneralized fallacy.
  Improper reasoning, false association, and false implication in all. You are actually doing the very thing you falsely claim others are doing.
  Reckless and baseless claims can demonstrate intent through negligence. Did you intend that?

croemer ・ 4 months ago

Getting error: Failed to generate voice

HeatrayEnjoyer ・ 4 months ago

I am hitting this error as well. I was additionally unable to create an account. Seems beta?
- blacktechnology ・ 4 months ago
  
  fixed
  
  croemer ・ 4 months ago
  
  No, still doesn't work. Progress bar now stuck at 95% for dozens of seconds after initial progress to 95% takes only 5 seconds. Broken.
  
  windsignaling ・ 4 months ago
  
  Still getting the same error I was (and the same error mentioned by parent commenter) when this was first posted.

krainboltgreene ・ 4 months ago

Getting a 500 from the HTTP API and also there's an `debugger` in the javascript.

ge96 ・ 4 months ago

3 seconds? That's crazy

"Huuhhhhhhhhhhh"

I wonder what their "fox jump" sentence is

mk_stjames ・ 4 months ago

A "Panphonic Poem" is what may do well here. As in...

  The pleasure of Shawn’s company
  Is what I most enjoy.
  He put a tack on Ms. Yancey’s chair
  When she called him a horrible boy.
  At the end of the month he was flinging two kittens
  Across the width of the room.
  I count on his schemes to show me a way now
  Of getting away from my gloom.

As discussed here:

https://literalminded.wordpress.com/2006/05/05/a-panphonic-p...

And recited very famously, in part and slightly modified, here:

https://www.youtube.com/watch?v=CgX4uJSj00Y

sailfast ・ 4 months ago

Default for me was: “What a beautiful day it is today, with bright sunshine and gentle breeze. Let's talk about the future of artificial intelligence.”
That said, I'm not going to be submitting a sample because [reasons]

croemer ・ 4 months ago

Duplicate: https://news.ycombinator.com/item?id=42641987

undefined ・ 4 months ago

[deleted]

croemer ・ 4 months ago

The title is editorialized, it should be something like: "Anyvoice - AI Voice Cloning"

croemer ・ 4 months ago

This is almost definitely against GDPR, there's no indication whatsoever of which legal entity is holding the data and how long it is stored on which servers where.

xqcgrek2 ・ 4 months ago

Has anyone tried multiple iterations? That is, upload a real voice, get its synthesized version, upload synthesized version 1 to get synthesized version 2, rinse and repeat...

abeppu ・ 4 months ago

Perhaps Alvin Lucifer reading his "I am sitting in a room" text would be ideal.

0_____0 ・ 4 months ago

I'm surprised you were able to repost this so quickly.

To reiterate, among my friends, if you use a tool like this to clone my voice for any reason, you are dead to me.

nwroot ・ 4 months ago

Failed to generate over and over

croemer ・ 4 months ago

I'm disappointed that people upvote something that so consistently fails. I've tried at least 5 times, never did it ever work. Not once.

gamblor956 ・ 4 months ago

This was a great way for them to collect a lot of free voice data to train their model.

inerte ・ 4 months ago

Every time there's a voice recognition post here someone comments about acquiring data. Why is this method better than having access to all of the video and podcasts sites on the internet?
- rahimnathwani ・ 4 months ago
  
  You can get people to utter the same sentence.
  
  gamblor956 ・ 4 months ago
  
  Righto. Everyone is saying the same thing so it's the cleanest data set you can get.
  
  mmh0000 ・ 4 months ago
  
  https://www.youtube.com/watch?v=ksb3KD6DfSI
  Just feed the AI broadcasts from "local" news stations.

clueless ・ 4 months ago

anybody try this and have a good result?

mxuribe ・ 4 months ago

Immediately, i thought that cybersecurity is now ruined for the distant future. Imagine if you will, a starship captain ready with a plot to overcome the evil plaguing their crew...and all they need to do is over-ride the starship computer's safety controls with the captain';s own voice override authorization...but, alas, early in 2025 a tech company developed the means by which said evil entity could re-override the captain's voice auth....and block the captain's plan...thereby dooming the entire crew of the starship.

This is why we can not have nice things; not now nor in the far off future! All of our uniqueness will be more easily duplicated. Thankfully, i won';t upload any of my voice recordings, and i will continue to walk around in my faraday cage suit. /s

montag ・ 4 months ago

Yep, this is a real Star Trek TNG episode, S4 E3 “Brothers”
- mxuribe ・ 4 months ago
  
  Lol, yep, good one!