regardless o the model you’re using, the tech itself was developed and fine-tuned on stolen artwork with the sole purpose of replacing the artists who made it
that’s not how that works. You can train a model on licensed or open data and they didn’t make it to spite you even if a large group of grifters are but those aren’t the ones developing it
If you’re going to hate something at least base it on reality and try to avoid being so black-and-white about it.
No, they’re using a corporate model that was trained unethically. I don’t see what your point is, though. That’s not inherent to how LLMs or other AIs work, that’s just corporations being leeches. In other words, business as usual in capitalist society.
You’re right about it not being inherent to the tech, and I sincerely apologize if I insist too much despite that. This will be my last reply to you. I hope I gave you something constructive to think about rather than just noise.
The issue, and my point, is that you’re defending a technicality that doesn’t matter in real world usage. Nearly no one uses non-corporate, ethical AI. Most organizations working with it aren’t starting from scratch because it’s disadvantageous or outright unfeasible resourcewise. Instead, they use pre-existing corporate models.
Edd may not be technically right, but he is practically right. The people he’s referring to are extremely unlikely to be using or creating completely ethical datasets/AI.
I think his argument is that the models initially needed lots of data to verify and validate their current operation. Subsequent advances may have allowed those models to be created cleanly, but those advances relied on tainted data, thus making the advances themselves tainted.
I’m not sure I agree with that argument. It’s like saying that if you invented a cure for cancer that relied on morally bankrupt means you shouldn’t use that cure. I’d say that there should be a legal process involved against the person who did the illegal acts but once you have discovered something it stands on its own two feet. Perhaps there should be some kind of reparations however given to the people who were abused in that process.
I think his argument is that the models initially needed lots of data to verify and validate their current operation. Subsequent advances may have allowed those models to be created cleanly, but those advances relied on tainted data, thus making the advances themselves tainted.
It’s not true; you can just train a model from the ground up on properly licensed or open data, you don’t have to inherit anything. What you’re talking about is called finetuning which is where you “re-train” a model to do something specific because it’s much cheaper than training from the ground up.
I don’t think that’s what they are saying. It’s not that you can’t now, it’s that initially people did need to use a lot of data. Then they found tricks to improve training on less, but these tricks came about after people saw what was possible. Since they initially needed such data, their argument goes, and we wouldn’t have been able to improve upon the techniques if we didn’t know that huge neutral nets trained by lots of data were effective, then subsequent models are tainted by the original sin of requiring all this data.
As I said above, I don’t think that subsequent models are necessarily tainted, but I find it hard to argue with the fact that the original models did use data they shouldn’t have and that without it we wouldn’t be where we are today. Which seems unfair to the uncompensated humans who produced the data set.
I actually think it’s very interesting how nobody in this community seems to know or understand how these models work, or even vaguely follow the open source development of them. The first models didn’t have this problem, it was when OpenAI realized there was money to be made that they started scraping the internet and training illegally and consequently a billion other startups did the same because that’s how silicon valley operates.
This is not an issue of AI being bad, it’s an issue of capitalist incentive structures.
Cool! What’s the effective difference for my life that your insistence on nuance has brought? What’s the difference between a world where no one should have ai because the entirety of the tech is tainted with abuse and a world where no one should have ai because the entirety of the publicly available tech is tainted with abuse? What should I, a consumer, do? Don’t say 1000 hrs of research on every fucking jpg, you know that’s not the true answer just from a logistical standpoint
And “open data” is a funny thing to say. Why is it open? Could it be open because people who made it didn’t expect it to be abused for ai? When a pornstar posted a nude picture online in 2010, do you think they thought of the idea that someone will use it to create deepfakes of random women? Please be honest. And yes, a picture might not actually be “open data” but it highlights the flaw in your reasoning. People don’t think about what could be done to their stuff in the future as much as they should but they certainly can’t predict the future.
Now ask yourself that same question with any profession. Please be honest and tell us, is that “open data” not just another way to abuse the good intentions of others?
I can’t make you understand more than you’re willing to understand. Works in the public domain are forfeited for eternity, you don’t get to come back in 10 years and go ‘well actually I take it back’. That’s not how licensing works. That’s not victim blaming, that’s telling you not to license your nudes in such a manner that people can use them freely.
The vast majority of people don’t think in legal terms, and it’s always possible for something to be both legal and immoral. See: slavery, the actions of the third reich, killing or bankrupting people by denying them health insurance… and so on.
There are teenagers, even children, who posted works which have been absorbed into AI training without their awareness or consent. Are literal children to blame for not understanding laws that companies would later abuse when they just wanted to share and participate in a community?
And AI companies aren’t using merely licensed material, they’re using everything they can get their hands on. If they’re pirating you bet your ass they’ll use your nudes if they find them, public domain or not. Revenge porn posted by an ex? Straight into the data.
So your argument is:
It’s legal
But:
What’s legal isn’t necessarily right
You’re blaming children before companies
AI makers actually use illegal methods, too
It’s closer to victim blaming than you think.
The law isn’t a reliable compass for what is or isn’t right. When the law is wrong, it should be changed. IP law is infamously broken in how it advantages and gets (ab)used by companies. For a few popular examples: see how youtube mediates between companies and creators, nintendo suing everyone they can (costs victims more than it does nintendo), everything disney did to IP legislation.
Okay but I wasn’t arguing morality or about children posting nudes of themselves. I’m just telling you that works submitted into the public domain can’t be retracted and there are models trained on exclusively open data, which a lot of AI haters don’t know, understand or won’t acknowledge. That’s all I’m saying. AI is not bad, corporations make it bad.
The law isn’t a reliable compass for what is or isn’t right.
Fuck yea it ain’t, I’m the biggest copyright and IP law hater on this platform and I’ll get ahead of the next 10 replies by saying no it’s not because I want to enable mindless corporate content scraping; it’s because human creativity shouldn’t not be boxed in. It should be shared freely, lest our culture be lost.
Oh boy here we go downvotes again
that’s not how that works. You can train a model on licensed or open data and they didn’t make it to spite you even if a large group of grifters are but those aren’t the ones developing it
If you’re going to hate something at least base it on reality and try to avoid being so black-and-white about it.
You CAN train a model on licensed or open data. But we all know they didn’t keep it to just that.
Yeah the corporations didn’t, that doesn’t mean you can’t and that people aren’t doing that.
Is everyone posting ghibli-style memes using ethical, licensed or open data models?
No, they’re using a corporate model that was trained unethically. I don’t see what your point is, though. That’s not inherent to how LLMs or other AIs work, that’s just corporations being leeches. In other words, business as usual in capitalist society.
You’re right about it not being inherent to the tech, and I sincerely apologize if I insist too much despite that. This will be my last reply to you. I hope I gave you something constructive to think about rather than just noise.
The issue, and my point, is that you’re defending a technicality that doesn’t matter in real world usage. Nearly no one uses non-corporate, ethical AI. Most organizations working with it aren’t starting from scratch because it’s disadvantageous or outright unfeasible resourcewise. Instead, they use pre-existing corporate models.
Edd may not be technically right, but he is practically right. The people he’s referring to are extremely unlikely to be using or creating completely ethical datasets/AI.
You’re right and I need to stop doing it. That’s a good reminder to go and enjoy the fresh spring air 😄
… but that’s why people are against it.
The existing models were all trained on stolen art
Corporate models, yes
Link to this noncorporate, ethically sourced ai plz because I’ve heard a lot but I’ve never seen it
https://huggingface.co/Mitsua/mitsua-diffusion-one
I think his argument is that the models initially needed lots of data to verify and validate their current operation. Subsequent advances may have allowed those models to be created cleanly, but those advances relied on tainted data, thus making the advances themselves tainted.
I’m not sure I agree with that argument. It’s like saying that if you invented a cure for cancer that relied on morally bankrupt means you shouldn’t use that cure. I’d say that there should be a legal process involved against the person who did the illegal acts but once you have discovered something it stands on its own two feet. Perhaps there should be some kind of reparations however given to the people who were abused in that process.
It’s not true; you can just train a model from the ground up on properly licensed or open data, you don’t have to inherit anything. What you’re talking about is called finetuning which is where you “re-train” a model to do something specific because it’s much cheaper than training from the ground up.
I don’t think that’s what they are saying. It’s not that you can’t now, it’s that initially people did need to use a lot of data. Then they found tricks to improve training on less, but these tricks came about after people saw what was possible. Since they initially needed such data, their argument goes, and we wouldn’t have been able to improve upon the techniques if we didn’t know that huge neutral nets trained by lots of data were effective, then subsequent models are tainted by the original sin of requiring all this data.
As I said above, I don’t think that subsequent models are necessarily tainted, but I find it hard to argue with the fact that the original models did use data they shouldn’t have and that without it we wouldn’t be where we are today. Which seems unfair to the uncompensated humans who produced the data set.
I actually think it’s very interesting how nobody in this community seems to know or understand how these models work, or even vaguely follow the open source development of them. The first models didn’t have this problem, it was when OpenAI realized there was money to be made that they started scraping the internet and training illegally and consequently a billion other startups did the same because that’s how silicon valley operates.
This is not an issue of AI being bad, it’s an issue of capitalist incentive structures.
Cool! What’s the effective difference for my life that your insistence on nuance has brought? What’s the difference between a world where no one should have ai because the entirety of the tech is tainted with abuse and a world where no one should have ai because the entirety of the publicly available tech is tainted with abuse? What should I, a consumer, do? Don’t say 1000 hrs of research on every fucking jpg, you know that’s not the true answer just from a logistical standpoint
Name one that is “ethically” sourced.
And “open data” is a funny thing to say. Why is it open? Could it be open because people who made it didn’t expect it to be abused for ai? When a pornstar posted a nude picture online in 2010, do you think they thought of the idea that someone will use it to create deepfakes of random women? Please be honest. And yes, a picture might not actually be “open data” but it highlights the flaw in your reasoning. People don’t think about what could be done to their stuff in the future as much as they should but they certainly can’t predict the future.
Now ask yourself that same question with any profession. Please be honest and tell us, is that “open data” not just another way to abuse the good intentions of others?
Brother if someone made their nudes public domain then that’s on them
Wow, nevermind, this is way worse than your other comment. Victim blaming and equating the law to morality, name a more popular duo with AI bros.
I can’t make you understand more than you’re willing to understand. Works in the public domain are forfeited for eternity, you don’t get to come back in 10 years and go ‘well actually I take it back’. That’s not how licensing works. That’s not victim blaming, that’s telling you not to license your nudes in such a manner that people can use them freely.
The vast majority of people don’t think in legal terms, and it’s always possible for something to be both legal and immoral. See: slavery, the actions of the third reich, killing or bankrupting people by denying them health insurance… and so on.
There are teenagers, even children, who posted works which have been absorbed into AI training without their awareness or consent. Are literal children to blame for not understanding laws that companies would later abuse when they just wanted to share and participate in a community?
And AI companies aren’t using merely licensed material, they’re using everything they can get their hands on. If they’re pirating you bet your ass they’ll use your nudes if they find them, public domain or not. Revenge porn posted by an ex? Straight into the data.
So your argument is:
But:
It’s closer to victim blaming than you think.
The law isn’t a reliable compass for what is or isn’t right. When the law is wrong, it should be changed. IP law is infamously broken in how it advantages and gets (ab)used by companies. For a few popular examples: see how youtube mediates between companies and creators, nintendo suing everyone they can (costs victims more than it does nintendo), everything disney did to IP legislation.
Okay but I wasn’t arguing morality or about children posting nudes of themselves. I’m just telling you that works submitted into the public domain can’t be retracted and there are models trained on exclusively open data, which a lot of AI haters don’t know, understand or won’t acknowledge. That’s all I’m saying. AI is not bad, corporations make it bad.
Fuck yea it ain’t, I’m the biggest copyright and IP law hater on this platform and I’ll get ahead of the next 10 replies by saying no it’s not because I want to enable mindless corporate content scraping; it’s because human creativity shouldn’t not be boxed in. It should be shared freely, lest our culture be lost.