• drkt@scribe.disroot.org
    link
    fedilink
    arrow-up
    13
    arrow-down
    58
    ·
    2 days ago

    Oh boy here we go downvotes again

    regardless o the model you’re using, the tech itself was developed and fine-tuned on stolen artwork with the sole purpose of replacing the artists who made it

    that’s not how that works. You can train a model on licensed or open data and they didn’t make it to spite you even if a large group of grifters are but those aren’t the ones developing it

    If you’re going to hate something at least base it on reality and try to avoid being so black-and-white about it.

    • sixty@sh.itjust.works
      link
      fedilink
      arrow-up
      17
      ·
      2 days ago

      You CAN train a model on licensed or open data. But we all know they didn’t keep it to just that.

      • drkt@scribe.disroot.org
        link
        fedilink
        arrow-up
        2
        arrow-down
        10
        ·
        1 day ago

        Yeah the corporations didn’t, that doesn’t mean you can’t and that people aren’t doing that.

        • mke@programming.dev
          link
          fedilink
          arrow-up
          13
          arrow-down
          1
          ·
          1 day ago

          Is everyone posting ghibli-style memes using ethical, licensed or open data models?

          • drkt@scribe.disroot.org
            link
            fedilink
            arrow-up
            2
            arrow-down
            1
            ·
            1 day ago

            No, they’re using a corporate model that was trained unethically. I don’t see what your point is, though. That’s not inherent to how LLMs or other AIs work, that’s just corporations being leeches. In other words, business as usual in capitalist society.

            • mke@programming.dev
              link
              fedilink
              arrow-up
              3
              ·
              1 day ago

              You’re right about it not being inherent to the tech, and I sincerely apologize if I insist too much despite that. This will be my last reply to you. I hope I gave you something constructive to think about rather than just noise.

              The issue, and my point, is that you’re defending a technicality that doesn’t matter in real world usage. Nearly no one uses non-corporate, ethical AI. Most organizations working with it aren’t starting from scratch because it’s disadvantageous or outright unfeasible resourcewise. Instead, they use pre-existing corporate models.

              Edd may not be technically right, but he is practically right. The people he’s referring to are extremely unlikely to be using or creating completely ethical datasets/AI.

              • drkt@scribe.disroot.org
                link
                fedilink
                arrow-up
                1
                ·
                1 day ago

                The issue, and my point, is that you’re defending a technicality that doesn’t matter in real world usage.

                You’re right and I need to stop doing it. That’s a good reminder to go and enjoy the fresh spring air 😄

    • pretzelz@lemmy.world
      link
      fedilink
      arrow-up
      18
      arrow-down
      1
      ·
      2 days ago

      I think his argument is that the models initially needed lots of data to verify and validate their current operation. Subsequent advances may have allowed those models to be created cleanly, but those advances relied on tainted data, thus making the advances themselves tainted.

      I’m not sure I agree with that argument. It’s like saying that if you invented a cure for cancer that relied on morally bankrupt means you shouldn’t use that cure. I’d say that there should be a legal process involved against the person who did the illegal acts but once you have discovered something it stands on its own two feet. Perhaps there should be some kind of reparations however given to the people who were abused in that process.

      • drkt@scribe.disroot.org
        link
        fedilink
        arrow-up
        3
        arrow-down
        2
        ·
        1 day ago

        I think his argument is that the models initially needed lots of data to verify and validate their current operation. Subsequent advances may have allowed those models to be created cleanly, but those advances relied on tainted data, thus making the advances themselves tainted.

        It’s not true; you can just train a model from the ground up on properly licensed or open data, you don’t have to inherit anything. What you’re talking about is called finetuning which is where you “re-train” a model to do something specific because it’s much cheaper than training from the ground up.

        • pretzelz@lemmy.world
          link
          fedilink
          arrow-up
          7
          ·
          1 day ago

          I don’t think that’s what they are saying. It’s not that you can’t now, it’s that initially people did need to use a lot of data. Then they found tricks to improve training on less, but these tricks came about after people saw what was possible. Since they initially needed such data, their argument goes, and we wouldn’t have been able to improve upon the techniques if we didn’t know that huge neutral nets trained by lots of data were effective, then subsequent models are tainted by the original sin of requiring all this data.

          As I said above, I don’t think that subsequent models are necessarily tainted, but I find it hard to argue with the fact that the original models did use data they shouldn’t have and that without it we wouldn’t be where we are today. Which seems unfair to the uncompensated humans who produced the data set.

          • drkt@scribe.disroot.org
            link
            fedilink
            arrow-up
            2
            ·
            1 day ago

            I actually think it’s very interesting how nobody in this community seems to know or understand how these models work, or even vaguely follow the open source development of them. The first models didn’t have this problem, it was when OpenAI realized there was money to be made that they started scraping the internet and training illegally and consequently a billion other startups did the same because that’s how silicon valley operates.

            This is not an issue of AI being bad, it’s an issue of capitalist incentive structures.

            • BoulevardBlvd@lemmy.blahaj.zone
              link
              fedilink
              arrow-up
              1
              ·
              1 day ago

              Cool! What’s the effective difference for my life that your insistence on nuance has brought? What’s the difference between a world where no one should have ai because the entirety of the tech is tainted with abuse and a world where no one should have ai because the entirety of the publicly available tech is tainted with abuse? What should I, a consumer, do? Don’t say 1000 hrs of research on every fucking jpg, you know that’s not the true answer just from a logistical standpoint

    • Tartas1995@discuss.tchncs.de
      link
      fedilink
      arrow-up
      11
      arrow-down
      4
      ·
      2 days ago

      Name one that is “ethically” sourced.

      And “open data” is a funny thing to say. Why is it open? Could it be open because people who made it didn’t expect it to be abused for ai? When a pornstar posted a nude picture online in 2010, do you think they thought of the idea that someone will use it to create deepfakes of random women? Please be honest. And yes, a picture might not actually be “open data” but it highlights the flaw in your reasoning. People don’t think about what could be done to their stuff in the future as much as they should but they certainly can’t predict the future.

      Now ask yourself that same question with any profession. Please be honest and tell us, is that “open data” not just another way to abuse the good intentions of others?

        • mke@programming.dev
          link
          fedilink
          arrow-up
          12
          arrow-down
          1
          ·
          1 day ago

          Wow, nevermind, this is way worse than your other comment. Victim blaming and equating the law to morality, name a more popular duo with AI bros.

          • drkt@scribe.disroot.org
            link
            fedilink
            arrow-up
            2
            arrow-down
            3
            ·
            1 day ago

            I can’t make you understand more than you’re willing to understand. Works in the public domain are forfeited for eternity, you don’t get to come back in 10 years and go ‘well actually I take it back’. That’s not how licensing works. That’s not victim blaming, that’s telling you not to license your nudes in such a manner that people can use them freely.

            • mke@programming.dev
              link
              fedilink
              arrow-up
              3
              ·
              edit-2
              1 day ago

              The vast majority of people don’t think in legal terms, and it’s always possible for something to be both legal and immoral. See: slavery, the actions of the third reich, killing or bankrupting people by denying them health insurance… and so on.

              There are teenagers, even children, who posted works which have been absorbed into AI training without their awareness or consent. Are literal children to blame for not understanding laws that companies would later abuse when they just wanted to share and participate in a community?

              And AI companies aren’t using merely licensed material, they’re using everything they can get their hands on. If they’re pirating you bet your ass they’ll use your nudes if they find them, public domain or not. Revenge porn posted by an ex? Straight into the data.

              So your argument is:

              • It’s legal

              But:

              • What’s legal isn’t necessarily right
              • You’re blaming children before companies
              • AI makers actually use illegal methods, too

              It’s closer to victim blaming than you think.

              The law isn’t a reliable compass for what is or isn’t right. When the law is wrong, it should be changed. IP law is infamously broken in how it advantages and gets (ab)used by companies. For a few popular examples: see how youtube mediates between companies and creators, nintendo suing everyone they can (costs victims more than it does nintendo), everything disney did to IP legislation.

              • drkt@scribe.disroot.org
                link
                fedilink
                arrow-up
                1
                arrow-down
                1
                ·
                1 day ago

                Okay but I wasn’t arguing morality or about children posting nudes of themselves. I’m just telling you that works submitted into the public domain can’t be retracted and there are models trained on exclusively open data, which a lot of AI haters don’t know, understand or won’t acknowledge. That’s all I’m saying. AI is not bad, corporations make it bad.

                The law isn’t a reliable compass for what is or isn’t right.

                Fuck yea it ain’t, I’m the biggest copyright and IP law hater on this platform and I’ll get ahead of the next 10 replies by saying no it’s not because I want to enable mindless corporate content scraping; it’s because human creativity shouldn’t not be boxed in. It should be shared freely, lest our culture be lost.