The TLDR here, IMO is simply stated: the OSAID fails to require reproducibility by the public of the scientific process of building these systems, because the OSAID fails to place sufficient requirements on the licensing and public disclosure of training sets for so-called “Open Source” systems. The OSI refused to add this requirement because of a fundamental flaw in their process; they decided that “there was no point in publishing a definition that no existing AI system could currently meet”. This fundamental compromise undermined the community process, and amplified the role of stakeholders who would financially benefit from OSI’s retroactive declaration that their systems are “open source”. The OSI should have refrained from publishing a definition yet, and instead labeled this document as ”recommendations” for now.
I really can’t overstate how much respect I have for Kuhn and the SFC. If RMS and the FSF are the Free Software movement’s past, Kuhn and the SFC are it’s future, and I can’t imagine anyone better to carry that particular torch.
Oh, wow. Should be pretty obvious that something isn’t open source, …well… unless the source is open…
You would think that but even here on the Fediverse where many users have an affection for technology and are generally vary of AI, I‘ve seen people gobbling up the Open Source label when the model was open weights at best.
I’ve also had that. And I’m not even sure whether I want to hold it against them. For some reason it’s an industry-wide effort to muddy the waters and slap open source on their products. From the largest company who chose to have “Open” in their name but oppose transparency with every fibre of their body, to Meta, the curren pioneer(?) of “open sourcing” LLMs, to the smaller underdogs who pride themselves with publishing their models that way… They’ve all homed in on the term.
And lots of the journalists and bloggers also pick up on it. I personally think, terms should be well-defined. And open-source had a well-defined meaning. I get that it’s complicated with the transformative nature of AI, copyright… But I don’t think reproducibility is a question here at all. Of course we need that, that’s core to something being open. And I don’t even understand why the OSI claims it doesn’t exist… Didn’t we have datasets available until LLaMA1 along with an extensive scientific paper that made people able to reproduce the model? And LLMs aside, we sometimes have that with other kinds of machine learning…
(And by the way, this is an old article, from end of october last year.)
That’s a really beautiful & concise way of putting it <3
You see this on GitHub already. People publish paper results and manuals, along with a few files, and treat that as if it were open source. And this isn’t limited to LLMs, people with CNN papers or crawlers and other results publish a few files and the results on GitHub as if it were open source. I think this is a clash between current scientific community thinking + Big Tech vs Free Software + Free Culture initiatives.
Additionally, you can’t expect something Microsoft/Meta touches to remain untainted for long.