oh someone finally offered cory doctorow a big enough sack of cash to do a cory doctorow? I’m shocked
Post
oh someone finally offered cory doctorow a big enough sack of cash to do a cory doctorow? I’m shocked
oh good, the “you’re just doing purity culture” thing is already taking hold over on bluesky
so the line is now supposed to be that local LLMs are good and moral and SaaS LLMs are bad, when local LLMs come from the same fucked system that’s also actively making it impossible to buy computing hardware powerful enough to run even a shitty local LLM? is that about right? I’m supposed to clap cause someone with money is running a plagiarism machine but slower and shittier on their desktop?
@zzt I thought that with local models you strictly controlled what information that they were drawing upon?
An LLM is two things: the model and the weights. The model is basically a description of how different layers fit together. It’s usually not that complicated (you can create quite good ones in a few hundred lines of code with modern frameworks). But, by itself, the model is useless because each layer is something like ‘take an input and permute it using this operation with an NxM matrix as the other operand’. That other operand is not part of the model, it’s in the weights. The weights are large. They are the result of training. You process a lot of data to generate them.
In a classical neural network, the model defines the topology, but each neurone has an activation threshold. When you train it, you feed a bunch of data through it and this sets the threshold values. Eventually, you stop and now you have a trained model. Modern deep learning models work in a similar way, but with a huge pile of optimisations. The weights are the valuable thing because it takes vast amounts of compute and data to produce them. They’re also completely opaque. They’re just a massive blob of data, so trying to figure out the behaviour of a trained model by looking at the weights is almost impossible, as is working out what went into their training sets.
Very few ‘open’ LLMs have weights that were trained on known and reproducible data sets. Things like Meta’s LLaMa are ‘open’ in that you can recreate the model yourself (as llama.cpp did) and download their weights, but you have no visibility into what the weights were trained on, can’t reproduce the training (unless you have a data centre and a massive pile of lawyers who will be able to defend you against copyright infringement lawsuits). Oh, and the license says that you agree never to sue Meta for any IP infringement, so if @pluralistic is using one of the ‘open’ LLaMa weights, he has just given Meta a perpetual license to use all of his work for any purpose. I’m sure he considers that a great deal for a grammar checker with a 50% false positive rate.
This, by the way, is why I really like Mozilla’s translation models (which are much simpler than a general purpose LLM, though they use much of the same underlying technology). They are trained on curated open datasets designed for training machine-translation systems and they are specifically designed so that you can redo the training on a single (powerful, but affordable [at least, before the bubblers decided to buy everything]) machine. That made them things that people could experiment with, exploring different model structures to see how it affected speed and accuracy.
So, yes, a local model will not send data across the network when you use it (hopefully. Unfortunately, most are distributed as Python code and a load of the ones on Hugging Face also came with bundled malware. I hope they’ve managed to fix that now), but they’re not open in any meaningful way, they are still subject to the whims of massive corporations, and they are building a dependency on the exact companies that Doctrow criticises and handing them a load of control over your workflow.
@david_chisnall @zzt @pluralistic
Thank you. I was unaware of the specifics of how the things work for the local models.
@Da_Gut that’s incorrect for all of the local, supposedly open source models I know of
all of the research I’ve read on this has easily extracted verbatim plagiarized text from the models, because all of them have their origins in the same sources — usually Facebook’s leaked llama model or deepseek (which itself took from previous models). it isn’t possible for LLM models to be trained by anything other than a billion dollar company or a state operating like one.
@zzt oh, I did not know that. So if you disconnect it from the Internet, it ceases to work?
@Da_Gut the factors I’m talking about aren’t technical ones, they’re social and systemic. specifically:
- local LLMs are worse than cloud ones, and necessarily must always be. it isn’t possible for independent development of models to happen, and LLMs are already on an intentionally fast deprecation cycle. old models aren’t viewed as useful by anybody.
- it’s very easy for established companies to take action against local models as IP theft, and they’re already laid the groundwork for this
it’s important to note that it isn’t always a big sack of cash. lately I keep seeing this pattern happen with engineers:
- “as an AI skeptic I finally have empirical proof that LLMs are good/useful/thinking/feeling <posts slop>”
- “uhhh are you ok? I checked the LLM output you posted and it doesn’t make any sense if you dig in at all and the citations are all fake”
- “this is empirical proof and you’re being emotional.”
this is engineer brain. Doctorow isn’t an engineer, so sack of cash it is.
@zzt I think AI boosterism is the first stage of AI psychosis
thx for telling me that everything I have hosted on the web getting repeatedly scraped to death by what would previously be considered a massive attack but is now being carried out by the largest corporations in the world is normal, actually. hope they give us good licensing terms on our data, uhhh no wait their IP, once they’re done killing and buying all the original data sources
I am wasting my time of course, Cory is and always has been a stack of Wired magazines with a flesh-colored mic strapped to it. every talk Cory does is a ted talk.