You mean at the tail end of a thread that opened with me pointing at the environmental costs?
Exactly! Hence my confusion. If you care about energy costs, then shouldn’t saving energy be a good thing? Why would the benefit be 0?
You mean at the tail end of a thread that opened with me pointing at the environmental costs?
Exactly! Hence my confusion. If you care about energy costs, then shouldn’t saving energy be a good thing? Why would the benefit be 0?
Weren’t you just telling me that the environmental cost has no impact on your stance?
Count yourself lucky. My front burner has become a secondary backburner and I’ve moved on to using a portable cooktop.
It sounds like you don’t like how LLMs are currently used, not their power consumption.
I agree that they’re a dead end. But I also don’t think they need much improvement over what we currently have. We just need to stop jamming them where they don’t belong and leave them be where they shine.
Yeah, they operate very opaquely, so we can’t know the true cost, but based on what I can know with certainty given models I can run on my own machines, the numbers seem reasonable. In any case, that’s not really relevant to this discussion. Treat it as a hypothetical, then work out the math later to figure out where we want to be and what threshold we should be setting.
Indeed. Though what we should be thinking about is not just the cost in absolute terms, but in relation to the benefit. GPT-4 is one of the more expensive models to run right now, and you can accomplish very good results with their smaller GPT-4o mini at 0.5% of the energy cost[1]. That’s the cost of running 0.07 LED bulbs over an hour, or running 1 LED bulb over 0.07 hours (i.e. 5min). If that saves you 5min of time writing an email while the room is lit with a single LED bulb and your computer is drawing energy, that might just be worth it, right?
[1] Estimated by using https://huggingface.co/spaces/genai-impact/ecologits-calculator and the pricing difference between GPT-4o, 4o mini, and 3.5 (https://openai.com/api/pricing/). The assumption I’m making is that the total hardware and energy cost scales linearly with the API pricing.
The energy usage is mainly on the training side with LLMs. Generating afterwards is fairly cheap. Maybe what you want is to have fewer companies trying to train their own models from scratch and encourage collaborating instead?
Is it the training process that you take issue with or the usage of the resulting model?
We’ve been doing this in RL research with Minecraft as well (see MineDojo). An excerpt from the GitHub page:
MineDojo […] provides open access to an internet-scale knowledge base of 730K YouTube videos, 7K Wiki pages, 340K Reddit posts.
Again, no one has run into legal issues with this yet either, but this also isn’t as ubiquitous compared to Atari, nor has it been around for as long.
Did you mean to respond to a different comment? I have no idea what happened in the VP debate.
The very first response I gave said you just have to reframe state.
And I said “am augmented state space would make it Markovian”. Is that not what you meant by reframing the state? If not, then apologies for the misunderstanding. I do my best, but I understand that falls short sometimes.
Reinforcement learning research has been using Atari games as standard benchmarks for over a decade now and no one has faced legal issues yet.
I’m not familiar with the term “beam” in the context of LLMs, so that’s not factored into my argument in any way. LLMs generate text based on the history of tokens generated thus far, not just the last token. That is by definition non-Markovian. You can argue that an augmented state space would make it Markovian, but you can say that about any stochastic process. Once you start doing that, both become mathematically equivalent. Thinking about this a bit more, I don’t think it really makes sense to talk about a process being Markovian or not without a wider context, so I’ll let this one go.
nitpick that makes communication worse
How many readers do you think know what “Markov” means? How many would know what “stochastic” or “random” means? I’m willing to bet that the former is a strict subset of the latter.
It’s in reference to your complaint about the imprecision of “stochastic process”. I’m not disagreeing that molecular diffusion is a stochastic process. I’m saying that if you want to use “Markov process” to describe a non-Markovian stochastic process, then you no longer have the precision you’re looking for and now molecular diffusion also falls under your new definition of Markov process.
That’s basically like saying that typical smartphones are square because it’s close enough to rectangle and rectangle is too vague of a term. The point of more specific terms is to narrow down the set of possibilities. If you use “square” to mean the set of rectangles, then you lose the ability to do that and now both words are equally vague.
Everyone’s weird in their own ways. It’s just that one of them is trying to convince people that weird is bad while simultaneously trying to court their votes.
Stochastic process
Or maybe had to simultaneously work multiple full time jobs and a weekend job to make ends meet?
Why settle for good enough when you have a term that is both actually correct and more widely understood?
Until chop off their legs. Then BMI spikes again.