letās not
Iām @froztbyte more or less everywhere that matters
letās not
that list undercounts far more than I expected it to
when digging around I happened to find this thread which has some benchmarks for a diff model
itās apples to square fenceposts, of course, since one llm is not another. but it gives something to presume from. if g4dn.2xl gave them 214 tok/s, and if we make the extremely generous presumption that tok==word (which, well, no; cf. strawberry
), then any Use Deserving Of o3 (letās say 5~15k words) would mean you need a tok-rate of 1000~3000 tok/s for a āreasonableā response latency (ā5-ish secondsā)
so youād need something like 5x g4dn.2xl just to shit out 5000 words with dolphin-llama3 in āquickā time. which, again, isnāt even whatever the fuck people are doing with openaiās garbage.
utter, complete, comprehensive clownery. era-redefining clownery.
but some dumb motherfucker in a bar will keep telling me itās the future. and I get to not boop 'em on the nose. le sigh.
following on from this comment, it is possible to get it turned off for a Workspace Suite Account
?
button from admin view)Workspace Support
(otherwise youāll get some made-up bullshit from a person trying to buy time or Case Success or whatever, simply because they donāt have the privileges to do what youāre asking)hopefully you spend less time on this than the 40-something minutes I had to (a lot of which was spent watching some poor support bastard start-stop typing for minutes at a time because they didnāt know how to respond to my request)
also, my inbox earlier:
24661 N + Jan 21 Apple Developer ( 42K) Explore the possibilities of Apple Intelligence.
so, for an extremely unscientific demonstration, here (warning: AWS may try hard to get you to engage with Explainer[0]) is an instance of an aws pricing estimate for big handwave āsome gpu computeā
and when I say āextremely unscientificā, I mean āI largely pulled the numbers out of my assā. even so, theyāre not entirely baseless, nor just picking absolute maxvals and laughing
parameters assumptions made:
(and before we get any fucking ruleslawyering dumb motherfuckers rolling in here about accuracy or whatever: get fucked kthx. this is just a very loosely demonstrative example)
so youād have a variable buffer of 50ā¦150 instances, featuring 3.2ā¦9.6TiB of RAM for working set size, 800ā¦2400 vCPU, 50ā¦150 nvidia t4 cores, and 800ā¦2400GiB gpu vram
letās presume a perfectly spherical ops team of uniform capability[3] and imagine that we have some lovely and capable active instance prewarming and correct host caching and whatnot. yāknow, things to reduce user latency. letās pretend weāre fully dynamic[4]
so, by the numbers, then
1y times 4h daily gives us 1460h (in seconds, thatās 5256000). this extremely inaccurate full-of-presumptions number gives us āservice-capable life timeā. the times your concierge is at the desk, the times you can get pizza delivered.
x3 to get to lifetime matching our spot commit, x50ā¦x150 to get to ātotal possible instance hoursā. which is the top end of our sunshine and rainbows pretend compute budget. which, of course, we still have exactly no idea how to spend. because we donāt know the real cost of servicing a query!
but letās work backwards from some made-up shit, using numbers The Poor Public gets (vs numbers Free Microsoft Credits will imbue unto you), and see where we end up!
so that means our baseline:
=200k/y
per ops/whatever person we have3y of 4h-daily at 50 instances = 788400000 seconds. at 150 instances, 2365200000 seconds.
so we can say that, for our deeply Whiffs Ever So Slightly values, a secondās compute on the low instance-count end is $0.01722755 and $0.00574252 at the higher instance-count end! which gives us a bit of a handle!
this, of course, entirely ignores parallelism, n-instance job/load/whatever distribution, database lookups, network traffic, allllllll kinds of shit. which we canāt really have good information on without some insider infrastructure leaks anyway. if we pretend to look at the compute alone.
so what does $1000/query mean, in the sense of our very ridiculous and fantastical numbers? since the units are now The Same, we can simply divide things!
at the 50 instance mark, weād need to hypothetically spend 174139.68 instance-seconds. thatās 2.0154 days of linear compute!
at the 150 instance mark, 522419.05 instance-seconds! 6.070 days of linear compute!
so! what have we learned? well, weāve learned that we couldnāt deliver responses to prompts in Reasonable Time at these hardware presumptions! which, again, are linear presumptions. and thereās gonna be a fair chunk of parallelism and other parts involved here. but even so, turns out itād be a bit of a sizable chunk of compute allocated. to even a single prompt response.
[0] - a product/service whose very existence I find hilarious; the entire suite of aws products is designed to extract as much money from every possible function whatsoever, leading to complexity, which they then respond to byā¦ producing a chatbot to āguide usersā
[1] - yes yes I know, the world is not uniform and the fucking promptfans come from everywhere. Iām presuming amerocentric design thinking (which imo is probably not wrong)
[2] - letās pretend that the calculatorsā presumption of 4h persistent peak load and our presumption of short-duration load approaching 4h cumulative are the same
[3] - oh, who am I kidding, you know itās gonna be some dumb motherfuckers with ansible and k8s and terraform and chucklefuckery
This must be mentioned in the acknowledgements
wat
in that spirit: Loserus Inamericus
(I donāt know if that scans, I have no latin skills and I donāt feel like breaking out information to check)
thereās probably a fair couple more. tracing anything de beers or a good couple of other industries will probably indicate a couple more
(my hypothesis is: the kinds of people that flourished under apartheid, the effect that had on local-developed industry, and then the āwider worldā of opportunities prey they got to sink their teeth into after apartheid went away; doubly so because staying ZA-only is extremely limiting for ghouls of their sort - itās a fixed-size pool, and the still-standing apartheid-vintage capital controls are Limiting for the kinds of bullshit they want to pull)
just to be clear: not providing excuses for felon. just think itās unlikely that that was the avenue
opinion: the AWB is too afrikaans for it to be likely that that is where he picked up his nazi shit. then-era ZA still had a lot of AF/EN animosity, and in a couple of biographies of the loon you hear things like āhe hated life in ZA as a kid because ā¦ {bullying}ā, and a non-zero amount of that may have stemmed from AF bullying EN
(icbw, itās definitely not something Iāve studied the history of the loonās tendencies, but can speak to (at least part[0]) of the ZA attitude)
(([0] - I wasnāt alive at the time it wouldāve mattered to him, but other bits of the cultural attitudes lasted well into my youth))
I recall seeing something of this sort happening on goog for about 12~18mo - every so often a researcher post does the rounds where someone finds Yet Another way goog is fucking it up
the advertising dept has completely captured all mindshare and it is (demonstrably) the only part that goog-the-business cares about
whooshing noises
sidenote: TGP ftw and janet fucking rocks
itās all activitypub, so yep you can cross-follow and cross-post
(thereās definitely jank, too, but it works)
nfi how phone managed that typo, but now Iām leaving it in
not to be forgotten: the long, long years of windows malware pulling exactly this shit
the bit about it that I find subtly glorious (in how remarkably fuckwitted it is) is the baseline idea of āintellectual horsepowerā
Iām not surprised that this is a view they (of the company thatās effectively going ājust 12 more DCs bro itāll be enough compute bro I promise bro just watchā) hold and consider in such a simple mechanism-rating scale
but it is funny as fuck
reasonable point, but still leaves the ethical parts of it (e.g. it gives more ammo to the kinds of creeps who make non-consensual imagery)