Why is no one talking about how unproductive it is to have verify every "hallucination" ChatGPT gives you?

phoneymouse@lemmy.world · 8 hours ago

Why is no one talking about how unproductive it is to have verify every "hallucination" ChatGPT gives you?

sp3ctr4l@lemmy.zip · edit-2 59 minutes ago

I just tried out Gemini.

I asked it several questions in the form of ‘are there any things of category x which also are in category y?’ type questions.

It would often confidently reply ‘No, here’s a summary of things that meet all your conditions to fall into category x, but sadly none also fall into category y’.

Then I would reply, ‘wait, you don’t know about thing gamma, which does fall into both x and y?’

To which it would reply ‘Wow, you’re right! It turns out gamma does fall into x and y’ and then give a bit of a description of how/why that is the case.

After that, I would say ‘… so you… lied to me. ok. well anyway, please further describe thing gamma that you previously said you did not know about, but now say that you do know about.’

And that is where it gets … fun?

It always starts with an apology template.

Then, if its some kind of topic that has almost certainly been manually dissuaded from talking about, it then lies again and says ‘actually, I do not know about thing gamma, even though I just told you I did’.

If it is not a topic that it has been manually dissuaded from talking about, it does the apology template and then also further summarizes thing gamma.

…

I asked it ‘do you write code?’ and it gave a moderately lengthy explanation of how it is comprised of code, but does not write its own code.

Cool, not really what I asked. Then command ‘write an implementation of bogo sort in python 3.’

… and then it does that.

…

Awesome. Hooray. Billions and billions of dollars for a shitty way to reform web search results into a coversational form, which is very often confidently wrong and misleading.

taladar@sh.itjust.works · 5 minutes ago

And then more money spent on adding that additional garbage filter to the beginning and the end of the process which certainly won’t improve the results.

Nurse_Robot@lemmy.world · 31 minutes ago

sigh people do talk about this, they complain about it non-stop. These same people probably aren’t using it as intended, or are deliberately trying to farm a “gotcha” response. AI is a very neat tool which can do a lot of things well, but it’s important to recognize its limitations. I don’t use it for things I don’t understand because I won’t recognize if it’s spitting out nonsense, but for topics I do understand it’s hard to overstate how efficient and time saving it is.

Rekorse@sh.itjust.works · 22 minutes ago

Efficiency depends on the cost doesnt it?

TrickDacy@lemmy.world · 3 hours ago

Probably because they’re not checking them

Sam_Bass@lemmy.world · 2 hours ago

They’re trying not to lose money on the developments

1stTime4MeInMCU@mander.xyz · 6 hours ago

I’m convinced people who can’t tell when a chat bot is hallucinating are also bad at telling whether something else they’re reading is true or not. What online are you reading that you’re not fact checking anyway? If you’re writing a report you don’t pull the first fact you find and call it good, you need to find a couple citations for it. If you’re writing code, you don’t just write the program and assume it’s correct, you test it. It’s just a tool and I think most people are coping because they’re bad at using it

BluesF@lemmy.world · 5 hours ago

Yeah. GPT models are in a good place for coding tbh, I use it every day to support my usual practice, it definitely speeds things up. It’s particularly good for things like identifying niche python packages & providing example use cases so I don’t have to learn shit loads of syntax that I’ll never use again.

Aceticon@lemmy.world · 2 hours ago

In other words, it’s the new version of copying code from Stack Overflow without going to the trouble of properly understanding what it does.

Rekorse@sh.itjust.works · 20 minutes ago

Pft you must have read that wrong, its clearly turning them into master programmer one query at a time.

hoshikarakitaridia@lemmy.world · 7 hours ago

Because in a lot of applications you can bypass hallucinations.

getting sources for something
as a jump off point for a topic
to get a second opinion
to help argue for r against your position on a topic
get information in a specific format

In all these applications you can bypass hallucinations because either it’s task is non-factual, or it’s verifiable while promoting, or because you will be able to verify in any of the superseding tasks.

Just because it makes shit up sometimes doesn’t mean it’s useless. Like an idiot friend, you can still ask it for opinions or something and it will definitely start you off somewhere helpful.

WalnutLum@lemmy.ml · 4 hours ago

All LLMs are text completion engines, no matter what fancy bells they tack on.

If your task is some kind of text completion or repetition of text provided in the prompt context LLMs perform wonderfully.

For everything else you are wading through territory you could probably do easier using other methods.

ms.lane@lemmy.world · 6 hours ago

Also just searching the web in general.

Google is useless for searching the web today.

fibojoly@sh.itjust.works · 1 hour ago

Not if you want that thing that everyone is on about. Don’t you want to be in with the crowd?! /s

ohwhatfollyisman@lemmy.world · 5 hours ago

so, basically, even a broken clock is right twice a day?

dev_null@lemmy.ml · 2 hours ago

Yes, but for some tasks mistakes don’t really matter, like “come up with names for my project that does X”. No wrong answers here really, so an LLM is useful.

Rekorse@sh.itjust.works · 17 minutes ago

How is that faster than just picking a random name? Noone picks software based on name.

onionsinmypores@sh.itjust.works · edit-2 2 hours ago

No, maybe more like, even a functional clock is wrong every 0.8 days.
https://superuser.com/questions/759730/how-much-clock-drift-is-considered-normal-for-a-non-networked-windows-7-pc

The frequency is probably way higher for most LLMs though lol

Snowclone@lemmy.world · edit-2 7 hours ago

I only use it for complex searches with results I can usually parse myself like ‘‘list 30 typical household items without descriptions or explainations with no repeating items’’ kind of thing.

ohwhatfollyisman@lemmy.world · 5 hours ago

great value for all that energy it expends, indeed!

Varyk@sh.itjust.works · 6 hours ago

it’s because everyone stopped using it, right?

at least months ago?

Eheran@lemmy.world · 6 hours ago

Remember when you had to have extremely niche knowledge of “banks” in a microcontroller to be able to use PWM on 2 pins with different frequencies?

Yes, I remember what a pile of shit it was to try and find out why xyz is not working while x and y and z work on their own. GPT usually gets me there after some tries. Not to mention how much faster most of the code is there, from A to Z, with only little to tweak to get it where I want (since I do not want to be hyper specific and/or it gets those details wrong anyway, as would a human without massive context).

snooggums@lemmy.world · 7 hours ago

Because most people are too lazy to bother with making sure the results are accurate when they sound plausible. They want to believe the hype, and lack critical thinking.

Chip_Rat@lemmy.world · 11 minutes ago

I don’t want to believe any hype! I just want to be able to ask “hey Chatgtp, I’m looking for a YouTube video by technology connections where he discusses dryer heat pumps.” And not have it spit out "it’s called “the neat ways your dryer heat pumps save energy!”

And it is not, that video doesn’t exist. And it’s even harder to disprove it on first glance because the LLM is mimicing what Alex would have called the video. So you look and look with your sisters very inefficient PS4 controller-to-youtube interface… And finally ask it again and it shy flowers you…

But I swear he talked about it ?!?! Anyone?!?

ms.lane@lemmy.world · 6 hours ago

This sound awfully familiar, like almost exactly what people were saying about Wikipedia 20 years ago…

julietOscarEcho@sh.itjust.works · 4 hours ago

Pretty weak analogy. Wikipedia was technologically trivial and did a really good job of avoiding vested interests. Also the hype is orders of magnitude different, noone ever claimed Wikipedia was going to lead to superhuman intelligences or to replacement of swathes of human creative/service workers.

Actually since you mention it, my hot take is that Wikipedia might have been a more significant step forward in AI than openAI/latest generation LLMs. The creation of that corpus is hugely valuable in training and benchmarking models of natural language. Also it actually disrupted an industry (conventional encyclopedias) in a way that I’m struggling to think of anything that LLMs has replaced in the same way thus far.

callcc@lemmy.world · 6 hours ago

It’s usually good for ecosystems with good and loads of docs. Whenever docs are scarce the results become shitty. To me it’s mostly a more targeted search engine without the crap (for now)

pixxelkick@lemmy.world · 7 hours ago

Gippity is pretty good at getting me 90% of the way there.

It usually sets me up with at least all the terms and etc I now know to google, whereas before I wouldnt even know what I am looking for in the first place.

Also not gonna lie, search engines are even worse than gippity for accuracy often.

And Ive had to fight with so many cases of garbage documentation lately that gippity genuinely does the job better, because it has all the random comments from issues and solutions in its data.

Usually once I have my sort of key terms I need to dig into, I can use youtube/google and get more specific information though, and thats the last 10%

damnthefilibuster@lemmy.world · 6 hours ago

What are you talking about? I don’t verify anything that ChatGPT gives me.

NoiseColor @lemmy.world · 6 hours ago

You have to understand it well enough to know what stuff you can rely on. On the other hand nowadays there are often sources there, so it’s easy to check.

RedditWanderer@lemmy.world · edit-2 7 hours ago

Big businesses know, they even ask people like me to add extra measures in place. I like to call it the concorde effect. Youre trying to make a plane that can shove air out of the way faster than it wants to move, and this takes an enormous amount of energy that isn’t worth the time save, or the cost. Even if you have higher airspeed when it works, if your plane doesn’t make it to destination it isn’t “faster”.

We hear a lot about the downsides of AI, except that doesn’t fit the big corpo narrative and people don’t care enough really. If youre just a consumer who has no idea how this really works, the investments companiess make into shoving it everywhere makes it seem like it’s not a problem and it looks like there’s only AI hype and no party poopers.