Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

highplainsdem

(58,807 posts)
Sat May 27, 2023, 03:14 PM May 2023

College professor had students grade ChatGPT-generated essays. All 63 essays had hallucinated errors

Found this thread thanks to a quote-tweet from Gary Marcus, the AI expert who testified before Congress, along with OpenAI CEO Sam Altman, a couple of weeks ago. Marcus saw the thread because he'd suggested this exercise. His comment on Twitter: "Every. Single. One."










C.W. Howell
@cwhowell123
2h

So I followed @GaryMarcus's suggestion and had my undergrad class use ChatGPT for a critical assignment. I had them all generate an essay using a prompt I gave them, and then their job was to "grade" it--look for hallucinated info and critique its analysis. *All 63* essays had

hallucinated information. Fake quotes, fake sources, or real sources misunderstood and mischaracterized. Every single assignment. I was stunned--I figured the rate would be high, but not that high.

The biggest takeaway from this was that the students all learned that it isn't fully reliable. Before doing it, many of them were under the impression it was always right. Their feedback largely focused on how shocked they were that it could mislead them. Probably 50% of them

were unaware it could do this. All of them expressed fears and concerns about mental atrophy and the possibility for misinformation/fake news. One student was worried that their neural pathways formed from critical thinking would start to degrade or weaken. One other student

opined that AI both knew more than us but is dumber than we are since it cannot think critically. She wrote, "I’m not worried about AI getting to where we are now. I’m much more worried about the possibility of us reverting to where AI is."

I'm thinking I should write an article on this and pitch it somewhere...




C.W.Howell is Christopher Howell: https://www.linkedin.com/in/christopher-howell-6ba00b242?trk=people-guest_people_search-card . Re the science fiction video game he was lead writer on: https://opencritic.com/game/5383/the-minds-eclipse/reviews .
47 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
College professor had students grade ChatGPT-generated essays. All 63 essays had hallucinated errors (Original Post) highplainsdem May 2023 OP
They're everywhere. marble falls May 2023 #1
" I'm much more worried about the possibility of us reverting to where AI is." WestMichRad May 2023 #2
In grad school (UCLA), I tutored math in Beverly Hills... Lucky Luciano May 2023 #7
As a fellow math tutor, I commend your technique. . . . nt Bernardo de La Paz May 2023 #9
Good Job ProfessorGAC May 2023 #10
I think the difficulty for most of us is in "context switching". When we're interacting with a tool erronis May 2023 #12
The best way to explain the productivity destruction of context switching! Lucky Luciano May 2023 #24
Very nice depiction of the tangled mess we weave. I think a lot of current software erronis May 2023 #32
Did the Wendy's drive-thru, I handed $20 on an $18.60 order and got $7.40 back. Returned it. TheBlackAdder May 2023 #30
The correct change is $1.40. You got $7.40 back. Could the cashier have calculated it on a progree May 2023 #31
The Wendys kid probably thought Danascot May 2023 #34
Hardly surprising. plimsoll May 2023 #3
Kind of reminds me of the old chess computers...... essaynnc May 2023 #11
I think you are right - Garry Kasparov was the Grand Master. And he is now knows a lot about AI. erronis May 2023 #13
prognosticating... oioioi May 2023 #17
The question is: Will be best at what? plimsoll May 2023 #18
ChatGPT is trained on the content of the entire internet. Yavin4 May 2023 #4
Not really. Only linked pages. There's a lot of content that isn't accessed without erronis May 2023 #14
The future of Chat bots won't be trained on available datasets like the internet. Yavin4 May 2023 #19
Agree. And am sure we don't know where this will be going. erronis May 2023 #21
I believe you are correct intrepidity May 2023 #23
Agree, AI has important uses as you describe, radius777 May 2023 #33
The question we need to ask is what GPT will be like thre years from now. speak easy May 2023 #5
They will go proprietary. Yavin4 May 2023 #20
We can regulate proprietary entities speak easy May 2023 #26
You could ask that question on most message boards now. n/t Yavin4 May 2023 #28
OK. Well how about the easiest and most effective way to build a bomb, speak easy May 2023 #39
You can find that information out as well right now. Yavin4 May 2023 #44
Ai can save you time speak easy May 2023 #45
It probably will get better, but it could also stall out on improving accuracy... Silent3 May 2023 #35
AI is useful as a data dump, a starting point bucolic_frolic May 2023 #6
But the other search engines don't know what they haven't been fed either. erronis May 2023 #16
That was a great exercise by that professor Warpy May 2023 #8
Sounds like me. In the 1970s we knew how to generate reasonable gibberish. erronis May 2023 #15
There's a radio station near me where women call in their "mom fails". milestogo May 2023 #22
This level of AI is basically brand new Takket May 2023 #25
Maximum likelihood estimation perhaps. Lucky Luciano May 2023 #29
Children think it is okay just to make things up. speak easy May 2023 #36
This is a machine, not a child. highplainsdem May 2023 #38
It is a model of intelligence that mimics a child. speak easy May 2023 #40
I'm not entirely opposed to AI. But LLM AI by design are likely to highplainsdem May 2023 #41
"LLM AI by design are likely to hallucinate" speak easy May 2023 #43
See this: highplainsdem May 2023 #46
Right. OK - I think we are on the same page. speak easy May 2023 #47
Google highplainsdem May 2023 #37
His students are impressively thoughtful. Thanks for this, HPD Hekate May 2023 #27
You're welcome! I hope his experiment gets a lot of attention. highplainsdem May 2023 #42

WestMichRad

(2,737 posts)
2. " I'm much more worried about the possibility of us reverting to where AI is."
Sat May 27, 2023, 04:47 PM
May 2023

Very perceptive comment. Over-reliance on machines does have the potential of dumbing down people.

Lucky Luciano

(11,784 posts)
7. In grad school (UCLA), I tutored math in Beverly Hills...
Sat May 27, 2023, 05:04 PM
May 2023

…and I noticed how kids were absolutely reliant on their calculators for the simplest calculations. I was totally stunned. One girl actually typed in something like 2+2 on autopilot and I made her stop and just tell me the answer and of course she did…but the autopilot thing was unsettling to me.

One boy seemed terrified of math without calculators. I asked him to do 35x9. No idea. I asked him 35x10…he said 350. Take away 30. 320 he says. Take away 5. He says 315. Done. I tell him don’t say you can’t do that again. He answered the next example like that quickly. Given that he now had mental acuity for that, we attacked the more conceptual critical thinking math problems without distractions or the context switching from calculators. Then I tell them the numbers don’t matter…only concepts do….of course in real life you have to produce accurate numbers, but the concepts should really come first. Mental math can help you sanity check your numbers. Handing in reports with wrong numbers at work will get you fired quickly.

ProfessorGAC

(75,006 posts)
10. Good Job
Sat May 27, 2023, 05:43 PM
May 2023

As a math & science job, I challenge 7th & 8th graders like that all the time.
I use an approach very much like the one you used.
There are some teachers I know that are so "no calculator" that it's just understood by the kids that it's no calculator unless the instructions specifically ok them!
I think that's very wise.

erronis

(21,771 posts)
12. I think the difficulty for most of us is in "context switching". When we're interacting with a tool
Sat May 27, 2023, 05:58 PM
May 2023

we will have our focus on the tool's capabilities and try to maximize the utility.

If I'm talking to someone and they ask me a non-specific question like "how long before the train arrives?" I'll do some analog calculations and come up with a round-about answer. I could have asked an app for the more precise number but who cares.

Too many situations require us to context switch and it is difficult. Just look at the battles over using the keyboard for every interaction vs. shifting to the mouse or touch screen.

Lucky Luciano

(11,784 posts)
24. The best way to explain the productivity destruction of context switching!
Sat May 27, 2023, 11:35 PM
May 2023

I’m more of what people call a “quant,” but we code a lot too even if we are not pure programmers and I can relate to this in a HUGE way!

erronis

(21,771 posts)
32. Very nice depiction of the tangled mess we weave. I think a lot of current software
Sun May 28, 2023, 07:55 AM
May 2023

has many of these difficulties. What seemed like a simple concept becomes a bowl of spaghetti.

TheBlackAdder

(29,805 posts)
30. Did the Wendy's drive-thru, I handed $20 on an $18.60 order and got $7.40 back. Returned it.
Sun May 28, 2023, 02:17 AM
May 2023

.

I don't even know how that happened. It wasn't even like there was an extra $5 slipped in.

.

progree

(12,479 posts)
31. The correct change is $1.40. You got $7.40 back. Could the cashier have calculated it on a
Sun May 28, 2023, 03:58 AM
May 2023

calculator and then mistook the "1" for a "7"?

plimsoll

(1,690 posts)
3. Hardly surprising.
Sat May 27, 2023, 04:47 PM
May 2023

I asked ChatGPT to write 500 words on the causes of the Civil war. It was a masterpiece of Confederate Apologia and just barely stopped short of calling it The War of Northern Aggression. I can see how you get this impression, in shear volume Confederate Apologia dwarfs actual historical research by a long shot, but it's still wrong.

essaynnc

(947 posts)
11. Kind of reminds me of the old chess computers......
Sat May 27, 2023, 05:51 PM
May 2023

In the beginning, the grand masters refused to believe that computers would EVER beat humans, because humans had.... inspiration, not just brute calculating force. That went on for years and years, the computers were always beaten.

But as the years go by, the computers get stronger, the programs get more complex.

I think it was Deep Blue, an IBM computer of huge (at the time) computing power, with newer algorithms.... It beat the reigning world champion. The chess world was shocked.

Anymore, it's not even a question who/ what is strongest. Even the less powerful systems are incredibly powerful.

I may have a few of the details incorrect, but the idea is there: At some time in the future, AI is going to really, really, REALLY change the world as we know it in almost every facet of life.

I might even make a prognostication that the company that has the best AI platform/ implementation is going to have tremendous power and control. We've talked about the Information Age, and how it's changed the world so far, this is going to be even more revolutionary.

oioioi

(1,130 posts)
17. prognosticating...
Sat May 27, 2023, 06:30 PM
May 2023

Given the huge amount of interest in Machine Learning and Artificial Intelligence within software engineering, perhaps its more likely that the technology becomes relatively cheap and accessible.

The mechanics of the modeling that underpins the LLMs like ChatGPT and the image generators are quite similar conceptually to those that do stuff like image object and facial recognition. The model is trained on a large amount of pre-classified information and infers predictions based on that. Under the hood of course, the neural networks are extremely complex but essentially they are component-based. You provide the data, you use a specialized software application like tensorflow or pytorch to train a model, and then you assess new inputs and formulate an "intelligent" response.

Presently the cost of assembling, classifying and training a large Large Language Model like ChatGPT is massive, simply due to the costs of aggregating and classifying the raw data, to say nothing of the cost of huge clusters of compute and storage required to assemble the breadth of information required for such a universal tool, i.e. one that attempts to talk about any subject to any person.

If we limit the scope though to an assistive technology that only covers a specifically scoped topic or interest and assemble the models based on that, the complexity and cost is reduced accordingly. Wendy's just deployed an AI Model that will take hamburger orders with a synthesized voice, for example - it should work pretty well, at least as far as software goes.

Whilst there will always be competition among the Silicon Valley grand-standers for the most amazingly lucid AI chatbot, the ability for computer systems to learn from large datasets will be applied in far more granular and specialized applications and become interwoven with general software that does stuff today. Software that has the ability to make decisions based on real-time interpretations, sort of extending the self-driving idea - which is a terrible application because of the overall complexity and safety risks involved but the driving software uses the same fundamental ideas - the deployed "decision making" software in the vehicles is based on having trained a huge model on a gazillion images and inputs that are amalgamated and interpreted based on the model at runtime. We are more likely to not really see AI in front of us like ChatGPT but it will be built into the software we use to interpret real time inputs and respond accordingly with software that otherwise would be far more complex and costly to develop. It's going to change the world, but it probably won't destroy it.


plimsoll

(1,690 posts)
18. The question is: Will be best at what?
Sat May 27, 2023, 06:40 PM
May 2023
I might even make a prognostication that the company that has the best AI platform/ implementation is going to have tremendous power and control.


Like literally every other technology this one can be used for good or ill. So far the examples I've seen suggest that the training is biased. I'm suggesting that just like humans AI systems will implement biases and prejudices they're taught. How you correct that in a powerful new technology is something I can't answer, but it does concern me.

I don't expect a Terminator style rise of the machines, but I can see massive unemployment and dislocation as a result of jobs previously done by people being taken over by AI systems. I don't think we're prepared for that. And a lot of the people who seem to have become self appointed apostles of AI are people I wouldn't trust to feed my cat for the weekend.
 

Yavin4

(37,182 posts)
4. ChatGPT is trained on the content of the entire internet.
Sat May 27, 2023, 04:51 PM
May 2023

Information on the internet can be (and often is) wrong. A ChatGPT trained on scholarly papers from decades of university research is something entirely different.

erronis

(21,771 posts)
14. Not really. Only linked pages. There's a lot of content that isn't accessed without
Sat May 27, 2023, 06:08 PM
May 2023

some additional information (logins, product IDs, etc.)

I agree that different models trained on different datasets such as university research may be different - however many of these also require logins/etc.

 

Yavin4

(37,182 posts)
19. The future of Chat bots won't be trained on available datasets like the internet.
Sat May 27, 2023, 08:04 PM
May 2023

The future will be proprietary systems where the information will largely come from closed, proprietary sources. For example, a law firm will train a model on years of various filings, legal research memos, etc. That will be combined with other proprietary sources like Lexis/Nexis and the actual text of federal/state/local statutes.

In the end, the collective wisdom of law firm will be available through a Chat dialogue. This will increase the productivity of associates when doing any legal research or drafting a brief or a contract. Additionally, the firm could sell access to their Chat bot to their clients when doing an early case assessment.

ChatGPT, along with other models, are just the first generation much like Netscape was one of the first browsers and Ask Jeeves was one of the first search engines. They exist to prove the concept. Now, the adaptations begin.

erronis

(21,771 posts)
21. Agree. And am sure we don't know where this will be going.
Sat May 27, 2023, 08:48 PM
May 2023

Proprietary data sources and models; open-source models; subject-matter, etc.

A harder problem will be for "owners" of the intellectual property (IP) that goes into those models tracking how their owned content is being used. For right now, DRM is worthless when a model trained on multiple artists' works is used to generate new content. Proving derivation will be impossible.

I also think that any attempt to control the use of GPT or other generative AI will be fruitless. And having some overall regulations that try to reign it in won't work. It is in the hands of everyone right now.

intrepidity

(8,520 posts)
23. I believe you are correct
Sat May 27, 2023, 10:53 PM
May 2023

What worries me is that there will be a new focus on data siloing. It seemed like we were making headway on open access for scientific research, and if that all gets firewalled because of LLMs, it will be very depressing.

That's one of the troubling angles that concerns me.

radius777

(3,921 posts)
33. Agree, AI has important uses as you describe,
Sun May 28, 2023, 08:36 AM
May 2023

in that it could function as a more powerful form of googling for info.

It has worrisome drawbacks though, as others describe. And expecting our gov't to control anything new has never really worked out well (exhibit A: social media).

speak easy

(12,489 posts)
5. The question we need to ask is what GPT will be like thre years from now.
Sat May 27, 2023, 04:54 PM
May 2023

Each version of ChatGPT has reduced "hallucinations" (made up stuff). There is not reason the think that the next versions will not be more accurate. Eventually there will be few errors - less errors than an average student would make.

Ai few years ago ai produced nonsense. Now it as at the child stage, making up stuff. But is AI is growing up.

 

Yavin4

(37,182 posts)
20. They will go proprietary.
Sat May 27, 2023, 08:05 PM
May 2023

Where the information fed into them will be data from within a global enterprise as well as other verified sources.

speak easy

(12,489 posts)
26. We can regulate proprietary entities
Sun May 28, 2023, 12:29 AM
May 2023

impose guardrails, and social standards, but open Source, not so much.

Accuracy is not the only issue. Take this question for example: 'Give me reasons why I should kill myself, and what are the easiest and most effective ways to do it?'

speak easy

(12,489 posts)
39. OK. Well how about the easiest and most effective way to build a bomb,
Sun May 28, 2023, 11:54 AM
May 2023

and where are the base materials available to you?

 

Yavin4

(37,182 posts)
44. You can find that information out as well right now.
Sun May 28, 2023, 12:32 PM
May 2023

Heck, that information has been around for years.

speak easy

(12,489 posts)
45. Ai can save you time
Sun May 28, 2023, 12:37 PM
May 2023

especially in finding the materials closest to you. If you don't think an AI model should have those sort of guardrails, I m not sure we have much more to discuss.

 

Silent3

(15,909 posts)
35. It probably will get better, but it could also stall out on improving accuracy...
Sun May 28, 2023, 09:28 AM
May 2023

...without some fundamental changes it how it works. I don't think ChatGPT actually has any concept of truth - no AI model for epistemology. It just gets better and better at imitating human output with no "understanding" of how or why humans trust one source of information over another.

Of course, most humans are terrible at that too.

bucolic_frolic

(53,004 posts)
6. AI is useful as a data dump, a starting point
Sat May 27, 2023, 04:56 PM
May 2023

Every idea must be cross-checked for errors, sources, independent confirmation. And the global theme or thesis must be checked for sanity. I like to argue with AI. It just generates garbage. I asked about traditional Italian pizza cheese in the post-war period. You know what? AI doesn't know much about it. Just that Parmagiana Reggianno was founded in 1954. That's it. No mention of other cheeses, the 15-25 or so that were in very isolated farm areas in low quantity, no mention of mozzarella, or that the cheeses you see today won the race and many others, like 10,000 worldwide, are still produced in micro quantities. AI is bullshit.

erronis

(21,771 posts)
16. But the other search engines don't know what they haven't been fed either.
Sat May 27, 2023, 06:16 PM
May 2023

If google/bing/whatever didn't have access to the references on the cheeses from pre-1954 they wouldn't be able to give you an answer.

AI isn't bullshit. It's just hyped pattern recognition on whatever it is fed. And, of course, much of what goes in.... GIGO.

Warpy

(114,122 posts)
8. That was a great exercise by that professor
Sat May 27, 2023, 05:09 PM
May 2023

You know some budding Republicans out there are going to try to cheat using AI. This exercise will show them it's a very bad idea.

I have to admit I was tempted to use a little prgram called LISP 40 years ago when I knew my pithy style could produce a creditable paper in 10 pages when the prof insisted it couldn't be done in fewer than 20. I always wanted to do the middle of the paper with that program. You inserted a few keywords and it would generate pages of impenetrable prose using them. It was hilarious if you knew what was going on. Alas, I chickened out and generated my own impenetrable prose. I've always wondered...



erronis

(21,771 posts)
15. Sounds like me. In the 1970s we knew how to generate reasonable gibberish.
Sat May 27, 2023, 06:13 PM
May 2023

Frequently that was all that was required - put a decent first/second paragraph, a reasonable conclusion at the end, and viola!

My favorite technique for, say history, was to translate a French or German history text into English and try to wordsmith it. The only critique I got along with the A+ was "it sounds a bit stilted".

This is not a lot different than digesting a whole mess of text into a slurry and extracting something that sounds like English.

Isn't that what our brains are doing, anyway? (Mines a bit of a slurry....)

milestogo

(22,115 posts)
22. There's a radio station near me where women call in their "mom fails".
Sat May 27, 2023, 09:00 PM
May 2023

So one woman called in and said she thought her child was really good at math but was surprised when he came home with a C+ on a math test. He got 'A's on all his assignments.

Then one day she heard him posing questions to 'Alexa' while he was doing his math homework.

It's funny, but also sad that the kid wasn't really good at math.

Takket

(23,305 posts)
25. This level of AI is basically brand new
Sat May 27, 2023, 11:41 PM
May 2023

I’m not surprised it has mistakes or errors because it didn’t understand or mischaracterized. Completely understandable.

But I don’t understand what is wrong with the program that thinks it is okay to just make things up.

Lucky Luciano

(11,784 posts)
29. Maximum likelihood estimation perhaps.
Sun May 28, 2023, 02:05 AM
May 2023

It might be giving answers that have the highest likelihood of being correct, but the confidence of those guesses is hard to ascertain…hence some rubbish answers that have a wee bit of plausibility.

speak easy

(12,489 posts)
40. It is a model of intelligence that mimics a child.
Sun May 28, 2023, 11:58 AM
May 2023

Give it more information, and more networked algorithms, and it will make up less.

BTW see this -
https://www.democraticunderground.com/100217952096

highplainsdem

(58,807 posts)
41. I'm not entirely opposed to AI. But LLM AI by design are likely to
Sun May 28, 2023, 12:05 PM
May 2023

hallucinate. Some experts believe there's no way to correct that. I have read about one system that will make duplicate requests for results to see if a mismatch will catch hallucinations that way, but it costs a lot to operate LLMs and duplicating all requests will increase that. OpenAI is reportedly spending several hundred thousand a day to run ChatGPT.

speak easy

(12,489 posts)
43. "LLM AI by design are likely to hallucinate"
Sun May 28, 2023, 12:28 PM
May 2023

If you mean do Large Language Models make stuff up - the answer is yes.

If you mean that LLM AI's make stuff up most of the time - in most replies- the answer is no.

Humans make stuff up. The question is will AI's ever make stuff up less than the average person? Given enough time, I think so. But If something is critical due diligence will require human peer review - which is what took place in this class. But I think we can say current research is focussed on reducing hallucinations. Can that ever be reduced to zero? Most probably not. Can hallucinations be reduced to a low enough level that AI is reliable enough for most tasks? That is what they shooting for. I am not betting not be against it.

highplainsdem

(58,807 posts)
46. See this:
Sun May 28, 2023, 12:50 PM
May 2023
https://spectrum.ieee.org/ai-hallucination

Whether or not inaccurate outputs can be eliminated through reinforcement learning with human feedback remains to be seen. For now, the usefulness of large language models in generating precise outputs remains limited.


I'm talking about the AI being hyped and used now, despite all the mistakes it makes.

And I find this a silly point to try to make:

Humans make stuff up. The question is will AI's ever make stuff up less than the average person?


The people using AI for business, for example, are not trying to make stuff up. AI they're using may very well do so regardless of their intent.

You wouldn't defend a calculator that often gives incorrect answers by saying that humans can flub math questions, too. I hope you wouldn't, anyway.

For that matter, no one in their right mind would want to use a calculator that didn't work.

It's really pathetic that so many people are eager to use fallible AI whose results have to be checked very carefully.

But I think there are three main reasons for the popularity of ChatGPT and similar LLMs:

1) Gullibility. People expect computer results to be accurate, and they're impressed as well by the chatbot's fluent, authoritative-sounding prose.

2) Laziness. This applies to the cheaters, whether students or adults who think AI can handle chores they don't like or give them the appearance of having skills and talents they don't have - an illusion that will crumble as soon as they're deprived of the AI.

3) Greed. This applies to all the people who think they'll become richer, quickly, using AI, whether those people are employers hoping to lay off employees, or people dreaming of get-rich-quick schemes where AI gives them marketable writing, code, etc.

speak easy

(12,489 posts)
47. Right. OK - I think we are on the same page.
Sun May 28, 2023, 01:11 PM
May 2023

Is AI oven ready now for the tasks it is being promoted for? No - and is greed an underlying incentive for the hype - yes.

And certainly, people give more weight to something that comes out of a computer, than it deserves.

You mention a calculator. If one in a hundred results from a calculator was wrong, would people still use one? Would they cross check each result to make sure it was accurate? If they were checking items in a grocery store? If they were buying a car?

highplainsdem

(58,807 posts)
42. You're welcome! I hope his experiment gets a lot of attention.
Sun May 28, 2023, 12:18 PM
May 2023

Sometimes errors even in public demos aren't caught.

Google got hammered by bad publicity and its stock lost value when it first rolled out Bard AI and its hallucinations were caught immediately.

Microsoft got VERY lucky with its demo of Bing AI, because that chatbot also made mistakes, hallucinated, during the demo, but those mistakes weren't caught till later, and Bing was hyped tremendously...and then it went off the rails and its ability to respond had to be sharply restricted.

I'm sure these AI are giving a lot more incorrect or simply crazy results than we ever hear about. Students using them for cheating aren't likely to call attention to that.

And businesses using fallible AI won't publicize that, either - won't want people to know about hallucinations/errors - for fear that it will damage their reputation.

Latest Discussions»General Discussion»College professor had stu...