Different Approach to Language in OLab

Authors:

David Topps, Corey Wirun, Mahdi Husseini, Michelle Cullen

Not ChatGPT

How is the approach we are taking with conversational agents in OLab different from ChatGPT?

Over the past decade, we have been exploring a variety of different approaches for incorporating natural language understanding into OLab.(1–3) Indeed, there is a long history in virtual patients of trying to introduce natural language. Our stance is that, while this is apparently engaging (and cute) at first sight, there are generally only a few areas in any given scenario where constructed responses are important. (https://olab.ca/constructed-responses-in-olab/ )

Our work with TTalk since 2013 has shown just what can be done with a simple chat-based interface, linked to the powerful virtual scenario engine in OLab. This has been shown to be cost-effective, scalable, extensible and with high learning impacts. But it does depend on a human element to a degree, which is both a strength and a limitation.

More recently in our DFlow-related work, we have been incorporating more intelligent conversational agents in a manner that is limited in both scope and risk. And given the most recent developments with Microsoft’s AI Bing and ChatGPT, we are glad we have been cautious.(4) It would have been disastrous to unleash an unfettered ChatGPT in certain high risk scenarios.

Part of what has made OLab and TTalk so effective in the past ten years is our success in creating scenarios that present a safe space, or more accurately a brave space (somewhere you can be brave enough to try new things), that shields learners from toxic risks and outcomes.

The Eloquent Ignoramus

We have all encountered students and colleagues who are full of confidence, wonderfully eloquent on any topic, but who bluster their way through a discussion with very little grasp of the problems and context at hand. (As one colleague remarked recently: a politician in a box.)

This has been our experience with ChatGPT when unleashed as a simple search prompt (like in AI Bing). (see ChatGPT Transcript Examples) In the past month, there has been a litany of amusing examples of what can go wrong when ChatGPT is let loose with no supervision.

FACETS and Factored Cognition

Our work over the past four years with cognitive computing platforms, such as IBM Watson, have shown us that these AI-related platforms can be very useful in supplying previously unavailable services. Rather than trying to adopt a one-size-fits-all approach, we have instead used these services as add-ons and helpers for OLab.

Our current approach, Factored Agents for Cognitive Educational Tasks (FACETs), instead treats these services like widgets that can be plugged into a virtual scenario and called when needed. We are working on FACETS to assist in a number of cognitive tasks but the chief focus of this article is natural language understanding.

This Factored Cognition approach is also being explored by other groups who are working with Large Language Models to assist in natural language understanding.(5) A recent development in academic search tools, Elicit, is an excellent example of how a factored cognition approach can greatly accelerate the literature search process.

As with many of these tools, a first stab at Elicit is sometimes underwhelming: while it can produce one-line summaries of a paper, these summaries should be treated with caution. However, when used properly, Elicit can produce effective results in 15 minutes that would normally take 15 hours with traditional literature search tools.

For those who work in this area, it is instructive to delve into how Elicit uses various AI-based language models: it is a very layered approach, making best use of the different levels and scope of these tools, to assemble a collection of information that is useful and usable.(6) Their research group is very open about how they engage these different layered functions and is open to collaboration on maximizing the utility of these services.

A Driving Analogy

It is useful at this point to employ an analogy from car driving to illustrate how these different forms of assistance can make us safer on the road. Few cars are left on the road that do not have some form of ABS built into them. Increasingly, we are seeing in high performance models a very high degree of sophistication in smart traction control and intelligent brake-assist that take into consideration multiple factors such as road grip (at each wheel), steering angle, throttle, engine RPM, brake pedal pressure, temperature and roll angle. On our SUV, these factors are computed 50,000 times per second – that is a little bit better than pumping the brakes! And it can save your bacon if you are surprised by a moose on Highway 40.

As a comparison, take the vaunted claims made by Tesla for its Full Self-Driving (FSD) option. While this is a step forwards, it is clearly unsafe to imply that it can do everything (something that Musk and colleagues are likely to find expensive in the lawsuits currently being pursued in California(7) ). In reality, while it can provide some clever assistance in certain situations, it is crucial not to let it drive unattended.

This is the same situation at present with the use of ChatGPT. It has great promise… but not as great as the promises being made. Microsoft is currently rather embarrassed.(8)

Layered approach with PODDS

We hope that this helps to illustrate why we are taking such a layered approach with PODDS. At the central core, or kernel, of PODDS, we are using our tried and trusted TTalk approach which has proven so effective over the past decade.(9)

Around this kernel, we are layering on various other FACETS, the factored cognition layers that can be employed in a context that is appropriate to the risk and complexity of the task at hand. We almost consider these to be Seed PODDS: they put forwards the germ of an idea to learners and introduce some basic concepts, so that their time with the facilitator in TTalk is made far more useful.

These are lightweight conversational agents with limited scope based on DFlow and similar technologies, with a smooth handoff to TTalk as the conversation gets into more nuanced areas where phrasing and context assume much greater importance.

This ability to layer up and down, using the appropriate complexity of tool where needed, is far more flexible and extensible than expecting the Large Language Model to do everything. FACETS developed for one topic area can be more easily repurposed for other topic areas. Lego blocks in cognitive computing.

References

Cullen M, Sharma N, Topps D. Turk Talk: human-machine hybrid virtual scenarios for professional education. MedEdPublish [Internet]. 2018 [cited 2018 Nov 22];7(4). Available from: https://www.mededpublish.org/manuscripts/2062
Topps D, Cullen M. Turk Talk hybrid natural language processing (NLP) virtual scenario example [Internet]. OpenLabyrinth, editor. Harvard Dataverse; 2019. Available from: https://doi.org/10.7910/DVN/66HUCD
Cullen M, Topps D, Consortium OlD. Turk Talk for nursing: case series [Internet]. Calgary: Scholars Portal Dataverse; 2020. Available from: https://doi.org/10.5683/SP2/S25FUY
The creepiness of conversational AI has been put on full display [Internet]. Big Think. 2023 [cited 2023 Feb 25]. Available from: https://bigthink.com/the-present/danger-conversational-ai/
Stuhlmueller A, Jungwon B. Factored Cognition | Ought [Internet]. [cited 2023 Feb 18]. Available from: https://ought.org/research/factored-cognition
Stuhlmueller A, Jungwon B. Elicit: Language Models as Research Assistants [Internet]. [cited 2023 Feb 18]. Available from: https://www.lesswrong.com/posts/s5jrfbsGLyEexh4GT/elicit-language-models-as-research-assistants
Eisenstein PA. Class-Action Suit Accuses Tesla Of ‘Deceptive’ And ‘Misleading’ Claims About Autopilot, Full Autonomy [Internet]. Forbes Wheels. 2022 [cited 2023 Feb 25]. Available from: https://www.forbes.com/wheels/news/class-action-suit-tesla/
Kim T. Microsoft’s Bing AI Chatbot Fails to Live Up to Its Own Hype [Internet]. [cited 2023 Feb 25]. Available from: https://www.barrons.com/articles/microsoft-bing-ai-chatbot-fails-live-up-hype-dd95caa6
Cullen M, Topps D. OSCE vs TTalk cost analysis [Internet]. Calgary: OHMES, University of Calgary; 2019 [cited 2019 May 7]. Available from: https://dataverse.scholarsportal.info/dataset.xhtml?persistentId=doi%3A10.5683%2FSP2%2FRJXRWC