Understanding AI and Large Language Models: Spiders Webs and LSD.

The following light-hearted script was for an evening talk at the London Stock Exchange for Enterprise Technology Meetup in June 2023. The speech is based on research with Dr Roser Pujadas of UCL and Dr Erika Valderamma of UMEA in Sweden.

—–

Last Tuesday the news went wild as industry and AI leaders warned that AI might pose an “existential threat” and that “Mitigating the risk of extinction from A.I. should be a global priority alongside other societal-scale risks, such as pandemics and nuclear war,”[1]. I want to address this important topic but I want to paint my own picture of what I think is wrong with some of the contemporary focus on AI, why we need to expand the frame of reference in this debate to think in terms of what I will term “Algorithmic Infrastructure”[2].

But before I do that I want to talk about spiderman. Who has seen the new spiderman animated movie? I have no idea why I went to see it since I don’t like superheroes or animated movies! We had childcare, didn’t want to eat so ended up at the movies and it beat Fast and Furious 26… Anyway I took two things from this – the first was that most of the visuals were like someone was animating on LSD, and second was that everything was connected in some spiders web of influence and connections. And that’s going what I am going to talk about – LSD and spider’s webs.

LSD … Lysergic acid diethylamide – commonly known to cause hallucinations in humans.

Alongside concerns such as putting huge numbers out of work, of spoofing identity, of affecting democracy through fake news is the concern that AI will hallucinate and so provide misinformation, and just tell plain falsehoods. But the AI like LLMs haven’t taken LSD – they are just identifying and weighing erroneous data supplied. The problem is that they learn – like a child learns – from their experience of the world. LLMs and reinforcement learning AI are a kind of modern-day Pinocchio being led astray by each experience within each element of language or photo they experience.

Pinocchio can probably pass the Turing Test that famously asks “can a machine pass off as a human” – though its always dubious whether all computer science professors could pass off as a human.

The problem with the turning test is that it accepts a fake human – it does not demand humanity or human level responses. In response Philosopher John Searle’s “Chinese Room Argument” from 1980 argues something different– Imagine yourself in a room alone following a computer programme for responding to Chinese characters slipped under the door. You know nothing of Chinese and yet by following the program for manipulating the symbols and numerals you sends appropriate strings of Chinese characters out under the door and this leads the outside to mistakenly assume you speak Chinese. Your only experience of Chinese are the symbols you receive – is that enough?

Our Pinocchios are just machines locked inside the room of silicon they inhabit. They can only speak Chinese by following rules from the programme they got – in our case the experience of Pinocchios neural network to data it was fed in training.

For an LLM or any ML solution … their “programme” is based on the rules embedded in the datathey have ingested, compared, quantified and explored within their networks and pathways. LLM Pinocchio is built from documents gleaned from the internet. This is impressive because “Language is not just words, but “a representation of the underlying complexity” of the world, observes Percy Liang, a professor at Stanford University – except where it isn’t I would argue.

Take the word “Love” or “Pain”– what does it actually mean – no matter how much you read only a human can experience these emotions (and if you are bereaved you realise how little text can prepare you) ? Can anything other than a human truly understand it?

Or another way, as Wittgenstein argued– can a human know that it is like to be a lion – and could a lion ever explain that to a human? So can our Pinocchio’s ever know what it is to be a human?

But worse – how can a non-lion ever know truly whether it has managed to simulate being a lion? How at the LLM police itself since is has never experienced our reality, our lives, our culture, our way of being? It will never be able to know whether it is tripping on an LSD false-world or the real-expressed and experienced world.

If you don’t believe in the partiality of data then think of the following example (sorry about this) defecating…. We all did it, probably today, but our LLM Pinocchio will never really know that …. Nobody ever does that in books, in movies, seldom in writing documents… we all experience it, we all know about it as an experience but no LLM will have anything to say on that – except from a medical perspective.

This is sometimes call the frame problem. And it is easy to reveal how much context is involved in language (But less so in Data).

Take another example – imagine a man and a women. The man says “I am leaving you!” – The women asks “Who is she?” You instinctively know what happened, what it means, where it fits in social conversation. LLMs can answer these questions within the scope of human imagining and human writing – not in their own logic or understanding. My 1 year old experiences the world and lives within it (including lots of pooing) … an LLM does not.

Pinoccios can learn from great quality data (e.g. playing Go or Atari Video Games) or poor quality Data (e.g. most data in the real world) . Data, like language is always culturally situated. Choices are made on what to keep, sensors are designed to capture what we believe and record. For example,in the seventeen centuries UK death record (around the time of plague) you could die of excessive drinking, fainting in the bath, Flox, being Found dead in street, Grief, HeadAche…

So now we need to think about what world the LLM or AI does live in… and so we turn back to Spiderman … or rather back to the spiders web of connections in the crazy multi-verse universe it talks about.

LLMs and other Networks learn from a spiders web of data.

At the moment most people talk about AI and LLMs as a “product” – a thing – with we interact with. We need to avoid this firm/product centric position (Pujadas et al 2023) and instead think of webs of services within an increasingly complex API-AI Economy.

In reality LLMs, ML etc are service – with an input (the training data and stream of questions) and an output (answers). This is perfectly amenable to integration into the digital infrastructure of cloud based services which underpin our modern economy. This is where my team’s AI research is leading.

We talk about Cloud Service Integration as the modern day enterprise development approach in which these Pinocchios are weaved and configured to provide business service through ever more Application Programming Interface connected services. We have seen an explosion of this type of cloud service integration in the last decade as cloud computing has reduced the latency of API calls such that multiple requests can occur within a normal transaction (e.g. opening a webpage can involved a multitude of difference API calls to a multitude of different services companies who themselves call upon multiple APIs. The spiders web of connected AI-enabled services taking inputs, undertaking very complex processing, and providing outputs. Each service though has training data from the past experiences of that services (which may or may not be limited or problematic data) and driving the nature of the next.

So to end my worry is not that a rogue AI trips out on LSD… rather than we build an algorithmic infrastructure in which it is simply impossible to identify hallucinations, bias, unethical practices within potentially thousands of different Pinocchio’s within the spidersweb of connected interlinked services that forms the algorithmic infrastructure.

Thank you.

[1] Statement on AI Risk | CAIS (safe.ai)

[2] Pujadas, Valderrama and Venters (2023) Forthcoming presentation at the Academy of Management Conference, Boston, USA.

Spiderman image (cc): https://commons.wikimedia.org/wiki/File:Spiderman.JPG by bortescristian used with thanks.

Understanding AI and Large Language Models: Spiders Webs and LSD.

Contact Details

Latest Tweets

Contents

Related Posts

Contact Details

Latest Tweets

Contents