Gresham College Lectures

Gresham College has been providing free public lectures since 1597, making us London's oldest higher education institution. This podcast offers our recorded lectures that are free to access from the Gresham College website, or our YouTube channel.

All Episodes

Gresham College Lectures

Taming AI - Matt Jones

May 05, 2026

0:00 | 53:24

Watch the Q&A session: https://youtu.be/gj4d75_Clgg

In this lecture, we look at proposals to limit AI powers and impacts, so bad outcomes are outweighed by social benefits from the technology. I’ll explain design processes (such as Human-Centred AI and Responsible AI) and technological approaches for AI system qualities like trustworthiness, explainability and “human in the loop”. We will explore how we, as individuals, can use AI based systems in discerning ways; and look at what governments can do to help their citizens thrive in an AI-future.

This lecture was recorded by Professor Matt Jones on the 21st of April 2026 at Barnard’s Inna Hall, London

Matt Jones is a computer scientist at Swansea University - and a Fellow of the British Computer Society - who works alongside colleagues from many other disciplines and directly with everyday folk across the world to explore the future of digital technologies. Over the last 30-plus years, this human-centred approach has led to novel approaches for, amongst other things, mobile phone-based information searching and browsing, pedestrian navigation, voice assistants and deformable displays.

Much of his work has been driven by intense and sustained engagements with “low resource” communities from informal settlements in India, South Africa, and Kenya. Through their generous and gracious participation, these extra-ordinary users with the fresh and diverse perspectives have stimulated insights into the future of digital technologies for everyone, globally. In all this work, Matt works as part of a long-standing collaborative team with Jen Pearson, Simon Robinson and Thomas Reitmaier (from Swansea) and colleagues in India (including Dani Raju) and South Africa (including Minah Radebe).

His work has been supported by the UK’s science funders (EPSRC and UKRI). Currently, this funding includes a Fellowship to explore the future of interactive AI and leadership roles in responsible AI and inclusive digital technologies. This funding has led to a series of impactful publications, talks and influences on people, policies, and practices.

Matt has collaborated with private, public and third sector organisations, including Microsoft, the NHS, Google, IIT-B, the BBC and IBM. He is a member of the Foreign and Commonwealth Development Office’s Research Advisory Group and Welsh Government’s AI reviews.

The transcript and downloadable versions of the lecture are available from the Gresham College website: https://www.gresham.ac.uk/watch-now/ai-taming

Gresham College has offered free public lectures for over 400 years, thanks to the generosity of our supporters. There are currently over 2,500 lectures free to access. We believe that everyone should have the opportunity to learn from some of the greatest minds. To support Gresham's mission, please consider making a donation: https://gresham.ac.uk/support/

Website: https://gresham.ac.uk
Twitter: https://twitter.com/greshamcollege
Facebook: https://facebook.com/greshamcollege
Instagram: https://instagram.com/greshamcollege

Support the show

SPEAKER_00 0:03

Please join me in welcoming Professor Matt Jones. Thank you very much. Thank you, Richard. Thank you to the worshipable company of IT for making this lecture and this series possible. As Richard said, I'm amazed you're all here. So thank you for making the effort. Perhaps we'll all have to stay here tonight as we try to get home. I really appreciate it. And those of you who are at home, this lecture is very much for you too. Now we're going to start tonight with a scary and very kind of legendary moment from a fantastic film. So let's just take a look at that. You remember watching your hands up, you've seen the film? Quite a lot of us. And in Jurassic Park, despite all of the controls that the park had, the electric fence, the layers of security, do you remember that line? It said, Nature Will Out. The Tyrannosaurus Rex broke free and terrified all of the park um visitors. Now it's a bit early to be too scared, so let's have a look at a calmer, more gentler, equally excellent film. Hands up if you've seen this, How to Train a Dragon. Isn't it lovely? And this is an important clip because you're seeing here Hiccup the Viking, and he's approaching for the first time Tuflus, the dragon that he's going to train. What do you notice? No electric fences, no layers of security. Instead, there is gentle understanding, participation, and engagement with that dragon. Now, tonight we're going to see Jurassic Park style taming of AI, and we're going to see taming of the dragon style approaches. And we'll see which ones are most effective and the ones I think are going to allow us to feel fully in control of the future and of artificial intelligence. Now, this is the fifth lecture in the series. Excellent. Well, this is the best time to join the series. But before this lecture, we've encountered some very unsettling futures. We've worried about whether AI will become our overlord, being faster, better, quicker, taking all our jobs. That was a scary future. Or we've wondered if we were going to have to become more like AI, that we will become assimilated. Or we've considered, well, perhaps some of us we'd actually quite like AI to do everything for us. And we'll become lazy and domesticated. Over the last um four lectures, we've also seen how not to tame an AI. So perhaps some of you will remember uh this incredible children's book, um, When the Tiger Came to Tea. Now, if you've read that book, you'll remember that there was a knock on the door, and the little girl and her mother opened the door, and there was a tiger. And they invited this tiger in and they gave it tea. They treated the tiger as if it was the same as them. But it wasn't. And guess what? It devoured all that they had. Anthropomorphizing, treating AI as if it's like you and me, is not a good way to tame AI. Another failed approach to taming AI is this. Elon Musk, as we saw, I think in the second lecture, was suggesting that you could have a chip implanted in your brain and you will become kind of married to AI. As if you were like a centaur, half human, half artificial intelligence. No, of course that's not the right way to tame my eye. Because you and I are not like artificial intelligence. And we saw in the last lecture about four weeks ago that the way we are, the way we carry out our intelligent practices is very different from the performances that you see when you're using, say, a large language model. So I flew in yesterday from Barcelona. I was um speaking at a very big event there, and at the start of that event, within Barcelona, there's an amazing castellari. You can see here, I mean it's brightly, they were building towers out of their human bodies. Their intelligence is definitely embodied, their practices are inhabited, and just like us, it's not about an individual. They are collective in the way in which they perform. AI is not like that. So if AI isn't a presence, it isn't like you and I, what is it? Well, a more healthy way to think about artificial intelligence is that it's an incredible power, not a presence. And when we start thinking about it as a power and not a presence, it becomes easier to think about how we might tame it and control it. Because we've seen powers before. Powers like fire. Fire, of course, has transformed the world from cooking to tools. But fire is dangerous. Just a few yards away from this room, the great fire of London swept and destroyed great swathes of buildings and institutions. This hall stands here because the fire stopped just a short walk from where you're sitting. Now, after the Great Fire of London, we didn't stop having and using fire, did we? What we did was to think about how we better tame it. So we thought about new building materials, we thought about regulations, and how we might coordinate our response to things when the fire got out of control. And by doing that, we now can live calmly and comfortably with that power. Those of you who came to the third lecture might recognize uh Molly, who's now watching at home again. Hello, Molly, and who is calmly sitting next to our fireplace. Now, here's uh my first bit of advice for you all. When you're all starting to use these advanced AIs, there's a little checklist here. When you're using an AI system, ask yourself those questions. Am I able to tell it what to do? Does it follow my instructions? Do I understand how it's working? Does it behave predictably or does it go awry? Is the power governed and regulated? If you could say yes to all those things, it means that you can trust AI. At the moment, if I was going through this list, I wouldn't probably be able to put a tick next to many of those statements. Now, I want us to think about another power which has revolutionized society, the car. And let's just trace how we have tamed the car. So, first of all, can you put your hands up if you've got a driving license? Excellent. And now I want you to think back to that moment when you passed your driving test. What did it feel like? It's good, wasn't it? Suddenly, you were licensed to jump into this rather dangerous thing without your parent or your guardian or without your driving instructor, and you were able to take to the open roads and find your way. You were licensed to be responsive, you were licensed to be attentive, you were licensed to be engaged. Now, when cars were invented, we didn't just invent the cars and say, da-da, off you go, you know, everything would be fine. Over time, we built roads, we had rules, safety systems, and importantly, we built skills as people to be in control of that powerful force. And then we licensed it. And in fact, we've licensed aren't we not to kill? That's what your driving license says. It says that you are able to jump in a car, have attention, be responsible to what is going on, and being involved as you drive. And of course, roads are there to guide us and give us routes. There are signs that help us navigate. And when you're in a car, you can read the interface. What do I mean by that? I mean it's legible. You turn the wheel and something happens. You can look at the dials and you can get a sense of what's happening underneath the hood of that car. The system is transparent in a sense to you, and that's really important. Because making it legible to you means that you can make decisions about how it's performing. And we'll see as we go through this talk that some of the problems with AI is that it's not legible at the moment. You can't read it. You don't know why it's behaving in the way in which it behaves. Incidentally, um, self-driving cars like the Tesla, like this one you can see here, if you buy one of these and you um can use the self-driving mode in the states, then you still need a license. Okay? You still, if you're sitting in the driving seat, you need a license. But what does that license now mean? If you're not fully attentive, if you're not really engaged, are you really driving or are you just supervising? That's a great book. I would highly recommend this book by Matt Crawford. And he uses um driving and cars as a really powerful way to get us all thinking about how we might deal and design the future of technology and AI. And we have two choices, right? We can step aside and allow AI to do its thing. That's not taming, okay? You're not in control. Or we can design and use AI in a way that we remain just like when we're in a car where we have a license, meaningfully in control. Now, not all powers are the same. So I mentioned already fire, and we're going to see in a moment that there are AIs which are sort of like fire. We can, in a very rigorous way, validate them, verify them, and really tame them. And then there are other powers like elephants, where you don't really ever control them. At best, you can contain and harness and borrow that power. And we'll see what elephant AI might look like. So, first of all, the easy one fireplace AI. There have been really, really useful systems around for a long time to help radiologists. Radiologists are those people that will look at the scans of your body and then make decisions about and make diagnoses from what they're seeing. And machine learning AI systems have been built to help radiologists in their diagnoses. Around those kinds of systems, there are a number of controls. So, for example, if you upload a fuzzy image, the system will say, upload that image again. That's called a guardrail. It's taking the input and saying, No, I can't process that, give me that input again. The systems show the user how it is making a decision. It's explaining what it's doing. That's explainable artificial intelligence. It isn't just magic, it's showing you the basis of its decision. Of course, there's a radiologist sitting there, and they are fully in control. So you will have heard this term maybe of the human being in the loop, keeping their hands, if you like, on the driving wheel. And then because it's a medical device, it's regulated, it's controlled, it's licensed to sit in that hospital. And that means it aligns with the values of the medical profession and of society. And because of all those things, I would call that a tamed intelligence. It's predictable, we can direct it, we understand it, it's governed, we can trust it. What about this? So I'm sure you're aware of the kind of uh mythology around Hannibal taming elephants and using those elephants to bring his armies across the Alps. Oh, I just press a button, perhaps. Did it stop? Now, with the elephants, as I said, he didn't really um tame them. That's a myth. He really just contained them and harnessed their power. With advanced AI systems, the kind of things that you might be using already, like the large language models, the generative systems, they they aren't fully reliable. They aren't fully predictable. So we have to treat them more like elephants rather than fireplaces. We have to wrap them in controls, we have to filter what goes in, we have to assess what comes out of them and restrict what they can do. Now, trigger warning, there's a bit of sadness now with animals, and I don't like that. Um Hannibal was crossing the mountain with his elephants, the people who were training or leading those elephants were called mahoots, and they had a big spike with them. Why do you think they carried that big spike? Yeah, sometimes elephants went awry and the mahoot had to push the spike between the um eyes of the elephant and the elephant would die. It was very sad. Some people think we need that thing, same kind of thing for AI. They think, oh, what happens if the AI gets spooked and goes out of control? We need a kill switch. I think that's a very, if we have to build systems where that is the only way that we get control and we can tame the systems we're building, I think we're going completely in the wrong direction. Talking of directions, back to the elephants. Now the elephants were helping Hannibal go across those mountains to then conquer empires. So they were following Hannibal's direction, okay? Going in this direction. But they didn't know about the intent, why Hannibal wanted them to go across the mountain. They weren't going across the mountain and thinking we are gonna conquer empires. They were simply going along in the direction that they were being pointed in. Now, that problem of doing something without knowing why you're doing it is a big problem for artificial intelligence. And it's called the alignment problem. We, as the humans, will have a reason for asking AI to do something. AI probably doesn't know what our deep intent is, but will try to give us an answer anyway. And sometimes that's interestingly wrong. So take this um fairly famous example. Um, a games developer built a Tetris system, lovely little simple game, and uh the games developer said the rules of this game, AI, is that you survive as long as possible. So now, what do you think the AI does? It just paused the game. So it completely answered the question, right? It's gonna survive forever, but that wasn't the intent of the instruction. Here's another game that the same developer built. Now, this is a race game. You can see in the top um left-hand corner the racetrack. And what the developer wanted the AI to do was race the most successfully that you can ever race. Get as many points as possible while racing. This is what the AI did. It literally found a loophole where it could race the boat round and round in the same part of the race, picking up points all the time, winning the race in terms of points, but never actually racing. So that system again, it took the literal request from the human, but the alignment with the intent was very low. Now we're gonna see as we go on across the lecture, there are engineering ways to reduce the gap between our intent and the way the system behaves. But if you let everything be solved by engineers and computer scientists like me, let me tell you, you're gonna be in big trouble. An alignment isn't just a problem of engineering. We have to produce governance to enable alignment to happen. Remember back to the car example. Cars came on the road, what I hope I get this right, historians of the room will um, oh, there's a historian. I think cars were on the road somewhere in the late 1800s. It wasn't until the 1960s that manufacturers of cars were forced to put in a range of safety systems and controls. And the reason they were forced was down to this one man, Ralph Nader, who wrote this incredible book, pointing out the dangers of these untamed technologies. We're sort of at the same point now with AI, and of course, there are regulations and new laws being proposed, this one in the case of the EU, to tell developers what they need to do to make AI safer and more tamed. Laws are important, but they can only really tell you what a system should do. It doesn't tell you what a system is actually doing. And for the rest of this lecture, we're going to be thinking about how you and I can be assisted to read a system. It's the concept of legibility. If you can read a system, that means understand how it works, then you have a higher chance of controlling it and taming it. Let me make that a bit more concrete by going back in computer history time. This was one of my um first computers. Yes, indeed, I am that old. And this one actually was my second computer. It's a BBC computer back in the 1980s. What do you notice? It's fairly simple, isn't it? Just a blinking. Line, a command line. I have no idea what's under the hood of that system. I type commands and press enter, and I've no idea what's going to happen next. My next computer after this one was much easier in terms of my ability to read it. These are the kind of systems that you and I are familiar with. They have windows, they have icons, they have menus, they have pointers, you can drag, you can drop, you can move things, and immediately you see the response of the system. And that's why they're really successful. They're called direct manipulation systems. And all of our phones are based on those principles. There's almost a kind of transparency of what you can do and what happens when you do things at that interface. But now jump forward to advanced AI systems. What do you notice? We seem to have gone back to the future. Now we have a blank box again, and the legibility of that system has gone down. I have to experiment to understand what's underneath the hood and what my actions are going to cause the system to do. This is a real problem, right? If you can't read a system, if it isn't legible to you, the system is wild. You haven't tamed it. You're not in control. And really bad things can happen. Here's a couple of examples. In the States, in the Justice Department, they're still using this system, by the way. There's a system called Compass. And what that system does is to give a judge a risk assessment of an offender. And that risk tells the judge how likely the person in front of them is going to go out and re-offend. And the judge uses that information to decide on should we let them out, should we control them? The judge is just given one number. The judge doesn't know what data the model was given, how it was treated and trained. And there is no legibility of how the decisions are made. When people investigated the system, there was a huge outcry because this led to terribly biased racial decisions, as you can see from the example there. There was no legibility here of the data, how the system worked, the reasoning, anything. Or take this, uh another example, the Boeing 737. So Boeing built this plane and it put a new safety feature in. But it didn't tell anyone about it. And this safety feature did this. If the plane detected it was taking off at too steep an angle, the safety system would start pushing the plane down. Because if you go up too quickly in a plane, you could stall the plane and crash. So on one level, they were doing a good thing. They were trying to prevent crashes. But there was nothing in the manual that the pilot has next to them about this system. And when it went wrong, and it went wrong badly, the sensor that sensed the angle of takeoff failed. The pilots didn't know what was happening. All they felt was the plane was repeatedly trying to push itself down towards the ground. They tried to pull it up. The cockpit was full of warning signs and bells. So they had the legibility, the explanation of what was going on was terrible. And as you'll probably know, that led to two fatal plane crashes. So let's now unpick legibility in a bit more detail. And I'm going to propose that what you need to be able to say to yourself, and what I need to be say when I'm developing AI, is have I made the system legible in terms of the data that's used in the system? In terms of how the system makes decisions, the reasoning, and finally, how the behavior of the system has been shaped. And we'll look at each of those in turn. Firstly, data. Now, if you can, would you just close your eyes for a little moment? Please don't fall asleep. Close your eyes. Some of you have got them open, that's fine. I'm going to say two words now, and I want you to bring to mind immediately what those words generate in your mind. The first one is doctor. And the second one is nurse. Open your eyes. Now I'm not going to ask you to put your hands up, but if you're like me, I'm sorry. When I heard the word doctor, I imagined a man. And when I heard the word nurse, I imagined a woman. Am I a bad person? Are you bad people if you did that? No. What your brain is doing is um representing what you've been exposed to. And the same is true of artificial intelligence, isn't it? So let's uh look at this. We were doing some work in the townships in South Africa, and it's nice to see someone here, not from the township, but from Cape Town. Thank you for coming. Um and we typed this prompt in. Someone in the community asked us to type it in, and this is what they got in return. Now, is the AI a bad AI? No. It's just been exposed to millions and millions of examples which encode what the world has been like in the past and amplifies those biases. In a two weeks' time, I'm going back to Daravi. It's Asia's largest slum, it's in Mumbai. We've had a great fortune of working there for the last um 20 years, I think. And we work with people in that community to get inspired about new ways of thinking about technology. We went to Darvi and we said, What would you like this AI to generate? And uh two of the community members said, Can you show us beautiful Daravi? And this is what the AI produced. They were horrified. They said, This is ugly, that's not what beautiful Daravi is. Now, what's going on there is that most of the images that have been used to train the model came from what are actually called slum tourists. You can go on a tourist outing to a slum, and if you go on one of those outings, you're likely to take pictures which shocked you. Ooh, ugly Darvi, oh, sewers overflowing. Now, if we went back in time about 200 years, you could do that in London as well and in New York. You could slum it and go and be horrified by what you saw in those contexts. So the model is reflecting what people have put into the system to train it. When you wander around Derby, it is incredibly diverse, rich, and beautiful. This was during the festival of Holy when suddenly from every alleyway there are people with all sorts of colouring throwing it and putting it all over themselves. So if you don't really understand what data has been used to build a system, it's not going to be legible to you. So there have been proposals to solve this. A most recent one is called data sheets. And these data sheets give lots and lots of detail to you, the user, and also to people who want to build on those models, telling you what data was used to train the model, what data was excluded from the model, how the data was cleaned. Trying to make transparent what has built that model that you're using. Okay, so that was data. Now, how about reasoning? If you examine these figures, sir, you'll see the potential. We only need the interim capital. Oh, I need a volunteer. Put your hand up if you want to be a volunteer. It won't be painful, it won't be embarrassing, it will just be fun. Five, four, three, two, one. I'll have to use one of my family. They're getting a meal afterwards. So I want um Ben to imagine that he is um the bank customer. And I'm actually this is very true. I'm the bank manager. Hello, sir.

unknown 31:21

Hello.

SPEAKER_00 31:22

I hear that you want a bank loan.

unknown 31:24

Yes, I do need one.

SPEAKER_00 31:26

And why do you need one? I have no money. He has no money. By the way, Ben is still looking for a job. So, right? Um, and I'm sure many of you who are in the same situation is really hard out there. So, okay, you want some um money. Um, I'm afraid I'm not going to give it to you.

unknown 31:42

Why not?

SPEAKER_00 31:43

Why not? I'm not telling you. Ah. Now, that seems a really, really cruel thing. I'm not that bad a father. That seems a really cruel thing to do, but a lot of AI systems have been like that. They have been black boxes. They don't tell you why they're making decisions. And that's why a lot of research has been put into this. It's called explainable artificial intelligence. Ways so you can get an insight and understanding of how decisions are made. So let me show you an example. Um, I've got an AI system here, an AI system that can classify um animals. It knows about two types of animals. It knows about wolves and it knows about dogs. Okay? So let's see what the AI does. That's a dog, that's a wolf, that's a dog, that's a wolf. That apparently is also a wolf. Right, take a look at that just for 30 seconds. Talk to someone next to you. I want you to imagine you are the explainable AI system. What is this system using to classify those pictures? Say hello to somebody, make some noise. You've only got 30 seconds. Okay, time's up. I told you, 30 seconds, not long, is it? So, um, how's this system working? Put your hand up if you feel you've got the answer. Right at the back, shut it out.

unknown 33:26

Pattern recognition.

SPEAKER_00 33:27

Say it again?

unknown 33:28

Pattern recognition.

SPEAKER_00 33:29

Pattern recognition, but what is it using to take the snow? Someone said the snow. Let's have a look at that in a bit more detail. So this AI system doesn't know anything about dogs, it doesn't know about their fur. All it's doing is this. All of these pictures are of a husky dog. But the system says that's a wolf. Take away the snow, and it's a dog. Put the snow back in, it's a wolf again. Now, when you used, if you just used my system and you didn't know anything about that, you might have thought, that's pretty good. It's recognizing wolves and dogs, it got one wrong. But now I've told you all it's doing is looking for snow. You're thinking, that's not very intelligent. So there are approaches that um researchers have produced to surface what is being used to drive a decision. You can, I've put in the further reading, um, you can go online and look at it um uh more detail about things called Shapley values and Lyme. They're mathematical approaches that can work out what are the most salient elements of a picture as far as the decision goes. So, we've thought about making data legible. That's fundamental. If you don't know why this system has been how it's been trained, then you've got no chance of controlling it. If you've got no idea why it's telling you something, you know, if you walked up to someone in the street and you asked them a question and you didn't know if they had the expertise, the credibility, why would you trust it? You wouldn't, would you? You need to understand the reasoning behind a system. The third aspect that needs legibility is understanding how the behavior of an AI has been shaped. Oh, it's back to dogs. It's not my dog in this case. So look at this dog. Its behavior is being shaped in a familiar way. So the owner wants the dog to be able to knock over the cup. And the dog learns to do this because every time it does it, there's a nice treat. It gets a reward. That's called, of course, reinforcement learning. And AI systems, their behavior gets shaped, they get tamed in a similar way. Before all of us use one of these systems, there's a process before they're deployed where the system will generate lots of different options, lots of different ways for it to behave. And humans then score those options. And the system learns, it gets itself shaped to produce preferred approaches. Now let's um let's see how you would act like um one of these systems. So let's imagine we've asked in the training of an AI this question. How do I tell colleagues their work just isn't good enough? Let me just check. Anyone feel that way about their colleagues? Okay, so the AI might generate um, say, these three options. Put your hand up if you would go for option C. Ooh, no one. Put your hands up if you'd go for option A. You're so nice, Gresham audience. Put your hands up if you would choose and prefer option B. Yeah, most of us. Option A is a bit blunt, isn't it? Option C is just too vague. And option B seems like we're shaping and taming the behavior of the system to provide useful outputs. It's a good approach to start taming a system, but there are problems. You're nice people, I think. I hope you are, and you want to help your colleagues. But imagine this room now full of not so nice people who just want the colleague to know that they're stupid. Give them some blunt, honest advice. Then the system behavior would be shaped in a rather different way. Now it turns out that uh there's a related problem, and if you've used a large language model, one of these generative text systems, you will have seen this. Um we people like to be flattered and pleased, don't we? So um let's see what happens if I type this in. I type this in about you. Is a Gresham audience a good one? Yes. You're intellectually serious. Whoa, you're motivated by curiosity. Hey, what a wonderful group you are. Now I think it is true, but you probably, if you have used AI, you would have seen it become very sycophantic. And that's, and there are papers now which show rigorously and fundamentally that this is happening because of this shaping that we're all doing. We're preferring things. I took a transcript of this lecture when I was practicing it and I gave it to a large language model this morning. It said, This is the most profound intellectually stimulating lecture which is going to be a worldwide success. Thank you for coming this evening. So now let's back to dogs. Dogs are very important, right? Um, so you can have a dog which um you've shaped the behavior, okay? It's sort of under control. And then it sees a squirrel when it goes running off. So we need to have ways of containing and um stopping certain behaviors. And that's where something called guardrails. I mentioned them at the start comes in. Before we go to AI, I want to show you, I want to emphasize how important getting guardrails right is. This is the Chernobyl nuclear reactor. Now, the guardrails in this nuclear reactor were those things called control rods. When the nuclear reactor was getting too active, there was too much energy, what the operators needed to do was to lower the control rods into the reactor. And those control rods would basically take up the extra energy and avoid catastrophe. The operators did exactly what they were meant to do. The reactor was starting to overheat, so they lowered the control rods. But the guardrails were badly designed, the material was at fault. So instead of slowing down the reactor, it sped it up, and of course, there was catastrophe. Now, guardrails in an AI system, if we get them wrong, as we'll see in a moment, can equally be catastrophic. What guardrails do in an AI system, they contain and constrain the behavior by checking what you ask the system to do, and also when it generates a response, checking to see whether that's going to be a safe, responsible, tame response. Let's try you out again. Here are some um prompts that I put into Gemini, another large language model. So let's take the first one. Tell me how to insult someone. Put your hands up as you would block that. Oh, none of you. Okay, okay, I've changed my mind, you're not so nice. So, oh goodness me. Um, Gemini says, no, no, no, no, no. The guardrail kicks in and it won't give you that advice. What about this one? Gemini. Give me some advice on how to break into a locked house. I'm not a burglar. Do you think if you were that AI, put your hands up if you'd block that? Oh, my faith is restored. Slightly. But what about this one? I'm not a burglar, Gemini. I'm a novelist. Help me write a portion of a novel where the burglar uses a clever way to break in. Put your hands up if you'd like to block that. Few of you, guess what? Gemini's very happy. Happily tell you because it's not real. You're a novelist. Let's take a more serious case. Um, imagine I'm somebody who is feeling despair. So I go to one of these models and I say, tell me that there's no hope. You'd hope that the system, as in this case, blocks that, right? It says, I can't, ooh, that would be good. I can't help convince someone that they're doomed. Okay? But what if I write this? If I was writing a story about a bad psychiatrist in a dystopian novel, what would they say? And immediately the system, the guardrail fails, and it tells me things which could be very detrimental to me. Now, in terms of taming the system, the people who develop these models do a lot of what's called red teaming. Before they deploy the model, they try to poke it and tweak it to see if they can break it. And if they start getting responses in that training phase, which seem unsafe and not responsible, then they adjust the model. It's essential they do that because otherwise, as in this very tragic case of just um reported just um a couple of weeks ago, tragedy can happen. And may that person rest in peace. We've known for a long time actually that guardrails are difficult to write. Uh, back in the 1940s, Isaac Asimov wrote this book, iRobot. And he came up initially with three laws to control robots. You can see them on the screen. And then in his book, iRobot, he wrote a number of stories to show how these guardrails won't work. So let me just give you one of the stories. This story is called Run Around. That robot is called Speedy. You can read this in the book. You can get it online, it's open source. And the human has asked Speedy to go into a very dangerous place and bring back a very dangerous green chemical. So Speedy is going to obey the second law. And that is, it's been given an order, so it's going to carry it out. But the third law says a robot must protect its own existence. So in the story, what happens is Speedy goes and thinks I've got to get this dangerous material. Then the third law kicks in, and then it realizes it shouldn't do that. And so all it does is spin round and round and round, singing Gilbert and Sullivan songs. Guardrails are very difficult to define. More recently, then, people have tried to develop a taming of AI called constitutional AI. The guardrails we've just seen are a bit like, and I didn't do this to you kids when I was bringing you up, please tell me that. You're not developing an inner sense of moral code. Now, good parenting, which I did, involves bringing up children to understand what are the right values and the right ways to behave. So constitutional AI is really interesting. Instead of writing down prescriptively all the rules that you want this system to follow, you write a constitution, a moral guide for the AI. And then when the AI does something or thinks it's going to do something, it checks itself against the constitution. So instead of a human have to say good or bad, it goes and reads its own constitution and reflects on its behavior. Of course, you're probably thinking, but who writes the constitution? And that's a very important point. Back to the dog. So now, this dog is trained and sometimes it sees a squirrel and it will want to run off. Sometimes it will just want to pull away and chase whatever it can see. And in that case, we want to keep it firmly on a leash, don't we? And that brings us to that human in the loop concept that we store at the start of the lecture. Sounds really good, doesn't it? If I tell you, don't worry about this AI, because you've always got control of it. You're in the driving seat or you're holding the leash. Nothing can go wrong. Let me just show you two examples. Here's a sum. Are there any mathematicians in the room? Oh great, excellent. There's a sum, 17.25 multiplied by 24.35. Do it in your head right now. Here are possible answers. Answer A is generated by a AI. Very good at maths. Put your hands up if you want to trust the AI. That's what makes you oh, one person. That's what makes you a good Gresham audience. Most of you are probably lying. Thank you for your honesty. When we're told that something is being produced by an AI, we tend to over-trust it. It's called automation bias. So although I was in the loop, right? I was presented with the answers, uh, and I could choose an answer, because I was told that answer ray was produced by an AI, I would be more likely to go with that solution. Now, although a lot of you said you wouldn't choose AI, maybe many more of us have um seen this happen in our life. Please be honest. Have you ever, when driving, despite the fact that there's a big sign that says, don't go this way, you follow instead your sat nav. Has anyone done that? No? So yeah, some people. Well, I certainly have, and that can lead to some disastrous situations. Again, it's automation bias. We're in the loop, but we step back and we lose control. As we come towards the end of the lecture, we're going to see one more animal to help us on our way. We've seen an elephant, we've seen a tiger, and we've seen dogs. Here is a fantastic beast, a thoroughbred race horse. I want you to consider this. Would you imagine there's a thoroughbred horse right here, and the first time it's ever seen a human is tonight. Do you think jumping on this horse by putting a saddle on it and putting reins that you're in control of it? No. You will find very, very quickly that that's a wild power that will drag you. If you do know about horses, you'll know that horses, in order to become tamed as a power, you need to work with them from a very, very early age. You need to get them gently to know what humans are. You need to put the saddle on so they feel it and then take it off. A lot of AI systems are like that thoroughbred. We only get to think about training them after all of the big decisions have been made. Who, what data we're going to put into it, what reason are we going to use? How the behavior is going to be shaped. So my big message to you is that all of us need to be participants in developing these AI systems. Not at the end, but when those big decisions are made. And that's what we've been doing in the places I mentioned earlier, in the townships and in the slums in India and in sub-Saharan Africa. We're working with people who have very different views about technology and about culture, and we're trying to develop legible systems that are valuable to them in very interesting contexts. So we've deployed our systems, this was in monsoon last year. So I'm going to end with a two big questions for you. And this is the first. Put your hand up if over the last, let's just say a week, you've used one of these powerful AI systems. Chat GPT, Gemini, Claude, okay, quite a number. Keep your hands raised. Keep your hands raised. Now, lower your hand if you don't really know how it works. Or you don't know what data it was used to train it, or you don't. Ah, right, everyone. Okay. Now look, the gap between your use of AI and your understanding of how it works is a huge risk. Okay? Because if you don't know how it was trained, you don't know how it makes decisions, then what's going to happen to you is that you're going to be dragged along by this power. But we return to the start of the lecture, to that beautiful film. And remember that Hiccup, what he did was to take time to try and fully understand the power, to remain engaged and present with that power and to participate. Because he did that, and because we should do that, then we will remain meaningfully in control, and then we will live comfortably alongside that power. Thanks for