Gresham College Lectures

Human-led AI

November 29, 2023 Gresham College
Gresham College Lectures
Human-led AI
Show Notes Transcript

Is Artificial Intelligence fundamentally different from previous technological advancements?

This lecture will examine the opportunities and threats of the impending AI revolution, asking if AI differs from past technology waves and exploring measures to ensure AI safety.

It will introduce 'Human-led AI', a paradigm which emphasises human control and supervision over AI, to mitigate potential hazards whilst also harnessing the power of this dynamic technology.


This lecture was recorded by Dr Marc Warner on 27 November 2023 at Barnard's Inn Hall, London.

The transcript and downloadable versions of the lecture are available from the Gresham College website:
https://www.gresham.ac.uk/watch-now/human-led-ai

Gresham College has offered free public lectures for over 400 years, thanks to the generosity of our supporters. There are currently over 2,500 lectures free to access. We believe that everyone should have the opportunity to learn from some of the greatest minds. To support Gresham's mission, please consider making a donation: https://gresham.ac.uk/support/

Website:  https://gresham.ac.uk
Twitter:  https://twitter.com/greshamcollege
Facebook: https://facebook.com/greshamcollege
Instagram: https://instagram.com/greshamcollege

Support the show

So why are some people concerned, right? So imagine I had like a general purpose robot, and I'm sitting in faculty's offices. We've just moved in and we still haven't managed to get our coffee machine plumbed in. Alright, okay, bloody hell, robot. Just open the door, go through it as the robot steps on the cat, okay, next day, robot, don't smash the door, don't step on the cat. Get me a coffee as fast as you can. So it runs downstairs and just steals a coffee from the hand of the person who's just get got served their coffee, okay, robot, don't smash the door, don't step on the coffee. And so the robot races in, starts making the coffee, comes back totally burned, like, you know, half, half the robot. It was, okay, robot, don't, don't smash the door. Why don't do this? It sounds small, but hopefully it illustrates the actual difficulty of specifying a precise objective function for something that doesn't have your common sense like humans are. When I ask somebody to get me a coffee, there's an enormous amount of sort of common sense that goes alongside that, that we simply don't know how to put into computers right now. And so in the robot coffee example, maybe that's a little bit silly, but what happens if we start making very powerful ais and we start putting them to work on really important problems? And we know from mythology, humans have known for thousands of years that actually this is a hard problem, right? King Midas wanted to be wealthy. He found out the difficulty of specifying an objective function to be what you actually want, not what you say. So this is Sam Altman from Open ai. I mean, I think the best case is like so unbelievably good that it's like hard to, I think it's like hard for me to even imagine. Like I can sort of, I can sort of think about what it's like when we make more progress of discovering new, new knowledge with these systems than humanity has done so far. But I can't quite like, I think the, the, the good case is just so unbelievably good that you sound like a really crazy person to start talking about it. Um, I'm more worried about like an accidental misuse case in the short term where, you know, someone gets a super powerful, like, it's not like the AI wakes up and decides to be evil. But, but I can see the accidental misuse case clearly, and that's, that's super bad. Um, Okay, so this is, that's Sam Altman from OpenAI talking about, uh, this, and then this is Dario Moe from a, he was actually the person, the scientist probably most prominently associated with building gp, PT three and open ai and then moved to start his own startup anthropic, which is maybe the third biggest, um, of the kind of Yeah, I, I I think it's popular to give these percentage numbers and, and you know, I mean the truth is that I'm, I'm not, I'm not sure it's easy to put to, to put a number to it. Um, you know, I I think I've, I think I've often said that, you know, my, my chance that something goes, you know, really quite catastrophically wrong on the scale of, of, you know, human civilization, you know, it might be somewhere between 10 and 25% when you put together the risk of something going wrong with the model itself, with, you know, something going wrong with human, you know, people or organizations or nation states misusing the model or, or it kind of inducing conflict among them or, or just some way in which kind of society can't, can't handle it. Um, again, this stuff about curing cancer, I think if, if we can avoid the downsides, then this stuff about, you know, about curing cancer, extending the human lifespan, um, you know, solving problems like, like mental illness. And I think one of the big motivators for reducing that 10 to 25% chance is, you know, how how great it'll <laugh>, you know, is trying to increase, is trying to increase the good part of the pie. It's great to be part of it. It's, you know, it's, it's great to be one of the ones building it and causing it to happen, but there, there's a certain robustness to it. And, you know, I find I find more meaning, I find more, you know, when, when this is all over, I think, you know, I personally will feel I've done more to contribute to, you know, whatever utopia results. The lesson that I wanted to take from this is one powerful AI is probably coming. And two, once you put AI to important decisions, those decisions intrinsically contain a notion of ethics. And so this discussion about how we use this technology has to involve everyone. And so that's what I'm hoping my talk is gonna lay the foundations for today. And then where should we be? AI over its history has really had these two branches, one the kind of good old fashioned, what we now call good old fashioned ai. Now, learning things from data may sound extremely mysterious. We want to separate cats from dogs and we have a couple of measurements about them, maybe their weight and their tooth length or whatever. But a computer doesn't know that. So all the dogs are on one side and all the cats are on the other. And it can see that like, okay, I've got sort of 100% dogs on one side, but basically 50 50 dogs and cats on the other. And then it makes a new guess. Now it's got a hundred percent cats on one side, two thirds dogs and one third cat on the other, slightly better. And so if a new data point comes in like this, a triangle like this, the computer can just look at the boundary. And make a guess. So there's a limited amount that I can fit to, and this is where neural networks come in. They have non-linear, they have the ability to do non-linear fitting, uh, and they have millions, billions, or even trillions of parameters these days. And so neural networks let us take in thousands of inputs, give out outputs into thousands of categories and do this kind of fitting, but in not just two dimensional space. So you won't be able to visualize it, but sort of keep in your mind the picture of three-dimensional space and volumes in three-dimensional space. And then if the new point falls inside that making that your prediction. So I'm sure basically everyone can people just sort of nod if they've, if they know what, yeah, okay, good. So a dog has four and we show this to chat GPT and we say, please predict the next word. And then we compare the two things and we say, uh, the 'cause we know the real sentence and we know its prediction and we say, cha, GPT, you got that wrong. Now adjust those boundaries, the kind of the equivalent of that line, but in this high dimensional space of all words, adjust those boundaries such that you predict legs next time. Okay, so you're now basically at the cutting edge of AI in terms of how these things work. How do we like conceptualize these on a single axis? And then perhaps chat GPT sits somewhere in the middle. Like kind of like adding two numbers together. And so this here is the compute used on some of the famous algorithms for the last 50 years. And uh, this point outlier at the top there is, um, is GPT-4 that's gone up 10 million times in the last 10 years, But to show the next year or two, I have to shrink all of those data points down to this. And so if we estimate in the 18 years, the first 18 years of life, roughly how many sort of flop equivalents do we think the human brain uses? So effectively the training time up to 18, where do we think it falls on this plot? Of course, big caveats on whether we can actually do these calculations effectively. Now, something that will run through many people's heads is aren't human beings somehow special? Is this not too me mechanistic reductionist, uh, to try and compare the compute like in this really obvious fashion? And in fact I claim that that kind of magical thinking has actually misled science in a bunch of circumstances. We now know that in our galaxy, well, we're not even the center of our solar system, nevermind the center of our galaxy in biology. We were given a very, very special place. And nowadays we think we evolved like all the other animals in quantum mechanics. Uh, for a decent period of time, probably a couple of decades, people genuinely wondered whether it required consciousness to decohere a wave function. Uh, I'm happy to ask answer questions about it at the end if people are interested. Um, but that does then like genuinely provoke the question, what is in the top right of this plot? And so this is where people will start talking about super intelligences. Now, people's default notion immediately when thinking about this is to sort of anthropomorphize it as like either a god or a devil and nothing in between. I think it's better to think of a super intelligence as kind of like, uh, this, uh, sort of disembodied idea of it has some goals, whatever, it's whatever we've either programmed it with or it's learned and it can make moves in the, uh, in the real world with whatever actions it has available to it and it has an understanding of the world. Whereas Sam said, and Dario would've said, but didn't quite, uh, uh, it's actually these accidental use ca misuse cases that are more concerning. Like a totally crucial question here. So, uh, nobody has built one as far as we know and, uh, and so we can't say for certain it's possible, but we can say that there is no known theory of science that prevents it. They can send signals faster, they can store signals for longer times, they can retrieve things with higher veracity. And so I think you have to, like the extraordinary claim that demands extraordinary evidence is positing that a super intelligence can't be built rather that it can. So what could we do about ai? So let's take a slightly hypothetical scenario. And so we set it going and it looks around at the world and it says, okay, uh, I'm slightly anthropomorphizing it here. And it recognizes that there are actors out there whose livelihoods to depend on the production of carbon in the atmosphere. Now, that's not to say there aren't ideas about how you create an AI with an off switch. So a professor called Stuart Russell at Berkeley, um, has a set of ideas where you insist that the intelligence, uh, maintains great uncertainty about human preferences. It's actually not obvious that if you just like let things happen and train the most powerful agents you could, there would be, uh, there would be an obvious off switch in the way you might imagine to begin with. Uh, however, however, in this green area, there's a lot of safe region. So in the top right there is, you know, a whole set of, uh, uh, potential technologies that we can't currently understand and can't currently control in the green region. We do understand them and we can control them, like artificial intelligence has been used totally safely for decades in the real world. It'll build great technology companies that will actually meaningfully change the way we live. And powerful AI can be a path to solving these. But critics will say, you know, it is unknown and it's not obviously safe. And the dynamics of building it are genuinely complicated. Individuals, nations, companies, they're all have reasons to race a bit more than they'd like. But because we want to encourage a love of learning, we think it's well worth it. We never make you pay for lectures, although donations are needed. We never make you pay for lectures, although donations are needed. And if you haven't already, click the follow or subscribe button from wherever you are listening right now. And if you haven't already, click the follow or subscribe button from wherever you are listening right now. Is it really realistic that we'll regulate a hypothetical unknown technology effectively in a way that actually ends up, uh, making things safer? A poor regulation could easily make things worse if it privileges bad actors over good actors. Um, because I think there are two fundamental crucial questions that underlie this, that enable us to have a bit more thoughtful discussion. Do we have a short time? Now, five, 10 years ago, the long time you would've found people who said, we probably have like a century or two. And now the people who say the long time are like, you know, a few decades and the people who say a short time say two to five years. Uh, and then how easy would it be to what we call align super intelligence, but basically get a super intelligence to care about us? I think we have a long time and it's gonna be really easy to get it to care about us. I think it's gonna be an extremely hard problem to get it to care about us, to align it to our values. We've got ages to figure it out. The technologies that you work on to try and make things safe right now aren't actually gonna be the technologies that it's built with. Nobody knows where the actual world really sits. Or is it gonna be easy? Um, actually I think probably the right attitude is to kind of fill in the, all the regulations on this two by two and then say, how do we, uh, uh, like mitigate the worst downsides and um, uh, like get, take advantage of the most of the upsides. And so what would a better, a better version be was perhaps two paths that strike me as safer. And the other is to start in the red zone, uh, but use kind of formal mathematical guarantees that what we're doing is actually safe. And so, you know, one of the things that seems important is in the early stages of training these, uh, models, um, you know, we are, we protect intellectual property carefully with copyright and people demand or people rightly get the value of what they're doing. Before you can build a nuclear power station, you have to give all kinds of quantitative numbers to demonstrate the safety of your, um, nuclear power reactor, which means that we do have relatively few nuclear incidents. Of course, the kind of contrary side to that is that actually they have sufficiently limited the application of nuclear power, that now more people die of the kind of effects of air pollution from coal than die from nuclear incidents. This seems like an important concept that we can lift across for AI from the pharmaceutical industry. So before we, uh, are, before we put a drug out into the world, we are very careful to make sure it's tested in every, uh, regard possible so that it can be as safe as possible. Although of course the downside to this one is that people say this is, this process is now taking too long and too expensive and actually harming people by preventing like good medicines from getting out rather than purely, uh, uh, a safety, um, factor. And finally, in aerospace, um, if there's ever an accident, there's a very truth-seeking investigation on trying to figure out what happened, like a sort of very carefully blame-free culture so that we really understand exactly what failed so that we can make these things safer over time. So I'll just take two minutes to explain a little bit about this approach. So we've designed it to be both powerful and trusted, and that means that there are sort of three basic tenets, um, for every system that's built according to these principles. Uh, and every algorithm has a governance mechanism controlling its actions, it's modular so that each component can be separately tested and understood. So it means that you really can build in this incremental approach, um, making sure things are safe as you go, and it is fundamentally human first. So it is explainable both structurally and algorithmically. And so basically what this means is humans can actually look at the systems and understand what they're doing and why in such a way that they can make good choices about whether to implement it or not. So what should we do about ai? But that is unfortunately the sort of position that we are in, in the debate in AI today. And so at the very least, we need to start carving out these two categories. The red zone, I think it is sensible to be cautious. The problem is, it's not totally clear what the right way to be cautious is, is the sensible path to regulate with all the downsides and complexities of regulation, or should we build in a more open fashion in the kind of open source ideals of many eyes, make light bugs or make shallow bugs? And crucially, if we are to build these things, what values should AI represent? And while this may seem slightly unsatisfying as an ending, building transformative AI is going to force us to decide what it means to be human. We are going to have to decide what we put into these algorithms, what we care about, and what we are willing to fight for. And at the moment, we are just in the middle of the story. There is no ending right now. The the end would depend on the decisions made by people in this room. Thank you very much. Gosh, mark, that was a brilliant sort of sweep across everything I remember in awe of what you were talking about. Now, uh, my name's Richard Harvey. Um, let's start about, start with bias. And I suppose <laugh> a biased AI perhaps isn't the sort of killer AI that one might have imagined, but it's certainly a bad ai Certainly, and so, so yes, uh, so AI can be biased. Um, it certainly depends on the training data, although there are, it is interesting in that once you put something into maths, it's actually often easier to fix faster because human brains are, well, we've tried for centuries, maybe millennia to remove bias from human brains. But I just, one like slight, uh, kind of tangent to that question. So people talk about like AI ethics as different to kind of AI safety and you know, there are all sorts of inside baseball fights between whose bit is more important. Um, I like to use the analogy of a car, like I want seat belts and I want a catalytic converter because I care about the short-term risks of crashes, but I care about the long-term risks of global warming. And I think insisting that you have to do one or you have to do other of those is just fundamentally a mistake. Yeah, yeah. Do, do you know of any progress on, on that, you know, um, so for characterizing intelligence for example, or, um, even characterizing malevolence in any numerical way? So I think people feel like as we step forward in our knowledge of, um, ai, we can sort of make analogies back to the human brain, but I don't know that anyone would want to. Now, I'm not sure if that assertion is completely <laugh> true, but, but perhaps behind it is the how, how general, how, how do you measure generality of an ai? Uh, well the, I mean, so these are huge, huge questions. I think I, sorry, I've There's metro question. And so I think, you know, it, it seems fairly clear that we are making progress in that like it's, um, GPT-3 four have substantially increased our, um, ability to have a single algorithm, um, uh, solve tests in very different domains. Uh, so, so I, I don't think I, I basically don't think I agree with the underlying assumption of the question. Mm-Hmm, <affirmative>, yeah, I think we are making progress, um, how fast is really gonna depend on what extra scientific breakthroughs we need. So I don't think that LLMs as they're these large language models, the, the name for the algorithm that sits behind, uh, chat GPT, I don't think those are enough for general intelligence, um, uh, in the way that most people mean it. Uh, but we get into some quite detailed specific questions around, um, definitions here. Mm-Hmm,<affirmative>, yeah. Need like PhDs these days. And that can't be human led in this regard. So let's talk about a circumstance that, that faculty would, uh, work in. So, you know, digital marketing is, you know, there are these exchanges where transactions happen on the millisecond and obviously a human being is not gonna execute every one of those transactions, but the human being can set the framework for, uh, the strategy, how much you want to spend in what given time on what types of campaign, and then can set the governance protocols that sit around the algorithm. If, um, we, if you spend this much in this unit time, then stop trading these kinds of things. Yeah, great answer. Um, yeah, here's a good one. Yeah, I think, um, I don't know that we have to be super precise though for, uh, for the purposes of, um, oops, ensuring safety. Maybe the boundary doesn't really matter. What matters is exactly you, you know, the end points. Exactly. And then in some sense there's a fuzzy boundary to the red zone and then it's gonna be our decision on how much risk we choose to take in that boundary. And there's an interesting question here, which I've temporarily lost, but let me paraphrase, which is, um, so AI is, is very dependent on its training data and the questionnaire is asking about data poisoning. Um, in the detail there are many, like the kind of coffee robot failure mode, there are many, many examples of sort of equivalent failure modes that can happen including sort of a data poisoning such that, uh, the coffee robot thinks it's executing on what I asked, but, but in fact is doing something completely different. And, um, people are working on, uh, on a variety of paths, including just making sure the data's reasonably good, um, uh, to kind of solve that. So when you are talking about ai, you are implicitly talking about, um, not only the algorithm, but also the, the data that that feeds it, this sort of classic chat GPT problem, isn't it that it, For sure. It's, It's ingesting stuff that might not have been safe in the first place. Uh, so I mean, of course the way our algorithms work at the moment mostly is, um, that they get trained and then that trained model goes into production.<laugh>, ask someone here. Like that is a magnificent opportunity to transform education. Yeah, great answer. So I, so do you guess can't remember. Anyway, what I would say is, um, I think there will turn out to be some notion like, so in computing we have this notion of a universal classical computer. And then we, so that was for a while, for a long time thought to be it if you just had those logic gates, you could do perform any computation. There's, there must be a hundred here. Sorry about that. For those of you in person, I guess you're around for a few minutes so people can badger you. Other than that, mark, thank you very much indeed.