Gresham College Lectures

Logarithms: Mobile Phones, Modelling & Statistics?

June 20, 2024 Gresham College

Logarithms were perhaps once thought of as just an old-fashioned way to do sums on slide rules. But they underpin much of modern life, from modelling the COVID pandemic to Claude Shannon’s mathematical theory of information (which makes mobile phones a reality) and making sense of Cristiano Ronaldo’s crazy Instagram follower numbers.

This lecture will explore the basics and history of logarithms, and then show how they are a natural way to represent many models and datasets.


This lecture was recorded by Oliver Johnson on 22nd May 2024 at Barnard's Inn Hall, London

The transcript of the lecture is available from the Gresham College website:
https://www.gresham.ac.uk/watch-now/logarithms

Gresham College has offered free public lectures for over 400 years, thanks to the generosity of our supporters. There are currently over 2,500 lectures free to access. We believe that everyone should have the opportunity to learn from some of the greatest minds. To support Gresham's mission, please consider making a donation: https://gresham.ac.uk/support/

Website:  https://gresham.ac.uk
Twitter:  https://twitter.com/greshamcollege
Facebook: https://facebook.com/greshamcollege
Instagram: https://instagram.com/greshamcollege

Support the show

<silence> Well, thank you very much. Um, it's, it's lovely to be here. It's an absolute honor to be following some great speakers that have given this lecture before. And thank you for coming. I know there's distractions tonight. There's an election, there's all kinds of things happening, and you, you've, you've come out in the rain to a maths talk, so you know, you are my people. Thank you, <laugh>. Um, so because it's a math lecture, I thought we should start with the maths question. So here's a nice easy one just to kind of get you warmed up. So, and anybody in the room gonna tell me the answer to this? Oh, okay. This isn't good. I've, I've obviously misjudged the audience a bit.<laugh>. Um, okay, so let's spoil it. There's the answer. I mean, I, I'm, I'm, I'm a bit disappointed in you guys. I'm a bit disappointed you didn't know this one. Okay. Um, let's try something easier then. Let's try something easier. Um, who can tell me the answer to this one? Yes. 72. 72. Fantastic. Okay. Right. That's good. Now, my claim is that these two questions are as easy as each other. My claim is, these are the same question. Um, and I want to try and explain why that's true, the sense in which that's true. And you'll see there's a little bit of kind of subtle color coding here. There's some, some greens coming in here. The greens are there for a reason that we'll see as we go on. Um, but what I want to do is to try and explain why these are the same questions. And to be honest, this was, I mean, I, I got the invitation to speak at Gresham and it took me about, I wanna say 30 seconds. I'm not sure it was even that long to decide to accept, because one of the reasons is that what I'm talking about kind of started here. I mean, not in this building because I, under, as I understand it, Gresham wasn't in this building at the time, but some of what I'm gonna talk about was done by Sarah's long distant predecessor. And another predecessor is professor of, of astronomy. So it's kind of almost kind of coming home with some of this stuff. So what I'm gonna talk about is, I'm gonna talk about logarithms and I'm gonna talk about the applications. And one of the applications, and this is sort of slightly self, self-referential, is I'm gonna talk about things going viral. So the last time I gave this lecture, I, I did the same question and I did it with different numbers.'cause I, I, I thought somebody might have cheated and watched the lecture and, you know, crib the answers. I changed the number. Um, but this is the number that was interesting. 12.3 million. This was Oxford, put this on Instagram. This is how many views there were of Instagram, of me doing a maths lecture. So what I want to talk about also is not just logarithms, but things going viral, things spreading things going fast. Now, we've obviously all had a little bit of experience of this lately, rather more experienced of this than, than we might like. So I don't know who remembers this slide. This was maybe one of the most fateful slides of the whole pandemic. So this was the 21st of September, 2020. So this was just beginning to move into the second wave of the pandemic. And at this stage, um, Chris Witty and Patrick Valance gave a press conference. They were worried they could see what was coming. They gave a press conference and they presented this slide. So what they said was, okay, if doubling occurs every seven days, what would it look like? Now they're very clear here. They're not saying this is, this is a prediction. They're not saying this is definitely what's gonna happen. They're saying, if, you know, if it did double every seven days, this is what would happen. Now, as far as we are concerned, there's nothing to argue with here, right? I mean, this is, this is true. I mean, what they're saying is, okay, at this point, there are 3000 cases. This is the 15th of September when they've got the data up to you double that every seven days, you double that. Four times you multiply by 16, you end up with this kind of scary growth. But I think from a kind of public relations communications point of view, I don't think this was a good graph. I didn't think this was a good graph to use because you look at the blue data up to this point, and you look at the red data that, okay, it's not a, it's not a prediction, it's a projection, but still, it, it doesn't look like there's a, there's much of a link between these two things. I mean, you look at this, it doesn't feel like this is a trajectory we're, we're about to go on immediately. Right? Now, I think also from a kind of PR point of view, there's a sort of slight bad choice of color scheme because some of you may, may have seen this, this is disco stu from the Simpsons, um, who did a sort of similar sort of extra extrapolation disco. Stu is sitting there saying, if disco record sales double throughout the 1970s, ah, you know, all this kind of thing. So, um, is disco stu, right? We, we we're witty and valance, right? This is the kind of question that, that we'd like to answer. Now, obviously, we have the luxury of knowing what happened. I mean, we know that there was a second wave. We know that we, you know, we did go back into lockdown and all these things did happen, but equally, um, witty and valances prediction, that wasn't a prediction, that was a projection didn't happen. I mean, this was, I mean, this slide is slightly cheeky, and I'll say a little bit about that in a second. But broadly speaking, the curve was meant to go up like this. This is, you know, various ways of, of doing the data, you know, reporting date, specimen date, seven day average, you know, all this stuff that we used to kind of worry about a lot. But what you can see is that, broadly speaking, the projections that it would take off the actual data didn't, or not to the same extent. Now this slide comes from Carl Hannigan's group in Oxford, who, you know, he was a kind of skeptic of all of this stuff. And so he was presenting this to say, well, what a ridiculous predict, you know, prediction. This was obviously all overblown, but still, as I say, the, the, the, the second wave did happen. I mean, one sense in which this is cheeky is he, he's run this on a week further than he should have. I mean, he, we said that the projection would run out four weeks to the 14th. This would be 49,000. He's run it on another kind of eight or nine days. So it's, it's gone further than it should have done. So, you know, he's not even comparing with the projection that it's made. And you can see that that has the effect visually of compressing down the growth that we did see. Um, but still, I mean, we can't argue with this. I mean, it's not the case that on this date when Witty and Valen said there might be 49,000 cases that there were 49,000 cases. So kind of PR problem. So what should we have done? So if, and I'm very glad I wasn't, but if I had been trying to present this data, trying to present these numbers, here's what I would've done. Here's the graph I would've shown. So what we have here is we have the same thing. We have the data up to a certain point. So what we have is we have the particular days of data plotted as red dots. And what you can see is that instead of having a kind of line that goes flatten and swooshes up, I've got a straight line going through this. Now, this is kind of what the talk is about is you'll notice that what I've done is I've done something weird with the y axis that if you go back to this graph, you can see that the, the spacings are in some sense additive that we go from zero to 25,000 is the same size as going from 25,000 to 50,000. The steps are additive amounts. My steps are different, my steps are multiplicative. So for example, my step from a thousand to 2000 is the same as my step from 5,000 to a to 10,000, which is the same from 10,000 to 20,000. This is a log scale. And so this is what I want to talk about. This is what I want to evangelize for and to say log scales are the right way to plot certain things. And this is an example of one of them. Now, one thing that you'll notice is that sort of for free, I got a doubling time here that that Witt and balance made this particular number. They said, okay, seven days if it doubled every seven days. I'm not saying if I'm saying it was that, what I'm saying is that from this data at this point, if I put a straight line through this, then the amount of time it takes to go from this straight line being a 2000, being at 2000 is turns out at the time 12.4 days. So it's slower, but it's still doubling. But what you can see is that because it's a straight line, it goes up these doubling steps, it takes the same amount of time each time it takes 12.4 days down here, but it also takes 12.4 days up at the top. Okay? So the obvious question is, well what happened? You know, I said the witty valance thing didn't happen. What if I'd done the graph, what would've happened? Well run it on, there's the data. So what you can see is that this kind of very crude log scale drew a straight line through it projection that maybe was a projection or was a prediction, I dunno what it was. But that by squashing the axis up in this particular way by representing the data so that the way that it grows is on a straight line, you can actually see that it did a pretty good job of picking out what would happen into the future. So it's not saying that we would end up at 50,000, it said we would end up at at 20,000. But the point is that was enough. I mean, doubling is doubling. I mean, it's maybe taking slightly longer to get to the stage where you run out of hospital beds, but still it, you're gonna get there eventually and this sort of projection is telling you how fast that's gonna happen. So that's the kind of theme of the lecture. Now, I wanted to kind of go back a little bit in history on this. So first of all, I wanted to tell you what a algorithm is. I mean, I know you are here in the rain for a math lecture, but I just want to kind of just check that we're all on board with this. So here is a simpler problem than the one that I did at the start. So we had one that people could do. Here's another one that hopefully everybody can do four times 80, 32. Hopefully everybody's on board with that. Yes. Good. Okay. I'm seeing some dots. Now the, obviously we know that because we learned our times tables, but there is another way to think about this, which is that, again, I've been slightly cunning here that when I say four, the way I'm thinking about four is I'm thinking about four as two times two, okay? Which obviously, and again, you'll notice this is in green, I'm thinking of as two squared. So there are two factors of two in my four here. The eight, again, I'm lucky or I cheated depending on how you think about it. The eight is, again, it's multiples of two. Eight is two times two times two. There are three factors of two in the eight. So what I have is that I have two cubed. So again, in green. So now when I come to do my sum, when I come to do this multiplication, then what you can see is that that to to, to know that the answer is 32. When I take four times eight, what I need to do is to say, well, I've got two factors of two from the four, I've got three factors of two from the eight, two plus three is funna. In total I've got funna factors of two in, in the product. And so what I've managed to do is I've managed to turn a problem about multiplication, which is hard into a problem about addition, which is easy. So this is the idea here that, that what we've done is just that, just by counting the factors of two, seeing how many of these things there are, what's driving this, and you can see this in green, is the two to the two times two to the three is two to the two plus three, which is two to the five. So that's what's underlying this equation. The four times eight is 32 that we know. We can also think about it as being an equation that's to do with with without adding things up. So what we can do is we can focus just on these numbers in green, we can focus on the two and the three and the five. And these are the logarithms. This is, this is, you know, if, if I was to define to you what a logarithm is, then you know it's there. These, these are the things formally speaking, I would say it was the, the inverse of the exponential function and all this kind of thing. It's the thing that you put in the power. It's the thing that you put on the top of the, of the power to, to to, to make the, to give the answer that you want. And so the rule is this, that if the numbers multiply the logarithms add, so the, the fact that it's five, it's two plus three where you add together the log, the, the, the logarithms. And so my silly maths question at the beginning, I cheated, I found online there is a table of powers of two. Um, I looked up what is two to the 31. Well, two to the 31 is the first of these big numbers I wrote down. I looked up two to the 41. It's the second of these big numbers that I wrote down. So this scary looking product that I had at the beginning, I can represent it's two to the 31 times two to the 41, which is two to the, and we had the answer at the back 72. So it's a cheat. I cheated. I mean I, I I, you know, clearly there's a sense in which I have slightly deceived you here. Most numbers aren't powers of two. Most of the time I can't do this trick except I sort of can. Um, because you know, even if it isn't a perfect power, then similar things will work. If you'd given me two big scary numbers in theory, I could have gone back to my powers of two table. I could have looked up. It wouldn't have been exactly a whole number, but, but the numbers would've been two to the something, the, the numbers I could represent as two to the, something I could look up, I could solve, I could find what the somethings are. And then the multiplication, I could do the same thing. I could multiply 'em together. Now, I mentioned this already, but this is where I'm, again, I'm using green as code. Um, so green now was for logarithms, green is also going to be for Gresham. So here are a couple of people. So generally I think if you talk to mathematicians and say, who invented logarithms, I think if anybody was to mention a name, the name people would come up with would, would probably be Napier. So Napier is, is on the left here. But actually somebody that sort of maybe doesn't get as much credit as they should is, is is Briggs. So Briggs was, as I say, the first Gresham professor of, of geometry, um, who actually, if, if you, and the, the Wikipedia page is very good on this, uh, nap p's logarithms aren't logarithms as we know them, that, that they don't satisfy exactly the relationship they have instead of being the, the, um, the logarithm of the product is the sum of the law algorithms, it's, there's an extra term, there's an extra weird factor in there. So actually it was, it was Briggs that straightened this out. It was Briggs that realized there was a better way of representing it, that what we were aiming for was this kind of product rule. And so Briggs was the person that figured this out. So that's one Gresham connection. The other Gresham connection, um, some people in the audience, let's do a show of hands. Who has used a slide rule? Yes, I'm, I'm, I'm, I'm just about missed this, but you know, a lot of people in this room have used a slide rule. What you'll notice is that if you look at a slide rule, it's exactly like my log scale on the graph that what we have here, the step from one to two is the same as the step from two to four, which is the same as the step from four to eight. So what a slide rule does, is it, it's, it's a physical device that has a log scale in it. And so you multiply numbers using a slide rule. You are basically using these laws of logarithms. You're using exactly the same property that I did with the number that I, that I, that I came up with. And so again, there's aggression connection. So the, the idea of a log scale was invented by Gunter, who was, I think the third Gresham professor of astronomy came up with this idea that actually, you know, instead of plotting the numbers themselves, you could plot the logs that this was an interesting thing to do. And outed turned this into a physical device. Outed said, well, okay, what we could do is if we had a piece of wood that was log X long and we had a piece of wood that was log y long, then we could put those two pieces of wood next to each other, the, the, the length of wood be the log of, of the product. And so it, it, it's a sort of nice chain of ideas that sort of says, okay, we start off with this thing that Napier came up with. Briggs refined it, Gunter invented the log scale out, turned it into a device. It's a nice story of how sort of mathematics works often that these things build on one another. That, that it's often not just the case that one person has an idea, but there's, there's often sort of teamwork. Okay? So what I'd like to do now is to sort of try to explain to you why this trick with the log scale and the pandemic worked. So my claim is, and I had innumerable arguments with people on Twitter about this for about two years. My claim is that a log scale is the right way to represent a pandemic. I mean, we sort of saw this already from the fact that we plotted the numbers, they lay along a line, but, you know, was that a coincidence? You know, what, what's the reason for that? Well, on some level, the point is that the way that epidemics work is they tend to multiply. There's a sort of multiplicative nature built into them. And this is why they're scary that the idea is that roughly speaking, each person who is infected infects a similar number of people. I mean, this is this famous R number that we used to worry about a lot, that if the R number is three, if I'm infected on average, I will infect three people. Those three people will go on and infect three people each. So it's three times three, they will infect three. And and so you get this exponential growth coming out of it. Now, the, the next stage of the argument that I have on Twitter then goes, people say, oh, but exponential growth doesn't go on forever. And it's like, well, no, it doesn't. I mean, clearly it, it, it, it can't do. I mean, you know, you you, you take these numbers, if I infect three people, they infect three people and so on. You can quickly see that in a certain number of generations you will run outta people. There aren't enough people in the world before this thing starts to, um, starts to run outta steam. But the point is that it can go on exponentially for, for, for much longer than we'd like that it does run outta steam eventually, but particularly in the sort of early waves of the pandemic before we had any vaccines, the exponential growth could be uncomfortably long. And so we'd like to understand how this happens. And actually, it turns out nearly a hundred years ago we understood this. So there is this classic SIR paper of 1927 that sets out some mathematical equations that essentially says how, how pandemics evolve. And so it's a toy model. It's, it's, it's not a model that's perfect. It's not a model that captures everything, but it's a model that works pretty well, you know, for a long time. It gives you a good idea of what's going on. So the idea is this, that what we think about is we think about, there are three types of people. There are people who are susceptible, so that means they haven't been infected yet. There are people who are infected, so they've got it, they, you know, they're in danger of, of giving it to other people. There are people who are recovered. Now what you'll see is that as a story for the first few months of the pandemic, this is pretty good. The issue with this is that, for example, this doesn't allow for reinfection. So the way that this model works is when you're recovered, you're immune, you've got this, you've done forever. So for some diseases, this would work as a model for covid. Whereas we now know, unfortunately, you know, reinfection and things are possible. It's not perfect, but, you know, early on in terms of the dynamics, it's, it's, it's, it's, it's not bad. So what they did, Mack and McKendrick who wrote this paper, um, wrote down this set of equations that say how the number of infected, susceptible, and recovered people change over time. Now, the interesting one, I mean they're all interesting, but the, the most interesting one is probably this one in the middle, this equation here, this is saying how the number of infected people changes with time. That's the left hand side. DI by dt, how does the number of infected people change with time? And what you'll see is that there are two effects here. There's a, there's a, a positive term and a negative term. So the negative term is here, there's a minus gamma. I, so what that's saying is that people get better, A proportion of people, a, a, a certain fixed fraction of people who are infected get better. So there's a minus gam RI people drop outta the infected class and they turn up over here. The DR by DT is positive that people move from infected to recovered. As you know, the symptoms wear off, you know, 10 days or whatever. It's the more interesting term is the first term. This is the term by which people get, um, infected. And this is kind of representing a sort of degree of interaction here. So what you can think about is that, that at a particular time, there are eye people that have this disease eye, people have this disease left to themselves. They would infect beta people say there is some number beta of people that they would infect. So you would multiply it by beta, but not all the people that they could infect are infect. Some of the people that they meet have got it already or have had it been recovered. So what you see is that there is this term here, s over N, which is kind of the fraction of people that they meet who are capable of being infected. So left of themselves, they would infect beta times I, but actually they only meet, you know, the proportion of people that they meet who are, who could get it are s over N. So in total, what you get is you get beta times I times s over N. So that's the sort of term that says this is, this is the rate at which people get infected. I dunno what beta is, I dunno what gamma is. They're, they're numbers. But the point is that, that these should stay the same over time. These should stay roughly the same. Now, if we go back to this equation, if you think about it a little bit more, you know, I've kind of written it out here as, as an equation. I'm doing my best with Microsoft's word equation editor. But the point is that what we can do is we can divide this equation by I, on the right hand side, there is an I and there is an I. So we could divide through the whole equation by I. So if we do that, then on the right hand side, we've got something simpler. We've got beta times s over N minus gamma, okay? So on the right hand side, that's become simpler. And the point is that by doing this, people that are doing a level and doing calculus will know this, that the derivative of the log is the di by DT divided by the I itself. So people that aren't doing a level or did a level a long time ago, I'll let you off if you don't know this, just trust me on this, that dividing through, we get the derivative of the log. So if we look at what this story is telling us, what this story is telling us is the, the, the, the log of the number of infected people is the right unit in which to be thinking that instead of thinking about the rate of the, the rate at which the number of infected people changes, we think about the rate at which the log of the number of infected people changes. That's a more natural thing. And the point is that roughly speaking, this right hand side is constant. I mean, it's not, but for a long time it is right that roughly speaking, the proportion of susceptible people doesn't change very much. You know, how many people were susceptible today is probably similar to the number of people that were susceptible yesterday. I mean, if you run it out, say three months into the future, then no, but you know, for a few weeks at a time, roughly speaking particularly early on in the pandemic, the s over n is roughly constant. So roughly speaking, what this is saying is that the right hand side is roughly constant. So the derivative of the log is roughly constant. So this should be roughly a straight line, and that's what we saw. So that's the kind of argument from the kergan, um, McKendrick paper to go back to that, to say, broadly speaking is what we expect is exponential growth or decay. So just to kind of illustrate this, to go back to real covid data again here is this was Northwest hospitals up to similar sort of dates. The 19th of September doesn't look too bad, right? I mean, you know, we, we, we have this level, you know, in the first wave we hit 3000 people in hospital. We've gone right down, okay? It started growing a little bit, but it doesn't seem like it's anything to worry about. It doesn't seem like looked at like that. It doesn't look like this is, this is something that's gonna trouble us. I was going quietly spare about this because I was on Twitter plotting this stuff on a log scale. So what I've done is I've taken the same numbers and instead of plotting it on a linear scale, I'm plotting it on a log scale. Again, you can see that these, these spacings, uh, have this property that you have, I've done this, it's maybe not super visible, but here's a red line, here's a blue line that's sort of saying this is, this is what the real data is doing. And what I said is that, well, this is the level we got to before, the level we got to before was, was about 3000. And so what I said is that we're doubling every 12 days. This second graph shows that we hit the previous peak at Halloween on this rate. And so that was a kind of quantifiable prediction. That was a, you know, if we keep going at this rate, this is what we'll do. We'll hit the level at Halloween. Now, I don't know who remembers this, but Halloween 2020 coincidentally was the date on which the second lockdown was announced. I mean, I'm not claiming that I got it to within a day, but you know, roughly speaking, these kind of, you know, qualitative predictions came true, were much more accurate than you would've liked. And so this kind of picture off the dashboard where, okay, it'd gone down, it's going up a little bit. What we actually saw was precisely this kind of exponential growth. It's, it's, it's slow and then it's fast. If you plot it on a linear scale, it kind of hides the fact that this thing is gonna take off, it's going to go, going to get bad in the future. Whereas on a log scale that it's always the same degree of bad. It's, it's the same slope. You're just going up, you're just following a straight line. So my contention is, is is that if more people had plotted things on log scales in early stages of the wave, then we would've had a better appreciation of of, of where this stuff was going. Okay? That's not very cheerful, is it? Let's talk about something slightly more cheerful. Um, what I do as my day job, what I research on is an area of math called Information Theory. So a few years ago I was very lucky to be invited to a meeting in San Jose and the institute that hosts this is run by somebody who collects old maths things. He has a copy of Newton's print Kier signed by Stephen Hawking, that's quite called Two Lucasian Professors. He also has this, this is the Bell System technical journal of 1948. Now, if you are me, this is almost as exciting and possibly more exciting than Newton's pre kpia. I mean, what am I looking at? What I'm looking at is I'm looking at Claude Shannon's paper from 1948. This is a mathematical theory of communication from 1948 that was published in here that created the whole of information theory almost overnight. This is what I do. It was in there, those two, those two articles in, in in that journal created the whole field overnight. So what did he do? What he showed was that information we can understand as a physical resource, that information is something that we can measure, we can quantify it, we can compress it, we can send it over noisy channels. All this stuff came back to Shannon. So what he did was he gave the fundamental limits for lots of things to do with how our life is now. So when you buy a phone, when you buy a memory stick, you are talking Shannon's language, things to do with bits and bites. That's to do with Shannon. That goes back to Shannon's contribution was to, to kind of develop this language, to quantify these things. And the interesting thing is, this is related to what I've just been talking about, and the way to think about it is to think about the key idea is a, a source of randomness. So for example, if you think about an enigma machine, if you think about somebody in the desert in North Africa sending messages on an enig, on an enigma machine, they are typing things that we can kind of think of as being random. That the person typing things into the enigma machine is a kind of source of randomness that they're generating something that we'd like to understand. I mean, a more natural one is a source of randomness. Think about flipping a coin. So think about tossing a coin. Each time you toss a coin, you get head or a tail. That is a source of randomness. And of course we know it's a source of randomness because that's how you start. You know how, how, how you decide who bats first in the test match. That's how you decide who goes to lunch first. You know, if you, if you want to decide something, we know that coins are random and that flipping coins is a good way to do it. Now implicitly, we, we are building into that the idea that the coin is fair. Now, by fair what I mean is that it's equally likely to be heads and tails. So what I'd like you to imagine is that that may not be true. So we might imagine that there might be some coins that are very biased. We might imagine coins that are, um, very, very likely to come up. Heads not at all likely to come up tails. Now my claim is that the fair coins are unpredictable. That's why we use them. We toss a coin, you can't tell which way it's gonna come up. You can't tell whether it's gonna be heads or tails. The biased coin, you can't predict perfectly. But if it's biased towards heads, if you, you guess heads you'll be right more often than not. So coins that are biased are more predictable. And so that's a sort of way into this. Now what Shannon did was that he came up with a formula for this. So what he said was, okay, so let's, let's think about an event. It doesn't matter what the event is. Let's think about an event a and with suppose that this happens with probability P of a, there is some probability that this event happens. So for example, tossing a coin, the chance that I get a tossing a fair coin, the chance that I get ahead is a half. Now what Shannon asked, the question he asked is, if a happens, how much information do we gain from that? Now his first insight was that, well, it should be a function of the probability that there should be some function f such that the amount of information that we gain should be some function of the probability. There should be some function f such that we gain some function of the probability. Now he thought about this some more and what he realized is that unlikely events bring us more information. So I mean, if you're a football fan, man city won the league again, you know, yawn, we, we all knew that was gonna happen. Our view of the world, nothing has changed by learning that man city won the league again, the year Leicester won the league. That was really interesting, you know, that that was all of a sudden, you know, our whole view of football and who was good and who wasn't to change massively. So it's the rare events that are the, are the interesting ones. So Shannon said, okay, so we wanna function. So it should be the case that the unlikely events have got more information. So that suggests that f should be a decreasing function, that the smaller p is the bigger F, FP should be, but which one? Okay, if you weren't in a talk about logarithms, then it maybe be hard to guess you are in a talk about logarithms. So the answer is gonna be the logarithm, but why? Well, this is Shannon's other insight is to think about independent events. So to think about, for example, if somebody over that side of the room tosses a coin, somebody over that side of the room throws a dice, there is no relationship between what, what we get out of either of them, right? So the amount that we, that well, the probability of seeing a head and a five should be the product of the probability of seeing ahead and the probability of seeing a five. So what we know is that independent events, the, um, the probabilities should multiply. So the probability of A and B should be the probability of A times the probability of B. Now, what Shannon realized is that the information from those things should add up. If, if, if I learn the head and I learn the , it's like I've learned two things. So it should be the case that the amount of information that I've learned is the sum of the two things. So it should be that the information in A and B should be the sum of the information in a and the sum of the information in B. And if you put those two things together, we want a decreasing function, we want a function that satisfies this property. Essentially the logarithm is the only thing that works. There's a sort of paper that studied this that says that essentially the logarithm is the only thing that works. Um, technically it's minus the log. We have to take the function as minus the log. Um, and if we do that, if we take it to base two, so if our logarithms are based on powers of two, then if you do that, then the unit of information is the bit. So the bit turns out to come from using the logarithm, using the, the logarithm to base two is what gives us the bit. So it was pretty impressive. So Shannon, you know, came up with all of this just by thinking about sort of simple things, thinking about coin tosses, thinking about combining things together. And he went on from there, he did a lot more. So for example, he defined something called entropy. So entropy is the, it is the amount of information that we, that we expect to gain. Now this is a sort of slightly more complicated thing, but um, we can think about if I got all of the air in this room and I kind of squashed it down and I squashed it down, there would be a point beyond which I can't go. There's a point beyond which, you know, the kind of molecules would be bashing into each other. There's a sort of minimum volume. What Shannon showed is that there is a similar thing for information. He showed that, that if you take a file, you compress it, you know, you take a song, you turn it into an MP three, there is a point beyond which you can't go. There is a limit to how far you can, you can compress this stuff. And he showed that not only is there a limit, but there's a limit that's given by his entropy. And his entropy is defined to be the amount of information that you expect to learn. It's again, it comes from the log. So it's kind of remarkable really that, that through these experiments, through this kind of thinking about this thing, he was able to to, to come up with this quantity. And the kind of key slogan is here that the more unpredictable the source, the more space you need to summarize it. So fair coins, you can't compress the outcome of a sequence of fair coins, heads, tails, heads, tails, heads, tail. You have to give the whole sequence. There isn't a better way to represent it. The sequence that comes up heads all the time. What we can do is we can do things like, we could say, well, okay, the first seven tosses, you go seven tosses to the first tail. You go 11 tosses to the next tail. After that you go 13, you know, whatever it is. So, so you can count on the head, there's ways of representing these predictable sequences more efficiently than there are of representing heads and tails sequences. Okay? So that's the log again, the logarithm here is underlying this entropy, which is underlying this sort of, um, idea of compression, right? One more thing I want to talk about, and for some people in this room, this is gonna be very important, I wanna talk about Instagram. Now, when you talk about log scales, one reason that people like log scales is people like log scales when the numbers vary by huge amounts. So numbers that vary by massive factors taking the logs somehow tames them, it brings them sort of back down to earth. Now, you may not even realize that some of these things are log scales, but the, the Richter scale is a log scale. The Richter scale for earthquakes, pH scale is a log scale, decibels are a log scale. All of these things are log scales because taking these things and and taking the log turns them into numbers that we can handle into numbers that we can, we can, we can understand. So for example, the Richter scale is set up with logs to base 10. Each time you go up on the Richter scale, each step up on the Richter scale represents 10 times more energy in the earthquake. So, I mean, you know, Rick to nine doesn't sound, I mean it sounds bigger than Rick to two, but it doesn't sound so much bigger than Rickter two. But the point is that going from Rick to two to Rick to nine, you're going up seven steps. You're going up seven steps, each of which are a factor of 10. So you are going by 10 to the power of seven. So the point is that a Richter nine earthquake is 10 to the power of seven. It's 10 million times more powerful than a Richter two. Okay? So that's why people like the Richter scale turns numbers into things that are understandable. Now, the obvious question is social media followers. Christiana Ronaldo has 600 million people follow him. I mean, I, I wrote this talk a little while ago, but he, he's probably got more now. He, he's got 600 million people follow him on Instagram. A lot of us, you know, a lots of people don't have 60 Instagram followers, you know, so again, there's a factor of 10 million there. So why do these numbers vary on such a wide scale? You know, why is Christiana Ronaldo 10 million times more popular than a lot of us on Instagram? You know, can we understand the way that social networks behave to understand how this happens? And if you are mathematicians, then you can try. So if you're a mathematician, the first thing that you might do is you might draw a picture like this that you might have in your mind a picture that sort of something like this. This is a sort of toy social network with six people in the network. I mean, maybe there's, you can think about this as a sort of class at school that this represents people being friends. So there's a line between these dots. The dots represent, the circles represent people, the lines represent the idea. These people are friends with each other. So with six people, you could map out the sort of friendship relations within the group. It might look something like this. If it's a social network, maybe you have arrows, not lines. I mean, I'm gonna kind of gloss over that really. Maybe you have arrows, not lines, because some people follow somebody else, it doesn't mean they follow back, but you know, you could think about a graph that looks something like this. Now the simplest model of this goes back to Erish and Renny who were Hungarian mathematicians in the 1950s. So what they defined was they defined a very simple model that they defined a kind of symmetric model that their model, what they do is you, you have n people who could be in the network, some number of people who could be in the network, you fix a probability. So there zen people in the network, there's a number P, which is between Norton one, that that's a probability. And all you do is for each pair of people, you connect them with a line or you don't, and you connect them with a line with probability P. So there's a similar chance of every potential pair of having the line between them, each potential pair of people you have, um, a line between them with probability p. Now from a mathematical point of view, this is quite fun. Um, it turns out there's some, some quite interesting behavior with this. So for example, if n times P is less than one, it turns out that the network, it isn't really a very good network that what you have is you have very small little isolated islands that don't really kind of communicate with one another. If NP is bigger than one, then a fixed fraction of the network is joined together, all of a sudden this whole thing starts to work. So if you are Elon Musk and you own the network, you want to get over this threshold, you want to get over this level of, of connectivity so that the whole network is, is, is joined together and so that things can, can pass around it. But the problem is this doesn't capture what we see. I mean, what we can do is, this is a sort of simulation here that if I took a model that we had say a hundred thousand people on the network, we had, the probability is 0.0, 0 3, 0 and five, then this is the kind of histogram that we see that what it turns out is that the, the everybody has the same distribution. There's a lot of sort of symmetry here. And so what we see is that the, the distribution of, of friends that people have, it's centered somewhere around 50, but it's, it's tightly constrained. Basically nobody has fewer than 20 friends. Almost nobody has bigger than more than 80 friends. Certainly nobody's got 600 million friends. So it, it, it's fine as a model, but it's not a model that is anything like Instagram. So if we're trying to understand Instagram like dynamics, this is, this is too simple. So what we can think about is we can think about how social networks actually evolve. And so what we can think about is the sort of dynamics of friendship. So for example, if Anna is friends with Becky and Charlotte, it's quite likely that Anna will invite Becky and Charlotte to her party. It's more likely than than average that Becky and Charlotte will meet at the party. They will talk to each other, they will become friends. They're not a random pair of people. So what you see is that within the network there is the sort of information that builds up like this. And in particular real social networks, they have something that's called a Matthew effect. And this is named after Matthew 25, 29 in the Bible. So the verse is this, for whoever has will be given more. So people with lots of followers on social media, get more followers on social media. Once you are popular, then more people follow you, more people retweet your stuff, you are more visible, more people then decide to follow you. So there is a sort of virtuous or vicious, depending on how you think about it, circle as to how your followers grow. If you have an existing audience, then it helps build a further audience. Now, the thing that's interesting is that there is a mathematical model that captures this. There's something called the Barabasi Albert model, which was developed in, in, in the nineties when people were starting to think about kind of the internet and things like this that represents this kind of dynamics. It represents how these things grow and there's not enough time to get into it. But the thing that's interesting is that you can come up with these sort of models where, for example, if I join the network, the chance that I follow somebody is proportional to the number of followers that they have already. I'm more likely to follow the big hitters that I'm to follow the people with, not many followers. If you do that, then it turns out that what you end up with is something like this. You end up with something like the number of people with K followers behaves like a times K to the minus gamma. So this isn't like the behavior that we saw before. The behavior that we saw before was this very fast drop off that this behavior here is is slower drop off. It's, it's what's called a power law behavior and it it is dropping off, it drops off. Um, much, much slower. Now this is something people claim to see. People claim that, well there is a mathematical model that says that we should see a power law. People say there is a power law in real data. So the question is how would we tell? And the answer is because this is to talk about logarithms, we will take the log. So were it to be the case that PK was behaving like a times K to the minus gamma. If we took the log of everything inside, it would be the case that log of PK was log of a minus gamma times log of K. So in other words, what we'd have is we'd have a sort of straight line relationship that what we could do is we could plot the log of the probability on the y axis, we could plot the log of the K on the X axis. This is a kind of y equals C plus MX type linear relationship. So if it's a power law, we should see a straight line. Like we saw a straight line with the Covid data. We should see a straight line when we plot this. Okay? I have faith who thinks this is a straight line? Okay, some people towards the back, this is Paul Krugman. Paul Krugman has a Nobel Prize. So this may, you know, this may alter your view. This is a blog that Paul Krugman wrote with Twitter followers and he's saying that's a straight line. It sort of is, but it's not quite. But the point is that we at least know how to have this argument. You know, he's saying it's a power law. The way that we have this argument is that we, we plot a straight line, uh, we plot it, it should be a straight line. If it's a power law, it will be a straight line. And so at least knowing about the logarithm gives us the way of, um, deciding this question. It turns out this is pretty controversial. It this is the kind of thing that people who do networks yell at each other about. Um, so health warning, sometimes it's not a power law. People will tell you it's a power law a lot of the time it's not a power law. Um, Cosmo Shazi and his coauthors, for example, are very keen that it's not a power law and they have statistical tests that come up with other ways of, of, of showing things. So for example, here is a graph from Cosmo's talk where you can see, well, no, this really isn't a straight line and this has one of my favorite talk titles ever. His talk is called, so you think you have a power law, do you Well, isn't that special? So you know, you can, you, you can see that that you know, at least within the levels of mathematical discourse, this is, this is kind of gloves off type stuff. People are fighting about is this a straight line? Is this a straight line? Should it be a straight line? What's the model by which it's a straight line, but the point is it almost doesn't matter. The point is that we at least know how to have this argument that the way to have this argument is to work on log scales. So what you can see is that across the bottom, the number of followers, uh, sorry, this is the, the in degree distribution of web logs in 2003. This is how many, um, the, the, the, where the ranking is of the most popular logs. This is the proportion you can see these are both log scales. So the theory says, if you want to decide this question, the way to do it is to go back to Gresham, to go back to Briggs in 1599 and say to him, I've got this data, what can I do with it? And that's the kind of key message here. So my claim is this, that logs are wonderful things, they're powerful things and the more you start looking for them, the more you'll see them. So, uh, the, the more that you use them and become aware of them, the more that you'll see them cropping up in different places. Now what we'd like to do is we'd like to understand why that is. So for example, with the epidemic data, we have these equations that says why this should be true with Shannon's information. We have this argument based on independence that says that the log is the right thing to do with the Instagram data. We have this, um, Baria Albert argument that people argue about that says that this is why it should be true, but this is what we're looking for. We don't want it to just be a coincidence. We want structural reasons why the log is the right thing to appear. But if we can do that, then my claim is that the, the logs help us make sense of the world. And so that's where I'd, I'd like to finish except one more slide. So if you like this, this is my book. This is number crunch. This is, this is out wherever books are sold. Um, and I talk about some of these things and other things as well. And thank you very much for listening. An online question, uh, is Shannon's coining of the term bit where the term bit in computer science comes from, Or, um, yes, e except I, I got picked up on this that somebody told me that it was actually Qy who coined it and that Shannon pinched it off Offy. But yes, it's, it's those people. So it, it's exactly the bit in, in, in, in computer science is, is exactly Shannon's bit. So Shannon, uh, when he was 21, he wrote an MSC thesis that basically invented the whole of this, that what he realized is that everything could be represented as zeros and ones. So at the age of 21, he already, you know, had changed the world, but then he came along and did, did this stuff to do with entropy afterwards. But, but yes, it's, it's exactly the same bit. Okay, cool. And, and while we're talking about words, someone else has asked, where does the word logarithm come from? Do we know? Oh, I ought to know That I, you, you've all got phones, you've got Wikipedia, you <laugh>, you look, the only thing I do know is it is an anagram. It's an anagram of algorithm. But I think that's, that, that, that's a coincidence. But, uh, <laugh>, Yeah, yeah. Oh, that's a nice coincidence. So we've seen things based on, um, powers of two, based on powers of 10. But, um, what about this natural algorithm that some of us may remember from school? What does that help us? Yes. Okay, so, so I deliberately skirted over this. I sort of said, oh yes, this is log to base two, and I didn't really say what I meant by that, but so, so yes, it's the thing that you're taking the power. So the reason or a reason that the natural logarithm is natural is to do with the, the slope of it. That the, the, the, if you take E to the X, where he is this sort of magic number of, you know, this 2.7 moralities number, that if you take EE of the X, that's the thing whose slope is equal to itself, that you take the derivative of that the slope is the same as the function itself. Other powers other, sorry, other, you know, A to the X for something else, there's a factor. There'll be a multiple. And so that's sort of the reason that it's natural is that, that it has that property That's a handy property. It it, it's annoying when you write papers in an information theory because in information theory papers, the convention is that log always means log to base two. And then when you read anybody else's papers, the convention may be that log means something else. So it, it's one of these things when we're all broadly speaking, speaking the same language, but only up to a constant factor. So is is logarithms what you do, logarithms at university? Is that much different from what you do at a level? Um, I'm not sure. I'm not sure they are. I mean, you know, it's, it, it is the same function. I mean, it's the same, you know, this is the nice thing that you, Briggs or whoever wrote down this function, it's still the same function Now that, that everybody talks about the log, they're talking about the same thing. They're probably being used for different things. So, um, I mean the, the advert for doing a, a, a university math degree is that what you can do is you can specialize in particular directions. So, for example, you can specialize in number theory. You can be interested in primes, you can count primes, and the log comes up there. You're interested in statistics, you know, you do you see things to do with data. Um, I think in turbulence, in applied maths, they probably crop up to do with, with, with things where I'm very much getting beyond my pay grade, but they come up all over the place. It's the same function, but the same function is a tool for understanding all kinds of different relationships and all kinds of different circumstances. Okay. Um, we're gonna need to finish there. So in the room, I'm sure all of will be willing to stay and answer any questions you may have, uh, after we stop. But I just want to finish by saying thank you to everyone for coming. And I'm so delighted that the London Mathematical Society and Gresham College have this fantastic partnership and brings these lectures to you each year. Um, let's finish by thanking the speaker Oliver Johnson, again for his wonderful talk.