
Gresham College Lectures
Gresham College Lectures
Error Control Coding
When was the last time you opened a file and noticed a computer glitch? “Never” is the usual answer. Yet the underlying hardware makes continual errors: disks make errors, the internet loses packets of data, wifi signals get corrupted and so on.
This lecture is about the secrets of the mysterious invulnerability that is Error Control Coding: a way of storing or transmitting information such that it knows it has been corrupted and can correct itself.
A lecture by Richard Harvey
The transcript and downloadable versions of the lecture are available from the Gresham College website:
https://www.gresham.ac.uk/lectures-and-events/error-control
Gresham College has been giving free public lectures since 1597. This tradition continues today with all of our five or so public lectures a week being made available for free download from our website. There are currently over 2,000 lectures free to access or download from the website.
Website: http://www.gresham.ac.uk
Twitter: http://twitter.com/GreshamCollege
Facebook: https://www.facebook.com/greshamcollege
Instagram: http://www.instagram.com/greshamcollege
- Hello. When I was writing this lecture, I thought about the data rates that were whizzing around my computer. And I realized that a modern computer sort of juggles data from memory to disk at around 500 megabits a second. And so if we start sort of multiplying that up, you know, what's that, 500 megabits per second times 60, that's 30 gigabits a minute, times 60, that's 1.8 terabytes an hour. And then, well, I'm a computer scientist, so I work 10 hours a day of course, unlike ordinary mortals, so that's 18 terabytes of information being shuffled around in a day. And if one of those bits is wrong, your computer will crash. And when was the last time your computer crashed? I don't just mean did something that was a bit irritating. I mean, blue screen of death type crash. Well, I mean, this used to happen all the time, but now, I mean, I can't remember the last time it happened with me, maybe last year some time, after I'd done something particularly stupid, no doubt. So if you think about this, it's a sort of wildly improbable situation. We humans have designed a system with billions of electrical parts, that's operating at an error rate of less than one in 10 to the 13. About 20 terabytes of information, our daily figure. It's hard to visualize, isn't it? There are roughly eight billion people on the planet. So imagine them, the way I thought of it is, well imagine we asked them to toss a coin 2000 times in a day, and then they had to log in to a website, some website, and log all of those coin tosses. Could we be confident that they could log all of those tosses in a website without any errors at all? I mean if I said it like that, it seems absurd, yet this is the daily business of computing. What makes it even more surprising is that electronic devices and communication channels make errors all the time. Underneath this apparent error-free behavior is a massively errorful system. So we don't notice those errors because of a brilliant bit of information theory known as error control coding. And that's what this lecture is about. Now, as far as I can tell, the early history of error control coding isn't really very well described or known. In a nutshell, nothing really happened in error control coding until 1948, when Claude Shannon published his brilliant paper, and that was the start of information theory and that was the start of a subject. I'll talk a little bit more about Shannon's contributions later. There was some error detection, and perhaps some error, yeah, some error detection earlier. And I suppose the earliest form of communication systems, the ones that we're familiar with, proved a model for electronic communication. So, "I'm sorry, could you say that again," that's a form of error control called ARQ, automatic repeat, request, and computers do still use that. The internet uses ARQ. So if your TCP/IP packet doesn't appear in the right order, it gets lost on the way or has got errors in it, and I'll talk about how we find those errors in a moment, your receiver says, "Could you send that again, please?" That's ARQ, and there's some evidence that morse transmissions used to do that. I asked the Royal Signals and Radar Museum about this, and they said it was quite frequent for operators to ask people to say again. But I'm really interested in talking about forward error correction, and forward error correction is the business of sending some side information alongside your original message, that allows the receiver to know that an error has occurred. And here's an example of that. This is an early telegram. For those of you who don't know, it's a bit odd, but what used to happen was, I expect everyone does know, but the idea was if you wanted to send a message rapidly, you would write it down on a telegram form. You would then hand it to a telegraph clerk who would then send it via morse code electrically, and then the other end it would be written down by a telegraph clerk and then they would charge you. And it was quite common apparently to write the number of words down, a sort of word count. So if the receiver's word count didn't match the word count on the telegram, you knew that there'd been an error. And I think it's here, there's an indication at the top of this, how many words were paid for. Generally you paid by the word. And I think it says 129 characters there, and I did do a quick count and it seemed to be about right. Now, that's a form of forward error correction, sorry, error control, known as a checksum. So you just add up something. The TCP/IP packets that are used in the internet are quite, they use a checksum. So in the header of those packets they add up how many ones are in the lump of data that you're sending, they write that down, and then on reception, you add up the number of ones, and if it doesn't match, you say send again. Now I'd really like to focus I think on forward error correction, and that doesn't really have a precise analog in human communication. There's a little bit of an analog, which is humans use quite a bit of side knowledge, common knowledge, in order to work out what was said, for example, in spoken communication. But here we're talking about something a bit more precise than that. And let's start with the earliest form that I could find, called parity. And if there's anyone listening, or anyone in the audience who is a scholar, I think it would be quite fun to know when parity really dates from. I don't really know. I wasted, wasted is the wrong word, I spent many a happy hour trying to work out who'd thought of parity first, and still not entirely sure. But let's leave the history aside, I'll just explain how it works. So the idea behind parity is really nice. Let's say we've got a message. So this is my binary message and it says 1011101. Probably means something to somebody, but we don't have to worry about the meaning here. And the idea is we're going to add up how many ones are in a particular block. So the parity works with a block of data. And if the number of ones is not even, then I'm going to make it even by adding another one. So the analysis says, well, there's five ones, an odd number, that's not acceptable to me,'cause I'm going to work in this here with even parity. Let's make it even, we'll add on a one, and that's the message we're going to send. So zoom, off it goes, down some channel or onto a disk, or something like that. Then the channel has noise, there's some errors, some corruptions, and that's what I'm showing here in pink. Okay, so you can probably see that this digit one here has been converted into a zero. So that's an error. And on reception I say five ones, that's not even, something bad must've happened, send it again. So that's the basis of parity. And you can see that you might think, well, how does parity work, what if there's an error in the parity digit, it still works. Now what it doesn't do of course is it doesn't allow you to correct the errors, it just signals the errors. And you might say, if you know something about hardware, you might be sort of cringing at this moment thinking, well, I've got to build some low level hardware to add up a large number of things. That's a bit complicated, isn't it? And well, nowadays you can do anything can't you, but it is a bit complicated, and if put like that, it seems a bit unpleasant. But it turns out that you can do it a little bit more cheaply, using some very common hardware. And that common hardware is called the exclusive or. The exclusive or, so digital logic has quite a few operations, there's an and operation, there's an or operation, and here we're going to talk about the exclusive or operation. The exclusive or operation is a bit like the one we use in everyday speech. So you might say Richard is either a computer scientist or a poet, and the implication behind that is he's either a computer scientist or he's a poet, but he can't be both. That's called the exclusive or, and it works like this. So if we have two zeros, which you might, logicians would refer to as two falses, the XOR gives a zero. It only gives a one if either B is one or A is one. If they're both one, it gives a zero. And if I write here in this column even parity, I think you can see immediately that even parity is the same as the binary exclusive or, and the good news about this, the reason I labor this point, is that you can go out and buy a binary exclusive or gate, and they're very cheap. So they fit onto silicon very easily. If you are really into this and you go and follow up some of the literature, what you'll see is that binary is a bit of a faff, using the XOR symbol. So we'll often use this symbol in the middle, just a good old fashioned plus. And when I was learning digital logic, this caused me great confusion because, perhaps I shouldn't confess this, I missed the lecture where the lecturer defined plus to mean exclusive or, and for hours afterwards I couldn't work out what the hell he was doing. He didn't seem to be adding numbers at all. It is a bit like an addition. That's why we use the plus symbol. But it's an addition where you don't carry anything. So if you think of doing, adding these numbers in binary, zero plus zero equals zero, yes. Zero plus one equals one, yes. One plus zero equals one. One plus one, well that equals two, which is one zero. Ignore the carry. So it's just zero. Okay, so we also think of exclusive or as a addition, and it's an addition in a particular type of mathematical object, actually called a Galois field, if you're interested. There's a wonderful set of theoretical underpinnings to coding, based upon Galois field theory. Galois field is a set of numbers and some operations, and you can define on GF2, which is the numbers we're dealing with here, plus, minus, times and division. Minus, in our wonderful world here, subtraction, is exactly the same as plus. So that's brilliant too,'cause it means even I, a thicko when it comes to doing maths, can do addition and subtraction in GF2. Well, even this simple system finds some application, and the simplest one I could think of was, something called the rapid array of independent, redundant array of independent drives, otherwise known as RAID. So when somebody says they have a RAID drive, what it means is they don't just have one disk drive. They've got a whole army of, a whole battalion of these things stacked up, in this case we have three. And if we've got some data that we want to put onto those drives, then this is a relatively early form of RAID, designed to provide redundancy. I should say, RAID is sometimes used to provide redundancy, but it's often also sometimes used to provide speed. I'm not talking about the speed bit here, it's just redundancy. So what we might do is we say we're going to put that block on there, and that block on there. And then we're going to XOR those together. Which is that. So just to quickly check that, I hope I've got that right. Wherever you see these the same, you should get a zero. Wherever they're different, we should get a one. And we'll put that on the third drive. So if one of those drives went up in smoke, then we could extract the two redundant drives. And instead of adding them together, we subtract them, which is the same thing, in our wonderful world of binary arithmetic. And we can recreate the original. So there's a good example of how parity saves the day. Now it's come at quite a cost, I have to admit, we're having to use more drives than we would really like here, in order to get redundancy. Now, what might not be evident at this point, but now is a good time to introduce it, is that the communication system I'm describing is sort of generically described like this. And this is the, it's called the Shannon-Weaver model after two scientists, Claude Shannon, who wrote this,"A Mathematical Theory of Communication" in 1948. And Shannon said, I can't remember if he said every communication system or every electronic communication system. Let's say, let's be kind and say he said every electronic communication system, since being much more widely applied, much to the aggravation of social scientists, I should say, who frequently complain this is far too simple a model. It looks a bit like this. We've got something over here sending information on the left hand side. That goes through some sort of transmission. You might call that modulation. Then noise isn't necessarily added, but noise does something bad to the signal. Then it hits the receiver and then it hits the destination. That's the Shannon-Weaver model. It's a bit cruel really, Shannon thought of all of this, and then Warren Weaver wrote it up in a nice little book, a little booklet. And if I remember rightly, the "A Mathematical Theory of Communication" became "The Mathematical Theory of Communication", that was Weaver's contribution when he got named on the name of the thing. So the Shannon model is usually what engineers call it. So if we were sending some data down this channel, there might be a whole load of gubbins to do with the transmission and the receiver and all of that. I'm going to ignore all the gubbins for the time being, and just assume that we're going to send binary data down the channel, 'cause that helps us understand this error control mechanism. So if we had say a black and white image, which can be represented by zeros and ones, that's a nice little sort of visual analog of what we might send down the channel. So here is my black and white image, in this case it's the Gresham College logo. And here is a part of that signal, all these zeros and ones. It goes into my thing which I've labeled binary symmetric channel. And I'll say what that is in a moment. Some noise does something, and then bits get flipped. And the model here then is bit flipping. We are just going to randomly flip a one, and it'll become a zero, or flip a zero and become a one. And you can imagine what happens which is it looks a bit like that, we're seeing some black pixels turn to white and some white pixels turn to black. Now what about this binary symmetric channel model? Not all channels are binary, but this one is a binary channel. It has two symbols that could be transmitted, and two symbols that could be received, a zero and one. And we characterize it using an error, which I'm going to call a probability of error, or error probability, PE. So there's a probability that a zero might get flipped to a one. That's what that arrow represents. And there must be a probability that a zero stays the same, and that's one minus PE, and I'm going to make it symmetric. So the probability that a one gets flipped to a zero is also PE. And so the probably that a one remains a one is one minus PE. That's a binary symmetric channel. To be fair, I've missed off another caveat, which is this is a binary symmetric memory-less channel. The channel doesn't remember what's been down it before, it just spews stuff down it. If you fancy doing a PhD in error control coding, then you might want to consider an Emory channel, that's where instead of binary you're sending M symbols, so lots of fancy dancey coding methods that use Emory coding, multi-phase shift keying, and so on. And the symmetry isn't given either in reality, but let's just stick with this model. Okay. So here's the model. As I increase the error, I hope you can see that the image on the right is getting more and more incomprehensible, until finally we get to what looks like random noise. So I've set up now a way of characterizing all communication systems. Well, all ones based upon binary memory of symmetric channels, which will do for the time being. So the way we might do that is by plotting the error, and how that error gets affected by my choice of code. So let's pick a really simple code. So the receiver and transmitter agree that instead of sending something once, they will send it three times. So the transmitter says well I'm not just going to send one of those things, I'm going to make three copies and I'm going to send it three times down the channel. So the theory is that those errors are independent. They don't happen at the same time. And then we'll get three copies at the other end, and then we'll stack them up and take a majority vote, and that's what we'll get at the output. And hope you can see that the noisiness of these things up here has been considerably reduced. So, this is an example of a repetition code. So the question you might ask, and there's a reason for asking this, is well, how could we sort of characterize that? And the way engineers would, might do it. They'd use something similar to this actually, but I'll talk a bit about how it varies a little bit, is something like this. So what I'm plotting on the bottom here is the rate that we're achieving over the channel. So this dot here, which has rate one, an error .1, that's the raw channel where I'm just spitting data out without any control coding at all. And then the next dot, well obviously I've reduced the rate by a third, but I've reduced the error. So as I reduce the rate, I increase the number of repetitions, which reduces the rate, but the error goes down. So I'll just label those, okay, these are repetition codes going down there. And that sort of looks, well perhaps I'll just plot it for you in another way. Very often engineers like to work at very small error rates, and you just can't see the error rates down here. So we'll often plot this on a log scale. It's exactly the same data, it's just plotted a different way. Right, so you can imagine how this might work. Your customer might say well, I want an error rate of, you know, 10 to the 15. And then you say, well, that will mean 30 repeats then. So it looks like we're already getting to something that engineers or optimizers might call a no-free lunch theorem, where of course I can give you what you want, but it will have a compromise. It's a sort of theorem where if I push in the balloon over here, it bulges out over there. If you want lower error rate, then you can have that, but the rate of your channel will decrease. Right, that was pretty much what people thought in the 1940s. It was pretty much the state of the art. And then along came a guy, well along came Shannon, but along came a guy who was working with him called Richard Hamming. And Hamming, by all accounts, would be doing quite a bit of computing, and had become intensely irked by errors that he was seeing in his machine,'cause every time he got an error it was stopping. So he started scratching his head thinking what can I do to change this? Now it's not that, the idea of parity was around, and as far as I can see, I mean this is actually referenced in, the first reference in Richard Hamming's paper is this one, which is a system built by RCA. And it's a high-speed telegraphy idea, and the idea is that you have a message here. It's called a seven unit printer code. These are binary digits. And this would be whizzed at high speed down one end. And because it's high speed now, we don't have two human operators who can say stop, stop, I didn't hear, didn't catch that one. We have to have some way of working out there was an error. And I don't actually know quite how this worked, but Hamming's innovation was work out a beautifully systematic way of doing not only error detection, but error correction. So let's have a look at the Hamming code model. And this is the first code that he works out in his paper. His paper is a great read. Everyone, you know, it's perfectly understandable, even by human beings who don't need a PhD in Galois field algebra to understand his paper, far from it. Great read. Hamming said right, let's, I'm going to allow myself seven digits. That's what we would call a block size. And I'm going to make four of those data, and three of them are going to be check digits. I'll call those parity, just to provide continuity with the past. And if you are interested in the parts, that's what we would call a 7,4 code, seven because the block size is seven, four because the amount of data is in there. So the brackets, if you like, tell you the redundancy in the code. And then he reasoned like this, he said well, maybe this is obvious, but he said an important thing. He said, well, the thing with parity, parity tells me if there's been an error. What if I knew where the error was? I didn't just know there was an error, I knew where it had occurred in the word in the block. Ah, well then I could correct it, couldn't I. So knowing where an error is is equivalent to correcting an error. So he said ah right, well, so what I need is, if not my parity digits, something related to my parity digits, which I've got three, they need to point to the error. Now I've got 0, 0, 0, I'm going to insist that's used for no error. So I'm going to reserve that. So this is my natural binary count of three digits. So this is 1, 2, 3, 4, 5, so on, up to 7. So clearly three parity digits could, in principle, address seven positions. They could point to seven positions. That's good, 'cause I've got seven digits in the code. So the overall sort of architecture of this is laid out in Hamming's paper quite nicely, but this is the start of his reasoning. He says well, I need to be able to find out these positions of these errors. And once I found the positions, then everything is oojah-cum-spiff. Right then, how does it work? So what he did is he said well, I'm going to label these bits from one to seven. That's why they're called b1 to b7 in this diagram here. And then I'm going to insist that I have even parity among these sets of digits. So this is just that mathematical way of saying even parity. So that says I want b position one, position three, position five, and position seven, to be even. And then he did a clever thing, which he said well, I'm going to assign my parity digits in these positions, which happen to be at powers of two. There's a whole load of YouTube explanations as to why they have to be at powers of two. I don't think we need to bother ourselves with it right now. So let's just sort of check how it works. So if we had a data digit, data byte four, we've got four bits,'cause it's a 7,4 code. So the first thing we're going to do is stuff those into the code, So 11001100, it's got stuffed into the code there. Word of caution. If you go away and look this up there are some differences, because some people put the bits in the other way round, 0011. Some people don't have the parity digits where Richard Hamming decided to have them, and I'll mention why that is later. But I'm following the Hamming methodology here. So, and then he says right, so the first parity has to cover b1, b3, b5 and b7. Well, p1 is the one I'm going to set, so that's at the first position, b3 is one, b5 is one, and b7 is zero. So that's already even. So I need to make that a zero in order to maintain its evenness. So that's how we set the parity digits. We go through and we check all of these things. And we've now created a seven bit block of data that checks subsets of the incoming data. So if you're feeling a bit sort of, you know, it's, I'm a bit scratching my head at this point, don't worry too much about it, the detail. But the important innovation is each one of these parities isn't checking all of the data as it used to, it's checking subsets of the data. And you can imagine there's quite a lot of human ingenuity in choosing those subsets, but let's just stick with Hamming for the time being. So now we transmit it, and let's imagine there's an error. There's an error in position five, which is precisely where Richard Hamming talks about it in his example, if you want to follow along at home as it were. So on the decode, obviously the thing we're going to have to do is we're going to recompute those parity digits with the data I've received now, and see if I can interpret that in some way. So the way I've chosen to do it here is I'm going to recompute those parities, and then I'm going to use a zero, if the parities match, that's the recomputed parity matches with the transmitted parity, and I'm going to use a one if it doesn't. That's all quite easy to do in low level hardware. So here are my checks. So if we look at the first one, for example, position b1, b3, b5 and b7, are all meant to be even, which they are. So did I, sorry, what did I, b1, b3, no they're not, sorry, b1, zero, b3 is one, b5 and b7, that is an odd parity, that's a mistake, so I'll put a one in there. The next set, which is here, is correct. Next set is wrong. So I've got a one. 1, 0, 1 is five in decimal. And the error is in the fifth position. So the neatness of the Hamming scheme was to make all of this work in a nice, easy way, which was the way he placed those parity digits. You don't have to place the parity digits in the middle of the words like this. Hamming did it because in those days hardware was quite expensive, but a lot of people now just stuff 'em on at the end. The important thing is that we're checking subsets of the data, and then inferring where the mistakes are. Right, at this point you're sort of longing to know whether the Hamming code is any better than any other sort of code, I suspect. So here we are back with our diagram again. So these are the codes that we discussed in the first section, which were the repetition codes, and I've just sketched in roughly where Hamming is, that Hamming is in this square position here. Well, it's sort of, now it's sort of all pointing in one direction really, isn't it, which is, you can have lower error rates, but in order to have lower probabilities of error at the output, then you're going to have to affect the amount of data you send down the channel. And that is pretty much the conventional wisdom. Now I've plotted here the sort of idea of better codes, which I hope you can sort of see immediately, which is, as we move in this direction, we get lower error and higher rates. So if we had a code out here, for example, that would be a better code than that code. This diagram is sort of, you know, in a way it was sort of rather comforting. The only problem is, it's the inference that I'm making for you, and I've made it twice now, is completely and utterly wrong. And surprisingly so. And the way, the person who showed this to be wrong was our man Shannon, again. So this is called Shannon's coding theorem, or sometimes Shannon's noisy coding theorem. Sometimes it's called the Shannon-Hartley theorem, after Ralph Hartley, who did some very early work in the '20s. Again, not sure, to be perfectly honest, you know, it really is Shannon's theorem, and Hartley's been given a bit of a free ride on this theorem, but there we go. And what I've done here is I've plotted this line. What Shannon's theorem says, and it's easy to say this in a way that makes it sound trivial, is this side of the line is forbidden, you cannot go this side of a line. It's impossible. What is perhaps not so obvious from this diagram, the diagram on the left, but might be more obvious from the diagram on the right, is that this line here goes to zero. It hits zero. Oh. That means we can take any noisy channel and operate it with zero error. Ooh, that is quite, well you can imagine how surprising that was at the time. It seems really somewhat quite astonishing. However noisy your channel is, however errorful it is, we can operate that channel with zero error, and I can tell you the rate at which we could operate it. What a wonderful bit of reasoning and mathematics. And Shannon managed to do this without knowledge of codes, which is a stunning thing. So technically, it looks a bit like this, which is the channel capacity is always less than or equal to some number, which is based upon a bit of entropy. And I don't want to talk about entropy, I have mentioned it in other lectures and it's, you're welcome to look it up. It's not for today. But it's really, it's really quite an astonishing result and it has some consequences, and I've sort of paraphrased them a bit really with Richard Harvey sort of Claude Eustace Shannon Agony Aunt dialogue, a sort of Aristotelian dialogue between, you know, an engineer. So here's an example. So he's an engineer and she's saying my channel has an error rate of 0.1 and a data rate of six megabits a second. How many repeats would I require to get the error rate down to 10 to the minus 15, 10 to the minus 15 is a number that is often used for, you know, reliable communication, and what's the cost. Okay, well we know the answer to that, we could go back to my first diagram when we were talking about repeats. So well, consult your diagram and you just count the number of repeats, and it's about 60 repeats. But there's a cost, right? The cost is the bandwidth will reduce to a measly six divided by 60 megabits, a hundred kilobits a second, which is lousy. Okay says quizzical engineer. Can't I do better than that? Well, the agony aunt says yes. My band is 0.54, so you multiply 0.54 by six megabits. You could communicate at zero error with a bandwidth of 3.24 megabits a second. Wow, fantastic! So you can imagine what the next question is from our engineer. Well, they're a young engineer, so they say amaze balls! How do I design such a code? Ah. That is the issue, ladies and gentlemen. That is the whole of coding theory in a nutshell. I could almost stop the lecture at this point. So let me draw for you the map of human ingenuity, in trying to design codes, okay. So this is the 1940s, when Shannon did things. And we got this early guy, Richard Hamming. Actually, to be perfectly honest, Hamming was slightly predated, I think, by Golay, and I'll show you Golay stuff in a minute. There was parity knocking around here, but it wasn't in a sort of significant framework. And then we've got all of these codes here, some of which you will have come across, because they are mentioned and people often mention Reed-Solomon coding,'cause it's used in this compact disc standard, for example. A lot of these sort of rather rarefied codes were used first in satellite systems. And then there was another burst of activity around here. Now I started lecturing somewhere around here, I think, and I remember cringing when having to lecture this stuff because you'd tell people about Hamming codes, because they're nice and beautiful and easy to explain, and then people would ask you well, yeah, but what does everyone use. And you'd have to go through all these things here, which were complicated and difficult to explain, and didn't bear much relation to each other, and weren't mathematically beautiful. So it was all a bit disappointing. Fortunately, these two codes out here, LDPC and polar codes, I'll talk about LDPC, low-density parity-check codes, pretty much hold the same sort of handle-turning aspect as Hamming codes, which I've already explained. So now, treating error control coding is trivial, you can just ignore all of this stuff. Which is what I'm going to do in this lecture. I'm going to rather sweepingly ignore whole ranges of interesting and useful code, because everyone at the moment is rushing to get their hardware onto polar codes and low-density parity-check codes. So if you have a 5G telephone, for example, then one of the principle differences between 5G and 4G is its, the use of these new error control coding standards. Right, brief diversion though. Golay, Golay codes, first codes, systematic codes. This is really a sort of mathematician's joke, but I thought I would, this is the complete paper from Golay in 1949. Elwyn Berlekamp, who was one of the sort of great gurus of coding theory, and a friend of Martin Gardner who died a few years ago, called this the single best published paper in coding theory. Now, the reason this is mildly entertaining to people like me is because nowadays you don't seem to be able to get away with less than 30 pages saying much less than this guy said. This is the ultimate compact bit of paper. It's so compact that many, many, many, many, many papers have been written on Golay code subsequently, basically unpacking and trying to explain how they work. And I'm not going to be, I'm not going to contribute to that literature, but I thought it might amuse you to see Marcel Golay's original paper. Now then, back to Hamming and his innovations. I'd like to talk a little bit about modern codes, and in order to do that I want to explain something called Hamming distance. So this is, just take your mind back to our 7,4 Hamming code, so has four bits. So what I've done here is I've enumerated all the possible combinations of four bits. So there are 16 combinations there. And as I said, we sort of pack those into a seven bit block or word. So let's do that. Now this time though, what I'm going to do is pack them all on one end. I said, I warned you I was going to do this. Instead of Hamming interspersed the data with the parity, nowadays it's quite common to put all the parities on one end. Doesn't affect the mathematical properties of the code whatsoever. It just makes it easier to explain to undergraduates. So that's why people tend to do it. Now, what you might ask is you might say, well, if I make an error on these codes, how many errors could I make before one of these transmitted words, on the right hand side, becomes another one. That would be an interesting thing to know, right? So if I was interested in say the second word on the list, and I wanted to compare its distance to say, I dunno, the ninth word, what I would do is I would pick out the second word and the ninth word, and then I would look how different they were, okay. So we can do that now, the first two bits are identical and then there's a difference, and then there's a difference, and a difference, an identical, an identical, and a difference. So I could just sort of scan across there, and my crosses aren't quite in the right place, but you get the idea, a cross means there's a difference. And so what we would say is that the Hamming distance between two code words, in this case two and nine, is four. So it's a sort of measure of how separate the code words are. You can see where I'm going with this, can't you, which is you want those code words to be as separate as possible,'cause you want them to be very resilient to errors. Okay, well let's do that with all the codes. So that's what I've done here in this great big matrix of differences, and that's the intersection of two and nine. So it's symmetrical, so let's just remove the top and let's just measure down. And you can see, I think fairly quickly, just by scanning your eye down these columns, that the smallest number for this code is three, and the largest is seven. So the Hamming code, if we want to take a sort of worst case analysis, and there might be very good reasons why we don't want to do that, but let's just take a worst case. The worst case is that there are two words that we send that are different only by three bits. So can you see why a 7,4 Hamming code can guarantee to correct any single error? Because if you think of this in a sort of space, here's one code word, here's the other one, I'm going to have to do three hops to get to this. If I've, the noise makes me one hop, I'm still close enough to this one to always guarantee that I can find my way back to that one. If I make more than three hops, I might be able to detect that there's an error, but I might not be able to guarantee correction. So this is, one of the ways of speaking about this is we say this code has a minimum Hamming distance of three. And, well, let me just sort of sketch that for you, and then I'll talk about some of the consequences. So how to visualize this. Well, you'll see diagrams in, you know, of people trying to plot seven dimensional space and the distance between seven dimensional spaces, and it's a bit, you know, a bit alarming,'cause we can't visualize seven dimensional space. And most people project it onto 2D. And here I've used a technique that I haven't seen used before actually, for this purpose, called t-SNE, which is an adaptive visualization technique. But this is essentially me taking a seven dimensional space and mapping it down to 2D. So if I took two words, and this is sort of somewhat conceptual. Let's just take two words. This is the line that separates them. And there's a hyper-sphere that sits around each one of these, and that represents the distance around a code word, which we can correct. And what you'd want to do is perfectly fill out the space with as many code words as you can, where there's a maximum amount of space between them. And codes that do that, in analogy with chemical physics, where we talk about perfect packing, are called perfect codes. There's a lot written about perfect codes, but most codes are not perfect codes, and they don't need to be. There's also a little bit of rubbish written about minimum distance codes. People tend to say if the minimum distance in this code is say three, I won't bother to detect more than three over two, one error, because I can't guarantee it. That's not a good strategy, as it turns out. It's best to, even though you, in a real code, the distance between various words will differ, maybe quite substantially, you are better off working with what you've got rather than working to some sort of minimum. There's an entertaining guy called David Mackay who called this a terrible example of worst case-ism. Okay, well that's what a sort of toy code, I don't want to call this a toy code. There are devices out there that still use Hamming 7,4 code, and without that 7,4 code, we'd be getting errors all over the place. So it's not a toy code, but let's look at a big code. This is what you do, this is what happens when you have a bigger code. This is the code used in, if you've got an up to date wifi system and it supports what's called 802.11n, you can go and look at table F-1, rates 3/4 code, page 2304 is where I found it. Just polish my halo, I'm pointing out that I scanned through 3,500 pages of standards looking for this, ladies and gentlemen. This is a block of 648 bits, of which 486 are data. And this is my little visualization of it on a single plane. And maybe you can't look at this and intuit what's going on, but you can certainly get the impression that not all code words are equally spaced, and there's big holes in the code. And a lot of human ingenuity goes into finding these codes. This is an example of a low-density parity-check code. Very interesting history. The idea of these was first thought up by a guy called Gallager in the 1970s, but he didn't have enough computing power to find them. And later authors later sort of figured out how best to find these. One of the most irritating things about Gallager codes, or LDPC codes, is describing them. Imagine you've got several hundred parity digits to describe. So you've got to write them in a document. Unless you can find a system for writing them down, you've got a license for trouble, haven't you,'cause you're going to make errors in them. So there's quite a lot of ingenuity in finding ways to decompose the apparently random collections of parity digits into ways that people can humanly understand. So a lot of work in LDPC codes has been in trying to find butterflies or bits of algebra that allow you to compute these matrices. And in fact, if you look at the wifi standard, that's what it does. It has a set of sub-matrices that you sort of, you move across a bigger matrix to create it. Polar codes are even better because they have a more principled way of doing so. Now I can see you're longing to look at the performance of these. It's a bit complicated to compute it myself, so I didn't. I picked an example from a very nice book by a guy called David Mackay, who sadly died recently. Very enterprising and ingenious man who worked on LDPC codes, amongst other things. He was also one of the government chief scientists. And this is sort of, on the left hand side, we've got repeat codes, Hamming codes, and all the sort of conventional stuff that didn't exist, and slam right in here, it's got error bars 'cause he's doing a simulation, is low-density parity-check code. So you can see what's happened, polar codes will be similar down here. So what's happened recently over the last five to 10 years is these new codes have appeared, and suddenly everyone is ditching everything in a desperate attempt to get to these, which is, or that's what's caused the excitement. What's beautiful about these codes, however, is the mathematics and the understanding that you got from the early part of this lecture, the Hamming understanding, if you like, is the same. It's the same sort of handle-turning methodology. It's just bigger, on a bigger scale. And that's what's so nice. A lot of the competing codes don't do that. Now those of you who are of a academic disposition may be puzzling what's this thing called GV down here. That is the Gilbert-Varshamov conjecture. It goes back to the my worst case-ism example I was talking about later. So if we assume that because a code cannot guarantee to correct more than T errors, then we shouldn't correct more than T. So if you operate under that wisdom, you say this code has a minimum distance of T, I am not going to bother to correct more than T, then this is where GV would sit, for this example. So there's a limit, and it doesn't get you to the Shannon limit. It's got a conjecture,'cause no one can prove it. So I mention it because I can see very intelligent people in the audience. So if you have nothing to do this evening, could you just go and prove for me that the Gilbert-Varshamov conjecture works? If you can do that, just put my name on the paper please and submit it to IEE Trans Information Theory,'cause I've always wanted a paper in IEE Trans Information Theory. So there's been this sort of leap forward in coding, and a number of rather innovative codes, but not linear block codes of the type I was talking about, have just been sort of left for dust, as it were. So all those convolutional codes and all those other codes have been left for dust. But it's the same ideas that were dreamt up in basically 1950 by Richard Hamming and Claude Shannon. Hamming is a very interesting man, not just because he invented these codes, but he also wrote a book about what it was like to be creative. And if you've got a moment do read it, it's an interesting read. What it's like to invent something. And I'm sure you've noticed in these lectures, I've been talking about inventions, and some of the inventions I've been talking about are sort of old fashioned professor brainstorm inventions, they've been artifacts. But very often they're inventions of algorithms, or ways of doing things. And this is lecture has been about inventions of algorithms, and I suppose also theories of algorithms, which is a very attractive sort of idea. I would just briefly point you to future lectures in this series, and the next one is also about an invention, but it's a different sort of invention, right? It's an invention of a system, and these aren't really understood very well in the modern world. People sort of tell their kids about inventions as vacuum cleaners and devices, but a lot of our modern world is inventions of agreements on protocols for the MPEG standard, which we talked about in previous lectures as an example of that, and cellular telephony is one of the most interesting examples of that because it's extremely complicated. So I hope you will be able to join me then, and needless to say cellular telephony would not happen without the beautiful and powerful mathematics that is error control coding. Thank you.(audience applauding)- [Man] Thank you very much indeed professor. I'm an 83-year-old aging git and I'm making more and more errors as I get older, the frequency. I did do a course, I gave it up actually, on cybernetics, because the mathematics in it wasn't explained. Ross Ashby's book,"Introduction to Cybernetics".- Yeah.- [Man] And also, I think I still have a copy of Shannon's Theory of Communication.- That's a good read, although it's quite, there are bits of it that are a bit difficult, I mean.- [Man] But one of the things that occurs to me. I should explain by the way that I'm the world's most ignorant person as regards computers in information technology. But at two of the universities I've worked, the very friendly computer assistance boys have said they've had more trouble with me than anybody else on my computer, because I've done things, I don't know what. And they said they'd never seen anything like it before. And another thing that I had with a previous word processor I had, is when I was beginning to type words, the computer would complete the word, I'd put the first two letters but very often the completed word, when I say often I mean 90% of the time, was not the word I was looking for. Does that have any relevance in fact to what you've been talking about? I do apologize for my ignorance,- It does-- which is unlimited.- It does. I have two things to say to you sir. The first one is one of the major systems we use for controlling source code, deciding, getting things that are up to date, is called GitHub. And people have often asked why it was called GitHub, and the inventor Linus Torvalds said I called it GitHub because I am a git. So don't be embarrassed about that. Now, what you're doing when you get auto correct, or when we're having this conversation now and I'm listening to you and working it out, is I'm using this model of what you're likely to be saying. And if you go outside that model, as you often do, because you're writing highfalutin prose, obviously you get corrected back to a model. And the spell checkers have got better and better at doing this. It's the same idea in coding. But for error control coding, which is a much lower level, our model is very strict indeed. We know what messages could be sent, so we can correct back to the most probable model, and fancy receivers model the whole channel, they model the noise, and they do something called soft decoding. You do soft decoding, but it's novel for low level, computer hard software decoding says, I'm not just going to pick the nearest word. I'm going to model a probability of various things happening, maybe thinking about the probability of the data in the first place and pick the most probable word. So it does that by measuring voltages really. So there is an analogy with what you're saying. The high level sort of error control that you're used to, however, is too errorful for us to use at a low level. If Microsoft spell check was in charge of correcting the bits on your computer system, we would be in a shed load of trouble.- [Man] You've spoken mainly, I think, about data transmission. What about programs actually inside a computer? Is there any error correction in those when they're transmitted over relatively short distances within a computer?- Yes, that's a very good point. Almost every bit of your iPhone or hardware or everything will have error control coding in it.'Cause errors are objectionable, and the lower the level you are at the hardware, on the one hand, the less likely it is that you're going to make errors, you know, because it's in a contained system, but the more damaging they are. So parity is everywhere. You know, parity is everywhere. Error control. Even your memory stick, the RAM in your computer, will have some parity control on it. Your disk most certainly has error control coding on it. Anything that goes over the airways has error control coding. Everything has ECC, it's just you don't notice. And the reason you don't notice is because of this beautiful idea which is, I think, quite counterintuitive, which is whatever the error, we can correct it back to zero. I think that's a good time to stop. Thank you very much, have a good evening.(audience applauding)