Running keys, one-time pads and the Venona project

Previous section: The ‘Ndrangheta and the San Luca code

A running key cipher takes the basic idea behind the Caesar cipher to the next level of complexity. If you distribute copies of a book – for example, this one – to a group of people who you want to be able to encode and decode messages, they can use the individual letters in the book to derive separate Caesar cipher keys for the individual letters in each message. This book begins with the word HARDLY, so the first letter in each message would be encrypted using the letter H, which is the eighth letter of the alphabet. If a message began with the letter B, the B would be replaced by the letter eight places after it in the alphabet, which is J. The person decoding the message would know to use the letter H from the book to decode the letter J by going back eight letters to yield the original B.

The third letter in this book is R, which is the eighteenth letter of the alphabet. If the third letter in the message to be encrypted were U, we would not be able to advance eighteen positions from it without reaching the end of the alphabet first. We would then cycle back round to the beginning of the alphabet again and take A as following Z. The encrypted version of the letter would be M. Again, the recipient of the encoded message would know from the R from the book to go back eighteen places from the encrypted M, cycling round in the opposite direction from A to Z to retrieve the original R.

A running key cipher is much more challenging to crack than a Caesar cipher. Nevertheless, if a single book has been used repeatedly to encode different messages, frequency analysis can be applied to determine probable values for each letter within the book and then to decode the messages. For example, if Cybertwists were used as a key source over and over again, the relative frequencies with which the first letters in the encrypted messages occurred would parallel the relative frequencies with which letters occur at the beginning of English texts, but the values would all be shifted eight letters to the right, which would identify the first letter in the book as an H.

Such purely frequency-based analysis requires a very large number of encrypted messages to crack an entire running key cipher. However, running key ciphers can be revealed on the basis of a surprisingly limited amount of input when frequency analysis is used in conjunction with other, more complicated analysis techniques. In any language, there are strong patterns around which letters of the alphabet tend to follow which other letters of the alphabet. There are also specific words like the that can reasonably be expected to occur often in any text. Such knowledge can be applied to several messages that are known to have been encrypted using the same key to work out probable letters at various points within that key.

Eventually, part of a longer word – say NFORMATI – will become visible either in one of the messages or in the key itself. Completing that word within the text that contains it will then allow more of each of the other texts to be decoded, hopefully exposing parts of other words there. The analysis will probably involve a good measure of trial and error, but a computer brings speed to the process as well as the ability to compile definitive lists of all words within a language that contain whatever strings of letters have already been found.

And even if this book were only used as a key on a single occasion, a skilled analyst examining the encoded message would still have a good chance of deciphering at least parts of it. Ascertaining that the key is itself based on English text would not be particularly difficult. Because the relative frequencies of letters in both the book and the original message would be uneven, the relative frequencies of letters in the encrypted message would themselves form a telltale signature. For example, because the letter E (the fifth letter of the alphabet) occurs very frequently in English, the letter J (the tenth letter of the alphabet) would occur fairly frequently in the encrypted messages. It would result whenever an E in the original message that had been encrypted happened to coincide with an E in the book that had been used as a key.

Once the analyst had worked out that the encrypted message had been generated from two parallel texts, he could make a tentative stab at deciphering it. The methods he would apply are similar to those that he would have been able to use with more confidence of success if he had been able to access several messages that he knew to have been encoded with the same key. Although a single text would give him little chance of working out the whole sequence, recovering snippets of the message might still turn out to be valuable. And, perhaps more significantly, recovering sections of the book could conceivably lead the sleuth to it via an online search. He would then have the entire key and would be in a position to decode the entire message.

This sort of analysis can be made impossible by deriving the key from a random sequence of letters rather than from a pre-existing text. When a stream of gobbledygook is put to work to encrypt a single message and then never used again, it is known as a one-time pad. A one-time pad has the distinction of being the only encryption method that is theoretically totally unbreakable. It may seem surprising that encoding secrets without using a computer retains any relevance in the twenty-first century, but this queen of encryption methods requires no complex mathematics and can be easily performed by hand.

However, the one-time pad does not represent a practical answer to most problems that encryption might realistically be required to solve. For every secret message that is to be sent, a new random sequence of letters has to be generated and distributed to its intended users without being intercepted by anybody else. The logistical hurdles these requirements entail are identical to those that motivate using encryption in the first place. If you can send the one-time pad safely without encryption, why not just send the message safely without encryption? In most situations, a one-time pad merely shifts the challenges of secure transmission from one document to another: from the secret message to the one-time pad.

On the other hand, this can be exactly what is called for if the conditions for secure and trusted communication exist at one point in time but cannot be guaranteed at some future point in time. In 1963, following the Cuban missile crisis, the United States and the Soviet Union set up an emergency hotline between their respective leaders that was based on one-time pads. Each country generated pads and passed them to the second country via diplomatic channels. This allowed the superpowers to build up a basis for communication slowly and during a period of low tension that would be ready for use if the situation heated up in the future.

As soon as a one-time pad is used more than once, it loses its status as an unbreakable encryption method. The Soviet Union based their confidential communications during World War II on one-time pads. However, because the unit generating the random pads was unable to meet the demand for them quickly enough, pads ended up being recycled. The authorities tried to reduce the resulting risk by making sure the same pad was never reused in similar places or situations, but this measure proved insufficient to stop American analysts, who had obtained a large quantity of encoded messages, from taking advantage of the situation. That they were able to crack messages that had been sent with reused keys is what one would expect. On the other hand, what seems genuinely remarkable given the vast amount of data they were analysing is that that they were able to detect the reuse in the first place and to identify which groups of messages it applied to.

The project to analyse the Soviet messages was called Venona and work on it continued right up until October 1980: once an adversary is in possession of encrypted messages, he has all the time in the world to try and decipher them. The messages that were successfully decoded made up but a fraction of the total, but they still contained a considerable amount of pivotal intelligence information. Venona led to the definitive exposure of double agent Klaus Fuchs, who was responsible for passing the Soviets top-secret U.S. information about hydrogen bomb design.

Next section: The Enigma machine and Bletchley Park