Previous section: The Sicilian Mafia and the Caesar cipher
More recently in January 2014, the Italian police managed to decode messages they had captured a year earlier that had been written by the Sicilian Mafia’s Calabrian cousins, the ’Ndrangheta. The San Luca code dates from the late nineteenth century and was developed as a way of writing down secret information that had previously been passed on by word of mouth. The confiscated material documented quasi-religious initiation rites for budding Mafiosi as well as a myth outlining how the Mafia originally came into being.
The code consisted of replacing each letter of the alphabet with a secret symbol. Because the ’Ndrangheta wrote messages with a space after each word, the code could be broken much like a Sudoku puzzle. Like other languages, Italian has a limited number of words made up of one or two letters. This means that there is only a handful of letters that could correspond to a symbol that appears within an encoded one-letter or two-letter word. Once the codebreakers had cracked the first few symbols by analysing the shortest words in the text, they were able to make educated guesses as to what some of the longer words were because they then knew some of the letters that made them up. As they deciphered each word, they decoded more symbols. They continued with the same technique until they had revealed the whole alphabet.
Even in the event that the San Luca messages had been written as contiguous text without spaces between the words, the police would still have been able to decode them. A computer could be programmed to try out all possible combinations of symbol-to-letter correspondences. The program would recognise a potential hit when the string of letters that resulted from applying a combination to an encoded message contained known Italian words as opposed to gobbledegook. If the computer had too little material to reach a definitive answer on its own, it would be able to find a hopefully small number of candidate solutions that a native speaker of Italian would then have to work through to determine which one made sense.
A technique called statistical frequency analysis would give this process of trial and error a considerable head start. In any language, some letters occur much more often than others. For example, any reasonably long English text will contain many more instances of the letter E than it does of the letter X. How often a given symbol appeared in an encoded San Luca message compared to the other symbols could be used to determine whether the symbol in question was likely to represent a letter that occurs frequently or infrequently in Italian. With shorter messages, this knowledge would enable a computer trying out all possible correspondences to start with the ones that were most likely to be right. And with a sufficiently enormous amount of encoded text, it could be used to crack the code all on its own: if the statistical sample were large enough, the relative frequencies of the symbols would be arranged in an order that would exactly match the relative frequencies of letters observed for the Italian language in general.
Although the two Mafia groups have supplied us with examples of encodings that were particularly trivial to crack, codemakers underestimating the ingenuity, tenacity and intuition of codebreakers is a recurring theme in the history of encryption. Human intelligence seems to be better suited to the goal-oriented challenge of finding patterns than to the open-ended task of hiding them.
Next section: Running keys, one-time pads and the Venona project