Previous section: The Sicilian Mafia and the Caesar cipher
More recently in January 2014, the Italian police managed to decode messages they had captured a year earlier that had been written by the Calabrian Mafia or ‘Ndrangheta. The San Luca code dates from the late nineteenth century and was developed as a way of writing down secret information that had previously been passed on by word of mouth. It had been used to document quasi-religious initiation rites for budding Mafiosi as well as to encode a myth outlining how the Mafia originally came into being.
The code consisted of substituting each letter of the alphabet with a secret symbol. Because the ‘Ndrangheta wrote messages with a space after each word, the code could be broken much like a Sudoku puzzle. Like other languages, Italian has a limited number of words made up of one or two letters, so that there is a restricted range of letters that are possible correspondences for symbols that appear within an encoded one- or two-letter word. Once the codebreakers had cracked the first few symbols by analysing the shortest words in the text, they were able to progress to slightly longer words containing mixtures of known and unknown symbols in order to reveal the meanings of the next few. They continued with the same technique until they had decoded the whole alphabet.
Even if the San Luca messages had been written as contiguous text without spaces between the words, the police would still have been able to decode them. A computer could be programmed to try out all possible combinations of symbol-to-letter correspondences. The program would know when it had a potential hit because the string of letters that resulted from applying a promising combination to an encoded message would contain known Italian words as opposed to gobbledegook. A native speaker of Italian would then have to go through the hopefully small number of candidate combinations found by the computer to determine which one was correct.
A technique called statistical frequency analysis would give this process of trial and error a considerable head start. In any language, some letters occur much more often than others. For example, any reasonably long English text will contain many more instances of the letter E than it does of the letter X. How often a given symbol appeared in an encoded San Luca message compared to the other symbols could have been used to determine whether it was likely to represent a letter that occurs frequently or infrequently in Italian. With shorter messages, this knowledge is useful because it enables a computer trying out all possible correspondences to start with the ones that are most likely to be right. With a sufficiently enormous amount of encoded text, it could be used to crack the code all on its own. If the statistical sample were large enough, the relative frequencies of the symbols would be arranged in an order that would match the relative frequencies of letters observed for the Italian language in general.
|Tweet about the San Luca code|
Next section: One-time pads and the Venona project