The role of statistics in deciphering the Enigma machine

The Imitation Game, the historical drama now playing in Slovene cinemas, is based on Alan Turing: The Enigma, a novel by Andrew Hodges. The film revolves around the British mathematician Alan Turing, the father of modern information technology and artificial intelligence. During World War II, he and his team of experts strove to decipher the messages of German military submarines, encrypted with a device called the Enigma machine. Historians believe that the discoveries made by Turing and his team helped shorten the war by at least two years and saved at least 14 million lives. The deciphering of the Enigma and the strategic use of the results involved a great deal of statistics. Before I present the role if statistics in this case, I will describe the problems that Turing and his team faced.

The Enigma is an encryption device that was first developed for commercial use, but in 1926, it was modified to be used by the German Navy. The following example demonstrates how the device functions: if we, for example, enter the word UNTERSEEBOOT (submarine) as plaintext – the initial message – the Enigma will, with the right settings, give us the ciphertext YHBHHDQOWXNZ. However, if we change the settings, as the Germans did daily, the ciphertext will change into one of the hundreds of millions of variations. But there were certain procedural faults in the encryption process, which worked to the advantage of Polish cryptologists. Between 1932 and 1939 they were able to develop numerous sophisticated methods for the reconstruction of cryptographic keys (i.e. indicators) based on the theory of permutation groups. They also developed a mechanical device called the bomba. However, the Polish bomba lost its efficiency in 1938 after the Germans added two more rotors to the three existing ones in order to increase the number of possible wheel combinations from 6 (P = 3!) to 60 (V = 5!/2!). The Poles did not have the means to continue developing their device, but fortunately they passed on their knowledge to the British, who worked on it in Bletchley Park.

Alan Turing’s first contribution to cryptology, in 1940, was an improved version of the bomba, the electro-mechanical device for deciphering messages. The first version was called Victory (in the film the name was changed to Christopher, after Turing’s first love). Unlike the Poles, who used an unreliable indicator method for ciphertext analysis, Turing and his colleagues developed the “crib-based method” based on the use of known, frequently recurring plaintexts. For example, many German messages began with the weather forecast, so they tried to find ciphertexts for the word WETTER (weather) and find the cryptographic key in this way. In the film, this role is played by the phrase “Heil Hitler”. It is important to mention that there was not only one bomba – five months after the development of the first model, a second was created, and by the end of the war there were over two hundred versions of it. In order to operate the devices, Bletchley Park required a large workforce – to find out more about their work process, I recommend you watch the interview with Jean Valentine, one of the bomba operators. You can learn more about the lives of “code crackers” on the website dedicated to their stories by the Google Cultural Institute.

The efficiency of the bomba depended largely on statistics, since Turing used statistical tests to decrease the number of initial settings that had to be checked before using the device. He also developed a crypto-analytical procedure called banburismus, which determines the probability of the Enigma settings based on sequential conditional probability. Turing wrote two scientific articles about his methods, which were kept a government secret until 2012. In his article On Statistics of Repetitions he created a statistical test to determine whether two ciphertexts use the same cryptographic key (photo), and in his essay The Applications of Probability to Cryptography he demonstrated the use of probability analysis on a wide range of crypto-analytical problems.

After Turing and his colleagues cracked the code, they clearly could not use every gathered piece of information to their advantage, as the Germans would quickly discover this and change the encryption process. Turing helped the army decide which deciphered pieces of information to use by applying his statistical methods. The principle can be easily explained. Let us say we have a German officer observing the events on a naval battlefield. The observation unit is a submarine, while the variable is whether or not the submarine is submerged. He also knows when the information on any submarine’s position was sent out. He can calculate the theoretical probability of the event in which the submarine submerged. He can then compare his calculations with the empirical data. If the values match, he can assume it submerged by chance. However, if there is a large deviation from the expected values (for example, if several submarines were submerged not long after their position was sent to headquarters), the officer may well suspect that something is wrong. Turing’s mission was to simulate the data that would seem as random as possible to the observer. Randomness was simulated in one of two ways: they either waited a few days before attacking (hoping that the submarines would not move very much) or they did not attack them at all (the sustainability strategy). You can learn more about this in the book Cryptonomicon.

The story of Alan Turing and the cracking of the Enigma is certainly one of the most inspiring stories in the field of mathematics and statistics.

The British mathematician and public speaker James Grime applied much of it to his Enigma project, in which he visited British schools, presenting Turing’s work and giving the students demonstrations of how the device functioned. As he says in an interview (video below), he believes it is very important to motivate young people about science. Statistics is one of the most practical branches of mathematics, which is why it is important to explain it in a clear and concise way. Grime emphasises that a knowledge of statistics can be useful to just about anyone, as it helps us to better understand everyday phenomena.

Grime, too, watched The Imitation Game and loved it, though he noticed many inaccuracies even in the film’s trailer. But all is forgiven, as it will encourage viewers to learn more about the subject. As an expert on Turing’s work, he wrote a humorous post after watching the film, where he answers frequently asked questions about it. Despite the clichés and historical inaccuracies, I definitely recommend watching this film. To me, it is a representation of statistics in modern pop culture, which we addressed in our blog Udomačena statistika (Domesticated Statistics).


Author: Ana Slavec, sociologist, PhD student of Statistics at the University of Ljubljana, and researcher at the Faculty of Social Sciences in Ljubljana. She deals with survey methodology, especially with improving the wording of survey questions. She writes for the blog Udomačena statistika (Domestic Statistics). You can also find her on Twitter: @aslavec.


Title photo: BagoGames via Flickr

Note: The article was first published on the blog Udomačena statistika (Domesticated Statistics).


Translated by: Ernest Alilović.

0 replies on “The role of statistics in deciphering the Enigma machine”