Relative Frequency of Letter and Words

READING TIME 2 MINUTES

My first university for my formal study was ITB, Institut Teknologi Bandung. I was joined this campus in August 2006 and graduated in October 2010. I studied at Department of Mathematics. My bachelor thesis is simply about counting the relative frequency. The title is “Penentuan Frekuensi Relatif Huruf dan Kata dalam Bahasa Indonesia untuk Membantu Memecahkan Kalimat Tersandikan” translated as “The Relative Frequency of Letter and Words in Bahasa Indonesia to Assist in Breaking an Encrypted Sentence”. Anyway, this is my bachelor thesis at the time. I cite it from my abstract. The original thesis is using Bahasa Indonesia.


Information is everyone needs today. Basically, everyone wants a good information system. This is the reason why the science of cryptography invented. Cryptography is a study that learns how to hide and secure any information. On the other hands, another study on a special need that is on how to break a hidden information is being developed as well. This branch of study is called cryptanalysis.

Classical cryptography works specifically by using a unique language. It is one of the reason why we need to have the statistics information on those languages. For some international language such as English, it is common to find those statistics. Unfortunately, that is not the case for Bahasa Indonesia. This thesis will discuss about finding relative frequency occurrence of monogram, digram, trigram, and alfabet letters in Bahasa Indonesia to help us in conducting the classical cryptography.

The conclusion for my thesis was a statistical data of monogram, digram, and trigram, and word in Bahasa Indonesia. Below is top five of each data.

  • Monogram: a, n, e, i, t
  • Digram: an, ng, er, ka, en
  • Trigram: ang, nya, men, kan, eng
  • Word: yang, dan, di, itu, dia

Chart by Visualizer

Source of this data comes from public readings, such as news paper, novel, and the other readings common to Indonesian people. I assume the random sample about 4 million letter is enough to give a general perspective about relative frequency in Bahasa Indonesia. This data can be used to perform some other methodologies related to Relative Frequency in a specific language, such as classical cryptanalysis.


Actually, I am not really good in mathematics, I am not a truly mathematician, and I don’t even fit to be called as mathematician. But I do give thanks for this chance, that I can learn mathematics in this institute. It changes so much things. It shape my mind that one of the most important thing is reasoning. I have to be sure to know what is the reason behind everything I do, and that makes meaning to everything I do.

Mathematics is both art and science, at the core of everything and yet so beautiful school of imagination, tenacity, and rigor. Enjoy! -Cédric Villani

*Cédric Villani is 2010 Fields Medalist, he came to ITB and shared this quote on 22-24 Oct 2011, a year after I graduated.

Leave a Reply

Your email address will not be published.

error: Content is protected !!
×