Zipfian Distribution: Death and Diseases

By Anushka Bansal

Zipf’s law is a discrete form of the continuous pareto distribution from which we get the pareto principle (popularized and formulated by economist Vilfredo Pareto in 1906). The pareto principle states that 20% of the causes account for 80% of the results.

When Linguist George Zipf ranked the words in order of popularity, a striking pattern emerged. The number one ranked word was always used twice as often as the second rank word, and three times as often as the third rank. He called this a rank vs. frequency rule, and found that it could also be used to describe income distributions in any given country, with the richest person making twice as much money as the next richest, and so forth. Later dubbed Zipf’s law, the rank vs. frequency rule also works if you apply it to the sizes of cities. The city with the largest population in any country is generally twice as large as the next-biggest, and so on. Incredibly, Zipf’s law for cities has held true for every country in the world, for the past century.

Presented below is a case study that explores the fundamental Zipfian distribution paradigm through the lens of diseases and death. It is fascinating how something as complex and as natural as death can be presented in a pattern and accord with a theory based on mere observation (data source: WHO)

Name of Disease [2]      Deaths in 2015 (in millions)
Ischaemic heart disease  8.76
Stroke6.24
Low respiratory infections3.19
Chronic obstructive pulmonary disease3.17
Lung cancer1.69
Diabetes mellitus1.59
Alzheimer’s disease1.54
Diarrhoeal diseases1.39
Tuberculosis1.37
Road Injury1.34

The above data (source: WHO) can be plotted as a line graph in order to formulate a functional relationship which, in scientific terms, is known as the power law. The slope of the line graph indicates presence of the power law wherein, a relative change in one quantity results in a proportional relative change in the other quantity. There may however, be certain noticeable errors and irregularities in the graph.

Diseases mentioned above may have several inexplicable causes that vary in different corners of the world and since the data above has been globally recorded, drastic variations are natural. For instance, cancer is caused by an uncontrolled division of abnormal cells in a part of the body. Lung Cancer is associated with dangers of smoking. The level of smoking and the number of people smoking in one corner of the world may be very different from that in the other corner of the world. Hence, people may die at a fast rate in the more hazardous of environments. Stroke or Cerebrovascular Accident is damage to the brain from interruption of its blood supply. This can be a result of several factors. These factors cannot be determined and thus, the irregularity of the results.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: