Yule–Simon distribution
Probability mass function
Plot of the Yule–Simon PMFYule–Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) |
|
Cumulative distribution function
Plot of the Yule–Simon CMFYule–Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) |
|
Parameters | shape (real) |
---|---|
Support | |
pmf | |
CDF | |
Mean | for |
Mode | |
Variance | for |
Skewness | for |
Ex. kurtosis | for |
MGF | |
CF |
In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert A. Simon. Simon originally called it the Yule distribution.[1]
The probability mass function (pmf) of the Yule–Simon (ρ) distribution is
- ,
for integer and real , where is the beta function. Equivalently the pmf can be written in terms of the falling factorial as
- ,
where is the gamma function. Thus, if is an integer,
- .
The parameter can be estimated using a fixed point algorithm.[2]
The probability mass function f has the property that for sufficiently large k we have
- .
This means that the tail of the Yule–Simon distribution is a realization of Zipf's law: can be used to model, for example, the relative frequency of the th most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of .
Occurrence
The Yule–Simon distribution arose originally as the limiting distribution of a particular stochastic process studied by Yule as a model for the distribution of biological taxa and subtaxa.[3] Simon dubbed this process the "Yule process" but it is more commonly known today as a preferential attachment process.[citation needed] The preferential attachment process is an urn process in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number the urn already contains.
The distribution also arises as a compound distribution, in which the parameter of a geometric distribution is treated as a function of random variable having an exponential distribution.[citation needed] Specifically, assume that follows an exponential distribution with scale or rate :
- ,
with density
- .
Then a Yule–Simon distributed variable K has the following geometric distribution conditional on W:
The pmf of a geometric distribution is
for . The Yule–Simon pmf is then the following exponential-geometric compound distribution:
- .
The following recurrence relation holds:
Generalizations
The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule–Simon(ρ, α) distribution is defined as
with . For the ordinary Yule–Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.
Bibliography
- Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)