Discrete Statistical Distributions#
Discrete random variables take on only a countable number of values. The commonly used distributions are included in SciPy and described in this document. Each discrete distribution can take one extra integer parameter: The relationship between the general distribution p and the standard distribution p_{0} is
which allows for shifting of the input. When a distribution generator is initialized, the discrete distribution can either specify the beginning and ending (integer) values a and b which must be such that
in which case, it is assumed that the pdf function is specified on the integers a+mk\leq b where k is a non-negative integer ( 0,1,2,\ldots ) and m is a positive integer multiplier. Alternatively, the two lists x_{k} and p\left(x_{k}\right) can be provided directly in which case a dictionary is set up internally to evaluate probabilities and generate random variates.
Probability Mass Function (PMF)#
The probability mass function of a random variable X is defined as the probability that the random variable takes on a particular value.
This is also sometimes called the probability density function, although technically
is the probability density function for a discrete distribution 1 .
- 1
XXX: Unknown layout Plain Layout: Note that we will be using p to represent the probability mass function and a parameter (a XXX: probability). The usage should be obvious from context.
Cumulative Distribution Function (CDF)#
The cumulative distribution function is
and is also useful to be able to compute. Note that
Survival Function#
The survival function is just
the probability that the random variable is strictly larger than k .
Percent Point Function (Inverse CDF)#
The percent point function is the inverse of the cumulative distribution function and is
for discrete distributions, this must be modified for cases where there is no x_{k} such that F\left(x_{k}\right)=q. In these cases we choose G\left(q\right) to be the smallest value x_{k}=G\left(q\right) for which F\left(x_{k}\right)\geq q . If q=0 then we define G\left(0\right)=a-1 . This definition allows random variates to be defined in the same way as with continuous rv’s using the inverse cdf on a uniform distribution to generate random variates.
Inverse survival function#
The inverse survival function is the inverse of the survival function
and is thus the smallest non-negative integer k for which F\left(k\right)\geq1-\alpha or the smallest non-negative integer k for which S\left(k\right)\leq\alpha.
Hazard functions#
If desired, the hazard function and the cumulative hazard function could be defined as
and
Moments#
Non-central moments are defined using the PDF
Central moments are computed similarly \mu=\mu_{1}^{\prime}
The mean is the first moment
the variance is the second central moment
Skewness is defined as
while (Fisher) kurtosis is
so that a normal distribution has a kurtosis of zero.
Moment generating function#
The moment generating function is defined as
Moments are found as the derivatives of the moment generating function evaluated at 0.
Fitting data#
To fit data to a distribution, maximizing the likelihood function is common. Alternatively, some distributions have well-known minimum variance unbiased estimators. These will be chosen by default, but the likelihood function will always be available for minimizing.
If f_{i}\left(k;\boldsymbol{\theta}\right) is the PDF of a random-variable where \boldsymbol{\theta} is a vector of parameters ( e.g. L and S ), then for a collection of N independent samples from this distribution, the joint distribution the random vector \mathbf{k} is
The maximum likelihood estimate of the parameters \boldsymbol{\theta} are the parameters which maximize this function with \mathbf{x} fixed and given by the data:
Where
Standard notation for mean#
We will use
where N should be clear from context.
Combinations#
Note that
and has special cases of
and
If n<0 or k<0 or k>n we define \left(\begin{array}{c} n\\ k\end{array}\right)=0
Discrete Distributions in scipy.stats
#
- Bernoulli Distribution
- Beta-Binomial Distribution
- Binomial Distribution
- Boltzmann (truncated Planck) Distribution
- Planck (discrete exponential) Distribution
- Poisson Distribution
- Geometric Distribution
- Negative Binomial Distribution
- Hypergeometric Distribution
- Fisher’s Noncentral Hypergeometric Distribution
- Wallenius’ Noncentral Hypergeometric Distribution
- Negative Hypergeometric Distribution
- Zipf (Zeta) Distribution
- Zipfian Distribution
- Logarithmic (Log-Series, Series) Distribution
- Discrete Uniform (randint) Distribution
- Discrete Laplacian Distribution
- Yule-Simon Distribution