From Wikipedia, the free encyclopedia - View original article
|This article has an unclear citation style. (September 2009)|
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers that approximates the properties of random numbers. The sequence is not truly random in that it is completely determined by a relatively small set of initial values, called the PRNG's state, which includes a truly random seed. Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom numbers are important in practice for their speed in number generation and their reproducibility.
PRNGs are central in applications such as simulations (e.g. of physical systems via the Monte Carlo method), in procedural generation, and in cryptography. Cryptographic applications require the output to also be unpredictable, and more elaborate algorithms, which do not inherit the linearity of simpler solutions, are needed.
Common classes of PRNG algorithms include linear congruential generators, lagged Fibonacci generators, and linear feedback shift registers. More recent instances of PRNGs with strong randomness guarantees are based on computational hardness assumptions, and include the Blum Blum Shub, Fortuna, and Mersenne Twister algorithms.
Good statistical properties are a central requirement for the output of a PRNG. In general, careful mathematical analysis is required to have any confidence that a PRNG generates numbers that are sufficiently "random" to suit the intended use. John von Neumann cautioned about the misinterpretation of a PRNG as a truly random generator, and joked that "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin." Robert R. Coveyou of Oak Ridge National Laboratory once titled an article, "Random number generation is too important to be left to chance.", which was cited in Ivars Peterson's book, The Jungles of Randomness.
we call a function (where is the set of positive integers) a pseudo-random number generator for given taking values in iff
( denotes the number of elements in the finite set .)
It can be shown that if is a pseudo-random number generator for the uniform distribution on and if is the CDF of some given probability distribution , then is a pseudo-random number generator for , where is the percentile of , i.e. . Intuitively, an arbitrary distribution can be simulated from a simulation of the standard uniform distribution.
A PRNG can be started from an arbitrary starting state using a seed state. It will always produce the same sequence thereafter when initialized with that state. The period of a PRNG is defined as the maximum over all starting states of the length of the repetition-free prefix of the sequence. The period is bounded by the size of the state, measured in bits. However, since the length of the period potentially doubles with each bit of 'state' added, it is easy to build PRNGs with periods long enough for many practical applications.
If a PRNG's internal state contains n bits, its period can be no longer than 2n results, and may be much shorter. For some PRNGs the period length can be calculated without walking through the whole period. Linear Feedback Shift Registers (LFSRs) are usually chosen to have periods of exactly 2n−1. Linear congruential generators have periods that can be calculated by factoring. Mixes (no restrictions) have periods of about 2n/2 on average, usually after walking through a nonrepeating starting sequence. Mixes that are reversible (permutations) have periods of about 2n−1 on average, and the period will always include the original internal state. Although PRNGs will repeat their results after they reach the end of their period, a repeated result does not imply that the end of the period has been reached, since its internal state may be larger than its output; this is particularly obvious with PRNGs with a 1-bit output.
Most pseudorandom generator algorithms produce sequences which are uniformly distributed by any of several tests. It is an open question, and one central to the theory and practice of cryptography, whether there is any way to distinguish the output of a high-quality PRNG from a truly random sequence without knowing the algorithm(s) used and the state with which it was initialized. The security of most cryptographic algorithms and protocols using PRNGs is based on the assumption that it is infeasible to distinguish use of a suitable PRNG from use of a truly random sequence. The simplest examples of this dependency are stream ciphers, which (most often) work by exclusive or-ing the plaintext of a message with the output of a PRNG, producing ciphertext. The design of cryptographically adequate PRNGs is extremely difficult, because they must meet additional criteria (see below). The size of its period is an important factor in the cryptographic suitability of a PRNG, but not the only one.
In practice, the output from many common PRNGs exhibit artifacts which cause them to fail statistical pattern-detection tests. These include:
Defects exhibited by flawed PRNGs range from unnoticeable (and unknown) to very obvious. An example was the RANDU random number algorithm used for decades on mainframe computers. It was seriously flawed, but its inadequacy went undetected for a very long time.
In many fields, much research work prior to the 21st century that relied on random selection or on Monte Carlo simulations, or in other ways relied on PRNGs, is much less reliable than it might have been as a result of using poor-quality PRNGs.
The first PRNG to avoid major problems and still run fairly fast was the Mersenne Twister (discussed below), which was published in 1997. Several other high-quality PRNGs have since been developed.
An early computer-based PRNG, suggested by John von Neumann in 1946, is known as the middle-square method. The algorithm is as follows: take any number, square it, remove the middle digits of the resulting number as the "random number", then use that number as the seed for the next iteration. For example, squaring the number "1111" yields "1234321", which can be written as "01234321", an 8-digit number being the square of a 4-digit number. This gives "2343" as the "random" number. Repeating this procedure gives "4896" as the next result, and so on. Von Neumann used 10 digit numbers, but the process was the same.
A problem with the "middle square" method is that all sequences eventually repeat themselves, some very quickly, such as "0000". Von Neumann was aware of this, but he found the approach sufficient for his purposes, and was worried that mathematical "fixes" would simply hide errors rather than remove them.
Von Neumann judged hardware random number generators unsuitable, for, if they did not record the output generated, they could not later be tested for errors. If they did record their output, they would exhaust the limited computer memories then available, and so the computer's ability to read and write numbers. If the numbers were written to cards, they would take very much longer to write and read. On the ENIAC computer he was using, the "middle square" method generated numbers at a rate some hundred times faster than reading numbers in from punched cards.
The middle-square method has since been supplanted by more elaborate generators.
The 1997 invention of the Mersenne twister algorithm, avoids many of the problems with earlier generators. It has a period of 219937−1 iterations (≈4.3×106001), is proven to be equidistributed in (up to) 623 dimensions (for 32-bit values), and runs faster than other statistically reasonable generators. It is now increasingly becoming the random number generator of choice for statistical simulations and generative modeling. SIMD-oriented Fast Mersenne Twister (SFMT), a variant of Mersenne Twister, is 2–4 times faster even if it's not compiled with SIMD support.
A PRNG suitable for cryptographic applications is called a cryptographically secure PRNG (CSPRNG). A requirement for a CSPRNG is that an adversary not knowing the seed has only negligible advantage in distinguishing the generator's output sequence from a random sequence. In other words, while a PRNG is only required to pass certain statistical tests, a CSPRNG must pass all statistical tests that are restricted to polynomial time in the size of the seed. Though such property cannot be proven, strong evidence may be provided by reducing the CSPRNG to a problem that is assumed to be hard, such as integer factorization. In general, years of review may be required before an algorithm can be certified as a CSPRNG.
Some classes of CSPRNGs include the following:
The German Federal Office for Information Security (Bundesamt für Sicherheit in der Informationstechnik, BSI) has established four criteria for quality of deterministic random number generators. They are summarized here:
For cryptographic applications, only generators meeting the K3 or K4 standard are acceptable.
Numbers selected from a non-uniform probability distribution can be generated using a uniform distribution PRNG and a function that relates the two distributions.
First, one needs the cumulative distribution function of the target distribution :
Note that . Using a random number c from a uniform distribution as the probability density to "pass by", we get
is a number randomly selected from distribution .
For example, the inverse of cumulative Gaussian distribution with an ideal uniform PRNG with range (0, 1) as input would produce a sequence of (positive only) values with a Gaussian distribution; however