From Wikipedia, the free encyclopedia  View original article

In numerical analysis, a branch of mathematics, there are several square root algorithms or methods for calculating the principal square root of a nonnegative real number. For the square roots of a negative or complex number, see below.
Finding is the same as solving the equation . Therefore, any general numerical rootfinding algorithm can be used. Newton's method, for example, reduces in this case to the socalled Babylonian method:
Generally, these methods yield approximate results. To get a higher precision for the root, a higher precision for the square is required and a larger number of steps must be calculated.
Many square root algorithms require an initial seed value. If the initial seed value is far away from the actual square root, the algorithm will be slowed down. It is therefore useful to have a rough estimate, which may be very inaccurate but easy to calculate. With expressed in scientific notation as where and n is an integer, the square root can be estimated as
The factors two and six are used because they approximate the geometric means of the lowest and highest possible values with the given number of digits: and .
For , the estimate is .
When working in the binary numeral system (as computers do internally), by expressing as where , the square root can be estimated as , since the geometric mean of the lowest and highest possible values is .
For the binary approximation gives
These approximations are useful to find better seeds for iterative algorithms, which results in faster convergence.
Perhaps the first algorithm used for approximating is known as the "Babylonian method", named after the Babylonians,^{[1]} or "Hero's method", named after the firstcentury Greek mathematician Hero of Alexandria who gave the first explicit description of the method.^{[2]} It can be derived from (but predates by 16 centuries) Newton's method. The basic idea is that if x is an overestimate to the square root of a nonnegative real number S then will be an underestimate and so the average of these two numbers may reasonably be expected to provide a better approximation (though the formal proof of that assertion depends on the inequality of arithmetic and geometric means that shows this average is always an overestimate of the square root, as noted in the article on square roots, thus assuring convergence).
More precisely, if is our initial guess of and is the error in our estimate such that , then we can expand the binomial and solve for
Therefore, we can compensate for the error and update our old estimate as
Since the computed error was not exact, this becomes our next best guess. The process of updating is iterated until desired accuracy is obtained. This is a quadratically convergent algorithm, which means that the number of correct digits of the approximation roughly doubles with each iteration. It proceeds as follows:
It can also be represented as:
This algorithm works equally well in the padic numbers, but cannot be used to identify real square roots with padic square roots; one can, for example, construct a sequence of rational numbers by this method that converges to +3 in the reals, but to −3 in the 2adics.
To calculate , where S = 125348, to 6 significant figures, use the rough estimation method above to get
Therefore,
Let the relative error in x_{n} be defined by
and thus
Then it can be shown that
and thus that
and consequently that convergence is assured provided that x_{0} and S are both positive.
If using the rough estimate above with the Babylonian method, then the worst cases are^{[clarification needed]}:
Thus in any case,
Remember that rounding errors will slow the convergence. It is recommended to keep at least one extra digit beyond the desired accuracy of the x_{n} being calculated to minimize round off error.
This is a method to find each digit of the square root in a sequence. It is slower than the Babylonian method (if you have a calculator that can divide in one operation), but it has several advantages:
Napier's bones include an aid for the execution of this algorithm. The shifting nth root algorithm is a generalization of this method.
Suppose we are to find the square root of N by expressing it as a sum of n positive numbers such that
By repeatedly applying the basic identity
the righthandside term can be expanded as
This expression allows us to find the square root by sequentially guessing the values of s. Suppose that the numbers have already been guessed, then the mth term of the righthandside of above summation is given by where is the approximate square root found so far. Now each new guess should satisfy the recursion
such that for all with initialization When the exact square root has been found; if not, then the sum of s gives a suitable approximation of the square root, with being the approximation error.
For example, in the decimal number system we have
where are place holders and the coefficients . At any mth stage of the square root calculation, the approximate root found so far, and the summation term are given by
Here since the place value of is an even power of 10, we only need to work with the pair of most significant digits of the remaining term at any mth stage. The section below codifies this procedure. It is obvious that a similar method can be used to compute the square root in number systems other than the decimal number system. For instance, finding the digitbydigit square root in the binary number system is quite efficient since the value of is searched from a smaller set of binary digits {0,1}. This makes the computation faster since at each stage the value of is either for or for . The fact that we have only two possible options for also makes the process of deciding the value of at mth stage of calculation easier. This is because we only need to check if for If this condition is satisfied, then we take ; if not then Also, the fact that multiplication by 2 is done by left bitshifts helps in the computation.
Write the original number in decimal form. The numbers are written similar to the long division algorithm, and, as in long division, the root will be written on the line above. Now separate the digits into pairs, starting from the decimal point and going both left and right. The decimal point of the root will be above the decimal point of the square. One digit of the root will appear above each pair of digits of the square.
Beginning with the leftmost pair of digits, do the following procedure for each pair:
Find the square root of 152.2756.
1 2. 3 4 / \/ 01 52.27 56 01 1*1 <= 1 < 2*2 x = 1 01 y = x*x = 1*1 = 1 00 52 22*2 <= 52 < 23*3 x = 2 00 44 y = (20+x)*x = 22*2 = 44 08 27 243*3 <= 827 < 244*4 x = 3 07 29 y = (240+x)*x = 243*3 = 729 98 56 2464*4 <= 9856 < 2465*5 x = 4 98 56 y = (2460+x)*x = 2464*4 = 9856 00 00 Algorithm terminates: Answer is 12.34
Find the square root of 2.
1. 4 1 4 2 / \/ 02.00 00 00 00 02 1*1 <= 2 < 2*2 x = 1 01 y = x*x = 1*1 = 1 01 00 24*4 <= 100 < 25*5 x = 4 00 96 y = (20+x)*x = 24*4 = 96 04 00 281*1 <= 400 < 282*2 x = 1 02 81 y = (280+x)*x = 281*1 = 281 01 19 00 2824*4 <= 11900 < 2825*5 x = 4 01 12 96 y = (2820+x)*x = 2824*4 = 11296 06 04 00 28282*2 <= 60400 < 28283*3 x = 2 The desired precision is achieved: The square root of 2 is about 1.4142
Inherent to digitbydigit algorithms is a search and test step: find a digit, , when added to the right of a current solution , such that , where is the value for which a root is desired. Expanding: . The current value of —or, usually, the remainder—can be incrementally updated efficiently when working in binary, as the value of will have a single bit set (a power of 2), and the operations needed to compute and can be replaced with faster bit shift operations.
Here we obtain the square root of 81, which when converted into binary gives 1010001. The numbers in the left column gives the option between that number or zero to be used for subtraction at that stage of computation. The final answer is 1001, which in decimal is 9.
1 0 0 1  √ 1010001 1 1 1  101 01 0  1001 100 0  10001 10001 10001  0
This gives rise to simple computer implementations:^{[3]}
short isqrt(short num) { short res = 0; short bit = 1 << 14; // The secondtotop bit is set: 1 << 30 for 32 bits // "bit" starts at the highest power of four <= the argument. while (bit > num) bit >>= 2; while (bit != 0) { if (num >= res + bit) { num = res + bit; res = (res >> 1) + bit; } else res >>= 1; bit >>= 2; } return res; }
Using the notation above, the variable "bit" corresponds to which is , the variable "res" is equal to , and the variable "num" is equal to the current which is the difference of the number we want the square root of and the square of our current approximation with all bits set up to . Thus in the first loop, we want to find the highest power of 4 in "bit" to find the highest power of 2 in . In the second loop, if num is greater than res + bit, then is greater than and we can subtract it. The next line, we want to add to which means we want to add to so we want res = res + bit<<1. Then update to inside res which involves dividing by 2 or another shift to the right. Combining these 2 into one line leads to res = res>>1 + bit. If isn't greater than then we just update to inside res and divide it by 2. Then we update to in bit by dividing it by 4. The final iteration of the 2nd loop has bit equal to 1 and will cause update of to run one extra time removing the factor of 2 from res making it our integer approximation of the root.
Faster algorithms, in binary and decimal or any other base, can be realized by using lookup tables—in effect trading more storage space for reduced run time.^{[4]}
Pocket calculators typically implement good routines to compute the exponential function and the natural logarithm, and then compute the square root of S using the identity found using the properties of logarithms () and exponentials ():
The denominator in the fraction corresponds to the n^{th} root. In the case above the denominator is 2, hence the equation specifies that the square root is to be found. The same identity is used when computing square roots with logarithm tables or slide rules.
This method for finding an approximation to a square root was described in an ancient Indian mathematical manuscript called the Bakhshali manuscript. It is equivalent to two iterations of the Babylonian method beginning with N. The original presentation goes as follows: To calculate , let N^{2} be the nearest perfect square to S. Then, calculate:
This can be also written as:
Find
The Vedic duplex method from the book 'Vedic Mathematics' is a variant of the digit by digit method for calculating the square root.^{[5]} The duplex is the square of the central digit plus double the crossproduct of digits equidistant from the center. The duplex is computed from the quotient digits (square root digits) computed thus far, but after the initial digits. The duplex is subtracted from the dividend digit prior to the second subtraction for the product of the quotient digit times the divisor digit. For perfect squares the duplex and the dividend will get smaller and reach zero after a few steps. For nonperfect squares the decimal value of the square root can be calculated to any precision desired. However, as the decimal places proliferate, the duplex adjustment gets larger and longer to calculate. The duplex method follows the Vedic ideal for an algorithm, oneline, mental calculation. It is flexible in choosing the first digit group and the divisor. Small divisors are to be avoided by starting with a larger initial group.
We proceed as with the digitbydigit calculation by assuming that we want to express a number N as a square of the sum of n positive numbers as
Define divisor as and the duplex for a sequence of m numbers as
Thus, we can reexpress the above identity in terms of the divisor and the duplexes as
Now the computation can proceed by recursively guessing the values of so that
such that for all , with initialization When the algorithm terminates and the sum of s give the square root. The method is more similar to long division where is the dividend and is the remainder.
For the case of decimal numbers, if
where , then the initiation and the divisor will be . Also the product at any mth stage will be and the duplexes will be , where are the duplexes of the sequence . At any mth stage, we see that the place value of the duplex is one less than the product . Thus, in actual calculations it is customary to subtract the duplex value of the mth stage at (m+1)th stage. Also, unlike the previous digitbydigit square root calculation, where at any given mth stage, the calculation is done by taking the most significant pair of digits of the remaining term , the duplex method uses only a single most significant digit of .
In other words, to calculate the duplex of a number, double the product of each pair of equidistant digits plus the square of the center digit (of the digits to the right of the colon).
Number => Calculation = Duplex 3 ==> 3^{2} = 9 14 ==>2(1·4) = 8 574 ==> 2(5·4) + 7^{2} = 89 1,421 ==> 2(1·1) + 2(4·2) = 2 + 16 = 18 10,523 ==> 2(1·3) + 2(0·2) + 5^{2} = 6+0+25 = 31 406,739 ==> 2(4·9)+ 2(0·3)+ 2(6·7) = 72+0+84 = 156
In a square root calculation the quotient digit set increases incrementally for each step.
Consider the perfect square 2809 = 53^{2}. Use the duplex method to find the square root of 2,809.
Find the square root of 2809. Set down the number in groups of two digits. The number of groups gives the number of whole digits in the root. Put a colon after the first group, 28, to separate it. From the first group, 28, obtain the divisor, 10, since 28>25=5^{2} and by doubling this first root, 2x5=10. Gross dividend: 28: 0 9. Using mental math: Divisor: 10) 3 0 Square: 10) 28: _{3}0 9 Duplex, Deduction: 25: xx 09 Square root: 5: 3. 0 Dividend: 30 00 Remainder: 3: 00 00 Square Root, Quotient: 5: 3. 0
This method is applicable for finding the square root of and converges best for . This, however, is no real limitation for a computer based calculation, as in base 2 floating point and fixed point representations, it is trivial to multiply by an integer power of 4, and therefore by the corresponding power of 2, by changing the exponent or by shifting, respectively. Therefore, can be moved to the range . Moreover, the following method does not employ general divisions, but only additions, subtractions, multiplications, and divisions by powers of two, which are again trivial to implement. A disadvantage of the method is that numerical errors accumulate, in contrast to single variable iterative methods such as the Babylonian one.
The initialization step of this method is
while the iterative steps read
Then, (while ).
Note that the convergence of , and therefore also of , is quadratic.
The proof of the method is rather easy. First, rewrite the iterative definition of as
Then it is straightforward to prove by induction that
and therefore the convergence of to the desired result is ensured by the convergence of to 0, which in turn follows from .
This method was developed around 1950 by M. V. Wilkes, D. J. Wheeler and S. Gill^{[6]} for use on EDSAC, one of the first electronic computers.^{[7]} The method was later generalized, allowing the computation of nonsquare roots.^{[8]}
The following are iterative methods for finding the reciprocal square root of S which is . Once it has been found, find by simple multiplication: . These iterations involve only multiplication, and not division. They are therefore faster than the Babylonian method. However, they are not stable. If the initial value is not close to the reciprocal square root, the iterations will diverge away from it rather than converge to it. It can therefore be advantageous to perform an iteration of the Babylonian method on a rough estimate before starting to apply these methods.
Some computers use Goldschmidt's algorithm to simultaneously calculate and . Goldschmidt's algorithm finds faster than NewtonRaphson iteration on a computer with a fused multiply–add instruction and either a pipelined floating point unit or two independent floatingpoint units. Two ways of writing Goldschmidt's algorithm are:^{[9]}
Each iteration:
until is sufficiently close to 1, or a fixed number of iterations.
which causes
Goldschmidt's equation can be rewritten as:
Each iteration: (All 3 operations in this loop are in the form of a fused multiply–add.)
until is sufficiently close to 0, or a fixed number of iterations.
which causes
If N is an approximation to , a better approximation can be found by using the Taylor series of the square root function:
As an iterative method, the order of convergence is equal to the number of terms used. With 2 terms, it is identical to the Babylonian method; With 3 terms, each iteration takes almost as many operations as the Bakhshali approximation, but converges more slowly. Therefore, this is not a particularly efficient way of calculation. To maximize the rate of convergence, choose N so that is as small as possible.
Other methods are less efficient than the ones presented above.
A completely different method for computing the square root is based on the CORDIC algorithm, which uses only very simple operations (addition, subtraction, bitshift and table lookup, but no multiplication). The square root of S may be obtained as the output using the hyperbolic coordinate system in vectoring mode, with the following initialization:^{[10]}
Quadratic irrationals (numbers of the form , where a, b and c are integers), and in particular, square roots of integers, have periodic continued fractions. Sometimes what is desired is finding not the numerical value of a square root, but rather its continued fraction expansion. The following iterative algorithm can be used for this purpose (S is any natural number that is not a perfect square):
Notice that m_{n}, d_{n}, and a_{n} are always integers. The algorithm terminates when this triplet is the same as one encountered before. The algorithm can also terminate on a_{i} when a_{i} = 2 a_{0},^{[11]} which is easier to implement.
The expansion will repeat from then on. The sequence [a_{0}; a_{1}, a_{2}, a_{3}, …] is the continued fraction expansion:
Begin with m_{0} = 0; d_{0} = 1; and a_{0} = 10 (10^{2} = 100 and 11^{2} = 121 > 114 so 10 chosen).
So, m_{1} = 10; d_{1} = 14; and a_{1} = 1.
Next, m_{2} = 4; d_{2} = 7; and a_{2} = 2.
Now, loop back to the second equation above.
Consequently, the simple continued fraction for the square root of 114 is
Its actual value is approximately 10.67707 82520 31311 21....
A more rapid method is to evaluate its generalized continued fraction. From the formula derived there:
and the fact that 114 is 2/3 of the way between 10^{2}=100 and 11^{2}=121 results in
which is simply the aforementioned [10;1,2, 10,2,1, 20,1,2, 10,2,1, 20,1,2, ...] evaluated at every third term. Combining pairs of fractions produces
which is now [10;1,2, 10,2,1,20,1,2, 10,2,1,20,1,2, ...] evaluated at the third term and every six terms thereafter.
Pell's equation (also known as Brahmagupta equation since he was the first to give a solution to this particular equation) and its variants yield a method for efficiently finding continued fraction convergents of square roots of integers. However, it can be complicated to execute, and usually not every convergent is generated. The ideas behind the method are as follows:
The method is as follows:
A number is represented in a floating point format as which is also called scientific notation. Its square root is and similar formulae would apply for cube roots and logarithms. On the face of it, this is no improvement in simplicity, but suppose that only an approximation is required: then just is good to an order of magnitude. Next, recognise that some powers, p, will be odd, thus for 3141.59 = 3.14159x10^{3} rather than deal with fractional powers of the base, multiply the mantissa by the base and subtract one from the power to make it even. The adjusted representation will become the equivalent of 31.4159x10^{2} so that the square root will be √31.4159 x 10.
If the integer part of the adjusted mantissa is taken, there can only be the values 1 to 99, and that could be used as an index into a table of 99 precomputed square roots to complete the estimate. A computer using base sixteen would require a larger table, but one using base two would require only three entries: the possible bits of the integer part of the adjusted mantissa are 01 (the power being even so there was no shift, remembering that a normalised floating point number always has a nonzero highorder digit) or if the power was odd, 10 or 11, these being the first two bits of the original mantissa. Thus, 6.25 = 110.01 in binary, normalised to 1.1001 x 2^{2} an even power so the paired bits of the mantissa are 01, while .625 = 0.101 in binary normalises to 1.01 x 2^{1} an odd power so the adjustment is to 10.1 x 2^{2} and the paired bits are 10. Notice that the low order bit of the power is echoed in the high order bit of the pairwise mantissa. An even power has its loworder bit zero and the adjusted mantissa will start with 0, whereas for an odd power that bit is one and the adjusted mantissa will start with 1. Thus, when the power is halved, it is as if its low order bit is shifted out to become the first bit of the pairwise mantissa.
A table with only three entries could be enlarged by incorporating additional bits of the mantissa. However, with computers, rather than calculate an interpolation into a table, it is often better to find some simpler calculation giving equivalent results. Everything now depends on the exact details of the format of the representation, plus what operations are available to access and manipulate the parts of the number. For example, Fortran offers an EXPONENT(x) function to obtain the power. Effort expended in devising a good initial approximation is to be recouped by thereby avoiding the additional iterations of the refinement process that would have been needed for a poor approximation. Since these are few (one iteration requires a divide, an add, and a halving) the constraint is severe.
Many computers follow the IEEE (or sufficiently similar) representation, and a very rapid approximation to the square root can be obtained for starting Newton's method. The technique that follows is based on the fact that the floating point format (in base two) approximates the base2 logarithm. That is
So for a 32bit single precision floating point number in IEEE format (where notably, the power has a bias of 127 added for the represented form) you can get the approximate logarithm by interpreting its binary representation as a 32bit integer, scaling it by , and removing a bias of 127, i.e.
For example, 1.0 is represented by a hexadecimal number 0x3F800000, which would represent if taken as an integer. Using the formula above you get , as expected from . In a similar fashion you get 0.5 from 1.5 (0x3FC00000).
To get the square root, divide the logarithm by 2 and convert the value back. The following program demonstrates the idea. Note that the exponent's lowest bit is intentionally allowed to propagate into the mantissa. One way to justify the steps in this program is to assume is the exponent bias and is the number of explicitly stored bits in the mantissa and then show that
/* Assumes that float is in the IEEE 754 single precision floating point format * and that int is 32 bits. */ float sqrt_approx(float z) { int val_int = *(int*)&z; /* Same bits, but as an int */ /* * To justify the following code, prove that * * ((((val_int / 2^m)  b) / 2) + b) * 2^m = ((val_int  2^m) / 2) + ((b + 1) / 2) * 2^m) * * where * * b = exponent bias * m = number of mantissa bits * * . */ val_int = 1 << 23; /* Subtract 2^m. */ val_int >>= 1; /* Divide by 2. */ val_int += 1 << 29; /* Add ((b + 1) / 2) * 2^m. */ return *(float*)&val_int; /* Interpret again as float */ }
The three mathematical operations forming the core of the above function can be expressed in a single line. An additional adjustment can be added to reduce the maximum relative error. So, the three operations, not including the cast, can be rewritten as
val_int = (1 << 29) + (val_int >> 1)  (1 << 22) + a;
where a is a bias for adjusting the approximation errors. For example, with a = 0 the results are accurate for even powers of 2 (e.g., 1.0), but for other numbers the results will be slightly too big (e.g.,1.5 for 2.0 instead of 1.414... with 6% error). With a = 0x4C000, the errors are between about 3.5% and 3.5%.
If the approximation is to be used for an initial guess for Newton's method to the equation , then the reciprocal form shown in the following section is preferred.
A variant of the above routine is included below, which can be used to compute the reciprocal of the square root, i.e., instead, was written by Greg Walsh, and implemented into SGI Indigo by Gary Tarolli.^{[12]}^{[13]} The integershift approximation produced a relative error of less than 4%, and the error dropped further to 0.15% with one iteration of Newton's method on the following line.^{[14]} In computer graphics it is a very efficient way to normalize a vector.
float invSqrt(float x) { float xhalf = 0.5f*x; union { float x; int i; } u; u.x = x; u.i = 0x5f3759df  (u.i >> 1); /* The next line can be repeated any number of times to increase accuracy */ u.x = u.x * (1.5f  xhalf * u.x * u.x); return u.x; }
Some VLSI hardware implements inverse square root using a second degree polynomial estimation followed by a Goldschmidt iteration.^{[15]}
If S < 0, then its principal square root is
If S = a+bi where a and b are real and b ≠ 0, then its principal square root is
This can be verified by squaring the root.^{[16]}^{[17]} Here
is the modulus of S. The principal square root of a complex number is defined to be the root with the nonnegative real part.