From Wikipedia, the free encyclopedia - View original article
|This article may need to be rewritten entirely to comply with Wikipedia's quality standards. (April 2012)|
In computer programming, the term magic number has multiple meanings. It could refer to one or more of the following:
The format indicator type of magic number was initially found in early Seventh Edition source code of the Unix operating system and, although it has lost its original meaning, the term magic number has become part of computer industry lexicon.
When Unix was ported to one of the first DEC PDP-11/20s it did not have memory protection and, therefore, early versions of Unix used the relocatable memory reference model. Thus, pre-Sixth Edition Unix versions read an executable file into memory and jumped to the first low memory address of the program, relative address zero. With the development of paged versions of Unix, a header was created to describe the executable image components. Also, a branch instruction was inserted as the first word of the header to skip the header and start the program. In this way a program could be run in the older relocatable memory reference (regular) mode or in paged mode. As more executable formats were developed, new constants were added by incrementing the branch offset.
In the Sixth Edition source code of the Unix program loader, the exec() function read the executable (binary) image from the file system. The first 8 bytes of the file was a header containing the sizes of the program (text) and initialized (global) data areas. Also, the first 16-bit word of the header was compared to two constants to determine if the executable image contained relocatable memory references (normal), the newly implemented paged read-only executable image, or the separated instruction and data paged image. There was no mention of the dual role of the header constant, but the high order byte of the constant was, in fact, the operation code for the PDP-11 branch instruction (octal 000407 or hex 0107). Adding seven to the program counter showed that if this constant was executed, it would branch the Unix exec() service over the executable image eight byte header and start the program.
Since the Sixth and Seventh Editions of Unix employed paging code, the dual role of the header constant was hidden. That is, the exec() service read the executable file header (meta) data into a kernel space buffer, but read the executable image into user space, thereby not using the constant's branching feature. Magic number creation was implemented in the Unix linker and loader and magic number branching was probably still used in the suite of stand-alone diagnostic programs that came with the Sixth and Seventh Editions. Thus, the header constant did provide an illusion and met the criteria for magic.
In Version Seven Unix, the header constant was not tested directly, but assigned to a variable labeled ux_mag and subsequently referred to as the magic number. Probably because of its uniqueness, the term magic number came to mean executable format type, then expanded to mean file system type, and expanded again to mean any strongly typed file.
Magic numbers are common in programs across many operating systems. Magic numbers implement strongly typed data and are a form of in-band signaling to the controlling program that reads the data type(s) at program run-time. Many files have such constants that identify the contained data. Detecting such constants in files is a simple and effective way of distinguishing between many file formats and can yield further run-time information.
CAFEBABE. When compressed with Pack200 the bytes are changed to
61) or "GIF87a" (
D8and end with
D9. JPEG/JFIF files contain the ASCII code for "JFIF" (
46) as a null terminated string. JPEG/Exif files contain the ASCII code for "Exif" (
66) also as a null terminated string, followed by more metadata about the file.
0A). That signature contains various newline characters to permit detecting unwarranted automated newline conversions, such as transferring the file using FTP with the ASCII transfer mode instead of the binary mode.
64) followed by more metadata.
21) followed by the path to an interpreter, if the interpreter is likely to be different than the one from which the script was invoked.
5A), the initials of the designer of the file format, Mark Zbikowski. The definition allows "ZM" (
4D) as well, but this is quite uncommon.
54depending on version; both represent the birthday of the author, Marshall Kirk McKusick.
55as its last two bytes.
21) as a prefix.
MMfollowed by 42 as a two-byte integer in little or big endian byte ordering.
IIis for Intel, which uses little endian byte ordering, so the magic number is
MMis for Motorola, which uses big endian byte ordering, so the magic number is
FFfor big endian and
FEfor little endian). UTF-8 text files often start with the UTF-8 encoding of the same character,
WAD2(for Quake) and
E0, which is visually suggestive of the word "DOCFILE0".
4B), the initials of Phil Katz, author of DOS compression utility PKZIP.
The Unix utility program
file can read and interpret magic numbers from files, and indeed, the file which is used to parse the information is called magic. The Windows utility TrID has a similar purpose.
42, for "Remote Frame Buffer") followed by the client's protocol version number.
"\xFFSMB"at the start of the SMB request.
05at the start of the request (representing Microsoft DCE/RPC Version 5), followed immediately by a
01for the minor version. In UDP-based MSRPC requests the first byte is always
57). Debugging extensions (used for DCOM channel hooking) are prefaced with the byte sequence "MARB" (
19representing the header length, followed immediately by the phrase "BitTorrent protocol" at byte position 1.
E3represents an eDonkey client,
C5represents eMule, and
D4represents compressed eMule.
80and an SSLv3 server response to a client hello begins with
16(though this may vary).
0x63' at the start of the options section of the packet. This value is included in all DHCP packet types.
The term magic number or magic constant also refers to the programming practice of using numbers directly in source code. This has been referred to as breaking one of the oldest rules of programming, dating back to the COBOL, FORTRAN and PL/1 manuals of the 1960s. The use of unnamed magic numbers in code obscures the developers' intent in choosing that number, increases opportunities for subtle errors (e.g. is every digit correct in 3.14159265358979323846 and is this equal to 3.14159?) and makes it more difficult for the program to be adapted and extended in the future. Replacing all significant magic numbers with named constants makes programs easier to read, understand and maintain.
Names chosen should be meaningful in terms of the domain. It is easy to imagine nonsense like
int EIGHT = 16 resulting when
NUMBER_OF_BITS might have been a better choice of name in the first place.
The problems associated with magic 'numbers' described above are not limited to numerical types and the term is also applied to other data types where declaring a named constant would be more flexible and communicative. Thus, declaring
const string testUserName = "John" is better than several occurrences of the 'magic number'
"John" in a test suite.
for i from 1 to 52 j := i + randomInt(53 - i) - 1 a.swapEntries(i, j)
a is an array object, the function
randomInt(x) chooses a random integer between 1 to x, inclusive, and
swapEntries(i, j) swaps the ith and jth entries in the array. In the preceding example,
52 is a magic number. It is considered better programming style to write the following:
constant int deckSize := 52 for i from 1 to deckSize j := i + randomInt(deckSize + 1 - i) - 1 a.swapEntries(i, j)
This is preferable for several reasons:
deckSizevariable in the second example would be a simple, one-line change.
deckSizeinto a parameter of that procedure. The first example would require several changes, perhaps:
function shuffle (int deckSize) for i from 1 to deckSize j := i + randomInt(deckSize + 1 - i) - 1 a.swapEntries(i, j)
|This section needs additional citations for verification. (March 2010)|
In some contexts the use of unnamed numerical constants is generally accepted (and arguably "not magic"). While such acceptance is subjective, and often depends on individual coding habits, the following are common examples:
for (int i = 0; i < max; i = i + 1)(assuming
++iis not supported)
isEven = (x % 2 == 0), where
%is the modulo operator
circumference = 2 * Math.PI * radius, or for calculating the discriminant of a quadratic equation as
d = b^2 − 4*a*c
The constants 1 and 0 are sometimes used to represent the boolean values True and False in programming languages without a boolean type such as older versions of C. Most modern programming languages provide a
bool primitive type and so the use of 0 and 1 is ill-advised.
In C and C++, 0 is sometimes used to represent the null pointer. As with boolean values, the C standard library includes a macro definition
NULL whose use is encouraged. Other languages provide a specific
nil value and when this is the case no alternative should be used. Starting with C++11, the typed pointer constant
nullptr has been introduced.
It is possible to create or alter globally unique identifiers (GUIDs) so that they are memorable, but this is highly discouraged as it compromises their strength as near-unique identifiers. The specifications for generating GUIDs and UUIDs are quite complex, which is what leads to them being guaranteed unique, if properly implemented. They should only be generated by a reputable software tool.
Java uses several GUIDs starting with
Magic debug values are specific values written to memory during allocation or deallocation, so that it will later be possible to tell whether or not they have become corrupted, and to make it obvious when values taken from uninitialized memory are being used. Memory is usually viewed in hexadecimal, so memorable repeating or hexspeak values are common. Numerically odd values may be preferred so that processors without byte addressing will fault when attempting to use them as pointers (which must fall at even addresses). Values should be chosen that are away from likely addresses (the program code, static data, heap data, or the stack). Similarly, they may be chosen so that they are not valid codes in the instruction set for the given architecture.
Since it is very unlikely, although possible, that a 32-bit integer would take this specific value, the appearance of such a number in a debugger or memory dump most likely indicates an error such as a buffer overflow or an uninitialized variable.
Famous and common examples include:
|Used by a number of RTOSes|
|Multiboot header magic number|
|Used by Apple as the exception code in iOS crash reports when an application has taken too long to launch or terminate.|
|Used in embedded development because the alternating bit pattern (10100101) creates an easily recognized pattern on oscilloscopes and logic analyzers.|
|Used in FreeBSD's PHK malloc(3) for debugging when /etc/malloc.conf is symlinked to "-J" to initialize all newly allocated memory as this value is not a NULL pointer or ASCII NUL character.|
|Used by Microsoft's HeapAlloc() to mark "no man's land" guard bytes after allocated heap memory|
|Used by Apple as the "Boot Zero Block" magic number|
|Used to initialize all unallocated memory (Mungwall, AmigaOS).|
|Required by Microsoft's Hyper-V hypervisor to be used by Linux guests as their "guest signature"|
|Used by Microsoft's LocalAlloc(LMEM_FIXED) to mark uninitialised allocated heap memory|
|Used by Apple as the exception code in iOS crash reports when a VoIP application has been terminated because it resumed too frequently|
|Burroughs large systems "uninitialized" memory (48-bit words)|
|Used on IBM RS/6000 64-bit systems to indicate uninitialized CPU registers|
|Error Code returned to the Microsoft eVC debugger when connection is severed to the debugger|
|On Sun Microsystems' Solaris, marks uninitialised kernel memory (KMEM_UNINITIALIZED_PATTERN)|
|Used in WebKit|
|Used by Microsoft .NET as a magic number in resource files|
|Used by both Universal Mach-O binaries and Java .class files|
|Used by Java for their pack200 compression|
|Used by Sun Microsystems' Solaris debugging kernel to mark kmemfree() memory|
|Used by Microsoft's C++ debugging runtime library to mark uninitialised stack memory|
|Used by Microsoft's C++ debugging runtime library to mark uninitialised heap memory|
|Seen in Intel Mach-O binaries on Apple Inc.'s Mac OS X platform (see |
|Used as a flag to indicate regular boot on the Nintendo GameCube and Wii consoles|
|Used by MicroQuill's SmartHeap and Microsoft's C++ debugging heap to mark freed heap memory|
|Used by Apple as the exception code in iOS crash reports when an application has been terminated because it held on to a system resource (like the address book database) while running in the background|
|Used at the start of Silicon Graphics' IRIX arena files|
|Famously used on IBM systems such as the RS/6000, also used in the original Mac OS operating systems, OPENSTEP Enterprise, and the Commodore Amiga. On Sun Microsystems' Solaris, marks freed kernel memory (KMEM_FREE_PATTERN)|
|Used as a marker in OpenWRT firmware to signify the beginning of the to-be created jffs2 file system at the end of the static firmware|
|Used by Android in the Dalvik virtual machine to indicate a VM abort|
|A Microsoft Windows STOP Error code used when the user manually initiates the crash.|
|Used by Mungwall on the Commodore Amiga to mark allocated but uninitialised memory |
|Used by Apple as the exception code in iOS crash reports when the user has force-quit the application.|
|Used for OpenSolaris core dumps|
|From MicroQuill's SmartHeap|
|Used by Alpha servers running Windows NT. The Alpha Hardware Abstraction Layer (HAL) generates this error when it encounters a hardware failure/|
|Comes at the end to identify every AppleScript script|
|Used by Microsoft's C++ debugging heap to mark "no man's land" guard bytes before and after allocated heap memory|
|Used by Linux reboot() syscall|
|Seen in PowerPC Mach-O binaries on Apple Inc.'s Mac OS X platform. On Sun Microsystems' Solaris, marks the red zone (KMEM_REDZONE_PATTERN)|
|Used by Microsoft's HeapFree() to mark freed heap memory|
The prevalence of these values in Microsoft technology is no coincidence; they are discussed in detail in Steve Maguire's book Writing Solid Code from Microsoft Press. He gives a variety of criteria for these values, such as:
Since they were often used to mark areas of memory that were essentially empty, some of these terms came to be used in phrases meaning "gone, aborted, flushed from memory"; e.g. "Your program is DEADBEEF".