From Wikipedia, the free encyclopedia - View original article
|This article needs additional citations for verification. (February 2013)|
A Database Management Systems (DBMS) is a set of programs that enables storing, modifying, and extracting information from a database, it also provides users with tools to add, delete, access, modify, and analyze data stored in one location. A group can access the data by using query and reporting tools that are part of the DBMS or by using application programs specifically written to access the data. DBMS’s also provide the method for maintaining the integrity of stored data, running security and users access, and recovering information if the system fails. The information from a database can be presented in a variety of formats. Most DBMSs include a report writer program that enables you to output data in the form of a report. Many DBMSs also include a graphics component that enables you to output information in the form of graphs and charts. Database and database management system are essential to all areas of business, they must be carefully managed. There are many different types of DBMSs, ranging from small systems that run on personal computers to huge systems that run on mainframes. The following are examples of database applications: computerized library systems, flight reservation systems, and computerized parts inventory systems. It typically supports query languages, which are in fact high-level programming languages, dedicated database languages that considerably simplify writing database application programs. Database languages also simplify the database organization as well as retrieving and presenting information from it. A DBMS provides facilities for controlling data access, enforcing data integrity, managing concurrency control, and recovering the database after failures and restoring it from backup files, as well as maintaining database security.
Database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions.
Databases have been in use since the earliest days of electronic computing. Unlike modern systems, which can be applied to widely different databases and needs, the vast majority of older systems were tightly linked to the custom databases in order to gain speed at the expense of flexibility. Originally DBMSs were found only in large organizations with the computer hardware needed to support large data sets.
As computers grew in speed and capability, a number of general-purpose database systems emerged; by the mid-1960s there were a number of such systems in commercial use. Interest in a standard began to grow, and Charles Bachman, author of one such product, the Integrated Data Store (IDS) , founded the "Database Task Group" within CODASYL, the group responsible for the creation and standardization of COBOL. In 1971 they delivered their standard, which generally became known as the "Codasyl approach", and soon a number of commercial products based on this approach were made available.
The Codasyl approach was based on the "manual" navigation of a linked data set which was formed into a large network. When the database was first opened, the program was handed back a link to the first record in the database, which also contained pointers to other pieces of data. To find any particular record the programmer had to step through these pointers one at a time until the required record was returned. Simple queries like "find all the people in India" required the program to walk the entire data set and collect the matching results one by one. There was, essentially, no concept of "find" or "search". This may sound like a serious limitation today, but in an era when most data was stored on magnetic tape such operations were too expensive to contemplate anyway. Solutions were found to many of these problems. Prime Computer created a CODASYL compliant DBMS based entirely on B-Trees that circumvented the record by record problem by providing alternate access paths. They also added a query language that was very straightforward. Further, there is no reason that relational normalization concepts cannot be applied to CODASYL databases however, in the final tally, CODASYL was very complex and required significant training and effort to produce useful applications.
IBM also had their own DBMS system in 1968, known as IMS. IMS was a development of software written for the Apollo program on the System/360. IMS was generally similar in concept to Codasyl, but used a strict hierarchy for its model of data navigation instead of Codasyl's network model. Both concepts later became known as navigational databases due to the way data was accessed, and Bachman's 1973 Turing Award presentation was The Programmer as Navigator. IMS is classified as a hierarchical database. IDMS and CINCOM's TOTAL database are classified as network databases.
Edgar Codd worked at IBM in San Jose, California, in one of their offshoot offices that was primarily involved in the development of hard disk systems. He was unhappy with the navigational model of the Codasyl approach, notably the lack of a "search" facility. In 1970, he wrote a number of papers that outlined a new approach to database construction that eventually culminated in the groundbreaking A Relational Model of Data for Large Shared Data Banks.
In this paper, he described a new system for storing and working with large databases. Instead of records being stored in some sort of linked list of free-form records as in Codasyl, Codd's idea was to use a "table" of fixed-length records. A linked-list system would be very inefficient when storing "sparse" databases where some of the data for any one record could be left empty. The relational model solved this by splitting the data into a series of normalized tables (or relations), with optional elements being moved out of the main table to where they would take up room only if needed.
For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach all of these data would be placed in a single record, and unused items would simply not be placed in the database. In the relational approach, the data would be normalized into a user table, an address table and a phone number table (for instance). Records would be created in these optional tables only if the address or phone numbers were actually provided.
Linking the information back together is the key to this system. In the relational model, some bit of information was used as a "key", uniquely defining a particular record. When information was being collected about a user, information stored in the optional tables would be found by searching for this key. For instance, if the login name of a user is unique, addresses and phone numbers for that user would be recorded with the login name as its key. This simple "re-linking" of related data back into a single collection is something that traditional computer languages are not designed for.
Just as the navigational approach would require programs to loop in order to collect records, the relational approach would require loops to collect information about any one record. Codd's solution to the necessary looping was a set-oriented language, a suggestion that would later spawn the ubiquitous SQL. Using a branch of mathematics known as tuple calculus, he demonstrated that such a system could support all the operations of normal databases (inserting, updating etc.) as well as providing a simple system for finding and returning sets of data in a single operation.
Codd's paper was picked up by two people at Berkeley, Eugene Wong and Michael Stonebraker. They started a project known as INGRES using funding that had already been allocated for a geographical database project and student programmers to produce code. Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979. INGRES was similar to System R in a number of ways, including the use of a "language" for data access, known as QUEL. Over time, INGRES moved to the emerging SQL standard.
IBM itself did one test implementation of the relational model, PRTV, and a production one, Business System 12, both now discontinued. Honeywell wrote MRDS for Multics, and now there are two new implementations: Alphora Dataphor and Rel. Most other DBMS implementations usually called relational are actually SQL DBMSs.
In 1970, the University of Michigan began development of the MICRO Information Management System based on D.L. Childs' Set-Theoretic Data model. Micro was used to manage very large data sets by the US Department of Labor, the U.S. Environmental Protection Agency, and researchers from the University of Alberta, the University of Michigan, and Wayne State University. It ran on IBM mainframe computers using the Michigan Terminal System. The system remained in production until 1998.
IBM started working on a prototype system loosely based on Codd's concepts as System R in the early 1970s. The first version was ready in 1974/5, and work then started on multi-table systems in which the data could be split so that all of the data for a record (some of which is optional) did not have to be stored in a single large "chunk". Subsequent multi-user versions were tested by customers in 1978 and 1979, by which time a standardized query language – SQL – had been added. Codd's ideas were establishing themselves as both workable and superior to Codasyl, pushing IBM to develop a true production version of System R, known as SQL/DS, and, later, Database 2 (DB2).
Many of the people involved with INGRES became convinced of the future commercial success of such systems, and formed their own companies to commercialize the work but with an SQL interface. Sybase, Informix, NonStop SQL and eventually Ingres itself were all being sold as offshoots to the original INGRES product in the 1980s. Even Microsoft SQL Server is actually a re-built version of Sybase, and thus, INGRES. Only Larry Ellison's Oracle started from a different chain, based on IBM's papers on System R, and beat IBM to market when the first version was released in 1978.
Stonebraker went on to apply the lessons from INGRES to develop a new database, Postgres, which is now known as PostgreSQL. PostgreSQL is often used for global mission critical applications (the .org and .info domain name registries use it as their primary data store, as do many large companies and financial institutions).
In Sweden, Codd's paper was also read and Mimer SQL was developed from the mid-70s at Uppsala University. In 1984, this project was consolidated into an independent enterprise. In the early 1980s, Mimer introduced transaction handling for high robustness in applications, an idea that was subsequently implemented on most other DBMS.
The 1980s, along with a rise in object oriented programming, saw a growth in how data in various databases were handled. Programmers and designers began to treat the data in their databases as objects. That is to say that if a person's data were in a database, that person's attributes, such as their address, phone number, and age, were now considered to belong to that person instead of being extraneous data. This allows for relations between data to be relations to objects and their attributes and not to individual fields.
Another big game changer for databases in the 1980s was the focus on increasing reliability and access speeds. In 1989, two professors from the University of Wisconsin at Madison published an article at an ACM associated conference outlining their methods on increasing database performance. The idea was to replicate specific important, and often queried information, and store it in a smaller temporary database that linked these key features back to the main database. This meant that a query could search the smaller database more quickly, rather than search the entire dataset. This eventually leads to the practice of indexing, which is used by almost every operating system from Windows to the system that operates Apple iPod devices.
In the 21st century a new trend of NoSQL databases was started. Those non-relational databases are significantly different from the classic relational databases. They often do not require fixed table schemas, avoid join operations by storing denormalized data, and are designed to scale horizontally. Most of them can be classified as either key-value stores or document-oriented databases.
In recent years there was a high demand for massively distributed databases with high partition tolerance but according to the CAP theorem it is impossible for a distributed system to simultaneously provide consistency, availability and partition tolerance guarantees. A distributed system can satisfy any two of these guarantees at the same time, but not all three. For that reason many NoSQL databases are using what is called eventual consistency to provide both availability and partition tolerance guarantees with a maximum level of data consistency.
A subset of NoSQL databases are XML databases. They all use industry standard XML data storage format. XML is open, machine-readable and cross-platform data format widely used for interoperability among different IT systems.
All XML databases can be attributed to document-oriented databases testing the additions.
A modeling language is a data modeling language to define the schema of each database hosted in the DBMS, according to the DBMS database model. Database management systems (DBMS) are designed to use one of five database structures to provide simplistic access to information stored in databases. The five database structures are:
Inverted lists and other methods are also used. A given database management system may provide one or more of the five models. The optimal structure depends on the natural organization of the application's data, and on the application's requirements, which include transaction rate (speed), reliability, maintainability, scalability, and cost.
The hierarchical structure was used in early mainframe DBMS. Records’ relationships form a treelike model. This structure is simple but nonflexible because the relationship is confined to a one-to-many relationship. IBM’s IMS system and the RDM Mobile are examples of a hierarchical database system with multiple hierarchies over the same data. RDM Mobile is a newly designed embedded database for a mobile computer system. The hierarchical structure is used primarily today for storing geographic information and file systems.
A hierarchical database model is a data model in which the data is organized into a tree-like structure. The structure allows representing information using parent/child relationships: each parent can have many children, but each child has only one parent (also known as a 1-to-many relationship). All attributes of a specific record are listed under an entity type.
In a database an entity type is the equivalent of a table. Each individual record is represented as a row, and each attribute as a column. Entity types are related to each other using 1:N mappings, also known as one-to-many relationships. This model is recognized as the first database model created by IBM in the 1960s.
The network structure consists of more complex relationships. Unlike the hierarchical structure, it can relate to many records and accesses them by following one of several paths. In other words, this structure allows for many-to-many relationships.
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.
The relational structure is the most commonly used today. It is used by mainframe, midrange and microcomputer systems. It uses two-dimensional rows and columns to store data. The tables of records can be connected by common key values. While working for IBM, E.F. Codd designed this structure in 1972.The model is not easy for the end user to run queries with because it may require a complex combination of many tables
The multidimensional structure is similar to the relational model. The dimensions of the cube-like model have data relating to elements in each cell. This structure gives a spreadsheet-like view of data. This structure is easy to maintain because records are stored as fundamental attributes—in the same way they are viewed—and the structure is easy to understand. Its high performance has made it the most popular database structure when it comes to enabling online analytical processing (OLAP).
The object-oriented structure has the ability to handle graphics, pictures, voice and text, types of data, without difficultly unlike the other database structures. This structure is popular for multimedia Web-based applications. It was designed to work with object-oriented programming languages such as Java.
The dominant model in use today is the ad hoc one embedded in SQL, despite the objections of purists who believe this model is a corruption of the relational model since it violates several fundamental principles for the sake of practicality and performance. Many DBMSs also support the Open Database Connectivity API that supports a standard way for programmers to access the DBMS.
Before the database management approach, organizations relied on file processing systems to organize, store, and process data files. End users criticized file processing because the data is stored in many different files and each organized in a different way. Each file was specialized to be used with a specific application. File processing was bulky, costly and inflexible when it came to supplying needed data accurately and promptly. Data redundancy is an issue with the file processing system because the independent data files produce duplicate data so when updates were needed each separate file would need to be updated. Another issue is the lack of data integration. The data is dependent on other data to organize and store it. Lastly, there was not any consistency or standardization of the data in a file processing system which makes maintenance difficult. For these reasons, the database management approach was produced.
Data structures (fields, records, files and objects) optimized to deal with very large amounts of data stored on a permanent data storage device (which implies relatively slow access compared to volatile main memory).
A database query language and report object allows users to interactively interrogate the database, analyze its data and update it according to the users privileges on data. It also controls the security of the database. Data security prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called subschemas. For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data.
If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. However, it may not leave an audit trail of actions or provide the kinds of controls necessary in a multi-user organization. These controls are only available when a set of application programs are customized for each data entry and updating function. Data structure is a logically representation of relationships between individual elements and data.
A database transaction mechanism ideally guarantees ACID properties in order to ensure data integrity despite concurrent user accesses (concurrency control), and faults (fault tolerance). It also maintains the integrity of the data in the database. The DBMS can maintain the integrity of the database by not allowing more than one user to update the same record at the same time. The DBMS can help prevent duplicate records via unique index constraints; for example, no two customers with the same customer numbers (key fields) can be entered into the database. See ACID properties for more information.
or How database usage requirements are met
A database management system (DBMS) is a system that allows to build and maintain databases, as well as to utilize their data and retrieve information from it. A DBMS defines the database type that it supports, as well as its functionality and operational capabilities. A DBMS provides the internal processes for external applications built on them. The end-users of some such specific application are usually exposed only to that application and do not directly interact with the DBMS. Thus end-users enjoy the effects of the underlying DBMS, but its internals are completely invisible to end-users. Database designers and database administrators interact with the DBMS through dedicated interfaces to build and maintain the applications' databases, and thus need some more knowledge and understanding about how DBMSs operate and the DBMSs' external interfaces and tuning parameters.
A DBMS consists of software that operates databases, providing storage, access, security, backup and other facilities to meet needed requirements. DBMSs can be categorized according to the database model(s) that they support, such as relational or XML, the type(s) of computer they support, such as a server cluster or a mobile phone, the query language(s) that access the database, such as SQL or XQuery, performance trade-offs, such as maximum scale or maximum speed or others. Some DBMSs cover more than one entry in these categories, e.g., supporting multiple query languages. Database software typically support the Open Database Connectivity (ODBC) standard which allows the database to integrate (to some extent) with other databases.
The development of a mature general-purpose DBMS typically takes several years and many man-years. Developers of DBMS typically update their product to follow and take advantage of progress in computer and storage technologies. Several DBMS products have been in on-going development since the 1970s-1980s. Since DBMSs comprise a significant economical market, computer and storage vendors often take into account DBMS requirements in their own development plans.
DBMS architecture specifies its components (including descriptions of their functions) and their interfaces. DBMS architecture is distinct from database architecture. The following are major DBMS components:
Database storage is the container of the physical materialization of a database. It comprises the Internal (physical) level in the database architecture. It also contains all the information needed (e.g., metadata, "data about the data", and internal data structures) to reconstruct the Conceptual level and External level from the Internal level when needed. It is not part of the DBMS but rather manipulated by the DBMS (by its Storage engine; see above) to manage the database that resides in it. Though typically accessed by a DBMS through the underlying Operating system (and often utilizing the operating systems' File systems as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database).
In principle the database storage can be viewed as a linear address space, where every bit of data has its unique address in this address space. Practically only a very small percentage of addresses is kept as initial reference points (which also requires storage), and most of the database data is accessed by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in effective manner, optimized for the needed data access operations.
Data compression methods allow in many cases to represent a string of bits by a shorter bit string ("compress") and reconstruct the original string ("decompress") when needed. This allows to utilize substantially less storage (tens of percents) for many types of data at the cost of more computation (compress and decompress when needed). Analysis of trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data in a database compressed or not.
Data compression is typically controlled through the DBMS's data definition interface, but in some cases may be a default and automatic.
For security reasons certain types of data (e.g., credit-card information) may be kept encrypted in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots (taken either via unforeseen vulnerabilities in a DBMS, or more likely, by bypassing it).
Data encryption is typically controlled through the DBMS's data definition interface, but in some cases may be a default and automatic.
This collection of bits describes both the contained database data and its related metadata (i.e., data that describes the contained data and allows computer programs to manipulate the database data correctly). The size of a database can nowadays be tens of Terabytes, where a byte is eight bits. The physical materialization of a bit can employ various existing technologies, while new and improved technologies are constantly under development. Common examples are:
These two examples are respectively for two major storage types:
Sophisticated storage units, which can, in fact, be effective dedicated parallel computers that support a large amount of nonvolatile storage, typically must include also components with volatile storage. Some such units employ batteries that can provide power for several hours in case of external power interruption (e.g., see the EMC Symmetrix) and thus maintain the content of the volatile storage parts intact. Just before such a device's batteries lose their power the device typically automatically backs-up its volatile content portion (into nonvolatile) and shuts off to protect its data.
Databases are usually too expensive (in terms of importance and needed investment in resources, e.g., time, money, to build them) to be lost by a power interruption. Thus at any point in time most of their content resides in nonvolatile storage. Even if for operational reason very large portions of them reside in volatile storage (e.g., tens of Gigabytes in volatile memory, for in-memory databases), most of this is backed-up in nonvolatile storage. A relatively small portion of this, which temporarily may not have nonvolatile backup, can be reconstructed by proper automatic database recovery procedures after volatile storage content loss.
More examples of storage types:
|This section requires expansion. (July 2011)|
Databases always use several types of storage when operational (and implied several when idle). Different types may significantly differ in their properties, and the optimal mix of storage types is determined by the types and quantities of operations that each storage type needs to perform, as well as considerations like physical space and energy consumption and dissipation (which may become critical for a large database). Storage types can be categorized by the following attributes:
While a group of bits malfunction may be resolved by error detection and correction mechanisms (see above), storage device malfunction requires different solutions. The following solutions are commonly used and valid for most storage devices:
Device mirroring and typical RAID are designed to handle a single device failure in the RAID group of devices. However, if a second failure occurs before the RAID group is completely repaired from the first failure, then data can be lost. The probability of a single failure is typically small. Thus the probability of two failures in a same RAID group in time proximity is much smaller (approximately the probability squared, i.e., multiplied by itself). If a database cannot tolerate even such smaller probability of data loss, then the RAID group itself is replicated (mirrored). In many cases such mirroring is done geographically remotely, in a different storage array, to handle also recovery from disasters (see disaster recovery above).
Database bits are laid-out in storage in data-structures and grouping that can take advantage of both known effective algorithms to retrieve and manipulate them and the storage own properties. Typically the storage itself is design to meet requirements of various areas that extensively utilize storage, including databases. A DBMS in operation always simultaneously utilizes several storage types (e.g., memory, and external storage), with respective layout methods.
A database, while in operation, resides simultaneously in several types of storage. By the nature of contemporary computers most of the database part inside a computer that hosts the DBMS resides (partially replicated) in volatile storage. Data (pieces of the database) that are being processed/manipulated reside inside a processor, possibly in processor's caches. These data are being read from/written to memory, typically through a computer bus (so far typically volatile storage components). Computer memory is communicating data (transferred to/from) external storage, typically through standard storage interfaces or networks (e.g., fibre channel, iSCSI). A storage array, a common external storage unit, typically has storage hierarchy of it own, from a fast cache, typically consisting of (volatile and fast) DRAM, which is connected (again via standard interfaces) to drives, possibly with different speeds, like flash drives[disambiguation needed] and magnetic disk drives (non-volatile). The drives may be connected to magnetic tapes, on which typically the least active parts of a large database may reside, or database backup generations.
Typically a correlation exists currently between storage speed and price, while the faster storage is typically volatile.
|This section requires expansion. (June 2011)|
A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows to manipulate the data in efficient ways. The data manipulation may include data insertion, deletion, updating and retrieval in various modes. A certain data structure type may be very effective in certain operations, and very ineffective in others. A data structure type is selected upon DBMS development to best meet the operations needed for the types of data it contains. Type of data structure selected for a certain task typically also takes into consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In some DBMSs database administrators have the flexibility to select among options of data structures to contain user data for performance reasons. Sometimes the data structures have selectable parameters to tune the database performance.
Databases may store data in many data structure types. Common examples are the following:
A typical DBMS cannot store the data of the application it serves alone. In order to handle the application data the DBMS need to store this data in data structures that comprise specific data by themselves. In addition the DBMS needs its own data structures and many types of bookkeeping data like indexes and logs. The DBMS data is an integral part of the database and may comprise a substantial portion of it.
Indexing is a technique for improving database performance. The many types of indexes share the common property that they reduce the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a binary search with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date.)
Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves.
Given a particular query, the DBMS' query optimizer is responsible for devising the most efficient strategy for finding matching data.
Indexes can speed up data access, but they consume space in the database, and must be updated each time the data is altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost.
In many cases substantial performance improvement is gained if different types of database objects that are usually utilized together are laid in storage in proximity, being clustered. This usually allows to retrieve needed related objects from storage in minimum number of input operations (each sometimes substantially time consuming). Even for in-memory databases clustering provides performance advantage due to common utilization of large caches for input-output operations in memory, with similar resulting behavior.
For example it may be beneficial to cluster a record of an item in stock with all its respective order records. The decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches sizes, storage types, etc.
Often storage redundancy is employed to increase performance. A common example is storing materialized views, which are frequently needed External views. Storing such external views saves expensive computing of them each time they are needed.
Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases the entire database is replicated.
As with every software system, a DBMS that operates in a faulty computing environment is prone to failures of many kinds. A failure can corrupt the respective database unless special measures are taken to prevent this. A DBMS achieves certain levels of fault tolerance by encapsulating operations within transactions. The concept of a database transaction (or atomic transaction) has evolved in order to enable both a well understood database system behavior in a faulty environment where crashes can happen any time, and recovery from a crash to a well understood database state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands).
Every database transaction obeys the following rules:
Isolation provides the ability for multiple users to operate on the database at the same time without corrupting the data.
A query is a request for information from a database. It can be as simple as "finding the address of a person with SS# 123-45-6789," or more complex like "finding the average salary of all the employed married men in California between the ages 30 to 39, that earn less than their wives." Queries results are generated by accessing relevant database data and manipulating it in a way that yields the requested information. Since database structures are complex, in most cases, and especially for not-very-simple queries, the needed data for a query can be collected from a database by accessing it in different ways, through different data-structures, and in different orders. Each different way typically requires different processing time. Processing times of a same query may have large variance, from a fraction of a second to hours, depending on the way selected. The purpose of query optimization, which is an automated process, is to find the way to process a given query in minimum time. The large possible variance in time justifies performing query optimization, though finding the exact optimal way to execute a query, among all possibilities, is typically very complex, time consuming by itself, may be too costly, and often practically impossible. Thus query optimization typically tries to approximate the optimum by comparing several common-sense alternatives to provide in a reasonable time a "good enough" plan which typically does not deviate much from the best possible result.
|This section requires expansion. (May 2011)|
A DBMS typically intends to provide convenient environment to develop and later maintain an application built around its respective database type. A DBMS either provides such tools, or allows integration with such external tools. Examples for tools relate to database design, application programming, application program maintenance, database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a DBMS and related database may span computers, networks, and storage units) and related database mapping (especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc.
A DBMS Provides the ability for many different users to share data and process resources. As there can be many different users, there are many different database needs. The question is: How can a single, unified database meet varying requirements of so many users?
A DBMS minimizes these problems by providing three views of the database data: an external view (or user view), logical view (or conceptual view) and physical (or internal) view. The user’s view of a database program represents data in a format that is meaningful to a user and to the software programs that process those data.
One strength of a DBMS is that while there is typically only one conceptual (or logical) and physical (or internal) view of the data, there can be an endless number of different external views. This feature allows users to see database information in a more business-related way rather than from a technical, processing viewpoint. Thus the logical view refers to the way the user views the data, and the physical view refers to the way the data are physically stored and processed.
Alternatively, and especially in connection with the relational model of database management, the relation between attributes drawn from a specified set of domains can be seen as being primary. For instance, the database might indicate that a car that was originally "red" might fade to "pink" in time, provided it was of some particular "make" with an inferior paint job. Such higher arity relationships provide information on all of the underlying domains at the same time, with none of them being privileged above the others.
A database management system is the system in which related data is stored in an efficient or compact manner. "Efficient" means that the data which is stored in the DBMS can be accessed quickly and "compact" means that the data takes up very little space in the computer's memory. The phrase "related data" means that the data stored pertains to a particular topic.
Specialized databases have existed for scientific, imaging, document storage and like uses. Functionality drawn from such applications has begun appearing in mainstream DBMS's as well. However, the main focus, at least when aimed at the commercial data processing market, is still on descriptive attributes on repetitive record structures.
Thus, the DBMS of today roll together frequently needed services and features of attribute management. By externalizing such functionality to the DBMS, applications effectively share code with each other and are relieved of much internal complexity. Features commonly offered by database management systems include:
Metadata is data describing data. For example, a listing that describes what attributes are allowed what is the data type and size of each attribute to be in data sets is called "meta-information".
An example of an advanced DBMS is Distributed Data Base Management System (DDBMS), a collection of data which logically belong to the same system but are spread out over the sites of the computer network. The two aspects of a distributed database are distribution and logical correlation:
|This section requires expansion. (January 2012)|