Shapefile

From Wikipedia, the free encyclopedia - View original article

Shapefile
Simple vector map.svg
Filename extension.shp, .shx, .dbf
Developed byEsri
Type of formatGIS
Standard(s)Shapefile Technical Description
 
Jump to: navigation, search
Shapefile
Simple vector map.svg
Filename extension.shp, .shx, .dbf
Developed byEsri
Type of formatGIS
Standard(s)Shapefile Technical Description

The Esri shapefile, or simply a shapefile, is a popular geospatial vector data format for geographic information system software. It is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and other GIS software products.[1] Shapefiles spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.

Overview[edit]

A shapefile is a digital vector storage format for storing geometric location and associated attribute information. This format lacks the capacity to store topological information. The shapefile format was introduced with ArcView GIS version 2 in the early 1990s. It is now possible to read and write shapefiles using a variety of free and paid programs.

Shapefiles are simple because they store the primitive geometric data types of points, lines, and polygons. They are of limited use without any attributes to specify what they represent. Therefore, a table of records will store properties/attributes for each primitive shape in the shapefile. Shapes (points/lines/polygons) together with data attributes can create infinitely many representations about geographic data. Representation provides the ability for powerful and accurate computations.

While the term "shapefile" is quite common, a "shapefile" is actually a set of several files. Three individual files are mandatory to store the core data that comprise a shapefile: .shp, .shx, and .dbf. The actual shapefile relates specifically to .shp files but alone is incomplete for distribution, as the other supporting files are required.

There are further optional files which store primarily index data to improve performance. Each individual file should conform to the DOS 8.3 filename convention (8 character filename prefix, period, 3 character filename suffix such as "shp") in order to be compatible with past applications that handle shapefiles, though many recent software applications accept files with longer names. For this same reason, all files should be located in the same folder.

Mandatory files :

Optional files :

In each of the .shp, .shx, and .dbf files, the shapes in each file correspond to each other in sequence (i.e., the first record in the .shp file corresponds to the first record in the .shx and .dbf files, etc.). The .shp and .shx files have various fields with different endianness, so an implementor of the file formats must be very careful to respect the endianness of each field and treat it properly.

Shapefiles deal with coordinates in terms of X and Y, although they are often storing longitude and latitude.

Shapefile shape format (.shp)[edit]

The main file (.shp) contains the primary geographic reference data in the shapefile. The file consists of a single fixed length header followed by one or more variable length records. Each of the variable length records includes a record header component and a record contents component. A detailed description of the file format is given in the Esri Shapefile Technical Description.[1] This format should not be confused with the AutoCAD shape font source format, which shares the .shp extension.

The main file header is fixed at 100 bytes in length and contains 17 fields; nine 4-byte (32-bit signed integer or int32) integer fields followed by eight 8-byte (double) signed floating point fields:

BytesTypeEndiannessUsage
0–3int32bigFile code (always hex value 0x0000270a)
4–23int32bigUnused; five uint32
24–27int32bigFile length (in 16-bit words, including the header)
28–31int32littleVersion
32–35int32littleShape type (see reference below)
36–67doublelittleMinimum bounding rectangle (MBR) of all shapes contained within the shapefile; four doubles in the following order: min X, min Y, max X, max Y
68–83doublelittleRange of Z; two doubles in the following order: min Z, max Z
84–99doublelittleRange of M; two doubles in the following order: min M, max M

The file then contains any number of variable-length records. Each record is prefixed with a record-header of 8 bytes:

BytesTypeEndiannessUsage
0–3int32bigRecord number (1-based)
4–7int32bigRecord length (in 16-bit words)

Following the record header is the actual record:

BytesTypeEndiannessUsage
0–3int32littleShape type (see reference below)
4–Shape content

The variable length record contents depend on the shape type. The following are the possible shape types:

ValueShape typeFields
0Null shapeNone
1PointX, Y
3PolylineMBR, Number of parts, Number of points, Parts, Points
5PolygonMBR, Number of parts, Number of points, Parts, Points
8MultiPointMBR, Number of points, Points
11PointZX, Y, Z, M
13PolylineZMandatory: MBR, Number of parts, Number of points, Parts, Points, Z range, Z array

Optional: M range, M array

15PolygonZMandatory: MBR, Number of parts, Number of points, Parts, Points, Z range, Z array

Optional: M range, M array

18MultiPointZMandatory: MBR, Number of points, Points, Z range, Z array

Optional: M range, M array

21PointMX, Y, M
23PolylineMMandatory: MBR, Number of parts, Number of points, Parts, Points

Optional: M range, M array

25PolygonMMandatory: MBR, Number of parts, Number of points, Parts, Points

Optional: M range, M array

28MultiPointMMandatory: MBR, Number of points, Points

Optional Fields: M range, M array

31MultiPatchMandatory: MBR, Number of parts, Number of points, Parts, Part types, Points, Z range, Z array

Optional: M range, M array

The "Z" types are three-dimensional. The "M" types contain a user-defined measurement which coincides with the point being referenced. Three-dimensional shapefiles are rather uncommon, and the measurement functionality has been largely superseded by more robust databases used in conjunction with the shapefile data.

Shapefile shape index format (.shx)[edit]

The shapefile index contains the same 100-byte header as the .shp file, followed by any number of 8-byte fixed-length records which consist of the following two fields:

BytesTypeEndiannessUsage
0–3int32bigRecord offset (in 16-bit words)
4–7int32bigRecord length (in 16-bit words)

Using this index, it is possible to seek backwards in the shapefile by, first, seeking backwards in the shape index (which is possible because it uses fixed-length records), then reading the record offset, and using that offset to seek to the correct position in the .shp file. It is also possible to seek forwards an arbitrary number of records using the same method.

Shapefile attribute format (.dbf)[edit]

Attributes for each shape are stored in dBase format. An alternative format that can also be used is the xBase format, which has an open specification, and is used in open source shapefile libraries, such as the Shapefile C library.[2]

Shapefile projection format (.prj)[edit]

The information contained in the .prj file specifies the geographic coordinate system of the geometric data in the .shp file. Although optional, it is usually provided, as it is not necessarily possible to guess the coordinate system of any given features. The file is created in well-known text (WKT) format when generated with ArcGIS Desktop versions 9 and later. Previous ArcGIS versions and some third-party software generate it in another format, shown here:

Older projection file format example:

Projection UTM

Zunits NO

Units METERS

Spheroid CLARKE1866

Xshift 0.0000000000

Yshift -4000000.0000000000

Parameters

-108 0 0.000 /* longitude

36 0 0.000 /* latitude

New WKT format example:

GEOGCS["GCS_North_American_1927",DATUM["D_North_American_1927",SPHEROID["Clarke_1866",6378206.4,294.9786982],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]]]

The information contained in the .prj file specifies the:

Shapefile spatial index format (.sbn)[edit]

This is a binary spatial index file, which is used only by Esri software. The format is not documented by Esri. However it has been reverse-engineered and documented [3] by the open source community. It is not currently implemented by other vendors. The .sbn file is not strictly necessary, since the .shp file contains all of the information necessary to successfully parse the spatial data.

Limitations[edit]

Topology and shapefiles[edit]

Shapefiles do not have the ability to store topological information. ArcInfo coverages and personal/file/enterprise geodatabases do have the ability to store feature topology.

Spatial representation[edit]

The edges of a polyline or polygon are composed of points. The spacing of the points implicitly determines the scale at which the feature is useful visually. Exceeding that scale results in jagged representation. Additional points would be required to achieve smooth shapes at greater scales. For features better represented by smooth curves, the polygon representation requires much more data storage than, for example, splines, which can capture smoothly varying shapes efficiently. None of the shapefile types supports splines.

Data storage[edit]

The size of both .shp and .dbf component files cannot exceed 2 GB (or 231 bits) — around 70 million point features at best.[4] The maximum number of feature for other geometry types varies depending on the number of vertices used.

The attribute database format for the .dbf component file is based on an older dBase standard. This database format inherently has a number of limitations:[4]

Mixing shape types[edit]

Because the shape type precedes each record, a shape file is physically capable of storing a mixture of different shape types. However, the specification states, "All the non-Null shapes in a shapefile are required to be of the same shape type." Therefore this ability to mix shape types must be limited to interspersing null shapes with the single shape type declared in the file's header. A shape file must not contain both polyline and polygon data, for example, and the descriptions for a well (point), a river (polyline), and a lake (polygon) would be stored in three separate files.

See also[edit]

References[edit]

  1. ^ a b Esri (July, 1998). Esri Shapefile technical description. Retrieved 2007-07-04. 
  2. ^ "Shapefile C Library V1.2". 
  3. ^ http://pyshp.googlecode.com/files/sbn_format.pdf
  4. ^ a b "ArcGIS Desktop 9.3 Help – Geoprocessing considerations for shapefile output". Esri. April 24, 2009. 

External links[edit]