Techniques used in GIS :: Ontario College of Technology, Toronto

Data creation
Modern GIS technologies use digital information, for which various digitized data creation methods are used. The most common method of data creation is digitization, where a hard copy map or survey plan is transferred into a digital medium through the use of a computer-aided design (CAD) program, and geo-referencing capabilities. With the wide availability of ortho-rectified imagery (both from satellite and aerial sources), heads-up digitizing is becoming the main avenue through which geographic data is extracted. Heads-up digitizing involves the tracing of geographic data directly on top of the aerial imagery instead of through the traditional method of tracing the geographic form on a separate digitizing tablet

Relating information from different source
The examples and perspective in this article or section may not represent a worldwide view of the subject

Please improve this article or discuss the issue on the talk page.

If you could relate information about the rainfall of your state to aerial photographs of your county, you might be able to tell which wetlands dry up at certain times of the year. A GIS, which can use information from many different sources in many different forms, can help with such analyses. The primary requirement for the source data consists of knowing the locations for the variables. Location may be annotated by x, y, and z coordinates of longitude, latitude, and elevation, or by other geocode systems like ZIP Codes or by highway mile markers. Any variable that can be located spatially can be fed into a GIS. Several computer databases that can be directly entered into a GIS are being produced by government agencies and non-government organizations[citation needed]. Different kinds of data in map form can be entered into a GIS.

A GIS can also convert existing digital information, which may not yet be in map form, into forms it can recognize and use. For example, digital satellite images generated through remote sensing can be analyzed to produce a map-like layer of digital information about vegetative covers. Another fairly developed resource for naming GIS objects is the Getty Thesaurus of Geographic Names (GTGN), which is a structured vocabulary containing around 1,000,000 names and other information about places

Likewise, census or hydrologic tabular data can be converted to map-like form, serving as layers of thematic information in a GIS

Data representation
GIS data represents real world objects (roads, land use, elevation) with digital data. Real world objects can be divided into two abstractionsdiscrete objects (a house) and continuous fields : (fall amount or elevation). There are two broad methods used to store data in a GIS for rain both abstractionsand Vector.

:A raster data type is, in essence, any type of digital image. Anyone who is familiar with digital photography will recognize the pixel as the smallest individual unit of an image. A combination of these pixels will create an image, distinct from the commonly used scalable vector graphics which are the basis of the vector model. While a digital image is concerned with the output as representation of reality, in a photograph or art transferred to computer, the raster data type will reflect an abstraction of reality. Aerial photos are one commonly used form of raster data, with only one purpose, to display a detailed image on a map or for the purposes of digitization. Other raster data sets will contain information regarding elevation, a DEM, or reflectance of a particular wavelength of light, LANDSAT.

Digital elevation model, map (image), and vector dataRaster data type consists of rows and columns of cells, with each cell storing a single value. Raster data can be images (raster images) with each pixel (or cell) containing a color value. Additional values recorded for each cell may be a discrete value, such as land use, a continuous value, such as temperature, or a null value if no data is available. While a raster cell stores a single value, it can be extended by using raster bands to represent RGB (red, green, blue) colors, colormaps (a mapping between a thematic code and RGB value), or an extended attribute table with one row for each unique cell value. The resolution of the raster data set is its cell width in ground units

Raster data is stored in various formats; from a standard file-based structure of TIF, JPEG, etc. to binary large object (BLOB) data stored directly in a relational database management system (RDBMS) similar to other vector-based feature classes. Database storage, when properly indexed, typically allows for quicker retrieval of the raster data but can require storage of millions of significantly-sized records.

Vector
A simple vector map, using each of the vector elements: points for wells, lines for rivers, and a polygon for the lake.In a GIS, geographical features are often expressed as vectors, by considering those features as geometrical shapes. In the popular ESRI Arc series of programs, these are explicitly called shapefiles. Different geographical features are best expressed by different types of geometry:Points

Zero-dimensional points are used for geographical features that can best be expressed by a single grid reference; in other words, simple location. For example, the locations of wells, peak elevations, features of interest or trailheads. Points convey the least amount of information of these file types. Lines or polylines

One-dimensional lines or polylines are used for linear features such as rivers, roads, railroads, trails, and topographic lines.

Polygons
Two-dimensional polygons are used for geographical features that cover a particular area of the earth's surface. Such features may include lakes, park boundaries, buildings, city boundaries, or land uses. Polygons convey the most amount of information of the file types.

Each of these geometries are linked to a row in a database that describes their attributes. For example, a database that describes lakes may contain a lake's depth, water quality, pollution level. This information can be used to make a map to describe a particular attribute of the dataset. For example, lakes could be coloured depending on level of pollution. Different geometries can also be compared. For example, the GIS could be used to identify all wells (point geometry) that are within 1-mile (1.6 km) of a lake (polygon geometry) that has a high level of pollution.

Vector features can be made to respect spatial integrity through the application of topology rules such as 'polygons must not overlap'. Vector data can also be used to represent continuously varying phenomena. Contour lines and triangulated irregular networks (TIN) are used to represent elevation or other continuously changing values. TINs record values at point locations, which are connected by lines to form an irregular mesh of triangles. The face of the triangles represent the terrain surface.

Advantages and disadvantages
Advantagesand disadvantages to using a raster or vector data model to represent There are advantages realityvalue for all points in the area covered which may require . Raster data sets record a more storage space than representing data in a vector format that can store data only where neededRaster data also allows easy implementation of overlay operations, which are more . difficult with vector dataVector data can be displayed as vector graphics used on traditional . maps, whereas raster data will appear as an image that may have a blocky appearance for object boundaries. Vector data can be easier to register, scale, and re-project. This can simplify combining vector layers from different sources. Vector data are more compatible with relational database environmentcan be part of a relational table as a normal column and . They processes using a multitude of operators.

The file size for vector data is usually much smaller for storage and sharing than raster data. Image or raster data can be 10 to 100 times larger than vector data depending on the resolution. Another advantage of vector data is it can be easily updated and maintained. For example, a new highway is added. The raster image will have to be completely reproduced, but the vector data, "roads," can be easily updated by adding the missing road segment. In addition, vector data allow much more analysis capability especially for "networks" such as roads, power, rail, telecommunications, etc. For example, with vector data attributed with the characteristics of roads, ports, and airfields, allows the analyst to query for the best route or method of transportation. In the vector data, the analyst can query the data for the largest port with an airfield within 60 miles and a connecting road that is at least two lane highway. Raster data will not have all the characteristics of the features it displays

Voxel
Selected GIS additionally support the voxel data model. A voxel (a portmanteau of the words volumetric and pixel) is a volume element, representing a value on a regular grid in three dimensional space. This is analogous to a pixel, which represents 2D image data. Voxels can be interpolated from 3D point clouds (3D point vector data), or merged from 2D raster slices.

Non-spatial data
Additional non-spatial data can also be stored besides the spatial data represented by the coordinates of a vector geometry or the position of a raster cell. In vector data, the additional data are attributes of the object. For example, a forest inventory polygon may also have an identifier value and information about tree species. In raster data the cell value can store attribute information, but it can also be used as an identifier that can relate to records in another table.

Data capture
Data capture—entering information into the system—consumes much of the time of GIS practitioners. There are a variety of methods used to enter data into a GIS where it is stored in a digital format.

Existing data printed on paper or PET film maps can be digitized or scanned to produce digital data. A digitizer produces vector data as an operator traces points, lines, and polygon boundaries from a map. Scanning a map results in raster data that could be further processed to produce vector data.

Survey data can be directly entered into a GIS from digital data collection systems on survey instruments. Positions from a Global Positioning System (GPS), another survey tool, can also be directly entered into a GIS.

Remotely sensed data also plays an important role in data collection and consist of sensors attached to a platform. Sensors include cameras, digital scanners and LIDAR, while platforms usually consist of aircraft and satellites.

The majority of digital data currently comes from photo interpretation of aerial photographs. Soft copy workstations are used to digitize features directly from stereo pairs of digital photographs. These systems allow data to be captured in 2 and 3 dimensions, with elevations measured directly from a stereo pair using principles of photogrammetry. Currently, analog aerial photos are scanned before being entered into a soft copy system, but as high quality digital cameras become cheaper this step will be skipped.

Satellite remote sensing provides another important source of spatial data. Here satellites use different sensor packages to passively measure the reflectance from parts of the electromagnetic spectrum or radio waves that were sent out from an active sensor such as radar. Remote sensing collects raster data that can be further processed to identify objects and classes of interest, such as land cover.

When data is captured, the user should consider if the data should be captured with either a relative accuracy or absolute accuracy, since this could not only influence how information will be interpreted but also the cost of data capture.

In addition to collecting and entering spatial data, attribute data is also entered into a GIS. For vector data, this includes additional information about the objects represented in the system

After entering data into a GIS, the data usually requires editing, to remove errors, or further processing. For vector data it must be made "topologically correct" before it can be used for some advanced analysis. For example, in a road network, lines must connect with nodes at an intersection. Errors such as undershoots and overshoots must also be removed. For scanned maps, blemishes on the source map may need to be removed from the resulting raster. For example, a fleck of dirt might connect two lines that should not be connected.

Raster-to-vector translation
Data restructuring can be performed by a GIS to convert data into different formats. For example, a GIS may be used to convert a satellite image map to a vector structure by generating lines around all cells with the same classification, while determining the cell spatial relationships, such as adjacency or inclusion.

More advanced data processing can occur with image processing, a technique developed in the late 1960s by NASA and the private sector to provide contrast enhancement, false colour rendering and a variety of other techniques including use of two dimensional Fourier transforms.

Since digital data are collected and stored in various ways, the two data sources may not be entirely compatible. So a GIS must be able to convert geographic data from one structure to another.

Projections, coordinate systems and registration
A property ownership map and a soils map might show data at different scales. Map information in a GIS must be manipulated so that it registers, or fits, with information gathered from other maps. Before the digital data can be analyzed, they may have to undergo other manipulations—projection and coordinate conversions, for example—that integrate them into a GIS.

The earth can be represented by various models, each of which may provide a different set of coordinates (e.g., latitude, longitude, elevation) for any given point on the earth's surface. The simplest model is to assume the earth is a perfect sphere. As more measurements of the earth have accumulated, the models of the earth have become more sophisticated and more accurate. In fact, there are models that apply to different areas of the earth to provide increased accuracy (e.g., North American Datum, 1927 - NAD27 - works well in North America, but not in Europe). See Datum for more information.

Projection is a fundamental component of map making. A projection is a mathematical means of transferring information from a model of the Earth, which represents a three-dimensional curved surface, to a two-dimensional medium—paper or a computer screen. Different projections are used for different types of maps because each projection particularly suits certain uses. For example, a projection that accurately represents the shapes of the continents will distort their relative sizes. See Map projection for more information.

Since much of the information in a GIS comes from existing maps, a GIS uses the processing power of the computer to transform digital information, gathered from sources with different projections and/or different coordinate systems, to a common projection and coordinate system. For images, this process is called rectification.

Spatial analysis with GIS

Data modeling
It is difficult to relate wetlands maps to rainfall amounts recorded at different points such as airports, television stations, and high schools. A GIS, however, can be used to depict two- and three-dimensional characteristics of the Earth's surface, subsurface, and atmosphere from information points. For example, a GIS can quickly generate a map with isopleth or contour lines that indicate differing amounts of rainfall

Such a map can be thought of as a rainfall contour map. Many sophisticated methods can estimate the characteristics of surfaces from a limited number of point measurements. A two-dimensional contour map created from the surface modeling of rainfall point measurements may be overlaid and analyzed with any other map in a GIS covering the same area.

Additionally, from a series of three-dimensional points, or digital elevation model, isopleth lines representing elevation contours can be generated, along with slope analysis, shaded relief, and other elevation products. Watersheds can be easily defined for any given reach, by computing all of the areas contiguous and uphill from any given point of interest. Similarly, an expected thalweg of where surface water would want to travel in intermittent and permanent streams can be computed from elevation data in the GIS.

Topological modeling
In the past years, were there any gas stations or factories operating next to the swamp? Any within two miles (3 km) and uphill from the swamp? A GIS can recognize and analyze the spatial relationships that exist within digitally stored spatial data. These topological relationships allow complex spatial modelling and analysis to be performed. Topological relationships between geometric entities traditionally include adjacency (what adjoins what), containment (what encloses what), and proximity (how close something is to something else).

Networks
If all the factories near a wetland were accidentally to release chemicals into the river at the same time, how long would it take for a damaging amount of pollutant to enter the wetland reserve? A GIS can simulate the routing of materials along a linear network. Values such as slope, speed limit, or pipe diameter can be incorporated into network modeling in order to represent the flow of the phenomenon more accurately. Network modelling is commonly employed in transportation planning, hydrology modeling, and infrastructure modeling.

Cartographic modeling
An example of use of layers in a GIS application. In this example, the forest cover layer (light green) is at the bottom, with the topographic layer over it. Next up is the stream layer, then the boundary layer, then the road layer. The order is very important in order to properly display the final result. Note that the pond layer was located just below the stream layer, so that a stream line can be seen overlying one of the ponds.The term "cartographic modeling" was (probably) coined by Dana Tomlin in his PhD dissertation and later in his book which has the term in the title. Cartographic modeling refers to a process where several thematic layers of the same area are produced, processed, and analyzed. Tomlin used raster layers, but the overlay method (see below) can be used more generally. Operations on map layers can be combined into algorithms, and eventually into simulation or optimization models.

Map overlay
The combination of two separate spatial data sets (points, lines or polygons) to create a new output vector data set. These overlays are similar to mathematical Venn diagram overlays. A union overlay combines the geographic features and attribute tables of both inputs into a single new output. An intersect overlay defines the area where both inputs overlap and retains a set of attribute fields for each. A symmetric difference overlay defines an output area that includes the total area of both inputs except for the overlapping area.

Data extraction is a GIS process similar to vector overlay, though it can be used in either vector or raster data analysis. Rather than combining the properties and features of both data sets, data extraction involves using a "clip" or "mask" to extract the features of one data set that fall within the spatial extent of another data set.

In raster data analysis, the overlay of data sets is accomplished through a process known as "local operation on multiple rasters" or "map algebra," through a function that combines the values of each raster's matrix. This function may weigh some inputs more than others through use of an "index model" that reflects the influence of various factors upon a geographic phenomenon.

Automated cartography
Digital cartography and GIS both encode spatial relationships in structured formal representations. GIS is used in digital cartography modeling as a (semi)automated process of making maps, so called Automated Cartography. In practice, it can be a subset of a GIS, within which it is equivalent to the stage of visualization, since in most cases not all of the GIS functionality is used. Cartographic products can be either in a digital or in a hardcopy format. Powerful analysis techniques with different data representation can produce high-quality maps within a short time period. The main problem in Automated Cartography is to use a single set of data to produce multiple products at a variety of scales, a technique known as Generalization.

This short section requires expansion.

Geostatistics
Geostatistics is a point-pattern analysis that produces field predictions from data points. It is a way of looking at the statistical properties of those special data. It is different from general applications of statistics because it employs the use of graph theory and matrix algebra to reduce the number of parameters in the data. Only the second-order properties of the GIS data are analyzed.

When phenomena are measured, the observation methods dictate the accuracy of any subsequent analysis. Due to the nature of the data (e.g. traffic patterns in an urban environment; weather patterns over the Pacific Ocean), a constant or dynamic degree of precision is always lost in the measurement. This loss of precision is determined from the scale and distribution of the data collection.

To determine the statistical relevance of the analysis, an average is determined so that points (gradients) outside of any immediate measurement can be included to determine their predicted behavior. This is due to the limitations of the applied statistic and data collection methods, and interpolation is required in order to predict the behavior of particles, points, and locations that are not directly measurable.

Hillshade model derived from a Digital Elevation Model (DEM) of the Valestra area in the northern Apennines (Italy)Interpolation is the process by which a surface is created, usually a raster data set, through the input of data collected at a number of sample points. There are several forms of interpolation, each which treats the data differently, depending on the properties of the data set. In comparing interpolation methods, the first consideration should be whether or not the source data will change (exact or approximate). Next is whether the method is subjective, a human interpretation, or objective. Then there is the nature of transitions between points: are they abrupt or gradual. Finally, there is whether a method is global (it uses the entire data set to form the model), or local where an algorithm is repeated for a small section of terrain.

Interpolation is a justified measurement because of a Spatial Autocorrelation Principle that recognizes that data collected at any position will have a great similarity to, or influence of those locations within its immediate vicinity.

Digital elevation models (DEM), triangulated irregular networks (TIN), Edge finding algorithms, Theissen Polygons, Fourier analysis, Weighted moving averages, Inverse Distance Weighted, Moving averages, Kriging, Spline, and Trend surface analysis are all mathematical methods to produce interpolative data.

Address Geocoding
Geocoding is calculating spatial locations (X,Y coordinates) from street addresses. A reference theme is required to geocode individual addresses, such as a road centerline file with address ranges. The individual address locations are interpolated, or estimated, by examining address ranges along a road segment. These are usually provided in the form of a table or database. The GIS will then place a dot approximately where that address belongs along the segment of centerline. For example, an address point of 500 will be at the midpoint of a line segment that starts with address 1 and ends with address 1000. Geocoding can also be applied against actual parcel data, typically from municipal tax maps. In this case, the result of the geocoding will be an actually positioned space as opposed to an interpolated point

It should be noted that there are several (potentially dangerous) caveats that are often overlooked when using interpolation. See the full entry for Geocoding for more information.

Various algorithms are used to help with address matching when the spellings of addresses differ. Address information that a particular entity or organization has data on, such as the post office, may not entirely match the reference theme. There could be variations in street name spelling, community name, etc. Consequently, the user generally has the ability to make matching criteria more stringent, or to relax those parameters so that more addresses will be mapped. Care must be taken to review the results so as not to erroneously map addresses incorrectly due to overzealous matching parameters.

Reverse geocoding
Reverse geocoding is the process of returning an estimated street address number as it relates to a given coordinate. For example, a user can click on a road centerline theme (thus providing a coordinate) and have information returned that reflects the estimated house number. This house number is interpolated from a range assigned to that road segment. If the user clicks at the midpoint of a segment that starts with address 1 and ends with 100, the returned value will be somewhere near 50. Note that reverse geocoding does not return actual addresses, only estimates of what should be there based on the predetermined range.

Data output and cartography
Cartography is the design and production of maps, or visual representations of spatial data. The vast majority of modern cartography is done with the help of computers, usually using a GIS. Most GIS software gives the user substantial control over the appearance of the data.

Cartographic work serves two major functions
First, it produces graphics on the screen or on paper that convey the results of analysis to the people who make decisions about resources. Wall maps and other graphics can be generated, allowing the viewer to visualize and thereby understand the results of analyses or simulations of potential events. Web Map Servers facilitate distribution of generated maps through web browsers using various implementations of web-based application programming interfaces(AJAX, Java, Flash, etc).

Second, other database information can be generated for further analysis or use. An example would be a list of all addresses within one mile (1.6 km) of a toxic spill.

Graphic display techniques
Traditional maps are abstractions of the real world, a sampling of important elements portrayed on a sheet of paper with symbols to represent physical objects. People who use maps must interpret these symbols. Topographic maps show the shape of land surface with contour lines; the actual shape of the land can be seen only in the mind's eye.

Today, graphic display techniques such as shading based on altitude in a GIS can make relationships among map elements visible, heightening one's ability to extract and analyze information. For example, two types of data were combined in a GIS to produce a perspective view of a portion of San Mateo County, California.

The digital elevation model, consisting of surface elevations recorded on a 30-meter horizontal grid, shows high elevations as white and low elevation as black.

The accompanying Landsat Thematic Mapper image shows a false-color infrared image looking down at the same area in 30-meter pixels, or picture elements, for the same coordinate points, pixel by pixel, as the elevation information.

A GIS was used to register and combine the two images to render the three-dimensional perspective view looking down the San Andreas Fault, using the Thematic Mapper image pixels, but shaded using the elevation of the landforms. The GIS display depends on the viewing point of the observer and time of day of the display, to properly render the shadows created by the sun's rays at that latitude, longitude, and time of day.

Spatial ETL
Spatial ETL tools provide the data processing functionality of traditional Extract, Transform, Load (ETL) software, but with a primary focus on the ability to manage spatial data. They provide GIS users with the ability to translate data between different standards and proprietary formats, whilst geometrically transforming the data en-route.