Map Arlington 2: Importing Shapefile Data into PostGIS

The Problem

Arlington County, Virginia makes GIS data about the county available through its Arlington County, Va. GIS Data website.

I want to import this data into my map_arlington database. Specifically, I would like to add tables for voter precincts, civic associations, county parks, and zip codes.

Moving the Database

Before attempting to add the new tables, I wanted to make a copy of the database and move it to a development machine, so on the host machine I ran:

$ pg_dump -U [user] map_arlington > map_arlington_export.sql

I then copied map_arlington_export.sql to the new server and ran:

$ psql map_arlington < map_arlington_export.sql

With the database copied, it was time to figure out how to add the new data.

Identifying the Spatial Reference System

One of the essential problems with using geospatial data from different sources it getting the data into a common spatial reference system. PostGIS stores spatial reference system information using a Spatial Reference System Identifier (SRID). For my task of adding the Arlington GIS data to my database derived from OSM data, I need to do the following:

  1. Find the SRID of the OSM data in my database.
  2. Find the SRID of the Arlington County data.
  3. Convert the Arlington County data into the spatial reference system of the OSM data and load it into the database.


To accomplish these three tasks, I will use shp2pgsql, ogrinfo, and the web application Prj2EPSG.

shp2pgsql comes with the postgis package and ogrinfo with the gdal-bin package. I installed both of these with:

$ sudo aptitude install postgis gdal-bin


Before proceeding to add new tables to my database from the Arlington County data, I decided to remove the gen0 and gen1 tables and rename each table to remove the osm_new_ that started each table name.

Here is a sample part of my psql session to illustrate this:

map_arlington=# \dt *landusages*
                List of relations
Schema  |          Name           | Type  |  Owner
public  | osm_new_landusages      | table | [username]
public  | osm_new_landusages_gen0 | table | [username]
public  | osm_new_landusages_gen1 | table | [username]
(3 rows)

map_arlington=# DROP TABLE osm_new_landusages_gen0;
map_arlington=# DROP TABLE osm_new_landusages_gen1;
map_arlington=# ALTER TABLE osm_new_landusages RENAME TO landusages;
map_arlington=# \dt *landusages*
                List of relations
Schema  |    Name    | Type  |  Owner
public  | landusages | table | [username]
(1 row)

When I finished, the complete list of tables looked like this:

map_arlington=# \dt
              List of relations
 Schema |       Name       | Type  |  Owner
 public | aeroways         | table | [username]
 public | amenities        | table | [username]
 public | boundary         | table | [username]
 public | buildings        | table | [username]
 public | landusages       | table | [username]
 public | mainroads        | table | [username]
 public | minorroads       | table | [username]
 public | motorways        | table | [username]
 public | places           | table | [username]
 public | railways         | table | [username]
 public | spatial_ref_sys  | table | [username]
 public | transport_areas  | table | [username]
 public | transport_points | table | [username]
 public | waterareas       | table | [username]
 public | waterways        | table | [username]
(15 rows)

To find the spatial referencing system in use by OSM, I just listed the boundary table with:

map_arlington=# \d boundary
 id          | integer                   | not null default ...
 osm_id      | bigint                    |
 name        | character varying(255)    |
 type        | character varying(255)    |
 admin_level | smallint                  |
 geometry    | geometry(Geometry,900913) |

Geometry types in PostGIS are listed with geometry followed by parentheses enclosing a pair of values: the geometry (possible values include Point, Polyline, Polygon, MultiPoint, MultiPolyline, MultiPolygon, and Geometry), and the SRID.

In this case the SRID is 900913, the Google Projection.

To find the spatial referencing system used by the Arlington County data, I ran the following command inside the directory with the unzipped voter precint shapefile:

$ ogrinfo -al -so Voter_Precinct.shp

The -al switch means “all layers” and the -so means “summary only”. The output of this command was:

INFO: Open of `Voter_Precinct.shp'
      using driver `ESRI Shapefile' successful.

Layer name: Voter_Precinct
Geometry: Polygon
Feature Count: 52
Extent: (11860792.110333, 6987491.520335) - (11900924.090333, 7026256.980335)
Layer SRS WKT:
OBJECTID: Integer (10.0)
PRECINCT: Integer (10.0)
HOUSE: Integer (10.0)
SENATE: Integer (10.0)
PREC_NAME: String (80.0)
POLLING_PL: String (80.0)
ADDRESS: String (80.0)
LABEL: String (80.0)
Shapearea: Real (24.15)
Shapelen: Real (24.15)

What I am interested in here is the PROJCS[...]. I want to turn that information into an SRID. For that I used the web application Prj2EPSG. The following screen shot shows the result I was looking for:

Prj2EPSG results

The SRID is 2283 (NAD83_Virginia_North_ftUS).

Knowing both the source and target SRIDs, I was ready to load the shapefile data into my database:

$ shp2psql -s 2283:900913 Voter_Precinct.shp precincts | psql -d map_arlington

I then changed directories to each of the shapefile directories in turn, and ran the same command for each shapefile. When I finished, my database tables consisted of:

               List of relations
 Schema |        Name        | Type  |  Owner
 public | aeroways           | table | jelkner
 public | amenities          | table | jelkner
 public | boundary           | table | jelkner
 public | buildings          | table | jelkner
 public | civic_associations | table | jelkner
 public | county_parks       | table | jelkner
 public | landusages         | table | jelkner
 public | mainroads          | table | jelkner
 public | minorroads         | table | jelkner
 public | motorways          | table | jelkner
 public | places             | table | jelkner
 public | precincts          | table | jelkner
 public | railways           | table | jelkner
 public | spatial_ref_sys    | table | jelkner
 public | transport_areas    | table | jelkner
 public | transport_points   | table | jelkner
 public | waterareas         | table | jelkner
 public | waterways          | table | jelkner
 public | zip_codes          | table | jelkner
(19 rows)

To demonstrate visually that everything worked, I connected to the database from QGIS and loaded the civic associations from the Arlington data on top of the county boundary from the OSM data:

Civic Associations

Finally, I ran pg_dump again to export the database and move it back to the production server.