Vote Arlington 1: Setting Up the Postgis Database

The Problem

I want to explore the statistical relationship between voting participation rates and income levels in Arlington County, Virginia. I am beginning with the assumption that I will find a direct correlation between the two - higher income levels will correlate with higher voter participation rates.

Creating the Database

I’m working on an Ubuntu 18.04 server with PostgreSQL 10 and PostGIS 2.4 installed. These tools can be installed on a fresh server with the following command:

$ sudo apt install postgresql-10-postgis-2.4

To allow external connections to the database, I edited /etc/postgresql/10/main/pg_hba.conf and changed:

host    all             all             127.0.0.1/32            md5

to:

host    all             all             0.0.0.0/0               md5

and then edited /etc/postgresql/10/main/postgresql.conf and changed:

#listen_addresses = 'localhost'

to:

listen_addresses = '*'

and then finally ran the following command to restart the database server and have these changes take effect:

$ sudo service postgresql restart

Next I gave my user database superuser privileges and set its postgress password with:

$ sudo su - postgres
$ createuser --superuser [user]
$ psql -c "ALTER ROLE [user] PASSWORD '[password]'"
$ exit

so I could create a database with:

$ createdb our_arlington
$ psql -d our_arlington -c 'CREATE EXTENSION postgis'

I installed this postgres server on a KVM using virt-manager. On the 192.168.122.x network that virt-manager creates, this server had address 192.168.122.41. To verify I could connect to it I ran:

$ psql -h 192.168.122.41 -p 5432 -U jelkner -d our_arlington

When greeted with:

our_arlington=#

I knew the database was ready.

With a spatial database created, it’s time to get the data.

Getting the Boundary Data

Since I will be comparing voting participation rates and average income to test for correlation, I will need to get data for each of these values for the same population. Therein lies a challenge. Voting participation rates are only available by voting precinct, while demographic data on average income is not generally reported that way.

This project is proving to be a great way to introduce myself to political geography. The Standard Hierarchy of Census Geographic Entities provides a window into the problem and suggests the first question to explore. Are voting districts comprised of census blocks or do voting district boundaries cut across census blocks? Figure 2-1 from chapter 2 of the Geographic Areas Reference Manual, titled “Geographic Hierarchy for the 1990 Decennial Census” suggests that at least in 1990 the voting districts were built from census blocks.

I’ll begin by loading boundaries of each into the database and see what I find. For census block boundaries for Arlington County, Virginia, I went to 2010 TIGER/Line® Shapefiles: Blocks and selected Virginia and then Arlington County. This gave me a directory named tl_2010_51013_tabblock10 that contained the Arlington census block boundaries from the 2010 census.

To find the spatial referencing system used by the Census Bureau data, I ran the following command inside the directory with the unzipped census block shapefile:

$ ogrinfo -al -so tl_2010_51013_tabblock10.shp

The -al switch means “all layers” and the -so means “summary only”. The output of this command was:

INFO: Open of `tl_2010_51013_tabblock10.shp'
      using driver `ESRI Shapefile' successful.

Layer name: tl_2010_51013_tabblock10
Metadata:
  DBF_DATE_LAST_UPDATE=2012-03-09
Geometry: Polygon
Feature Count: 2188
Extent: (-77.172169, 38.827290) - (-77.032143, 38.934343)
Layer SRS WKT:
GEOGCS["GCS_North_American_1983",
    DATUM["North_American_Datum_1983",
        SPHEROID["GRS_1980",6378137,298.257222101]],

        PRIMEM["Greenwich",0],
    UNIT["Degree",0.017453292519943295],
    AUTHORITY["EPSG","4269"]]
STATEFP10: String (2.0)
COUNTYFP10: String (3.0)
TRACTCE10: String (6.0)
BLOCKCE10: String (4.0)
GEOID10: String (15.0)
NAME10: String (10.0)
MTFCC10: String (5.0)
UR10: String (1.0)

UACE10: String (5.0)
UATYP10: String (1.0)
FUNCSTAT10: String (1.0)
ALAND10: Integer64 (14.0)
AWATER10: Integer64 (14.0)
INTPTLAT10: String (11.0)
INTPTLON10: String (12.0)

What I am interested in here is the GEOCS[...]. I want to turn that information into an SRID. For that I used the web application Prj2EPSG. The following screen shot shows the result I was looking for:

Prj2EPSG results

The SRID is 4269 - GCS_North_American_1983. Since this is the first geographic data I’ll be loading into the database, I’ll just use this projection.

Now to convert the shapefile into sql:

$ shp2pgsql -s 4269 tl_2010_51013_tabblock10.shp raw_census_blocks | psql -d our_arlington

I named the table raw_census_blocks because I plan to use SQL to change it into a census_blocks table later.

After running this command, I connected to the our_arlington database with:

$ psql our_arlington

and viewed the tables:

our_arlington=# \dt
              List of relations
 Schema |       Name        | Type  |  Owner
--------+-------------------+-------+---------
 public | raw_census_blocks | table | jelkner
 public | spatial_ref_sys   | table | jelkner
(2 rows)

It appears the raw_census_blocks have successfully loaded. I’ll look at them in QGIS to see.

Connecting QGIS to database

The connection worked. Time to load the census block layer.

QGIS showing census blocks

Now for the voter precinct layer. Returning to the Census Bureau website at 2010 TIGER/Line® Shapefiles: Voting Districts and again selecting Virginia and then Arlington County gave me a directory named tl_2010_51013_vtd10 that contained the Arlington voting district boundaries from 2010.

Note

I first tried to use the voter precinct data from the Arlington County, VA - Official GIS Open Data Portal. This data had SRID 4326 - GCS_WGS_1984 - a different projection than the census tracks, so I needed to convert it when loading it:

$ shp2pgsql -s 4269:4326 Voter_Precinct.shp voter_precincts | psql -d our_arlington

Since the voting district boundaries didn’t line up perfectly with census block boundaries and since the voting districts were from a different year, I decided to return to the Census Bureau for precinct data.

I ran the following command with the following result from inside the directory containing tl_2010_51013_vtd10.shp:

$ shp2pgsql -s 4269 tl_2010_51013_vtd10.shp raw_voting_precincts | psql -d our_arlington

and again viewed the tables:

our_arlington=# \dt
                List of relations
 Schema |         Name         | Type  |  Owner
--------+----------------------+-------+---------
 public | raw_census_blocks    | table | jelkner
 public | raw_voting_precincts | table | jelkner
 public | spatial_ref_sys      | table | jelkner
(3 rows)

I loaded the two layers and the boundaries lined up, so it appears that voting precincts are made up of census blocks.