Vote Arlington 3: Processing the Data

I began this investigation with the hypothesis that I will find a direct correlation between average income and voting participation rate in the November 2010 general election in Arlington, Virginia.

By correlation I am referring to a statistical association between the 51 bivariate data points of (avg_income, vote_rate) for each of the voting precincts in Arlington County, Virginia.

Before the statistical analysis can be done, I need to calculate the values of the bivariate data. I’ll do voting participation rate first, since that will be by far the easier of the two to calculate.

Voting Participation Rate

I will define voting participation rate for this investigation as the voter turnout divided by the voting age population within each voting precinct expressed as a percentage.

There are other possible ways to define this value that would have other possible implications. I could have used number of registered voters, for example, instead of voter age population. I chose voting age population because I am interested in testing the hypothesis that communities with lower average income have less “voice” in the political process rather than investigating the personal behavior of individual voters.

To compute the voting participation rate with my given definition, I ran the following script:

SELECT precinct_num, precinct_name,
((voter_turnout * 100)::numeric / vote_age_pop)::numeric(4, 2) AS vote_rate
FROM voting_precincts ORDER BY precinct_num;

which produced the following table:

 precinct_num |   precinct_name    | vote_rate
--------------+--------------------+-----------
 01           | Arlington          |     18.69
 02           | Ashton Heights     |     37.61
 03           | Aurora Hills       |     50.38
 04           | Ballston           |     47.02
 05           | Barcroft           |     33.83
 06           | Crystal City       |     23.91
 07           | Cherrydale         |     45.03
 08           | Hume               |     26.66
 09           | Columbia           |     32.45
 10           | Wilson             |     22.69
 11           | East Falls Church  |     53.39
 12           | Fairlington        |     51.70
 13           | Glen Carlyn        |     34.86
 14           | Clarendon          |     34.75
 15           | Lyon Park          |     32.33
 16           | Lyon Village       |     40.89
 17           | Overlee Knolls     |     49.44
 18           | Park Lane          |     21.96
 19           | Rosslyn            |     22.58
 20           | Thrifton           |     57.51
 21           | Virginia Highlands |     28.15
 22           | Abingdon           |     43.29
 23           | Westover           |     40.19
 24           | Woodlawn           |     38.39
 25           | Arlington Forest   |     38.25
 26           | Fillmore           |     29.68
 27           | Jefferson          |     38.27
 28           | Claremont          |     27.21
 29           | Dominion Hills     |     53.85
 30           | Glebe              |     25.58
 31           | Lexington          |     52.80
 32           | Oakridge           |     26.09
 33           | Rock Spring        |     58.23
 34           | Yorktown           |     52.29
 35           | Madison            |     59.91
 36           | Marshall           |     40.57
 37           | Nottingham         |     53.52
 38           | Arlington View     |     22.36
 39           | Ashlawn            |     38.54
 40           | Virginia Square    |     28.29
 41           | Woodbury           |     21.32
 42           | Shirlington        |     29.62
 43           | Arlington Mill     |     12.87
 44           | Dawson             |     28.64
 45           | Buckingham         |     23.39
 46           | Central            |     24.27
 47           | Four Mile Run      |     33.47
 48           | Courtlands         |     30.18
 49           | Monroe             |     28.15
 50           | Crystal Plaza      |     18.50
 51           | Taylor             |     37.22
(51 rows)

Average Income

Finding average income per voting precinct is a greater challenge, since I will have to derive the data from three different tables. Here is the plan:

  1. Use a spatial query to determine which census blocks are in which voting precincts.
  2. Determine the total income for each census block by multiplying census block population by the income percapita for the census block taken from the block group table.
  3. Calculate total income and total population for each voting precinct with total\_income = \sum_{b=1}^n total\_pop_{b} \cdot income\_percap_{b} and total\_pop = \sum_{b=1}^n total\_pop_{b} where b is each census block in the precinct, and divide these two values to obtain percapita income for the voting precinct.

The first task is accomplished by:

SELECT
  vp.precinct_num,
  vp.precinct_name,
  cb.tract,
  cb.block,
  cb.total_pop
FROM
  voting_precincts AS vp
LEFT JOIN
  census_blocks AS cb
ON
  ST_Within(cb.geom, vp.geom)
ORDER BY
  vp.precinct_num;

with the following top of the 2188 row table showing it works as desired:

 precinct_num |   precinct_name    | tract  | block | total_pop
--------------+--------------------+--------+-------+-----------
 01           | Arlington          | 102701 | 1007  |       194
 01           | Arlington          | 102600 | 2002  |        98
 01           | Arlington          | 102600 | 1006  |        70
 01           | Arlington          | 102701 | 1005  |       189
 01           | Arlington          | 102600 | 1004  |       135
 01           | Arlington          | 102701 | 2001  |       335
 01           | Arlington          | 102600 | 2003  |        83
 01           | Arlington          | 102600 | 1007  |        53
 01           | Arlington          | 102600 | 1002  |      1375
 01           | Arlington          | 102600 | 1003  |       150
 01           | Arlington          | 102701 | 1000  |       567
 01           | Arlington          | 102600 | 2004  |        86
 01           | Arlington          | 102702 | 2000  |       100
 01           | Arlington          | 102701 | 1006  |       550
 01           | Arlington          | 102701 | 1002  |       170
 01           | Arlington          | 102600 | 2001  |        92
 01           | Arlington          | 102600 | 1005  |        68
 01           | Arlington          | 102600 | 1008  |        59
 01           | Arlington          | 102600 | 1009  |        40

The second task is calculated with:

SELECT
  cb.tract, cb.block, bg.income_percap * cb.total_pop AS total_income
FROM
  census_blocks AS cb
LEFT JOIN
  block_groups AS bg
ON
  cb.tract || substring(cb.block from 1 for 1) = bg.tract || bg.block_group
ORDER BY
  cb.tract

the top rows of whose result are:

 tract  | block | total_income
--------+-------+--------------
 100100 | 4000  |      8726958
 100100 | 3006  |      3953691
 100100 | 3004  |      5202225
 100100 | 1003  |      4829517
 100100 | 4008  |      3464136
 100100 | 3020  |      3814965
 100100 | 4007  |      3131046
 100100 | 1004  |      2069793
 100100 | 4024  |      3264282
 100100 | 1006  |      3073329
 100100 | 1008  |      3198771
 100100 | 2007  |      9349784
 100100 | 1009  |            0
 100100 | 4019  |      5595912
 100100 | 1000  |     11540664
 100100 | 1001  |      1317141
 100100 | 1011  |      8341893
 100100 | 1012  |      3512376

The final result is obtained with:

CREATE TABLE precinct_data
AS (SELECT
      vp.precinct_num,
      vp.precinct_name,
      SUM(bg.income_percap * cb.total_pop) / SUM(cb.total_pop)
      AS income_per_cap,
      ((vp.voter_turnout * 100)::numeric / vp.vote_age_pop)::numeric(4, 2)
      AS vote_rate
    FROM
      census_blocks AS cb
    LEFT JOIN
      block_groups AS bg
    ON
      cb.tract || substring(cb.block from 1 for 1) = bg.tract || bg.block_group
    LEFT JOIN
      voting_precincts as vp
    ON
      ST_Within(cb.geom,vp.geom)
    GROUP BY
      vp.precinct_num,
      vp.precinct_name,
      vote_rate
    ORDER BY
      vp.precinct_num);

which creates a new table named precinct_data:

our_arlington=# \dt
              List of relations
 Schema |       Name       | Type  |  Owner
--------+------------------+-------+---------
 public | block_groups     | table | jelkner
 public | census_blocks    | table | jelkner
 public | precinct_data    | table | jelkner
 public | spatial_ref_sys  | table | jelkner
 public | voting_precincts | table | jelkner
(5 rows)

Ordering the precinct_data table by income_per_cap produces the following:

 precinct_num |   precinct_name    | income_per_cap | vote_rate
--------------+--------------------+----------------+-----------
 43           | Arlington Mill     |          18226 |     12.87
 01           | Arlington          |          26398 |     18.69
 38           | Arlington View     |          35605 |     22.36
 13           | Glen Carlyn        |          36116 |     34.86
 30           | Glebe              |          36627 |     25.58
 28           | Claremont          |          38218 |     27.21
 47           | Four Mile Run      |          40988 |     33.47
 05           | Barcroft           |          41136 |     33.83
 09           | Columbia           |          42684 |     32.45
 26           | Fillmore           |          43755 |     29.68
 45           | Buckingham         |          44886 |     23.39
 23           | Westover           |          45908 |     40.19
 27           | Jefferson          |          49073 |     38.27
 32           | Oakridge           |          50625 |     26.09
 04           | Ballston           |          51939 |     47.02
 25           | Arlington Forest   |          54068 |     38.25
 39           | Ashlawn            |          54438 |     38.54
 44           | Dawson             |          55198 |     28.64
 18           | Park Lane          |          56868 |     21.96
 24           | Woodlawn           |          58522 |     38.39
 49           | Monroe             |          60482 |     28.15
 21           | Virginia Highlands |          60756 |     28.15
 19           | Rosslyn            |          60990 |     22.58
 29           | Dominion Hills     |          61231 |     53.85
 10           | Wilson             |          61583 |     22.69
 07           | Cherrydale         |          61699 |     45.03
 12           | Fairlington        |          61883 |     51.70
 11           | East Falls Church  |          61939 |     53.39
 42           | Shirlington        |          62106 |     29.62
 15           | Lyon Park          |          62315 |     32.33
 50           | Crystal Plaza      |          63895 |     18.50
 17           | Overlee Knolls     |          64294 |     49.44
 31           | Lexington          |          65480 |     52.80
 36           | Marshall           |          66046 |     40.57
 08           | Hume               |          68363 |     26.66
 22           | Abingdon           |          69594 |     43.29
 16           | Lyon Village       |          69839 |     40.89
 02           | Ashton Heights     |          70247 |     37.61
 03           | Aurora Hills       |          70811 |     50.38
 40           | Virginia Square    |          71906 |     28.29
 46           | Central            |          72095 |     24.27
 34           | Yorktown           |          72890 |     52.29
 41           | Woodbury           |          73800 |     21.32
 14           | Clarendon          |          76054 |     34.75
 37           | Nottingham         |          76403 |     53.52
 06           | Crystal City       |          77172 |     23.91
 51           | Taylor             |          84365 |     37.22
 33           | Rock Spring        |          85660 |     58.23
 48           | Courtlands         |          86792 |     30.18
 20           | Thrifton           |          98391 |     57.51
 35           | Madison            |         108370 |     59.91
(51 rows)

Before moving to the final statistical analysis, a few maps to help visualize this table would be helpful.

Time again for QGIS. Using the DB Manager, I used a natural join to add the data from the table above to the one with the precinct polygons.

QGIS DB Manager

Loading this layer and viewing its attribute table shows the needed information:

New QGIS Layer

Which I used to make the following two choropleth maps:

Choropleth Income Map Choropleth Voting Rate Map

All that is left now is to analyze the data, but a cursory look at the table and maps makes confirmation of the hypothesis look promising.