This tutorial explains the process of unzipping .kmz
files, importing .kml
files using the readOGR()
function from the rgdal
package to add Chicago Transportation Authority (CTA) ‘L’ train lines and bus routes to a map of the City of Chicago.
To follow the tutorial, you’ll need to install the following packages installed:
sp
: classes and methods for spatial data.
rgdal
: Primarily used to create spatial data frames, using the Geospatial Data Abstraction Library.
install.packages( c( "sp", "rgdal" ) )
The City of Chicago Open Data Portal contains a list of publically available datasets from the Chicago Transportation Authority (CTA).
We’ll import the five following datasets:
CSV file of CTA System Information - List of ‘L’ Stops: This list of ‘L’ stops provides location and basic service availability information for each place on the CTA system where a train stops, along with formal station names and stop descriptions.
KML file of CTA ‘L’ (Rail) Lines: Lines representing approximately where the CTA rail lines are.
KML file of CTA ‘L’ (Rail) Stations: Point data representing approximate location of Station head house. (Not necessarily where an entrance to station would be.)
KML file of CTA Bus Stops: Point data representing over 11,000 CTA bus stops. The Stop ID is used to get Bus Tracker information.
KML file of CTA Bus Routes: Line data representing CTA Bus Routes. Source data are NAVTEQ street centerlines.
To import a CSV file into R, from the City of Chicago Open Data Portal, you’ll do the following:
Click the Download
Button on the top-right of the webpage.
Hover your mouse on CSV
, right-click, and click on the Copy Link Address
button.
Image couresy of City of Chicago
file
argument inside the read.csv()
function, while setting header
equal to TRUE and stringsAsFactors
equal to FALSE.# List of L stops
ctaLinfo <- read.csv( file = "https://data.cityofchicago.org/api/views/8pix-ypme/rows.csv?accessType=DOWNLOAD"
, header = TRUE
, stringsAsFactors = FALSE
)
KMZ files are zipped KML (Keyhole Markup Language) files with a .kmz extension. The contents of a KMZ file are a single root KML document, which is commonly expressed as “doc.kml”. The line and point data regarding CTA ‘L’ trains and buses is stored as a KMZ, requiring us to unzip the file prior to using the KML file inside.
To import a KMZ file into R, from the City of Chicago Open Data Portal, you’ll do the following:
Download
, right-click, and click on the Copy the Link Address
button.Image couresy of City of Chicago
url
argument inside the download.file()
function. Be sure to name the zip file with the destfile
argument. You’ll reuse that exact same character string when unzipping the file. However, you’ll always be extracting the “doc.kml”
file that resides inside the unzipped KMZ file.I’ll be downloading the zip file into my current working directory. If you’d like to place it elsewhere, please use the setwd()
function to declare your preferred working directory to store the downloaded files.
# optional
# setwd( dir = "/your/preferred/wd/filepath")
library( sp )
library( rgdal )
###############################
# download CTA 'L' Rail Lines #
###############################
download.file( url = "https://data.cityofchicago.org/download/sgbp-qafc/application%2Fzip"
, destfile = "CTA_RailLines.zip"
)
# unzip file
unzip( "CTA_RailLines.zip")
# read data
ctaLines <- readOGR( dsn = paste( getwd()
, "doc.kml"
, sep = "/"
)
, stringsAsFactors = FALSE
)
##################################
# download CTA 'L' Rail Stations #
##################################
download.file( url = "https://data.cityofchicago.org/download/4qtv-9w43/application%2Fzip"
, destfile = "CTA_RailStations.zip"
)
# unzip file
unzip( "CTA_RailStations.zip")
# read data
ctaLineStations <- readOGR( dsn = paste( getwd()
, "doc.kml"
, sep = "/"
)
, stringsAsFactors = FALSE
)
###############################
# download CTA Bus Routes #####
###############################
download.file( url = "https://data.cityofchicago.org/download/rytz-fq6y/application%2Fzip"
, destfile = "CTA_ROUTES.zip"
)
# unzip file
unzip( "CTA_ROUTES.zip")
# read data
ctaBusRoutes <- readOGR( dsn = paste( getwd()
, "doc.kml"
, sep = "/"
)
, stringsAsFactors = FALSE
)
###############################
# download CTA Bus Stops ######
###############################
download.file( url = "https://data.cityofchicago.org/download/84eu-buny/application%2Fzip"
, destfile = "CTA_BusStops.zip")
# unzip file
unzip( "CTA_BusStops.zip" )
# read data
ctaBusStations <- readOGR( dsn = paste( getwd()
, "doc.kml"
, sep = "/"
)
, stringsAsFactors = FALSE
)
########################################
# Import Community Areas as Background #
########################################
# store Chicago current community area
# GeoJSON URL as a character vector
geojson_comarea_url <- "https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=GeoJSON"
# transform URL character vector into spatial dataframe
comarea606 <- readOGR( dsn = geojson_comarea_url
, layer = "OGRGeoJSON"
, stringsAsFactors = FALSE
, verbose = FALSE # to hide progress message after object is created
)
Plotting the CTA requires the use of four functions, in order of appearance:
pdf
: pdf starts the graphics device driver for producing PDF graphics.
par
: used to minimize the white space on the plot, as well as declaring a dark background color.
plot
: used to display the spatial polygons data frame. This script “hides” the borders of the City of Chicago Community Areas by filling the polygons with the same color as the borders.
lines
: used to display the spatial lines data frame of the CTA ‘L’ rail lines.
points
: used to display the spatial points data frame of the CTA ‘L’ rail line stations.
dev.off
: shuts down the specified (by default the current) device.
# save as pdf
pdf( file = "CTA_L_RailLines_Stations_2017-08-19.pdf"
, width = 8
, height = 11
)
# clear margin white space
par( mar = c(0, 0, 4, 0 )
, bg = "#000000"
)
# plot community areas
plot( comarea606
#, main = "City of Chicago 77 Community Areas"
, col = "#B3DDF2"
, border = "#B3DDF2"
)
# add Blue Line
lines( ctaLines[ ctaLines$Name == "Blue Line (Forest Park)" |
ctaLines$Name == "Blue Line (O'Hare)"
, ]
, col = "#00A1DE"
, lwd = 10 # make line thicker
)
# plot Red Line
lines( ctaLines[ ctaLines$Name == "Red, Purple Line" |
ctaLines$Name == "Brown, Purple (Express), Red" |
ctaLines$Name == "Red Line"
, ]
, col = "#C60C30"
, lwd = 10
)
# plot Green Line
lines( ctaLines[ ctaLines$Name == "Green, Pink" |
ctaLines$Name == "Green Line" |
ctaLines$Name == "Green, Orange" |
ctaLines$Name == "Brown, Green, Orange, Pink, Purple (Exp)"
, ]
, col = "#009B3A"
, lwd = 8
)
# plot Yellow Line
lines( ctaLines[ ctaLines$Name == "Yellow Line"
, ]
, col = "#F9E300"
, lwd = 10
)
# plot Purple Line
lines( ctaLines[ ctaLines$Name == "Red, Purple Line" |
ctaLines$Name == "Brown, Purple (Express), Red" |
ctaLines$Name == "Brown, Green, Orange, Pink, Purple (Exp)" |
ctaLines$Name == "Purple Line" |
ctaLines$Name == "Brown, Purple" |
ctaLines$Name == "Brown, Orange, Pink, Purple (Express)"
, ]
, col = "#522398"
, lwd = 10
)
# plot Orange Line
lines( ctaLines[ ctaLines$Name == "Brown, Green, Orange, Pink, Purple (Exp)" |
ctaLines$Name == "Brown, Orange, Pink, Purple (Express)" |
ctaLines$Name == "Green, Orange" |
ctaLines$Name == "Orange Line"
, ]
, col = "#F9461C"
, lwd = 10
)
# plot Brown Line
lines( ctaLines[ ctaLines$Name == "Brown, Purple (Express), Red" |
ctaLines$Name == "Brown, Green, Orange, Pink, Purple (Exp)" |
ctaLines$Name == "Brown, Purple" |
ctaLines$Name == "Brown, Orange, Pink, Purple (Express)" |
ctaLines$Name == "Brown Line"
, ]
, col = "#62361B"
, lwd = 10
)
# plot Pink Line
lines( ctaLines[ ctaLines$Name == "Brown, Green, Orange, Pink, Purple (Exp)" |
ctaLines$Name == "Brown, Orange, Pink, Purple (Express)" |
ctaLines$Name == "Green, Pink" |
ctaLines$Name == "Pink Line"
, ]
, col = "#E27EA6"
, lwd = 10
)
# add CTA 'L' Station points
points( ctaLineStations
, col = "#FFFFFF"
, pch = 20
, cex = 1
)
# shut down graphing device
dev.off()
# save as PDF
pdf( file = "CTA_Bus_Routes_Stops_2017-08-19.pdf"
, width = 9
, height = 12
)
# clear margin white space
par( mar = c(0, 0, 4, 0 )
, bg = "#000000"
)
# plot community areas
plot( comarea606
#, main = "City of Chicago 77 Community Areas"
, col = "#B3DDF2"
, border = "#B3DDF2"
)
# add CTA Bus Routes
lines( ctaBusRoutes
, col = "#FFFFFF"
, lwd = 4
)
# add CTA Bus Stops
points( ctaBusStations
, col = "#FF0000"
, pch = 20
, cex = 0.6
)
# turn graphic device off
dev.off()
Thank you for reading this tutorial. My name is Cristian E. Nuno and I am an aspiring data scientist. To see more of my work, please visit my professional portfolio Urban Data Science.
# Print version information about R, the OS and attached or loaded packages.
sessionInfo()
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] bindrcpp_0.2 splancs_2.01-40 png_0.1-7 pander_0.6.1
## [5] magrittr_1.5 dplyr_0.7.4 geosphere_1.5-7 rgdal_1.2-18
## [9] raster_2.6-7 maptools_0.9-2 sp_1.2-7
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.16 bindr_0.1.1 knitr_1.20 lattice_0.20-35
## [5] R6_2.2.2 rlang_0.2.0 stringr_1.3.0 tools_3.4.4
## [9] htmltools_0.3.6 yaml_2.1.18 rprojroot_1.3-2 digest_0.6.15
## [13] assertthat_0.2.0 tibble_1.4.2 glue_1.2.0 evaluate_0.10.1
## [17] rmarkdown_1.9 stringi_1.1.7 pillar_1.2.1 compiler_3.4.4
## [21] backports_1.1.2 foreign_0.8-69 pkgconfig_2.0.1