ggplot2 - ggplot in R with fortify takes too long to process small geospatial data -
i trying use ggplot draw map of canada , colour code each region based on total sales. geospatial file gadm , contains 12 provinces (level 1). when fortify data resulting data.frame on 4 million rows. when when try draw map ggplot seems hang. i've left 30 minutes , had give up.
is problem size of result of fortify? don't know how reduce size. i've tried playing 'region' argument in fortify causes fortify appear hang.
i have included code , url download data working with.
require(dplyr) # loaded from: https://raw.githubusercontent.com/technology-hatchery/rcode/master/data/sample%20-%20superstore%20sales%20(excel).csv orders <- read.csv(file='data/orders.csv', sep=',', header=true, na.strings = '') orders$order.date <- as.date(orders$order.date, '%m/%d/%y') orders$order.priority <- as.factor(orders$order.priority) orders$customer.name <- as.character(orders$customer.name) orders$ship.date <- as.date(orders$ship.date, '%m/%d/%y') orders$order.total <- orders$unit.price * orders$order.quantity orders <- tbl_df(orders) require(raster) require(ggplot2) require(rcolorbrewer) require(rgdal) require(rgeos) # map gadm: http://biogeo.ucdavis.edu/data/gadm2.7/rds/can_adm1.rds canada <- readrds('../../geo/gadm/canada/can_adm1.rds') canada <- sptransform(canada, crs("+proj=longlat +datum=wgs84")) # add data spatial polygon data.frame canada.df <- fortify(canada) summary(canada.df) nrow(canada.df) # [1] 4005898 # build region list , add spatial df provinces <- canada@data %>% dplyr::select(objectid, name_1) %>% dplyr::rename(id = objectid, province = name_1) head(provinces) # add total sales spatial df provinceorders <- orders %>% mutate(province = as.character(province)) %>% left_join(., provinces, by='province') %>% group_by(id, province) %>% dplyr::summarise(total = sum(order.total)) %>% dplyr::select(id, total) head(provinceorders) canada.df <- merge(canada.df, provinceorders, by='id', all.x=true) canada.df <- arrange(canada.df, order, group) head(canada.df) ggplot() + geom_polygon(data=canada.df, aes(x=long, y=lat, group=group, fill=total), color='white') + scale_fill_gradient(high='red', low= 'blue') #geom_text(aes(label=province, x=long, y=lat))
try shapefile noaa instead. has provinces doesn't have super-precise coastline polygons (which aren't needed):
library(rgdal) library(ggplot2) library(ggthemes) url <- "http://www.nws.noaa.gov/geodata/catalog/national/data/province.zip" fil <- basename(url) if (!file.exists(fil)) download.file(url, fil) fils <- grep("shp", unzip(fil), ignore.case=true, value=true) ca <- readogr(fils, ogrlistlayers(fils)[1]) ca_map <- fortify(ca, region="name") gg <- ggplot() gg <- gg + geom_map(data=ca_map, map=ca_map, aes(x=long, y=lat, map_id=id), color="black", fill="white", size=0.15) gg <- gg + coord_map("lambert", 44, 85) gg <- gg + theme_map() gg
a system.time(ca_map <- fortify(ca, region="name"))
shows:
## user system elapsed ## 0.517 0.005 0.523
pretty consistently me.
Comments
Post a Comment