pacman::p_load(tidyverse, sf, tmap, httr, performance)In-class Exercise 11
httr allows you to crawl data from the web.
folder_path <- "data/aspatial"
file_list <- list.files(path = folder_path, pattern = "^realis.*\\.csv$", full.names = TRUE)
realis_data <- file_list %>% map_dfr(read_csv)Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 10000 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): Project Name, Sale Date, Address, Type of Sale, Type of Area, Nett...
dbl (2): Area (SQM), Number of Units
num (4): Transacted Price ($), Area (SQFT), Unit Price ($ PSF), Unit Price ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 6643 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): Project Name, Sale Date, Address, Type of Sale, Type of Area, Nett...
dbl (1): Number of Units
num (5): Transacted Price ($), Area (SQFT), Unit Price ($ PSF), Area (SQM),...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sort(unique(realis_data$`Property Type`))[1] "Apartment" "Condominium" "Detached House"
[4] "Executive Condominium" "Semi-Detached House" "Terrace House"
condo_resale <- realis_data %>% mutate(`Sale Date` = dmy(`Sale Date`)) %>% filter(`Type of Sale` == "Resale" & `Property Type` == "Condominium")Reverse geocoding allows you to pass postal codes or addresses and it will allow you to get XY coordinates.
postcodes <- unique(condo_resale$`Postal Code`)url <- "https://onemap.gov.sg/api/common/elastic/search"
found <- data.frame()
not_found <- data.frame()
for (postcode in postcodes) {
query <- list('searchVal'=postcode, 'returnGeom'='Y', 'getAddrDetails'='Y', 'pageNum'='1')
res <- GET(url, query=query)
if ((content(res)$found) != 0) {
found <- rbind(found, data.frame(content(res))[4:13])
} else {
not_found <- data.frame(postcode)
}
}found <- found %>% select(c(6:8)) %>% rename(POSTAL = `results.POSTAL`, XCOORD = `results.X`, YCOORD = `results.Y`)condo_resale_geocoded <- left_join(condo_resale, found, by = c('Postal Code' = 'POSTAL'))condo_resale_sf <- st_as_sf(condo_resale_geocoded, coords = c("XCOORD", "YCOORD"), crs = 3414)If you need to do weighted regression, you need to avoid overlapping points, since there are places with the same postal code.
overlapping_points <- condo_resale_sf %>% mutate(overlap = lengths(st_equals(., .)) > 1)If there is overlapping, you should do spatial jittering by shifting each coordinate point randomly by 2 metres. Do not use too low values to avoid rounding.
condo_resale_sf <- condo_resale_sf %>% st_jitter(amount = 2)In take-home exercise 2:
If you take the islands into account, and a province contains islands, the centroids might drift into the sea for example. Hence, you might want to convert each row into polygons from multi-polygons. You might then result in multiple rows that refer to the same province, so you can keep the polygon that is the largest for each province.
sf_polygon <- prov_sf %>%
st_cast("POLYGON") %>%
mutate(area = st_area(.))prov_cleaned <- sf_polygon %>%
group_by(ADM1_EN) %>%
filter(area == max(area)) %>%
ungroup() %>%
select(-area) %>%
select(ADM1_EN)This method would not remove Phuket, but other islands would be removed.