In-class Exercise 11

Author

Eugene Toh

Published

November 4, 2024

pacman::p_load(tidyverse, sf, tmap, httr, performance)

httr allows you to crawl data from the web.

folder_path <- "data/aspatial"
file_list <- list.files(path = folder_path, pattern = "^realis.*\\.csv$", full.names = TRUE)
realis_data <- file_list %>% map_dfr(read_csv)
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 10000 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): Project Name, Sale Date, Address, Type of Sale, Type of Area, Nett...
dbl  (2): Area (SQM), Number of Units
num  (4): Transacted Price ($), Area (SQFT), Unit Price ($ PSF), Unit Price ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 6643 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): Project Name, Sale Date, Address, Type of Sale, Type of Area, Nett...
dbl  (1): Number of Units
num  (5): Transacted Price ($), Area (SQFT), Unit Price ($ PSF), Area (SQM),...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sort(unique(realis_data$`Property Type`))
[1] "Apartment"             "Condominium"           "Detached House"       
[4] "Executive Condominium" "Semi-Detached House"   "Terrace House"        
condo_resale <- realis_data %>% mutate(`Sale Date` = dmy(`Sale Date`)) %>% filter(`Type of Sale` == "Resale" & `Property Type` == "Condominium")

Reverse geocoding allows you to pass postal codes or addresses and it will allow you to get XY coordinates.

postcodes <- unique(condo_resale$`Postal Code`)
url <- "https://onemap.gov.sg/api/common/elastic/search"
found <- data.frame()
not_found <- data.frame()

for (postcode in postcodes) {
  query <- list('searchVal'=postcode, 'returnGeom'='Y', 'getAddrDetails'='Y', 'pageNum'='1')
  res <- GET(url, query=query)
  if ((content(res)$found) != 0) {
    found <- rbind(found, data.frame(content(res))[4:13])
  } else {
    not_found <- data.frame(postcode)
  }
}
found <- found %>% select(c(6:8)) %>% rename(POSTAL = `results.POSTAL`, XCOORD = `results.X`, YCOORD = `results.Y`)
condo_resale_geocoded <- left_join(condo_resale, found, by = c('Postal Code' = 'POSTAL'))
condo_resale_sf <- st_as_sf(condo_resale_geocoded, coords = c("XCOORD", "YCOORD"), crs = 3414)

If you need to do weighted regression, you need to avoid overlapping points, since there are places with the same postal code.

overlapping_points <- condo_resale_sf %>% mutate(overlap = lengths(st_equals(., .)) > 1)

If there is overlapping, you should do spatial jittering by shifting each coordinate point randomly by 2 metres. Do not use too low values to avoid rounding.

condo_resale_sf <- condo_resale_sf %>% st_jitter(amount = 2)

In take-home exercise 2:

If you take the islands into account, and a province contains islands, the centroids might drift into the sea for example. Hence, you might want to convert each row into polygons from multi-polygons. You might then result in multiple rows that refer to the same province, so you can keep the polygon that is the largest for each province.

sf_polygon <- prov_sf %>%
  st_cast("POLYGON") %>%
  mutate(area = st_area(.))
prov_cleaned <- sf_polygon %>%
  group_by(ADM1_EN) %>%
  filter(area == max(area)) %>%
  ungroup() %>%
  select(-area) %>%
  select(ADM1_EN)

This method would not remove Phuket, but other islands would be removed.