Take Home Exercise 3

Author

Daniel Chng

Published

March 11, 2023

Modified

March 22, 2023

Predicting HDB Public Housing Resale Pricies using Geographically Weighted Methods

Assignment Doc: Here

1 Setting Scene

Housing is an essential component of household wealth worldwide. Buying a housing has always been a major investment for most people. The price of housing is affected by many factors. Some of them are global in nature such as the general economy of a country or inflation rate. Others can be more specific to the properties themselves. These factors can be further divided to structural and locational factors. Structural factors are variables related to the property themselves such as the size, fitting, and tenure of the property. Locational factors are variables related to the neighbourhood of the properties such as proximity to childcare centre, public transport service and shopping centre.

Conventional, housing resale prices predictive models were built by using Ordinary Least Square (OLS) method. However, this method failed to take into consideration that spatial autocorrelation and spatial heterogeneity exist in geographic data sets such as housing transactions. With the existence of spatial autocorrelation, the OLS estimation of predictive housing resale pricing models could lead to biased, inconsistent, or inefficient results (Anselin 1998). In view of this limitation, Geographical Weighted Models were introduced for calibrating predictive model for housing resale prices.

1.1 Task

In this take-home exercise, you are tasked to predict HDB resale prices at the sub-market level (i.e. HDB 3-room, HDB 4-room and HDB 5-room) for the month of January and February 2023 in Singapore. The predictive models must be built by using by using conventional OLS method and GWR methods. You are also required to compare the performance of the conventional OLS method versus the geographical weighted methods.

1.2 Data

For the purpose of this take-home exercise, HDB Resale Flat Prices provided by Data.gov.sg should be used as the core data set. The study should focus on either three-room, four-room or five-room flat and transaction period should be from 1st January 2021 to 31st December 2022. The test data should be January and February 2023 resale prices.

Below is a list of recommended predictors to consider. However, students are free to include other appropriate independent variables.

  • Structural factors

    • Area of the unit

    • Floor level

    • Remaining lease

    • Age of the unit

    • Main Upgrading Program (MUP) completed (optional)

  • Locational factors

    • Proxomity to CBD

    • Proximity to eldercare

    • Proximity to foodcourt/hawker centres

    • Proximity to MRT

    • Proximity to park

    • Proximity to good primary school

    • Proximity to shopping mall

    • Proximity to supermarket

    • Numbers of kindergartens within 350m

    • Numbers of childcare centres within 350m

    • Numbers of bus stop within 350m

    • Numbers of primary school within 1km

1.3 Our Data

1.3.1 Aspatial Data

HDB Resale Data - Here

1.3.2 Geospatial Data

Master Plan 2019 Sub Zone Boundary - From Prof Kam

Shopping Malls - Referenced from Github and crossed checked with Wikipedia

MRT Stations and Bus Stops - Here

Other locations - Here

1.4 References

Prof Kam - Here and Here

Senior Megan’s - Here

2 Getting Started

2.1 Importing of Packages

packages <- c('sf', 'tidyverse', 'tmap', 'httr', 'jsonlite', 'rvest', 'sp', 'ggpubr', 'corrplot', 'broom',  'olsrr', 'spdep', 'GWmodel', 'devtools', 'rgeos', 'lwgeom', 'maptools', 'matrixStats', 'units', 'gtsummary', 'Metrics', 'rsample', 'SpatialML')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p, repos = "http://cran.us.r-project.org")
  }
  library(p, character.only = T)
}

2.2 Importing of Aspatial Data

resale <- read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")

Looking at the first few values of the dataset

head(resale)
# A tibble: 6 × 11
  month   town     flat_…¹ block stree…² store…³ floor…⁴ flat_…⁵ lease…⁶ remai…⁷
  <chr>   <chr>    <chr>   <chr> <chr>   <chr>     <dbl> <chr>     <dbl> <chr>  
1 2017-01 ANG MO … 2 ROOM  406   ANG MO… 10 TO …      44 Improv…    1979 61 yea…
2 2017-01 ANG MO … 3 ROOM  108   ANG MO… 01 TO …      67 New Ge…    1978 60 yea…
3 2017-01 ANG MO … 3 ROOM  602   ANG MO… 01 TO …      67 New Ge…    1980 62 yea…
4 2017-01 ANG MO … 3 ROOM  465   ANG MO… 04 TO …      68 New Ge…    1980 62 yea…
5 2017-01 ANG MO … 3 ROOM  601   ANG MO… 01 TO …      67 New Ge…    1980 62 yea…
6 2017-01 ANG MO … 3 ROOM  150   ANG MO… 01 TO …      68 New Ge…    1981 63 yea…
# … with 1 more variable: resale_price <dbl>, and abbreviated variable names
#   ¹​flat_type, ²​street_name, ³​storey_range, ⁴​floor_area_sqm, ⁵​flat_model,
#   ⁶​lease_commence_date, ⁷​remaining_lease

2.2.1 Filtering Resale Data

Filtering Resales data to only include data of 4 room and months between Jan 2021 and Feb 2023

resale_filtered <- filter(resale, flat_type == "4 ROOM") %>%
  filter(month >= "2021-01" & month <= "2023-02")

Double Checking the Time Period

unique(resale_filtered$month)
 [1] "2021-01" "2021-02" "2021-03" "2021-04" "2021-05" "2021-06" "2021-07"
 [8] "2021-08" "2021-09" "2021-10" "2021-11" "2021-12" "2022-01" "2022-02"
[15] "2022-03" "2022-04" "2022-05" "2022-06" "2022-07" "2022-08" "2022-09"
[22] "2022-10" "2022-11" "2022-12" "2023-01" "2023-02"

Double Checking Flat Type

unique(resale_filtered$flat_type)
[1] "4 ROOM"

Taking a look at the filtered results

glimpse(resale_filtered)
Rows: 25,502
Columns: 11
$ month               <chr> "2021-01", "2021-01", "2021-01", "2021-01", "2021-…
$ town                <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO …
$ flat_type           <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", …
$ block               <chr> "547", "414", "509", "467", "571", "134", "204", "…
$ street_name         <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 10", "ANG MO …
$ storey_range        <chr> "04 TO 06", "01 TO 03", "01 TO 03", "07 TO 09", "0…
$ floor_area_sqm      <dbl> 92, 92, 91, 92, 92, 98, 92, 92, 92, 92, 92, 109, 9…
$ flat_model          <chr> "New Generation", "New Generation", "New Generatio…
$ lease_commence_date <dbl> 1981, 1979, 1980, 1979, 1979, 1978, 1977, 1978, 19…
$ remaining_lease     <chr> "59 years", "57 years 09 months", "58 years 06 mon…
$ resale_price        <dbl> 370000, 375000, 380000, 385000, 410000, 410000, 41…

2.2.2 Adding More Data to Resale Data

2.2.2.1 Adding New Data to Dataset

  • Address

  • Remaining Lease year and months

resale_transformed <- resale_filtered %>%
  mutate(resale_filtered, address = paste(block,street_name)) %>%
  mutate(resale_filtered, remaining_lease_yr = as.integer(str_sub(remaining_lease, 0, 2))) %>%
  mutate(resale_filtered, remaining_lease_mth = as.integer(str_sub(remaining_lease, 9, 11)))

Replacing NA values to 0

resale_transformed$remaining_lease_mth[is.na(resale_transformed$remaining_lease_mth)] <- 0

Converting lease years to months for easier calculation later

resale_transformed$remaining_lease_yr <- resale_transformed$remaining_lease_yr * 12
resale_transformed <- resale_transformed %>%
  mutate(resale_transformed, remaining_lease_mths = rowSums(resale_transformed[, c("remaining_lease_yr", "remaining_lease_mth")])) %>%
  select(month, town, address, block, street_name, flat_type, storey_range, floor_area_sqm, flat_model, lease_commence_date, remaining_lease_mths, resale_price)

2.2.2.2 Getting of LAT & LONG from OneMap.sg API

For us to visualize the data on the map, we will have to first get the LAT & LONG data from the address. Its a good thing that OneMap.sg has an API for us to easily grab the LAT & LONG based off the address.

More details on how to use OneMap.sg API here

First we create a variable of all unique address

address <- sort(unique(resale_transformed$address))

Viewing the first few addresses

head(address)
[1] "1 CHAI CHEE RD"    "1 PINE CL"         "1 ST. GEORGE'S RD"
[4] "1 TECK WHYE AVE"   "1 TOH YI DR"       "10 CHAI CHEE RD"  

We will then use a function to loop through and retrieve the LAT and LONG

get_coords <- function(add_list){
  
  # Create a data frame to store all retrieved coordinates
  postal_coords <- data.frame()
    
  for (i in add_list){
    #print(i)

    r <- GET('https://developers.onemap.sg/commonapi/search?',
           query=list(searchVal=i,
                     returnGeom='Y',
                     getAddrDetails='Y'))
    data <- fromJSON(rawToChar(r$content))
    found <- data$found
    res <- data$results
    
    # Create a new data frame for each address
    new_row <- data.frame()
    
    # If single result, append 
    if (found == 1){
      postal <- res$POSTAL 
      lat <- res$LATITUDE
      lng <- res$LONGITUDE
      new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
    }
    
    # If multiple results, drop NIL and append top 1
    else if (found > 1){
      # Remove those with NIL as postal
      res_sub <- res[res$POSTAL != "NIL", ]
      
      # Set as NA first if no Postal
      if (nrow(res_sub) == 0) {
          new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
      }
      
      else{
        top1 <- head(res_sub, n = 1)
        postal <- top1$POSTAL 
        lat <- top1$LATITUDE
        lng <- top1$LONGITUDE
        new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
      }
    }

    else {
      new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
    }
    
    # Add the row
    postal_coords <- rbind(postal_coords, new_row)
  }
  return(postal_coords)
}

Running the Function

latlong <- get_coords(address)

Checking if there is any missing values

latlong[(is.na(latlong))]
character(0)

2.2.2.3 Combining the Data back

resale_latlong <- left_join(resale_transformed, latlong, by = c('address' = 'address'))

Viewing the results

head(resale_latlong)
# A tibble: 6 × 15
  month   town     address block stree…¹ flat_…² store…³ floor…⁴ flat_…⁵ lease…⁶
  <chr>   <chr>    <chr>   <chr> <chr>   <chr>   <chr>     <dbl> <chr>     <dbl>
1 2021-01 ANG MO … 547 AN… 547   ANG MO… 4 ROOM  04 TO …      92 New Ge…    1981
2 2021-01 ANG MO … 414 AN… 414   ANG MO… 4 ROOM  01 TO …      92 New Ge…    1979
3 2021-01 ANG MO … 509 AN… 509   ANG MO… 4 ROOM  01 TO …      91 New Ge…    1980
4 2021-01 ANG MO … 467 AN… 467   ANG MO… 4 ROOM  07 TO …      92 New Ge…    1979
5 2021-01 ANG MO … 571 AN… 571   ANG MO… 4 ROOM  07 TO …      92 New Ge…    1979
6 2021-01 ANG MO … 134 AN… 134   ANG MO… 4 ROOM  07 TO …      98 New Ge…    1978
# … with 5 more variables: remaining_lease_mths <dbl>, resale_price <dbl>,
#   postal <chr>, latitude <chr>, longitude <chr>, and abbreviated variable
#   names ¹​street_name, ²​flat_type, ³​storey_range, ⁴​floor_area_sqm,
#   ⁵​flat_model, ⁶​lease_commence_date

Checking if there is any missing values

resale_latlong[(is.na(resale_latlong))]
<unspecified> [0]

2.2.2.4 Saving the result into a file

Running this allows us to save the file for us to save time and from re-running the steps to retrieve the same resale file with LAT and LONG values.

resale_latlong.rds <- write_rds(resale_latlong, "data/model/resale_latlong.rds")

2.2.3 Reading of Resale file with LAT and LONG

Start from this step if you already have your own resale file with the LAT and LONG data.

Remember to either change the file name to match the code or code to match the file.

resale_main <- read_rds("data/model/resale_latlong.rds")

Previewing the content

head(resale_main)
# A tibble: 6 × 15
  month   town     address block stree…¹ flat_…² store…³ floor…⁴ flat_…⁵ lease…⁶
  <chr>   <chr>    <chr>   <chr> <chr>   <chr>   <chr>     <dbl> <chr>     <dbl>
1 2021-01 ANG MO … 547 AN… 547   ANG MO… 4 ROOM  04 TO …      92 New Ge…    1981
2 2021-01 ANG MO … 414 AN… 414   ANG MO… 4 ROOM  01 TO …      92 New Ge…    1979
3 2021-01 ANG MO … 509 AN… 509   ANG MO… 4 ROOM  01 TO …      91 New Ge…    1980
4 2021-01 ANG MO … 467 AN… 467   ANG MO… 4 ROOM  07 TO …      92 New Ge…    1979
5 2021-01 ANG MO … 571 AN… 571   ANG MO… 4 ROOM  07 TO …      92 New Ge…    1979
6 2021-01 ANG MO … 134 AN… 134   ANG MO… 4 ROOM  07 TO …      98 New Ge…    1978
# … with 5 more variables: remaining_lease_mths <dbl>, resale_price <dbl>,
#   postal <chr>, latitude <chr>, longitude <chr>, and abbreviated variable
#   names ¹​street_name, ²​flat_type, ³​storey_range, ⁴​floor_area_sqm,
#   ⁵​flat_model, ⁶​lease_commence_date

2.2.4 Transforming to sf and assigning CRS

resale_main_sf <- st_as_sf(resale_main,
                           coords = c("longitude", "latitude"),
                           crs=4326) %>%
  st_transform(crs = 3414)

Let’s check if the coordinate value is correct

st_crs(resale_main_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Lastly, we will check if there is any invalid Geometries

length(which(st_is_valid(resale_main_sf) == FALSE))
[1] 0

2.3 Importing Geospatial Data

mpsz <- st_read(dsn = "data/geospatial", layer = "MPSZ-2019")
Reading layer `MPSZ-2019' from data source 
  `C:\Users\Daniel\Desktop\Github\IS419\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

Checking if there is any invalid Geometries

length(which(st_is_valid(mpsz) == FALSE))
[1] 6

Since there is invalid Geometries, we will make it valid

Holes in polygons are okay, but they can cause problems if they go the wrong way round or if the hole is caused by the polygon looping itself.

mpsz <- st_make_valid(mpsz)
length(which(st_is_valid(mpsz) == FALSE))
[1] 0

Lastly, like the previous resale LATLONG data, we will change the CRS code

mpsz <- st_transform(mpsz, 3414)
st_crs(mpsz)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

2.3.1 Import Geospatial Data without LATLONG in data

CBD Area

For CBD data, we can easily find the location of CBD area according to here

From the LATLONG of 1°17′30″N 103°51′00″E according to the Wiki source, we can get the number coordinates from Google Map by placing the coordinate above, then right-clicking on the pin to get the values of 1.291667, 103.850000. It seems like the location pointed to is 3 Coleman St, Singapore 179804.

name <- c('CBD')
latitude = c(1.291667)
longitude = c(103.850000)
cbd <- data.frame(name, latitude, longitude)

As usual, we will need to change the CRS value to match.

cbd_sf <- st_as_sf(cbd,
                   coords = c("longitude",
                              "latitude"),
                   crs = 4326) %>%
  st_transform(crs = 3414)

st_crs(cbd_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Primary Schools

As the data given for the primary schools do not have LATLONG details, we will need to use the function created above to give us the LATLONG values again.

primary_raw <- read.csv("data/geospatial/general-information-of-schools.csv")

primary_data <- primary_raw %>%
  filter(mainlevel_code == "PRIMARY") %>%
  select(school_name, address, postal_code, mainlevel_code)

glimpse(primary_data)
Rows: 183
Columns: 4
$ school_name    <chr> "ADMIRALTY PRIMARY SCHOOL", "AHMAD IBRAHIM PRIMARY SCHO…
$ address        <chr> "11   WOODLANDS CIRCLE", "10   YISHUN STREET 11", "100 …
$ postal_code    <int> 738907, 768643, 579646, 159016, 544969, 569785, 569920,…
$ mainlevel_code <chr> "PRIMARY", "PRIMARY", "PRIMARY", "PRIMARY", "PRIMARY", …

Calling the LATLONG conversion steps and function similar to above

primary_postal <- unique(primary_data$postal_code)
primary_latlong <- get_coords(primary_postal)

Checking for NAs

primary_latlong[(is.na(primary_latlong))]
[1] NA NA NA NA NA NA NA NA NA

After looking at the issue, we can see that the issue is due to postal that starts with 0, thus we will remove the 0

primary_data$postal_code[primary_data$postal_code == '88256'] <- '088256'
primary_data$postal_code[primary_data$postal_code == '99757'] <- '099757'
primary_data$postal_code[primary_data$postal_code == '99840'] <- '099840'

Re-running the functions

primary_postal <- unique(primary_data$postal_code)
primary_latlong <- get_coords(primary_postal)

primary_latlong[(is.na(primary_latlong))]
character(0)

Now we will combine the data and convert the CRS value

primary_school <- left_join(primary_data, primary_latlong, by = c('postal_code' = 'postal'))

primary_school_sf <- st_as_sf(primary_school,
                              coords = c("longitude",
                                         "latitude"),
                              crs = 4326) %>%
  st_transform(crs = 3414)

Good Primary Schools

To define good primary schools, we get the data from here. However your preferences and sources might vary, so do change as you wish

good_primary_school <- primary_school %>%
  filter(school_name %in%
           c("PEI HWA PRESBYTERIAN PRIMARY SCHOOL",
             "GONGSHANG PRIMARY SCHOOL",
             "RIVERSIDE PRIMARY SCHOOL",
             "RED SWASTIKA SCHOOL",
             "PUNGGOL GREEN PRIMARY SCHOOL",
             "PRINCESS ELIZABETH PRIMARY SCHOOL",
             "WESTWOOD PRIMARY SCHOOL",
             "AI TONG SCHOOL",
             "FRONTIER PRIMARY SCHOOL",
             "OASIS PRIMARY SCHOOL"))

As usual, converting CRS value

good_primary_school_sf <- st_as_sf(good_primary_school,
                                   coords = c("longitude",
                                              "latitude"),
                                   crs = 4326) %>%
  st_transform(crs = 3414)

Shopping Malls

As there is source of shopping mall data from gov.sg, the source was taken from Wiki and from a person who created a scrapper to get the data with LATLONG.

shopping <- read.csv("data/geospatial/mall_coordinates.csv")

shopping <- shopping %>%
  select(name, latitude, longitude)

glimpse(shopping)
Rows: 199
Columns: 3
$ name      <chr> "100 AM", "i12 Katong", "313@SOMERSET", "321 CLEMENTI", "600…
$ latitude  <dbl> 1.274588, 1.305087, 1.301385, 1.312025, 1.334042, 1.437131, …
$ longitude <dbl> 103.8435, 103.9051, 103.8377, 103.7650, 103.8510, 103.7953, …

Converting CRS

shopping_sf <- st_as_sf(shopping,
                        coords = c("longitude",
                                   "latitude"),
                        crs = 4326) %>%
  st_transform(crs = 3414)

2.3.2 Importing Geospatial Data with LATLONG in data

Since all the other data sets already have LATLONG included, we will import them and ensure the CRS code is correct

Elderly Care

elder_sf <- st_read(dsn = "data/geospatial", layer = "ELDERCARE")
Reading layer `ELDERCARE' from data source 
  `C:\Users\Daniel\Desktop\Github\IS419\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 133 features and 18 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21
elder_sf <- st_transform(elder_sf, 3414)

Hawker Centre

hawker_sf <- st_read(dsn = "data/geospatial", layer = "HAWKERCENTRE")
Reading layer `HAWKERCENTRE' from data source 
  `C:\Users\Daniel\Desktop\Github\IS419\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 125 features and 21 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 12874.19 ymin: 28355.97 xmax: 45241.4 ymax: 47872.53
Projected CRS: SVY21
hawker_sf <- st_transform(hawker_sf, 3414)

MRT Stations

mrt <- read.csv("data/geospatial/mrtsg.csv")

mrt_sf <- st_as_sf(mrt,
                   coords = c("Longitude",
                              "Latitude"),
                              crs = 4326) %>%
  st_transform(crs = 3414)

Parks

parks_sf <- st_read(dsn = "data/geospatial", layer = "NATIONALPARKS")
Reading layer `NATIONALPARKS' from data source 
  `C:\Users\Daniel\Desktop\Github\IS419\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 352 features and 15 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 12373.11 ymin: 21869.93 xmax: 46735.95 ymax: 49231.09
Projected CRS: SVY21
parks_sf <- st_transform(parks_sf, 3414)

Supermarkets

supermarket_sf <- st_read(dsn = "data/geospatial", layer = "SUPERMARKETS")
Reading layer `SUPERMARKETS' from data source 
  `C:\Users\Daniel\Desktop\Github\IS419\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 526 features and 8 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4901.188 ymin: 25529.08 xmax: 46948.22 ymax: 49233.6
Projected CRS: SVY21
supermarket_sf <- st_transform(supermarket_sf, 3414)

Kindergartens

kindergarten_sf <- st_read(dsn = "data/geospatial", layer = "KINDERGARTENS")
Reading layer `KINDERGARTENS' from data source 
  `C:\Users\Daniel\Desktop\Github\IS419\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 448 features and 15 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 11909.7 ymin: 25596.33 xmax: 43395.47 ymax: 48562.06
Projected CRS: SVY21
kindergarten_sf <- st_transform(kindergarten_sf, 3414)

Childcare

childcare_sf <- st_read(dsn = "data/geospatial", layer = "CHILDCARE")
Reading layer `CHILDCARE' from data source 
  `C:\Users\Daniel\Desktop\Github\IS419\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1545 features and 15 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 11203.01 ymin: 25667.6 xmax: 45404.24 ymax: 49300.88
Projected CRS: WGS_1984_Transverse_Mercator
# Assign EPSG Code
childcare_sf <- st_transform(childcare_sf, 3414)

Bus Stops

BusStop_sf <- st_read(dsn = "data/geospatial", layer = "BusStop")
Reading layer `BusStop' from data source 
  `C:\Users\Daniel\Desktop\Github\IS419\IS415-GAA\Take-Home_Ex\Take-Home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
BusStop_sf <- st_transform(BusStop_sf, 3414)

3 Proximity Calculation

3.1 Function for calculation

3.1.1 Normal Calculation

Currently, the distance is measured in metre because SVY21 projected coordinate system is used. The code chunk below will be used to convert the unit f measurement from metre to km.

prox_cal <- function(df1, df2, col_name) {
  dist_matrix <- st_distance(df1, df2)
  df1[,col_name] <- rowMins(dist_matrix) / 1000
  return(df1)
}

3.1.2 With Radius Calculation

prox_cal_radius <- function(df1, df2, col_name, radius) {
  dist_matrix <- st_distance(df1, df2) %>%
    drop_units() %>%
    as.data.frame()
  df1[,col_name] <- rowSums(dist_matrix <= radius)
  return(df1)
}

3.2 Calculating Location Factors

Normal

resale_main_sf <- prox_cal(resale_main_sf, cbd_sf, "PROX_CBD")
resale_main_sf <- prox_cal(resale_main_sf, elder_sf, "PROX_ELDERCARE")
resale_main_sf <- prox_cal(resale_main_sf, hawker_sf, "PROX_HAWKER")
resale_main_sf <- prox_cal(resale_main_sf, mrt_sf, "PROX_MRT")
resale_main_sf <- prox_cal(resale_main_sf, parks_sf, "PROX_PARK")
resale_main_sf <- prox_cal(resale_main_sf, good_primary_school_sf, "PROX_GOODPRIMARY")
resale_main_sf <- prox_cal(resale_main_sf, shopping_sf, "PROX_SHOPPING")
resale_main_sf <- prox_cal(resale_main_sf, BusStop_sf, "PROX_BUS")
resale_main_sf <- prox_cal(resale_main_sf, childcare_sf, "PROX_CHILDCARE")
resale_main_sf <- prox_cal(resale_main_sf, supermarket_sf, "PROX_SUPERMARKET")

Radius

resale_main_sf <- prox_cal_radius(resale_main_sf, kindergarten_sf, "WITHIN_350M_KINDERGARTEN", 350)
resale_main_sf <- prox_cal_radius(resale_main_sf, childcare_sf, "WITHIN_350M_CHILDCARE", 350)
resale_main_sf <- prox_cal_radius(resale_main_sf, BusStop_sf, "WITHIN_350M_BUS", 350)
resale_main_sf <- prox_cal_radius(resale_main_sf, primary_school_sf, "WITHIN_1KM_PRIMARY", 1000)

3.3 Saving of dataset

resale_main_sf <- resale_main_sf %>%
  mutate() %>%
  rename("AREA_SQM" = "floor_area_sqm",
         "LEASE_MTHS" = "remaining_lease_mths",
         "PRICE" = "resale_price",
         "STOREY" = "storey_range")

4 EDA

resale_main_sf <- read_rds("data/model/resale_main.rds")
glimpse(resale_main_sf)
Rows: 25,502
Columns: 28
$ month                    <chr> "2021-01", "2021-01", "2021-01", "2021-01", "…
$ town                     <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "AN…
$ address                  <chr> "547 ANG MO KIO AVE 10", "414 ANG MO KIO AVE …
$ block                    <chr> "547", "414", "509", "467", "571", "134", "20…
$ street_name              <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 10", "AN…
$ flat_type                <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", "4 RO…
$ STOREY                   <chr> "04 TO 06", "01 TO 03", "01 TO 03", "07 TO 09…
$ AREA_SQM                 <dbl> 92, 92, 91, 92, 92, 98, 92, 92, 92, 92, 92, 1…
$ flat_model               <chr> "New Generation", "New Generation", "New Gene…
$ lease_commence_date      <dbl> 1981, 1979, 1980, 1979, 1979, 1978, 1977, 197…
$ LEASE_MTHS               <dbl> 708, 693, 702, 695, 689, 681, 661, 682, 692, …
$ PRICE                    <dbl> 370000, 375000, 380000, 385000, 410000, 41000…
$ postal                   <chr> "560547", "560414", "560509", "560467", "5605…
$ geometry                 <POINT [m]> POINT (30770.07 39578.64), POINT (30292…
$ PROX_CBD                 <dbl> 9.564575, 8.401690, 9.516492, 8.580908, 9.084…
$ PROX_BUS                 <dbl> 0.16157609, 0.16740841, 0.07424143, 0.0887911…
$ PROX_CHILDCARE           <dbl> 2.493662e-01, 6.715056e-02, 1.385583e-01, 1.4…
$ PROX_ELDERCARE           <dbl> 1.08567795, 0.15039052, 0.72242472, 0.0981628…
$ PROX_HAWKER              <dbl> 0.4442515, 0.2050009, 0.4495734, 0.3190679, 0…
$ PROX_GOODPRIMARY         <dbl> 3.182527, 2.354024, 2.414729, 2.699653, 2.648…
$ PROX_PARK                <dbl> 0.8291527, 0.7847864, 0.3796058, 0.9242129, 0…
$ PROX_SUPERMARKET         <dbl> 0.4184204, 0.1946009, 0.4435109, 0.4269715, 0…
$ PROX_SHOPPING            <dbl> 1.1817959, 0.8444986, 0.2966736, 0.9304149, 0…
$ PROX_MRT                 <dbl> 1.0731215, 0.8245176, 0.4544926, 0.9503956, 0…
$ WITHIN_350M_KINDERGARTEN <dbl> 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, …
$ WITHIN_350M_CHILDCARE    <dbl> 2, 3, 3, 3, 3, 2, 6, 3, 3, 3, 3, 3, 5, 2, 3, …
$ WITHIN_350M_BUS          <dbl> 4, 7, 10, 4, 8, 2, 8, 7, 6, 7, 7, 7, 8, 8, 11…
$ WITHIN_1KM_PRIMARY       <dbl> 1, 3, 2, 3, 2, 2, 3, 2, 3, 3, 1, 2, 3, 2, 2, …

4.1 Statistical Graphs

4.1.1 Distribution of 4-Room Resale Flat Prices

ggplot(data = resale_main_sf, aes(x = `PRICE`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue") +
  labs(title = "Distribution of 4-Room Resale Prices",
       x = "Resale Prices",
       y = "Frequency")

Note

From the histogram, we can see that it is skewed towards the right. This means that the transacted price were at a relative lower price.

resale_main_sf <- resale_main_sf %>%
  mutate(`LOG_SELLING_PRICE` = log(PRICE))
ggplot(data=resale_main_sf, aes(x=`LOG_SELLING_PRICE`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

Note

We can see that after we log the values, it becomes slightly less skewed.

4.1.2 Viewing other category

AREA_SQM <- ggplot(data = resale_main_sf, aes(x = `AREA_SQM`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

LEASE_MTHS <- ggplot(data = resale_main_sf, aes(x = `LEASE_MTHS`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

PROX_CBD <- ggplot(data = resale_main_sf, aes(x = `PROX_CBD`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

PROX_BUS <- ggplot(data = resale_main_sf, aes(x = `PROX_BUS`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

PROX_CHILDCARE <- ggplot(data = resale_main_sf, aes(x = `PROX_CHILDCARE`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

PROX_ELDERCARE <- ggplot(data = resale_main_sf, aes(x = `PROX_ELDERCARE`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

PROX_HAWKER <- ggplot(data = resale_main_sf, aes(x = `PROX_HAWKER`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

PROX_GOODPRIMARY <- ggplot(data = resale_main_sf, aes(x = `PROX_GOODPRIMARY`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

PROX_PARK <- ggplot(data = resale_main_sf, aes(x = `PROX_PARK`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

PROX_SUPERMARKET <- ggplot(data = resale_main_sf, aes(x = `PROX_SUPERMARKET`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

PROX_SHOPPING <- ggplot(data = resale_main_sf, aes(x = `PROX_SHOPPING`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

PROX_MRT <- ggplot(data = resale_main_sf, aes(x = `PROX_MRT`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

WITHIN_350M_KINDERGARTEN <- ggplot(data = resale_main_sf, aes(x = `WITHIN_350M_KINDERGARTEN`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

WITHIN_350M_CHILDCARE <- ggplot(data = resale_main_sf, aes(x = `WITHIN_350M_CHILDCARE`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

WITHIN_350M_BUS <- ggplot(data = resale_main_sf, aes(x = `WITHIN_350M_BUS`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

WITHIN_1KM_PRIMARY <- ggplot(data = resale_main_sf, aes(x = `WITHIN_1KM_PRIMARY`)) +
  geom_histogram(bins = 20, color = "black", fill = "light blue")

ggarrange(AREA_SQM, LEASE_MTHS, PROX_CBD, PROX_BUS, PROX_CHILDCARE, PROX_ELDERCARE, PROX_HAWKER, PROX_GOODPRIMARY, PROX_PARK, PROX_SUPERMARKET, PROX_SHOPPING, PROX_MRT, WITHIN_350M_KINDERGARTEN, WITHIN_350M_CHILDCARE, WITHIN_350M_BUS, WITHIN_1KM_PRIMARY, ncol = 3, nrow = 6)

4.1.3 Viewing the Distribution using Boxplot

ggplot(data = resale_main_sf, aes(x = '', y = PRICE)) +
  geom_boxplot() +
  labs(x = '', y = 'Resale Prices')

summary(resale_main_sf$PRICE)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 250000  445000  495000  529142  570000 1370000 
Note

We can see that there are quite a few outliers like the min and max selling at extremes. But mainly the price ranges from 554k to 570k

4.1.4 Point Map

tmap_mode("view")
tmap_options(check.and.fix = TRUE)
tm_shape(resale_main_sf)+
  tm_dots(col = "PRICE",
          alpha = 0.6,
          style = "quantile",
          popup.vars=c("block"="block", "street_name"="street_name", "flat_model" = "flat_model", "town" = "town", "PRICE" = "PRICE", "LEASE_MTHS", "LEASE_MTHS")) +
  tm_view(set.zoom.limits = c(11, 14))
Note

We can see that the higher resales prices are generally located at the bottom right and bottom half of Singapore.

town_mean <- aggregate(resale_main_sf[, "PRICE"], list(resale_main_sf$town), mean)
town_top10 = top_n(town_mean, 10, `PRICE`) %>%
  arrange(desc(`PRICE`))
town_top10
Simple feature collection with 10 features and 2 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: 19536.43 ymin: 28217.39 xmax: 35958.92 ymax: 41493.47
Projected CRS: SVY21 / Singapore TM
           Group.1    PRICE                       geometry
1     CENTRAL AREA 857591.2 MULTIPOINT ((28640.73 29932...
2       QUEENSTOWN 783287.3 MULTIPOINT ((22133.07 32910...
3      BUKIT MERAH 717229.0 MULTIPOINT ((25024.14 28462...
4  KALLANG/WHAMPOA 703112.0 MULTIPOINT ((19536.43 41493...
5        TOA PAYOH 655306.9 MULTIPOINT ((29021.66 34720...
6         CLEMENTI 646121.4 MULTIPOINT ((19863.73 32474...
7      BUKIT TIMAH 638823.0 MULTIPOINT ((21224.71 35657...
8           BISHAN 609267.0 MULTIPOINT ((27638.13 37842...
9          GEYLANG 590617.2 MULTIPOINT ((32749.43 33252...
10      ANG MO KIO 558339.2 MULTIPOINT ((28070.74 38987...
Important

From the map data above, we can infer based off referring to the map. But with the result above, it shows that the places with higher resales prices were the Central Area with Queenstown not far behind.

5 Hedonic Pricing Modelling

5.1 Simple Linear Regression Method

resales_slr <- lm(formula=PRICE ~ AREA_SQM, data = resale_main_sf)
summary(resales_slr)

Call:
lm(formula = PRICE ~ AREA_SQM, data = resale_main_sf)

Residuals:
    Min      1Q  Median      3Q     Max 
-286935  -83442  -32277   42723  838888 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 639415.3    11240.0  56.888   <2e-16 ***
AREA_SQM     -1164.5      118.4  -9.837   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 129800 on 25500 degrees of freedom
Multiple R-squared:  0.00378,   Adjusted R-squared:  0.003741 
F-statistic: 96.76 on 1 and 25500 DF,  p-value: < 2.2e-16
Note

Since p-value is much smaller than 0.0001, we will reject the null hypothesis that mean is a good estimator of SELLING_PRICE. This will allow us to infer that simple linear regression model above is a good estimator of SELLING_PRICE.

ggplot(data=resale_main_sf,  
       aes(x=`AREA_SQM`, y=`PRICE`)) +
  geom_point() +
  geom_smooth(method = lm)

Note

Seems like the variability of AREA_SQ can’t really fit the straight line. So after plotting the best fit curve, we can see that AREA_SQM is not a good indicator of predicting the PRICE.

5.2 Multiple Linear Regression Method

resale_main_nogeo <- resale_main_sf %>%
  st_drop_geometry() %>%
  dplyr::select(c(7, 8, 11, 12, 14:27)) %>%
  mutate(STOREY = as.character(STOREY))

Checking dataset

glimpse(resale_main_nogeo)
Rows: 25,502
Columns: 18
$ STOREY                   <chr> "04 TO 06", "01 TO 03", "01 TO 03", "07 TO 09…
$ AREA_SQM                 <dbl> 92, 92, 91, 92, 92, 98, 92, 92, 92, 92, 92, 1…
$ LEASE_MTHS               <dbl> 708, 693, 702, 695, 689, 681, 661, 682, 692, …
$ PRICE                    <dbl> 370000, 375000, 380000, 385000, 410000, 41000…
$ PROX_CBD                 <dbl> 9.564575, 8.401690, 9.516492, 8.580908, 9.084…
$ PROX_BUS                 <dbl> 0.16157609, 0.16740841, 0.07424143, 0.0887911…
$ PROX_CHILDCARE           <dbl> 2.493662e-01, 6.715056e-02, 1.385583e-01, 1.4…
$ PROX_ELDERCARE           <dbl> 1.08567795, 0.15039052, 0.72242472, 0.0981628…
$ PROX_HAWKER              <dbl> 0.4442515, 0.2050009, 0.4495734, 0.3190679, 0…
$ PROX_GOODPRIMARY         <dbl> 3.182527, 2.354024, 2.414729, 2.699653, 2.648…
$ PROX_PARK                <dbl> 0.8291527, 0.7847864, 0.3796058, 0.9242129, 0…
$ PROX_SUPERMARKET         <dbl> 0.4184204, 0.1946009, 0.4435109, 0.4269715, 0…
$ PROX_SHOPPING            <dbl> 1.1817959, 0.8444986, 0.2966736, 0.9304149, 0…
$ PROX_MRT                 <dbl> 1.0731215, 0.8245176, 0.4544926, 0.9503956, 0…
$ WITHIN_350M_KINDERGARTEN <dbl> 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, …
$ WITHIN_350M_CHILDCARE    <dbl> 2, 3, 3, 3, 3, 2, 6, 3, 3, 3, 3, 3, 5, 2, 3, …
$ WITHIN_350M_BUS          <dbl> 4, 7, 10, 4, 8, 2, 8, 7, 6, 7, 7, 7, 8, 8, 11…
$ WITHIN_1KM_PRIMARY       <dbl> 1, 3, 2, 3, 2, 2, 3, 2, 3, 3, 1, 2, 3, 2, 2, …
corrplot(cor(resale_main_nogeo[, 2:18]), diag = FALSE, order = "AOE",
         t1.pos = "td",
         t1.cex = 0.5,
         method = "number",
         type = "upper")

Note

We can see that there are only a few correlated variables and they are within the acceptable range. So, can can accept all the variables in the regression.

5.2.1 Building Hedonic pricing model using multiple linear regression method

resale_mlr <- lm(formula = PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD +
                   PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY +
                   PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_SHOPPING +
                   PROX_SUPERMARKET + WITHIN_1KM_PRIMARY + WITHIN_350M_BUS +
                   WITHIN_350M_CHILDCARE + WITHIN_350M_KINDERGARTEN + LEASE_MTHS,
                 data = resale_main_nogeo)
summary(resale_mlr)

Call:
lm(formula = PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD + 
    PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY + PROX_HAWKER + 
    PROX_MRT + PROX_PARK + PROX_SHOPPING + PROX_SUPERMARKET + 
    WITHIN_1KM_PRIMARY + WITHIN_350M_BUS + WITHIN_350M_CHILDCARE + 
    WITHIN_350M_KINDERGARTEN + LEASE_MTHS, data = resale_main_nogeo)

Residuals:
    Min      1Q  Median      3Q     Max 
-366081  -45763   -1688   44765  366746 

Coefficients:
                           Estimate Std. Error  t value Pr(>|t|)    
(Intercept)              120014.669   8258.128   14.533  < 2e-16 ***
STOREY04 TO 06            18174.045   1415.338   12.841  < 2e-16 ***
STOREY07 TO 09            33293.452   1433.951   23.218  < 2e-16 ***
STOREY10 TO 12            42205.423   1465.693   28.796  < 2e-16 ***
STOREY13 TO 15            46506.923   1732.282   26.847  < 2e-16 ***
STOREY16 TO 18            57642.699   2237.856   25.758  < 2e-16 ***
STOREY19 TO 21            81000.888   3229.399   25.082  < 2e-16 ***
STOREY22 TO 24            93152.163   3729.839   24.975  < 2e-16 ***
STOREY25 TO 27           129017.648   4162.388   30.996  < 2e-16 ***
STOREY28 TO 30           191972.634   5073.291   37.840  < 2e-16 ***
STOREY31 TO 33           200784.072   6147.604   32.661  < 2e-16 ***
STOREY34 TO 36           229694.831   6128.617   37.479  < 2e-16 ***
STOREY37 TO 39           237916.273   7247.254   32.828  < 2e-16 ***
STOREY40 TO 42           254342.952  10061.970   25.278  < 2e-16 ***
STOREY43 TO 45           306318.117  13356.989   22.933  < 2e-16 ***
STOREY46 TO 48           363573.767  22972.861   15.826  < 2e-16 ***
STOREY49 TO 51           486732.032  39689.742   12.263  < 2e-16 ***
AREA_SQM                   3337.357     66.666   50.061  < 2e-16 ***
PROX_BUS                    560.743   8367.314    0.067   0.9466    
PROX_CBD                 -16334.883    157.010 -104.037  < 2e-16 ***
PROX_CHILDCARE           -28447.588   5842.370   -4.869 1.13e-06 ***
PROX_ELDERCARE            -7618.521    766.757   -9.936  < 2e-16 ***
PROX_GOODPRIMARY          -1106.698    276.047   -4.009 6.11e-05 ***
PROX_HAWKER              -22254.694    912.079  -24.400  < 2e-16 ***
PROX_MRT                 -23445.396   1273.357  -18.412  < 2e-16 ***
PROX_PARK                  2133.165   1051.654    2.028   0.0425 *  
PROX_SHOPPING            -24191.278   1462.249  -16.544  < 2e-16 ***
PROX_SUPERMARKET           8280.425   2994.998    2.765   0.0057 ** 
WITHIN_1KM_PRIMARY        -9076.876    341.628  -26.569  < 2e-16 ***
WITHIN_350M_BUS            1086.401    169.483    6.410 1.48e-10 ***
WITHIN_350M_CHILDCARE     -3914.981    256.907  -15.239  < 2e-16 ***
WITHIN_350M_KINDERGARTEN   6513.794    468.411   13.906  < 2e-16 ***
LEASE_MTHS                  359.103      3.123  114.989  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 68620 on 25469 degrees of freedom
Multiple R-squared:  0.722, Adjusted R-squared:  0.7216 
F-statistic:  2067 on 32 and 25469 DF,  p-value: < 2.2e-16

5.2.2 Preparing Publication Quality Table: olsrr method

resale_mlr <- lm(formula = PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD +
                   PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY +
                   PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_SHOPPING +
                   PROX_SUPERMARKET + WITHIN_1KM_PRIMARY + WITHIN_350M_BUS +
                   WITHIN_350M_CHILDCARE + WITHIN_350M_KINDERGARTEN + LEASE_MTHS,
                 data = resale_main_nogeo)
ols_regress(resale_mlr)
                            Model Summary                              
----------------------------------------------------------------------
R                       0.850       RMSE                    68623.970 
R-Squared               0.722       Coef. Var                  12.969 
Adj. R-Squared          0.722       MSE                4709249303.857 
Pred R-Squared          0.721       MAE                     53943.596 
----------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                     ANOVA                                       
--------------------------------------------------------------------------------
                    Sum of                                                      
                   Squares           DF       Mean Square       F          Sig. 
--------------------------------------------------------------------------------
Regression    3.114971e+14           32      9.734283e+12    2067.056    0.0000 
Residual      1.199399e+14        25469    4709249303.857                       
Total         4.314369e+14        25501                                         
--------------------------------------------------------------------------------

                                                Parameter Estimates                                                 
-------------------------------------------------------------------------------------------------------------------
                   model          Beta    Std. Error    Std. Beta       t         Sig          lower         upper 
-------------------------------------------------------------------------------------------------------------------
             (Intercept)    120014.669      8258.128                   14.533    0.000    103828.267    136201.072 
          STOREY04 TO 06     18174.045      1415.338        0.057      12.841    0.000     15399.902     20948.188 
          STOREY07 TO 09     33293.452      1433.951        0.103      23.218    0.000     30482.826     36104.078 
          STOREY10 TO 12     42205.423      1465.693        0.126      28.796    0.000     39332.581     45078.266 
          STOREY13 TO 15     46506.923      1732.282        0.111      26.847    0.000     43111.552     49902.294 
          STOREY16 TO 18     57642.699      2237.856        0.098      25.758    0.000     53256.374     62029.024 
          STOREY19 TO 21     81000.888      3229.399        0.089      25.082    0.000     74671.081     87330.694 
          STOREY22 TO 24     93152.163      3729.839        0.088      24.975    0.000     85841.465    100462.861 
          STOREY25 TO 27    129017.648      4162.388        0.108      30.996    0.000    120859.129    137176.166 
          STOREY28 TO 30    191972.634      5073.291        0.130      37.840    0.000    182028.693    201916.575 
          STOREY31 TO 33    200784.072      6147.604        0.111      32.661    0.000    188734.416    212833.727 
          STOREY34 TO 36    229694.831      6128.617        0.128      37.479    0.000    217682.391    241707.271 
          STOREY37 TO 39    237916.273      7247.254        0.111      32.828    0.000    223711.240    252121.305 
          STOREY40 TO 42    254342.952     10061.970        0.085      25.278    0.000    234620.916    274064.988 
          STOREY43 TO 45    306318.117     13356.989        0.077      22.933    0.000    280137.656    332498.579 
          STOREY46 TO 48    363573.767     22972.861        0.053      15.826    0.000    318545.648    408601.887 
          STOREY49 TO 51    486732.032     39689.742        0.041      12.263    0.000    408937.869    564526.194 
                AREA_SQM      3337.357        66.666        0.176      50.061    0.000      3206.688      3468.027 
                PROX_BUS       560.743      8367.314        0.000       0.067    0.947    -15839.670     16961.157 
                PROX_CBD    -16334.883       157.010       -0.528    -104.037    0.000    -16642.633    -16027.134 
          PROX_CHILDCARE    -28447.588      5842.370       -0.017      -4.869    0.000    -39898.967    -16996.208 
          PROX_ELDERCARE     -7618.521       766.757       -0.036      -9.936    0.000     -9121.409     -6115.634 
        PROX_GOODPRIMARY     -1106.698       276.047       -0.017      -4.009    0.000     -1647.766      -565.631 
             PROX_HAWKER    -22254.694       912.079       -0.090     -24.400    0.000    -24042.421    -20466.967 
                PROX_MRT    -23445.396      1273.357       -0.068     -18.412    0.000    -25941.248    -20949.544 
               PROX_PARK      2133.165      1051.654        0.007       2.028    0.043        71.864      4194.466 
           PROX_SHOPPING    -24191.278      1462.249       -0.066     -16.544    0.000    -27057.370    -21325.187 
        PROX_SUPERMARKET      8280.425      2994.998        0.010       2.765    0.006      2410.059     14150.791 
      WITHIN_1KM_PRIMARY     -9076.876       341.628       -0.111     -26.569    0.000     -9746.487     -8407.265 
         WITHIN_350M_BUS      1086.401       169.483        0.024       6.410    0.000       754.205      1418.598 
   WITHIN_350M_CHILDCARE     -3914.981       256.907       -0.060     -15.239    0.000     -4418.533     -3411.428 
WITHIN_350M_KINDERGARTEN      6513.794       468.411        0.049      13.906    0.000      5595.682      7431.906 
              LEASE_MTHS       359.103         3.123        0.455     114.989    0.000       352.982       365.224 
-------------------------------------------------------------------------------------------------------------------

5.2.3 Preparing Publication Quality Table: gtsummary method

tbl_regression(resale_mlr, intercept = TRUE)
Characteristic Beta 95% CI1 p-value
(Intercept) 120,015 103,828, 136,201 <0.001
STOREY
    01 TO 03
    04 TO 06 18,174 15,400, 20,948 <0.001
    07 TO 09 33,293 30,483, 36,104 <0.001
    10 TO 12 42,205 39,333, 45,078 <0.001
    13 TO 15 46,507 43,112, 49,902 <0.001
    16 TO 18 57,643 53,256, 62,029 <0.001
    19 TO 21 81,001 74,671, 87,331 <0.001
    22 TO 24 93,152 85,841, 100,463 <0.001
    25 TO 27 129,018 120,859, 137,176 <0.001
    28 TO 30 191,973 182,029, 201,917 <0.001
    31 TO 33 200,784 188,734, 212,834 <0.001
    34 TO 36 229,695 217,682, 241,707 <0.001
    37 TO 39 237,916 223,711, 252,121 <0.001
    40 TO 42 254,343 234,621, 274,065 <0.001
    43 TO 45 306,318 280,138, 332,499 <0.001
    46 TO 48 363,574 318,546, 408,602 <0.001
    49 TO 51 486,732 408,938, 564,526 <0.001
AREA_SQM 3,337 3,207, 3,468 <0.001
PROX_BUS 561 -15,840, 16,961 >0.9
PROX_CBD -16,335 -16,643, -16,027 <0.001
PROX_CHILDCARE -28,448 -39,899, -16,996 <0.001
PROX_ELDERCARE -7,619 -9,121, -6,116 <0.001
PROX_GOODPRIMARY -1,107 -1,648, -566 <0.001
PROX_HAWKER -22,255 -24,042, -20,467 <0.001
PROX_MRT -23,445 -25,941, -20,950 <0.001
PROX_PARK 2,133 72, 4,194 0.043
PROX_SHOPPING -24,191 -27,057, -21,325 <0.001
PROX_SUPERMARKET 8,280 2,410, 14,151 0.006
WITHIN_1KM_PRIMARY -9,077 -9,746, -8,407 <0.001
WITHIN_350M_BUS 1,086 754, 1,419 <0.001
WITHIN_350M_CHILDCARE -3,915 -4,419, -3,411 <0.001
WITHIN_350M_KINDERGARTEN 6,514 5,596, 7,432 <0.001
LEASE_MTHS 359 353, 365 <0.001
1 CI = Confidence Interval

5.2.4 Checking for Multicolinearity

ols_vif_tol(resale_mlr)
                  Variables Tolerance      VIF
1            STOREY04 TO 06 0.5453259 1.833766
2            STOREY07 TO 09 0.5526886 1.809337
3            STOREY10 TO 12 0.5661245 1.766396
4            STOREY13 TO 15 0.6433468 1.554372
5            STOREY16 TO 18 0.7594335 1.316771
6            STOREY19 TO 21 0.8589673 1.164189
7            STOREY22 TO 24 0.8859272 1.128761
8            STOREY25 TO 27 0.9019700 1.108684
9            STOREY28 TO 30 0.9220648 1.084522
10           STOREY31 TO 33 0.9418010 1.061795
11           STOREY34 TO 36 0.9406107 1.063139
12           STOREY37 TO 39 0.9573717 1.044526
13           STOREY40 TO 42 0.9708734 1.030000
14           STOREY43 TO 45 0.9786577 1.021808
15           STOREY46 TO 48 0.9918187 1.008249
16           STOREY49 TO 51 0.9966094 1.003402
17                 AREA_SQM 0.8811246 1.134913
18                 PROX_BUS 0.8620575 1.160015
19                 PROX_CBD 0.4244658 2.355902
20           PROX_CHILDCARE 0.8721396 1.146605
21           PROX_ELDERCARE 0.8365748 1.195350
22         PROX_GOODPRIMARY 0.5928120 1.686875
23              PROX_HAWKER 0.8029681 1.245380
24                 PROX_MRT 0.8000420 1.249934
25                PROX_PARK 0.8580656 1.165412
26            PROX_SHOPPING 0.6879710 1.453550
27         PROX_SUPERMARKET 0.8865753 1.127936
28       WITHIN_1KM_PRIMARY 0.6199215 1.613107
29          WITHIN_350M_BUS 0.7671970 1.303446
30    WITHIN_350M_CHILDCARE 0.7114620 1.405556
31 WITHIN_350M_KINDERGARTEN 0.8882120 1.125857
32               LEASE_MTHS 0.6974027 1.433892
Note

Since the VIF of the independent variables is less than 10, we can conclude that there are no signs of multicollinearity among independent variables.

5.2.5 Testing for Non-Linearity

ols_plot_resid_fit(resale_mlr)

Note

We can see that most of the data points are around the 0 line with a few outliers. But it is still within the range of tolerance and can conclude that the relationship between dependent and independent variables are linear.

5.2.6 Testing for Normality Assumption

ols_plot_resid_hist(resale_mlr)

Note

We can see that this graph above shows that the residual of the multiple linear regression model is normally distributed.

5.2.7 Testing for Spatial Autocorrelation

Exporting of data and save it as data frame

mlr_output <- as.data.frame(resale_mlr$residuals)

Join newly created data frame with resale_main_sf

resale_res_sf <- cbind(resale_main_sf,
                       resale_mlr$residuals) %>%
  rename(`MLR_RES` = resale_mlr.residuals)

Converting from simple feature into SpatialPointDataFrame

resale_sp <- as_Spatial(resale_res_sf)
resale_sp
class       : SpatialPointsDataFrame 
features    : 25502 
extent      : 11519.79, 42645.18, 28217.39, 48741.06  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 29
names       :   month,       town,          address, block,   street_name, flat_type,   STOREY, AREA_SQM,    flat_model, lease_commence_date, LEASE_MTHS,   PRICE, postal,          PROX_CBD,           PROX_BUS, ... 
min values  : 2021-01, ANG MO KIO,   1 CHAI CHEE RD,     1,  ADMIRALTY DR,    4 ROOM, 01 TO 03,       70, Adjoined flat,                1967,        517,  250000, 052335, 0.999393538715878, 0.0158480216214286, ... 
max values  : 2023-02,     YISHUN, 9B BOON TIONG RD,    9B, YUNG SHENG RD,    4 ROOM, 49 TO 51,      145,       Type S1,                2019,       1168, 1370000,    NIL,  19.6500691667807,  0.391470769487539, ... 

5.2.8 Visualising Distribution of Residuals

tm_shape(mpsz)+
  tmap_options(check.and.fix = TRUE) +
  tm_polygons(alpha = 0.4) +
tm_shape(resale_res_sf) +  
  tm_dots(col = "MLR_RES",
          alpha = 0.6,
          style="quantile") +
  tm_view(set.zoom.limits = c(11,14))
Note

From the visualisation, we can see that there are signs of spatial autocorrelation. We will use Moran’s I test to test it.

5.2.9 Moren’s I Test

nb <- dnearneigh(coordinates(resale_sp), 0, 1500, longlat = FALSE)
summary(nb)
Neighbour list object:
Number of regions: 25502 
Number of nonzero links: 26961048 
Percentage nonzero weights: 4.145609 
Average number of links: 1057.213 
Link number distribution:

   5   28   33   40   85   88   91   93  111  112  135  150  153  155  156  162 
   6   29   25    9    3   84   11    5    1    3    3  129   19    3    1    1 
 164  167  168  172  179  181  184  185  193  195  197  198  199  200  201  203 
   1    2    5    3    5    5    1    1   10    3   10    6   20    2   48    5 
 204  205  206  208  210  211  212  214  215  217  218  219  220  221  222  223 
   5   91   15   21    1    3    1   13   13    8    5    2   29    8    3   14 
 224  226  227  228  230  232  234  235  236  238  239  240  241  242  243  244 
   6    5   41   15    2   18   19    1   17    6    6    6   17   18    8    4 
 245  246  247  248  249  250  251  252  254  256  257  258  259  260  262  263 
  39   10   26   35    7    2    9   14   26    7   25    3    2    2   28   24 
 264  265  266  267  268  269  270  271  272  273  274  275  276  277  278  279 
  36   13   16   10   17   22    2   12   21   17    9    5   22   14    9   24 
 280  281  282  283  284  285  286  287  289  290  291  292  293  294  295  296 
  17    6   36   13   50    3   31   21   16   10    9   33   11   11    8   12 
 297  298  299  300  301  302  306  307  308  310  311  312  313  314  315  316 
  40   57   29    6   26   11   14    5   18    4    9    9    8    7   32   10 
 317  318  319  320  321  322  323  324  325  326  327  328  329  330  331  332 
  46   14    6   15   13    9   15   19   13    7   12   50    5    6   32    7 
 333  334  335  336  337  338  339  340  341  342  343  344  345  346  347  348 
   8   43    7   15    7   14    6   12   11   20   15    6    5   25   13    9 
 349  350  351  352  353  354  355  356  357  358  359  360  361  362  363  364 
  13   14   16    3   19    8    6   16    6   14    7   17    7   27    9    2 
 365  366  367  368  369  370  371  372  373  374  375  376  377  378  379  380 
   2   18   31   13    9   39   23   13    8   23   26   10   16    5    6   16 
 382  383  384  385  386  387  388  389  390  391  392  393  394  395  396  397 
   8    9    2   13   14   11   18   32   32    7   18   12    8   17   33    4 
 398  399  400  401  402  403  404  405  406  407  408  409  410  411  412  413 
  16   12    4   39   19    6   25   11   19   33   19   58   16   27    4    7 
 414  415  416  417  418  419  420  421  422  423  424  425  426  427  428  429 
   4    5   30   19   34   13   10    9   29   16   14   10   13   23    6   30 
 430  431  432  433  434  435  436  437  438  439  440  441  442  443  444  445 
  18   18   20    9   13   15   18   10   31    8    9    6   16   25   17   29 
 446  447  448  449  450  451  452  453  454  455  456  457  458  459  460  461 
  20   17   28   16    7   28    9   43   20   40   27   20   22    7   45   24 
 462  463  464  465  466  467  468  469  470  471  472  473  474  475  476  477 
  27   20   51   13   19    3    8   16   10   16   11   17   52   20   65   14 
 478  479  480  481  482  483  484  485  486  487  488  489  490  491  492  493 
  14   17   20   21   70   31  107   22   24   15   30   14   11   16   30    9 
 494  495  496  497  498  499  500  501  502  503  504  505  506  507  508  509 
  27   12    4   20   24   18   18   23   19   22   10   15   16    3    9   15 
 510  511  512  513  514  515  516  517  518  519  520  521  522  523  524  525 
   3   10   15   29    8    6   26   12    6    7   54   21   15    7   16   16 
 526  527  528  529  530  531  532  533  534  535  536  537  538  539  540  541 
  10    9   23   39   15   29   47    8   14   20   16   22    9   19   35   48 
 542  543  544  545  546  547  548  549  550  551  552  553  554  555  556  557 
  45   27    6   21   14   32   24   15   15    8   80   13    8   40   13    8 
 558  559  560  561  562  563  564  565  566  567  568  569  570  571  572  573 
  32   27   24   18   16    5   25   54   11   27    8   48    6   34   18   10 
 574  575  576  577  578  579  580  581  582  583  584  585  586  587  588  589 
  36   14   31   10   16   40   22   14   48   39   67   20   24   24   25   18 
 590  591  592  593  594  595  596  597  598  599  600  601  602  603  604  605 
  29   22   22   18   23    5   32   13   18   22   27   21    7   38   12   22 
 606  607  608  609  610  611  612  613  614  615  616  617  618  619  620  621 
  22   23   49   35   27   10   13   22   41    7   20   18   12   18   31   80 
 622  623  624  625  626  627  628  629  630  631  632  633  634  635  636  637 
  53   27   67    7   47   18   22    1   22   27   15    8   25   47   47   13 
 638  639  640  641  642  643  644  645  646  647  648  649  650  651  652  653 
  36   18   25   11   14   22   40   25   35   35   66   88    2   73   19   14 
 654  655  656  657  658  659  660  661  662  663  664  665  666  667  668  669 
   6   16   30   22   20   22    5   17   13   43   16   10   17   25   20   33 
 670  671  672  673  674  675  676  677  678  679  680  681  682  683  684  685 
  62   19    9   30   31   13   17    8   19   28   17   10   43   17   13   15 
 686  687  688  689  690  691  692  693  694  695  696  697  698  699  700  701 
   7    8   24   18   19   25   40   59   42   30   13   31   25    7   18   20 
 702  703  704  705  706  707  708  709  710  711  712  713  714  715  716  717 
  56   34   39   32   64   31   19   23   33   19   35   42   12   15   21    7 
 718  719  720  721  722  723  724  725  726  727  728  729  730  731  732  733 
  11   11    5   16   11    9    7   29   32   35    8   30    4   10   18    1 
 734  735  736  737  738  739  740  741  742  743  744  745  746  747  748  749 
   5   17   17    8   22   39   15    5   12   28   44    8   21   43    7   17 
 750  751  752  753  754  755  756  757  758  759  760  761  762  763  764  765 
   5   24   46   42   10   47    6   19   41   10    8    7   14   21   27    7 
 766  767  768  769  770  771  772  773  774  775  776  777  779  780  781  782 
  14    4    6   33    6    4   27   15   23   26   23   22   34    6   10    5 
 783  784  785  786  787  788  789  790  791  792  793  794  795  796  797  798 
   6    7   22   41   15   27   34   15    2    2    9   20   41   20    7   20 
 799  800  801  802  803  804  805  806  807  808  809  810  812  813  814  815 
  10   12    6   43    7   17   12   11    9   13   22    4    8   14   11   16 
 816  817  818  820  821  822  823  824  825  826  827  829  830  831  832  833 
   8   24    5    2    3    8    4    2    3   35   18   14   12    8    5    1 
 834  835  836  837  839  840  841  842  843  844  845  846  847  848  849  850 
  24    8   49    6   20    2   15    9   16   13    3   12   17    4   12    8 
 851  852  854  855  857  858  859  860  861  863  864  865  866  868  869  870 
   5   21    8    7    4    2    8   10   14   22    9    9   15   21    7   13 
 872  873  874  875  876  877  878  879  880  881  882  883  884  885  886  887 
   6   12    1   10    4   24   14   31   51   21    8   51   15    1    3   12 
 888  889  890  891  892  893  894  895  896  898  899  900  901  902  903  904 
  14    4   21   33   14    6   32    4   22    7   20   29    7    7   33   12 
 905  907  908  909  910  911  912  913  914  915  916  917  918  919  920  921 
  13   10    3    3   15   14    8   10    7   28    2    6   14    5   29   29 
 922  923  924  925  926  927  928  929  930  931  932  933  934  935  936  937 
  16   29    2    3   22   15    3    3    7   24    9   19   26   13   12   17 
 938  939  940  941  942  943  944  945  946  947  948  949  950  951  952  953 
  12    6   37   39   10   10   10   13   21   15   21   10   23   27    6   21 
 955  956  958  959  960  961  962  963  964  965  966  967  968  969  970  971 
   5   16   28   59   15    5   23   26    7   18    4    2    8    1   24   23 
 972  973  974  975  976  977  978  979  980  981  982  983  984  985  986  987 
   8   21   21    8   18   13   28   10   14    8    9    9    2   18    2    7 
 988  989  990  991  992  993  994  995  996  997  999 1000 1001 1002 1004 1005 
  10   15   11   12    2   11   79    8    6    6    9    1   36    7   15    6 
1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 
  13    3   15    6   33   16   14   17   16    9   15   17   16    1   10    1 
1022 1023 1024 1025 1026 1027 1028 1030 1031 1032 1033 1034 1035 1036 1037 1038 
  19   54   17    7   13    4   23   16   25   18   15   26    1   18   19   19 
1039 1040 1041 1042 1043 1044 1045 1048 1051 1052 1053 1054 1055 1056 1057 1058 
  23    7    4    4   15    9   14   10   14   19   14    5   34   11    5   15 
1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 
  18   10   12    8    9    6    7   24   10   19   16    6   10    8   11    3 
1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 
  21    8   20    9   12   17    5   12   27   13   12   15   12   19   18    7 
1091 1092 1094 1095 1096 1097 1098 1099 1100 1101 1102 1104 1105 1106 1107 1108 
  45    6   11   18   38    2   10   10   13    7    8   12   17    6   16    9 
1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1123 1124 1125 1126 
   2    8   26   21   19   14   12   33    2   32    9   26    1   15    8   13 
1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1143 
  27   33   19   10   16   29   16   19   26   19   35   22    2    1   12    9 
1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 
   5    2   11   28   15   12    9   18    6   34    6   13   21   13    7    5 
1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 
   7   14    9   23   17   21    9   19   11   19   24    1   25   37    7   14 
1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 
   7   10   11    8    8   11    1   24    9    7   30   20   11    1    3   15 
1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 
  29   28    2   23   10    7   15   11    8    4   14    5    3   21    3    7 
1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1221 1222 1223 1224 
  14    5    3    7   10    2   13    7    6   30   18   14   28   24   10    7 
1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1238 1239 1240 1241 
  19    2   34    3   14    9   12   11   10   24    2   24    3   15   20    6 
1242 1243 1244 1245 1246 1247 1248 1249 1251 1252 1253 1254 1255 1256 1257 1258 
  15   14    7    9    2    8   10    9   16    5   16   22   15   30   25   19 
1259 1260 1261 1262 1263 1264 1265 1266 1267 1269 1270 1271 1273 1274 1275 1276 
   3   17   16   29    6   12   17    1   25   35   12    1    8   19    2   10 
1277 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 
  38    5    4   37    3   16    2   15   16   10   25    4   20   11   11    1 
1295 1296 1297 1298 1299 1300 1302 1303 1304 1306 1307 1308 1309 1310 1311 1313 
  17   56    5   16    5    9    4   14   22    3   29   22    7   11   12    8 
1314 1315 1316 1317 1318 1319 1320 1321 1323 1324 1325 1326 1327 1328 1329 1330 
  77   24    3   31   10   12   27    8   29   16    2   12    8   16   22    8 
1332 1333 1334 1336 1337 1338 1339 1340 1342 1343 1344 1345 1346 1347 1348 1349 
   9    1    6   28    7   40    1    6   14    7   31    9    2   56   30   24 
1350 1351 1352 1353 1354 1355 1356 1358 1359 1360 1361 1362 1363 1364 1365 1368 
   7    5   15   10    1    5   18    7   19   26   12   48   38   26   13   10 
1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1381 1382 1383 1385 1386 
  13    9   35    4    4    9   15    9   13   14   11   13    1   16   20   13 
1387 1388 1389 1391 1392 1395 1396 1397 1398 1400 1401 1402 1403 1404 1405 1406 
   9    5    4   17    9   19   20    1    4   16    4    1    1   21   22   15 
1407 1408 1409 1410 1411 1412 1413 1416 1417 1418 1421 1424 1425 1426 1427 1428 
  17   14    3   14    1    3   10    7    9    3    5    3    2   13    8   21 
1429 1430 1431 1433 1434 1435 1436 1437 1438 1439 1441 1443 1444 1445 1446 1447 
  13   23    4    6    4    2   11    2    3    1    5    1    2    2   12    4 
1448 1450 1451 1452 1453 1454 1455 1456 1458 1459 1460 1462 1463 1465 1466 1467 
   8    4    2    4   11   13    3    2    6    5    2    1   10   17   15   24 
1468 1469 1470 1471 1473 1474 1475 1477 1478 1479 1480 1481 1482 1483 1484 1485 
   3    5    2    7    1   10   14    3    6    2    6    8    1    2    2    2 
1487 1489 1490 1492 1493 1494 1495 1496 1497 1499 1501 1502 1504 1505 1507 1510 
   4   18    1    2    7    1    4    1    9   14   12    1    7    7   23    4 
1513 1514 1515 1516 1517 1518 1523 1524 1525 1526 1528 1531 1532 1535 1536 1538 
  11   13    8    5   14    2    1    3    1   13    4    9   18    1    3    1 
1539 1540 1543 1544 1545 1548 1549 1551 1552 1554 1555 1556 1558 1559 1560 1565 
   3   19    2    3    9    7    6   16    7   15   24   28    3    1   14    3 
1566 1567 1568 1571 1576 1580 1582 1583 1585 1586 1587 1589 1590 1591 1593 1595 
  10    6    3    7    5    5   13    4    9    1    5    3    5   12    9    2 
1597 1599 1600 1602 1603 1604 1605 1606 1607 1608 1609 1610 1612 1615 1617 1618 
   6    3    3    6    3   13    4   11    3    3   11   14   21   13   15    2 
1619 1620 1624 1626 1630 1632 1634 1636 1637 1643 1647 1648 1649 1650 1652 1656 
   2   20    8   11   16    3   18   20   20    1   13    4   10    8    5    8 
1658 1659 1660 1662 1663 1667 1673 1674 1675 1677 1681 1684 1686 1687 1690 1697 
   1   12    1    2    8   18    4    1    1    5    4   10    1    2    8    4 
1702 1703 1705 1706 1707 1710 1715 1717 1718 1720 1721 1722 1723 1725 1727 1728 
  13    6   27    8    1   11    4    6    9    8    1    2   10    2    2    5 
1729 1731 1732 1734 1735 1737 1743 1744 1745 1746 1747 1748 1751 1754 1757 1758 
   4    2    1    3    1    4    4    6   13    1    2    3    7    8    3   10 
1759 1761 1762 1763 1764 1765 1766 1767 1768 1769 1771 1772 1773 1778 1779 1781 
   9    5    1   11    8   21    2    3    6    3    9    2    6   24    1   11 
1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1799 1800 1801 1803 1804 
   3    5    6    7    4    4    1    3   16    1   10    4    8    7    2    3 
1805 1809 1811 1812 1817 1819 1820 1824 1826 1827 1829 1830 1831 1834 1836 1838 
   5    2   16    3    1    8   11    4    1    1   10   18    1   15    9    3 
1839 1840 1841 1842 1843 1845 1846 1847 1848 1849 1850 1851 1855 1856 1857 1858 
   9    5    3    6    9    6   13    8   15    1    5   11    6    9    2   11 
1859 1860 1862 1863 1864 1865 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 
   2    8   28   17    2    5    3    6    5    5    4    3    3    3    4    5 
1878 1881 1887 1889 1890 1892 1894 1896 1897 1899 1900 1905 1907 1908 1911 1912 
  23   10    4    1   18    1    4    7   22    3    4   19    1   10    7    6 
1913 1914 1915 1916 1917 1919 1922 1923 1924 1926 1929 1930 1932 1934 1935 1936 
  12    7   14    1    2   10   13    7    7    2    2    9    2   10   66    9 
1940 1941 1945 1953 1955 1958 1959 1960 1962 1963 1965 1972 1973 1976 1979 1980 
   6   12   11    4   18    4    5    5    7   12    6   11   18    7    5    6 
1982 1984 1986 1988 1989 1994 1995 2002 2003 2006 2007 2017 2025 2026 2029 2031 
  13    7    6    6    2   11    2    3   37   11    8   17    2    6   20   10 
2032 2033 2034 2035 2037 2039 2040 2042 2044 2045 2046 2047 2048 2049 2051 2054 
   3   11    5    6    4   14   26    1    2    4    3   17   12    1    7    1 
2055 2057 2063 2065 2066 2067 2068 2070 2071 2074 2075 2076 2077 2078 2080 2083 
   2   14    5   14    4    6   38   11    9    4    4   13    1   20    4   23 
2084 2085 2088 2089 2091 2092 2093 2094 2098 2100 2102 2103 2106 2108 2109 2110 
  11   34    2   13    6   17    8   10    4    7    6   19    3    1    7   18 
2111 2112 2115 2116 2118 2120 2121 2122 2123 2125 2126 2128 2131 2132 2133 2135 
   1    9   11   44    8   14   28   27    3   17   11   21   12    3   11    5 
2137 2139 2140 2141 2145 2146 2147 2148 2149 2150 2152 2153 2155 2157 2158 2160 
  19   14    3   19   17    1    1   21    3    3    8    8    7   36    4   15 
2162 2163 2164 2165 2167 2171 2172 2175 2176 2179 2180 2181 2186 2189 2191 2193 
  17    4    8    4   14    9   12   18    6    7    2    3   14    3    6   10 
2194 2195 2197 2198 2199 2203 2205 2206 2209 2212 2213 2215 2217 2220 2222 2224 
  48   18    6    1    3    5    6    8   28   11    8    5   17    6    2    3 
2227 2228 2229 2230 2234 2238 2242 2247 2249 2251 2252 2253 2254 2256 2258 2262 
   7    4   12   11    3    6    3    2   21    3    1   13    2   28    5    2 
2264 2266 2267 2271 2272 2273 2275 2276 2277 2279 2283 2287 2292 2298 2299 2301 
   2    9    1    3    3    6    2    4   16    2    6   16    2    1   50    2 
2302 2303 2305 2306 2307 2310 2315 2317 2318 2324 2328 2330 2332 2335 2337 2338 
   8    5    6    6    4    3    3    1    2    6    1    3    3    3   11    2 
2346 2349 2350 2351 2352 2354 2355 2362 2363 2366 2369 2370 2373 2374 2379 2381 
  10    4    5    4    7   11   10    2   10   22    3    1    2    7    2    2 
2385 2389 2391 2393 2395 2396 2406 2416 2417 2421 2422 2423 2424 2426 2427 2429 
   1   20    6    8   11    7    3    1   11    2    1   11    6    5    2    1 
2430 2434 2436 2438 2440 2444 2446 2447 2455 2456 2458 2460 2462 2463 2468 2473 
   3   10    3   16    5    8    3    2    6    5   14    4   28    2    4    8 
2474 2483 2491 2495 2496 2501 2510 2515 2521 2526 2531 2534 2538 2542 2547 2548 
  13   14    8    3    4   20    7    5   15    3    1    1    9   10    2    4 
2552 2561 2562 2567 2569 2576 2585 2587 2593 2597 2598 2601 2606 2609 2611 2612 
   8    2    1   15    7    7    1    9    1    9   20    7    5   11   12    6 
2617 2632 2647 2651 2653 2654 2655 2659 2665 2672 2674 2675 2678 2682 2688 2689 
  26    5    2    3    4    1    7   11    8   18    2    7    2   15   15    3 
2694 2700 2703 2710 2717 2722 2727 2730 2733 2734 2739 2744 2750 2751 2754 2755 
  16   20    4   13    6   25    3    3    2   20    2    8   14    4    6    9 
2762 2764 2765 2774 2782 2783 2788 2789 2790 2791 2792 2793 2800 2801 2803 2804 
   4    1   14   10   15    5    8   14    9   13   13    2    4    8   10    3 
2805 2811 2822 2827 2829 2830 2833 2834 2840 2841 2842 2845 2847 2852 2856 2859 
  11    4    9   10   10    4    3    1    4   10    3    1    4   18   10    5 
2864 2866 2870 2872 2877 2881 2886 2898 2900 2914 2918 2925 2926 2927 2930 2932 
   5    2   13    2    6   19    6    7    7    1    6   12    3   13    2    5 
2936 2942 2943 2950 2953 2955 2959 2967 2968 2974 2975 2976 2982 2986 2987 2994 
   5   24    1    4    2    8    9    5    5    5    6    2    3   17    4   10 
2996 3002 3007 3008 3012 3017 3026 3031 3037 3040 3054 3058 3067 3068 3069 3071 
   2   18   15    8    2    3    7    4    2    2    4    3    4    5    5    3 
3074 3076 3086 3095 3105 3111 3115 3135 3138 3139 3142 3144 3146 3155 3156 3168 
   3    2    2    5    3    9    1    3    7    2    8    6    4    2    2    6 
3175 3176 3178 3182 3185 3187 3189 3194 3196 3203 3212 3221 3229 3241 3244 3245 
   3    2    1    8    2    3    9    4    3    3    1    4   10    4   12    3 
3252 3255 3264 3265 3267 3278 3285 3291 3307 3310 3313 3317 3322 3328 3331 3333 
   3   10   10   13    4    3    6    2    4    7    2    6    1    2    1    2 
3338 3343 3357 3405 3424 3441 3452 3473 3486 
   6    2    1    1    6    4    2    1    1 
6 least connected regions:
5233 5270 5273 14652 22145 22146 with 5 links
1 most connected region:
6060 with 3486 links

Converting Output neighbour list into Spatial Weights

nb_lw <- nb2listw(nb, style = 'W', zero.policy = TRUE)
summary(nb_lw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 25502 
Number of nonzero links: 26961048 
Percentage nonzero weights: 4.145609 
Average number of links: 1057.213 
Link number distribution:

   5   28   33   40   85   88   91   93  111  112  135  150  153  155  156  162 
   6   29   25    9    3   84   11    5    1    3    3  129   19    3    1    1 
 164  167  168  172  179  181  184  185  193  195  197  198  199  200  201  203 
   1    2    5    3    5    5    1    1   10    3   10    6   20    2   48    5 
 204  205  206  208  210  211  212  214  215  217  218  219  220  221  222  223 
   5   91   15   21    1    3    1   13   13    8    5    2   29    8    3   14 
 224  226  227  228  230  232  234  235  236  238  239  240  241  242  243  244 
   6    5   41   15    2   18   19    1   17    6    6    6   17   18    8    4 
 245  246  247  248  249  250  251  252  254  256  257  258  259  260  262  263 
  39   10   26   35    7    2    9   14   26    7   25    3    2    2   28   24 
 264  265  266  267  268  269  270  271  272  273  274  275  276  277  278  279 
  36   13   16   10   17   22    2   12   21   17    9    5   22   14    9   24 
 280  281  282  283  284  285  286  287  289  290  291  292  293  294  295  296 
  17    6   36   13   50    3   31   21   16   10    9   33   11   11    8   12 
 297  298  299  300  301  302  306  307  308  310  311  312  313  314  315  316 
  40   57   29    6   26   11   14    5   18    4    9    9    8    7   32   10 
 317  318  319  320  321  322  323  324  325  326  327  328  329  330  331  332 
  46   14    6   15   13    9   15   19   13    7   12   50    5    6   32    7 
 333  334  335  336  337  338  339  340  341  342  343  344  345  346  347  348 
   8   43    7   15    7   14    6   12   11   20   15    6    5   25   13    9 
 349  350  351  352  353  354  355  356  357  358  359  360  361  362  363  364 
  13   14   16    3   19    8    6   16    6   14    7   17    7   27    9    2 
 365  366  367  368  369  370  371  372  373  374  375  376  377  378  379  380 
   2   18   31   13    9   39   23   13    8   23   26   10   16    5    6   16 
 382  383  384  385  386  387  388  389  390  391  392  393  394  395  396  397 
   8    9    2   13   14   11   18   32   32    7   18   12    8   17   33    4 
 398  399  400  401  402  403  404  405  406  407  408  409  410  411  412  413 
  16   12    4   39   19    6   25   11   19   33   19   58   16   27    4    7 
 414  415  416  417  418  419  420  421  422  423  424  425  426  427  428  429 
   4    5   30   19   34   13   10    9   29   16   14   10   13   23    6   30 
 430  431  432  433  434  435  436  437  438  439  440  441  442  443  444  445 
  18   18   20    9   13   15   18   10   31    8    9    6   16   25   17   29 
 446  447  448  449  450  451  452  453  454  455  456  457  458  459  460  461 
  20   17   28   16    7   28    9   43   20   40   27   20   22    7   45   24 
 462  463  464  465  466  467  468  469  470  471  472  473  474  475  476  477 
  27   20   51   13   19    3    8   16   10   16   11   17   52   20   65   14 
 478  479  480  481  482  483  484  485  486  487  488  489  490  491  492  493 
  14   17   20   21   70   31  107   22   24   15   30   14   11   16   30    9 
 494  495  496  497  498  499  500  501  502  503  504  505  506  507  508  509 
  27   12    4   20   24   18   18   23   19   22   10   15   16    3    9   15 
 510  511  512  513  514  515  516  517  518  519  520  521  522  523  524  525 
   3   10   15   29    8    6   26   12    6    7   54   21   15    7   16   16 
 526  527  528  529  530  531  532  533  534  535  536  537  538  539  540  541 
  10    9   23   39   15   29   47    8   14   20   16   22    9   19   35   48 
 542  543  544  545  546  547  548  549  550  551  552  553  554  555  556  557 
  45   27    6   21   14   32   24   15   15    8   80   13    8   40   13    8 
 558  559  560  561  562  563  564  565  566  567  568  569  570  571  572  573 
  32   27   24   18   16    5   25   54   11   27    8   48    6   34   18   10 
 574  575  576  577  578  579  580  581  582  583  584  585  586  587  588  589 
  36   14   31   10   16   40   22   14   48   39   67   20   24   24   25   18 
 590  591  592  593  594  595  596  597  598  599  600  601  602  603  604  605 
  29   22   22   18   23    5   32   13   18   22   27   21    7   38   12   22 
 606  607  608  609  610  611  612  613  614  615  616  617  618  619  620  621 
  22   23   49   35   27   10   13   22   41    7   20   18   12   18   31   80 
 622  623  624  625  626  627  628  629  630  631  632  633  634  635  636  637 
  53   27   67    7   47   18   22    1   22   27   15    8   25   47   47   13 
 638  639  640  641  642  643  644  645  646  647  648  649  650  651  652  653 
  36   18   25   11   14   22   40   25   35   35   66   88    2   73   19   14 
 654  655  656  657  658  659  660  661  662  663  664  665  666  667  668  669 
   6   16   30   22   20   22    5   17   13   43   16   10   17   25   20   33 
 670  671  672  673  674  675  676  677  678  679  680  681  682  683  684  685 
  62   19    9   30   31   13   17    8   19   28   17   10   43   17   13   15 
 686  687  688  689  690  691  692  693  694  695  696  697  698  699  700  701 
   7    8   24   18   19   25   40   59   42   30   13   31   25    7   18   20 
 702  703  704  705  706  707  708  709  710  711  712  713  714  715  716  717 
  56   34   39   32   64   31   19   23   33   19   35   42   12   15   21    7 
 718  719  720  721  722  723  724  725  726  727  728  729  730  731  732  733 
  11   11    5   16   11    9    7   29   32   35    8   30    4   10   18    1 
 734  735  736  737  738  739  740  741  742  743  744  745  746  747  748  749 
   5   17   17    8   22   39   15    5   12   28   44    8   21   43    7   17 
 750  751  752  753  754  755  756  757  758  759  760  761  762  763  764  765 
   5   24   46   42   10   47    6   19   41   10    8    7   14   21   27    7 
 766  767  768  769  770  771  772  773  774  775  776  777  779  780  781  782 
  14    4    6   33    6    4   27   15   23   26   23   22   34    6   10    5 
 783  784  785  786  787  788  789  790  791  792  793  794  795  796  797  798 
   6    7   22   41   15   27   34   15    2    2    9   20   41   20    7   20 
 799  800  801  802  803  804  805  806  807  808  809  810  812  813  814  815 
  10   12    6   43    7   17   12   11    9   13   22    4    8   14   11   16 
 816  817  818  820  821  822  823  824  825  826  827  829  830  831  832  833 
   8   24    5    2    3    8    4    2    3   35   18   14   12    8    5    1 
 834  835  836  837  839  840  841  842  843  844  845  846  847  848  849  850 
  24    8   49    6   20    2   15    9   16   13    3   12   17    4   12    8 
 851  852  854  855  857  858  859  860  861  863  864  865  866  868  869  870 
   5   21    8    7    4    2    8   10   14   22    9    9   15   21    7   13 
 872  873  874  875  876  877  878  879  880  881  882  883  884  885  886  887 
   6   12    1   10    4   24   14   31   51   21    8   51   15    1    3   12 
 888  889  890  891  892  893  894  895  896  898  899  900  901  902  903  904 
  14    4   21   33   14    6   32    4   22    7   20   29    7    7   33   12 
 905  907  908  909  910  911  912  913  914  915  916  917  918  919  920  921 
  13   10    3    3   15   14    8   10    7   28    2    6   14    5   29   29 
 922  923  924  925  926  927  928  929  930  931  932  933  934  935  936  937 
  16   29    2    3   22   15    3    3    7   24    9   19   26   13   12   17 
 938  939  940  941  942  943  944  945  946  947  948  949  950  951  952  953 
  12    6   37   39   10   10   10   13   21   15   21   10   23   27    6   21 
 955  956  958  959  960  961  962  963  964  965  966  967  968  969  970  971 
   5   16   28   59   15    5   23   26    7   18    4    2    8    1   24   23 
 972  973  974  975  976  977  978  979  980  981  982  983  984  985  986  987 
   8   21   21    8   18   13   28   10   14    8    9    9    2   18    2    7 
 988  989  990  991  992  993  994  995  996  997  999 1000 1001 1002 1004 1005 
  10   15   11   12    2   11   79    8    6    6    9    1   36    7   15    6 
1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 
  13    3   15    6   33   16   14   17   16    9   15   17   16    1   10    1 
1022 1023 1024 1025 1026 1027 1028 1030 1031 1032 1033 1034 1035 1036 1037 1038 
  19   54   17    7   13    4   23   16   25   18   15   26    1   18   19   19 
1039 1040 1041 1042 1043 1044 1045 1048 1051 1052 1053 1054 1055 1056 1057 1058 
  23    7    4    4   15    9   14   10   14   19   14    5   34   11    5   15 
1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 
  18   10   12    8    9    6    7   24   10   19   16    6   10    8   11    3 
1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 
  21    8   20    9   12   17    5   12   27   13   12   15   12   19   18    7 
1091 1092 1094 1095 1096 1097 1098 1099 1100 1101 1102 1104 1105 1106 1107 1108 
  45    6   11   18   38    2   10   10   13    7    8   12   17    6   16    9 
1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1123 1124 1125 1126 
   2    8   26   21   19   14   12   33    2   32    9   26    1   15    8   13 
1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1143 
  27   33   19   10   16   29   16   19   26   19   35   22    2    1   12    9 
1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 
   5    2   11   28   15   12    9   18    6   34    6   13   21   13    7    5 
1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 
   7   14    9   23   17   21    9   19   11   19   24    1   25   37    7   14 
1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 
   7   10   11    8    8   11    1   24    9    7   30   20   11    1    3   15 
1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 
  29   28    2   23   10    7   15   11    8    4   14    5    3   21    3    7 
1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1221 1222 1223 1224 
  14    5    3    7   10    2   13    7    6   30   18   14   28   24   10    7 
1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1238 1239 1240 1241 
  19    2   34    3   14    9   12   11   10   24    2   24    3   15   20    6 
1242 1243 1244 1245 1246 1247 1248 1249 1251 1252 1253 1254 1255 1256 1257 1258 
  15   14    7    9    2    8   10    9   16    5   16   22   15   30   25   19 
1259 1260 1261 1262 1263 1264 1265 1266 1267 1269 1270 1271 1273 1274 1275 1276 
   3   17   16   29    6   12   17    1   25   35   12    1    8   19    2   10 
1277 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 
  38    5    4   37    3   16    2   15   16   10   25    4   20   11   11    1 
1295 1296 1297 1298 1299 1300 1302 1303 1304 1306 1307 1308 1309 1310 1311 1313 
  17   56    5   16    5    9    4   14   22    3   29   22    7   11   12    8 
1314 1315 1316 1317 1318 1319 1320 1321 1323 1324 1325 1326 1327 1328 1329 1330 
  77   24    3   31   10   12   27    8   29   16    2   12    8   16   22    8 
1332 1333 1334 1336 1337 1338 1339 1340 1342 1343 1344 1345 1346 1347 1348 1349 
   9    1    6   28    7   40    1    6   14    7   31    9    2   56   30   24 
1350 1351 1352 1353 1354 1355 1356 1358 1359 1360 1361 1362 1363 1364 1365 1368 
   7    5   15   10    1    5   18    7   19   26   12   48   38   26   13   10 
1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1381 1382 1383 1385 1386 
  13    9   35    4    4    9   15    9   13   14   11   13    1   16   20   13 
1387 1388 1389 1391 1392 1395 1396 1397 1398 1400 1401 1402 1403 1404 1405 1406 
   9    5    4   17    9   19   20    1    4   16    4    1    1   21   22   15 
1407 1408 1409 1410 1411 1412 1413 1416 1417 1418 1421 1424 1425 1426 1427 1428 
  17   14    3   14    1    3   10    7    9    3    5    3    2   13    8   21 
1429 1430 1431 1433 1434 1435 1436 1437 1438 1439 1441 1443 1444 1445 1446 1447 
  13   23    4    6    4    2   11    2    3    1    5    1    2    2   12    4 
1448 1450 1451 1452 1453 1454 1455 1456 1458 1459 1460 1462 1463 1465 1466 1467 
   8    4    2    4   11   13    3    2    6    5    2    1   10   17   15   24 
1468 1469 1470 1471 1473 1474 1475 1477 1478 1479 1480 1481 1482 1483 1484 1485 
   3    5    2    7    1   10   14    3    6    2    6    8    1    2    2    2 
1487 1489 1490 1492 1493 1494 1495 1496 1497 1499 1501 1502 1504 1505 1507 1510 
   4   18    1    2    7    1    4    1    9   14   12    1    7    7   23    4 
1513 1514 1515 1516 1517 1518 1523 1524 1525 1526 1528 1531 1532 1535 1536 1538 
  11   13    8    5   14    2    1    3    1   13    4    9   18    1    3    1 
1539 1540 1543 1544 1545 1548 1549 1551 1552 1554 1555 1556 1558 1559 1560 1565 
   3   19    2    3    9    7    6   16    7   15   24   28    3    1   14    3 
1566 1567 1568 1571 1576 1580 1582 1583 1585 1586 1587 1589 1590 1591 1593 1595 
  10    6    3    7    5    5   13    4    9    1    5    3    5   12    9    2 
1597 1599 1600 1602 1603 1604 1605 1606 1607 1608 1609 1610 1612 1615 1617 1618 
   6    3    3    6    3   13    4   11    3    3   11   14   21   13   15    2 
1619 1620 1624 1626 1630 1632 1634 1636 1637 1643 1647 1648 1649 1650 1652 1656 
   2   20    8   11   16    3   18   20   20    1   13    4   10    8    5    8 
1658 1659 1660 1662 1663 1667 1673 1674 1675 1677 1681 1684 1686 1687 1690 1697 
   1   12    1    2    8   18    4    1    1    5    4   10    1    2    8    4 
1702 1703 1705 1706 1707 1710 1715 1717 1718 1720 1721 1722 1723 1725 1727 1728 
  13    6   27    8    1   11    4    6    9    8    1    2   10    2    2    5 
1729 1731 1732 1734 1735 1737 1743 1744 1745 1746 1747 1748 1751 1754 1757 1758 
   4    2    1    3    1    4    4    6   13    1    2    3    7    8    3   10 
1759 1761 1762 1763 1764 1765 1766 1767 1768 1769 1771 1772 1773 1778 1779 1781 
   9    5    1   11    8   21    2    3    6    3    9    2    6   24    1   11 
1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1799 1800 1801 1803 1804 
   3    5    6    7    4    4    1    3   16    1   10    4    8    7    2    3 
1805 1809 1811 1812 1817 1819 1820 1824 1826 1827 1829 1830 1831 1834 1836 1838 
   5    2   16    3    1    8   11    4    1    1   10   18    1   15    9    3 
1839 1840 1841 1842 1843 1845 1846 1847 1848 1849 1850 1851 1855 1856 1857 1858 
   9    5    3    6    9    6   13    8   15    1    5   11    6    9    2   11 
1859 1860 1862 1863 1864 1865 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 
   2    8   28   17    2    5    3    6    5    5    4    3    3    3    4    5 
1878 1881 1887 1889 1890 1892 1894 1896 1897 1899 1900 1905 1907 1908 1911 1912 
  23   10    4    1   18    1    4    7   22    3    4   19    1   10    7    6 
1913 1914 1915 1916 1917 1919 1922 1923 1924 1926 1929 1930 1932 1934 1935 1936 
  12    7   14    1    2   10   13    7    7    2    2    9    2   10   66    9 
1940 1941 1945 1953 1955 1958 1959 1960 1962 1963 1965 1972 1973 1976 1979 1980 
   6   12   11    4   18    4    5    5    7   12    6   11   18    7    5    6 
1982 1984 1986 1988 1989 1994 1995 2002 2003 2006 2007 2017 2025 2026 2029 2031 
  13    7    6    6    2   11    2    3   37   11    8   17    2    6   20   10 
2032 2033 2034 2035 2037 2039 2040 2042 2044 2045 2046 2047 2048 2049 2051 2054 
   3   11    5    6    4   14   26    1    2    4    3   17   12    1    7    1 
2055 2057 2063 2065 2066 2067 2068 2070 2071 2074 2075 2076 2077 2078 2080 2083 
   2   14    5   14    4    6   38   11    9    4    4   13    1   20    4   23 
2084 2085 2088 2089 2091 2092 2093 2094 2098 2100 2102 2103 2106 2108 2109 2110 
  11   34    2   13    6   17    8   10    4    7    6   19    3    1    7   18 
2111 2112 2115 2116 2118 2120 2121 2122 2123 2125 2126 2128 2131 2132 2133 2135 
   1    9   11   44    8   14   28   27    3   17   11   21   12    3   11    5 
2137 2139 2140 2141 2145 2146 2147 2148 2149 2150 2152 2153 2155 2157 2158 2160 
  19   14    3   19   17    1    1   21    3    3    8    8    7   36    4   15 
2162 2163 2164 2165 2167 2171 2172 2175 2176 2179 2180 2181 2186 2189 2191 2193 
  17    4    8    4   14    9   12   18    6    7    2    3   14    3    6   10 
2194 2195 2197 2198 2199 2203 2205 2206 2209 2212 2213 2215 2217 2220 2222 2224 
  48   18    6    1    3    5    6    8   28   11    8    5   17    6    2    3 
2227 2228 2229 2230 2234 2238 2242 2247 2249 2251 2252 2253 2254 2256 2258 2262 
   7    4   12   11    3    6    3    2   21    3    1   13    2   28    5    2 
2264 2266 2267 2271 2272 2273 2275 2276 2277 2279 2283 2287 2292 2298 2299 2301 
   2    9    1    3    3    6    2    4   16    2    6   16    2    1   50    2 
2302 2303 2305 2306 2307 2310 2315 2317 2318 2324 2328 2330 2332 2335 2337 2338 
   8    5    6    6    4    3    3    1    2    6    1    3    3    3   11    2 
2346 2349 2350 2351 2352 2354 2355 2362 2363 2366 2369 2370 2373 2374 2379 2381 
  10    4    5    4    7   11   10    2   10   22    3    1    2    7    2    2 
2385 2389 2391 2393 2395 2396 2406 2416 2417 2421 2422 2423 2424 2426 2427 2429 
   1   20    6    8   11    7    3    1   11    2    1   11    6    5    2    1 
2430 2434 2436 2438 2440 2444 2446 2447 2455 2456 2458 2460 2462 2463 2468 2473 
   3   10    3   16    5    8    3    2    6    5   14    4   28    2    4    8 
2474 2483 2491 2495 2496 2501 2510 2515 2521 2526 2531 2534 2538 2542 2547 2548 
  13   14    8    3    4   20    7    5   15    3    1    1    9   10    2    4 
2552 2561 2562 2567 2569 2576 2585 2587 2593 2597 2598 2601 2606 2609 2611 2612 
   8    2    1   15    7    7    1    9    1    9   20    7    5   11   12    6 
2617 2632 2647 2651 2653 2654 2655 2659 2665 2672 2674 2675 2678 2682 2688 2689 
  26    5    2    3    4    1    7   11    8   18    2    7    2   15   15    3 
2694 2700 2703 2710 2717 2722 2727 2730 2733 2734 2739 2744 2750 2751 2754 2755 
  16   20    4   13    6   25    3    3    2   20    2    8   14    4    6    9 
2762 2764 2765 2774 2782 2783 2788 2789 2790 2791 2792 2793 2800 2801 2803 2804 
   4    1   14   10   15    5    8   14    9   13   13    2    4    8   10    3 
2805 2811 2822 2827 2829 2830 2833 2834 2840 2841 2842 2845 2847 2852 2856 2859 
  11    4    9   10   10    4    3    1    4   10    3    1    4   18   10    5 
2864 2866 2870 2872 2877 2881 2886 2898 2900 2914 2918 2925 2926 2927 2930 2932 
   5    2   13    2    6   19    6    7    7    1    6   12    3   13    2    5 
2936 2942 2943 2950 2953 2955 2959 2967 2968 2974 2975 2976 2982 2986 2987 2994 
   5   24    1    4    2    8    9    5    5    5    6    2    3   17    4   10 
2996 3002 3007 3008 3012 3017 3026 3031 3037 3040 3054 3058 3067 3068 3069 3071 
   2   18   15    8    2    3    7    4    2    2    4    3    4    5    5    3 
3074 3076 3086 3095 3105 3111 3115 3135 3138 3139 3142 3144 3146 3155 3156 3168 
   3    2    2    5    3    9    1    3    7    2    8    6    4    2    2    6 
3175 3176 3178 3182 3185 3187 3189 3194 3196 3203 3212 3221 3229 3241 3244 3245 
   3    2    1    8    2    3    9    4    3    3    1    4   10    4   12    3 
3252 3255 3264 3265 3267 3278 3285 3291 3307 3310 3313 3317 3322 3328 3331 3333 
   3   10   10   13    4    3    6    2    4    7    2    6    1    2    1    2 
3338 3343 3357 3405 3424 3441 3452 3473 3486 
   6    2    1    1    6    4    2    1    1 
6 least connected regions:
5233 5270 5273 14652 22145 22146 with 5 links
1 most connected region:
6060 with 3486 links

Weights style: W 
Weights constants summary:
      n        nn    S0       S1       S2
W 25502 650352004 25502 79.06603 102796.2

Perform Moran’s I test for residual spatial autocorrelation

lm.morantest(resale_mlr, nb_lw)

    Global Moran I for regression residuals

data:  
model: lm(formula = PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD +
PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY + PROX_HAWKER +
PROX_MRT + PROX_PARK + PROX_SHOPPING + PROX_SUPERMARKET +
WITHIN_1KM_PRIMARY + WITHIN_350M_BUS + WITHIN_350M_CHILDCARE +
WITHIN_350M_KINDERGARTEN + LEASE_MTHS, data = resale_main_nogeo)
weights: nb_lw

Moran I statistic standard deviate = 895.35, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran I      Expectation         Variance 
    2.778070e-01    -2.755883e-04     9.646277e-08 
Note

Since the P value is less than the Alpha value of 0.05, we reject the null hypothesis that residuals are randomly distributed.

Also, since the Observed Moran I is greater than 0, we can infer that the residuals are more of a cluster distribution.

6 Predictive Models

6.1 Data Prep

mdata <- read_rds("data/model/resale_main.rds")

Viewing the data

glimpse(mdata)
Rows: 25,502
Columns: 28
$ month                    <chr> "2021-01", "2021-01", "2021-01", "2021-01", "…
$ town                     <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "AN…
$ address                  <chr> "547 ANG MO KIO AVE 10", "414 ANG MO KIO AVE …
$ block                    <chr> "547", "414", "509", "467", "571", "134", "20…
$ street_name              <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 10", "AN…
$ flat_type                <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", "4 RO…
$ STOREY                   <chr> "04 TO 06", "01 TO 03", "01 TO 03", "07 TO 09…
$ AREA_SQM                 <dbl> 92, 92, 91, 92, 92, 98, 92, 92, 92, 92, 92, 1…
$ flat_model               <chr> "New Generation", "New Generation", "New Gene…
$ lease_commence_date      <dbl> 1981, 1979, 1980, 1979, 1979, 1978, 1977, 197…
$ LEASE_MTHS               <dbl> 708, 693, 702, 695, 689, 681, 661, 682, 692, …
$ PRICE                    <dbl> 370000, 375000, 380000, 385000, 410000, 41000…
$ postal                   <chr> "560547", "560414", "560509", "560467", "5605…
$ geometry                 <POINT [m]> POINT (30770.07 39578.64), POINT (30292…
$ PROX_CBD                 <dbl> 9.564575, 8.401690, 9.516492, 8.580908, 9.084…
$ PROX_BUS                 <dbl> 0.16157609, 0.16740841, 0.07424143, 0.0887911…
$ PROX_CHILDCARE           <dbl> 2.493662e-01, 6.715056e-02, 1.385583e-01, 1.4…
$ PROX_ELDERCARE           <dbl> 1.08567795, 0.15039052, 0.72242472, 0.0981628…
$ PROX_HAWKER              <dbl> 0.4442515, 0.2050009, 0.4495734, 0.3190679, 0…
$ PROX_GOODPRIMARY         <dbl> 3.182527, 2.354024, 2.414729, 2.699653, 2.648…
$ PROX_PARK                <dbl> 0.8291527, 0.7847864, 0.3796058, 0.9242129, 0…
$ PROX_SUPERMARKET         <dbl> 0.4184204, 0.1946009, 0.4435109, 0.4269715, 0…
$ PROX_SHOPPING            <dbl> 1.1817959, 0.8444986, 0.2966736, 0.9304149, 0…
$ PROX_MRT                 <dbl> 1.0731215, 0.8245176, 0.4544926, 0.9503956, 0…
$ WITHIN_350M_KINDERGARTEN <dbl> 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, …
$ WITHIN_350M_CHILDCARE    <dbl> 2, 3, 3, 3, 3, 2, 6, 3, 3, 3, 3, 3, 5, 2, 3, …
$ WITHIN_350M_BUS          <dbl> 4, 7, 10, 4, 8, 2, 8, 7, 6, 7, 7, 7, 8, 8, 11…
$ WITHIN_1KM_PRIMARY       <dbl> 1, 3, 2, 3, 2, 2, 3, 2, 3, 3, 1, 2, 3, 2, 2, …

6.1.1 Filtering Test and Training Data

test_data <- filter(mdata) %>%
  filter(month >= "2023-01" & month <= "2023-02")

train_data <- filter(mdata) %>%
  filter(month >= "2021-01" & month <= "2022-12")

Checking if Time Period is right

unique(test_data$month)
[1] "2023-01" "2023-02"
unique(train_data$month)
 [1] "2021-01" "2021-02" "2021-03" "2021-04" "2021-05" "2021-06" "2021-07"
 [8] "2021-08" "2021-09" "2021-10" "2021-11" "2021-12" "2022-01" "2022-02"
[15] "2022-03" "2022-04" "2022-05" "2022-06" "2022-07" "2022-08" "2022-09"
[22] "2022-10" "2022-11" "2022-12"

6.1.2 Removing columns that are not required

test_data <- test_data %>%
  dplyr::select(c(7, 8, 11, 12, 14:28))

train_data <- train_data %>%
  dplyr::select(c(7, 8, 11, 12, 14:28))

Checking Data

glimpse(test_data)
Rows: 1,846
Columns: 19
$ STOREY                   <chr> "07 TO 09", "07 TO 09", "07 TO 09", "10 TO 12…
$ AREA_SQM                 <dbl> 88, 91, 92, 92, 92, 92, 92, 92, 98, 92, 92, 9…
$ LEASE_MTHS               <dbl> 1065, 698, 671, 668, 660, 660, 659, 657, 651,…
$ PRICE                    <dbl> 735000, 480888, 470000, 486500, 505000, 50500…
$ geometry                 <POINT [m]> POINT (29258.92 38653.1), POINT (30036.…
$ PROX_CBD                 <dbl> 8.648990, 8.319957, 8.580908, 8.461580, 8.740…
$ PROX_BUS                 <dbl> 0.06370299, 0.11817809, 0.08879118, 0.1286942…
$ PROX_CHILDCARE           <dbl> 1.424212e-01, 1.526755e-01, 1.448160e-01, 8.1…
$ PROX_ELDERCARE           <dbl> 0.21761827, 0.25976843, 0.09816285, 0.0854535…
$ PROX_HAWKER              <dbl> 0.4465136, 0.3874107, 0.3190679, 0.2666457, 0…
$ PROX_GOODPRIMARY         <dbl> 1.4167972, 2.0895381, 2.6996535, 2.3359500, 2…
$ PROX_PARK                <dbl> 0.5681842, 0.5179204, 0.9242129, 0.7734256, 0…
$ PROX_SUPERMARKET         <dbl> 0.34392765, 0.28490909, 0.42697146, 0.1283025…
$ PROX_SHOPPING            <dbl> 0.5677569, 0.7529773, 0.9304149, 0.7819466, 0…
$ PROX_MRT                 <dbl> 0.7106577, 0.7795685, 0.9503956, 0.7586304, 0…
$ WITHIN_350M_KINDERGARTEN <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 0, …
$ WITHIN_350M_CHILDCARE    <dbl> 5, 3, 3, 3, 3, 4, 3, 4, 7, 3, 8, 7, 3, 3, 4, …
$ WITHIN_350M_BUS          <dbl> 4, 7, 4, 7, 6, 6, 6, 6, 6, 6, 6, 6, 10, 7, 9,…
$ WITHIN_1KM_PRIMARY       <dbl> 2, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 2, 2, …
glimpse(train_data)
Rows: 23,656
Columns: 19
$ STOREY                   <chr> "04 TO 06", "01 TO 03", "01 TO 03", "07 TO 09…
$ AREA_SQM                 <dbl> 92, 92, 91, 92, 92, 98, 92, 92, 92, 92, 92, 1…
$ LEASE_MTHS               <dbl> 708, 693, 702, 695, 689, 681, 661, 682, 692, …
$ PRICE                    <dbl> 370000, 375000, 380000, 385000, 410000, 41000…
$ geometry                 <POINT [m]> POINT (30770.07 39578.64), POINT (30292…
$ PROX_CBD                 <dbl> 9.564575, 8.401690, 9.516492, 8.580908, 9.084…
$ PROX_BUS                 <dbl> 0.16157609, 0.16740841, 0.07424143, 0.0887911…
$ PROX_CHILDCARE           <dbl> 2.493662e-01, 6.715056e-02, 1.385583e-01, 1.4…
$ PROX_ELDERCARE           <dbl> 1.08567795, 0.15039052, 0.72242472, 0.0981628…
$ PROX_HAWKER              <dbl> 0.4442515, 0.2050009, 0.4495734, 0.3190679, 0…
$ PROX_GOODPRIMARY         <dbl> 3.182527, 2.354024, 2.414729, 2.699653, 2.648…
$ PROX_PARK                <dbl> 0.8291527, 0.7847864, 0.3796058, 0.9242129, 0…
$ PROX_SUPERMARKET         <dbl> 0.4184204, 0.1946009, 0.4435109, 0.4269715, 0…
$ PROX_SHOPPING            <dbl> 1.1817959, 0.8444986, 0.2966736, 0.9304149, 0…
$ PROX_MRT                 <dbl> 1.0731215, 0.8245176, 0.4544926, 0.9503956, 0…
$ WITHIN_350M_KINDERGARTEN <dbl> 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, …
$ WITHIN_350M_CHILDCARE    <dbl> 2, 3, 3, 3, 3, 2, 6, 3, 3, 3, 3, 3, 5, 2, 3, …
$ WITHIN_350M_BUS          <dbl> 4, 7, 10, 4, 8, 2, 8, 7, 6, 7, 7, 7, 8, 8, 11…
$ WITHIN_1KM_PRIMARY       <dbl> 1, 3, 2, 3, 2, 2, 3, 2, 3, 3, 1, 2, 3, 2, 2, …

6.1.3 Saving of dataset

write_rds(test_data, "data/model/test_data.rds")
write_rds(train_data, "data/model/train_data.rds")

6.2 Computing Correlation Matrix

mdata_nogeo <- mdata %>%
  dplyr::select(c(7, 8, 11, 12, 14:28)) %>%
  st_drop_geometry() %>%
  mutate(STOREY = as.character(STOREY))
Note

The correlation matrix above shows that all the correlation values are within the margin of error. Hence, there is no sign of multicolinearity and all variables can be included in the regression

6.3 Retrieving Stored Data

test_data <- read_rds("data/model/test_data.rds")
train_data <- read_rds("data/model/train_data.rds")

6.3.1 Building a non-spatial multiple linear regression

price_mlr <- lm(formula = PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD +
                   PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY +
                   PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_SHOPPING +
                   PROX_SUPERMARKET + WITHIN_1KM_PRIMARY + WITHIN_350M_BUS +
                   WITHIN_350M_CHILDCARE + WITHIN_350M_KINDERGARTEN + LEASE_MTHS,
                 data = train_data)
summary(price_mlr)

Call:
lm(formula = PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD + 
    PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY + PROX_HAWKER + 
    PROX_MRT + PROX_PARK + PROX_SHOPPING + PROX_SUPERMARKET + 
    WITHIN_1KM_PRIMARY + WITHIN_350M_BUS + WITHIN_350M_CHILDCARE + 
    WITHIN_350M_KINDERGARTEN + LEASE_MTHS, data = train_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-358722  -44740   -1805   43766  367404 

Coefficients:
                           Estimate Std. Error  t value Pr(>|t|)    
(Intercept)              117534.323   8447.235   13.914  < 2e-16 ***
STOREY04 TO 06            18819.414   1448.679   12.991  < 2e-16 ***
STOREY07 TO 09            32913.451   1471.928   22.361  < 2e-16 ***
STOREY10 TO 12            42124.788   1503.354   28.021  < 2e-16 ***
STOREY13 TO 15            46539.487   1770.176   26.291  < 2e-16 ***
STOREY16 TO 18            57792.485   2277.495   25.375  < 2e-16 ***
STOREY19 TO 21            83110.456   3285.789   25.294  < 2e-16 ***
STOREY22 TO 24            90026.901   3817.710   23.581  < 2e-16 ***
STOREY25 TO 27           127578.933   4286.717   29.761  < 2e-16 ***
STOREY28 TO 30           190839.103   5211.319   36.620  < 2e-16 ***
STOREY31 TO 33           202066.715   6183.606   32.678  < 2e-16 ***
STOREY34 TO 36           227966.101   6251.494   36.466  < 2e-16 ***
STOREY37 TO 39           236073.194   7303.119   32.325  < 2e-16 ***
STOREY40 TO 42           248879.827  10132.768   24.562  < 2e-16 ***
STOREY43 TO 45           291774.008  13950.940   20.914  < 2e-16 ***
STOREY46 TO 48           372521.834  24006.879   15.517  < 2e-16 ***
STOREY49 TO 51           489215.496  39105.531   12.510  < 2e-16 ***
AREA_SQM                   3321.643     68.241   48.675  < 2e-16 ***
PROX_BUS                   -151.012   8556.345   -0.018 0.985919    
PROX_CBD                 -16449.767    160.874 -102.252  < 2e-16 ***
PROX_CHILDCARE           -30020.250   5966.182   -5.032 4.90e-07 ***
PROX_ELDERCARE            -7304.721    782.959   -9.330  < 2e-16 ***
PROX_GOODPRIMARY           -989.700    282.399   -3.505 0.000458 ***
PROX_HAWKER              -22093.698    933.964  -23.656  < 2e-16 ***
PROX_MRT                 -23007.446   1299.264  -17.708  < 2e-16 ***
PROX_PARK                  2929.195   1081.974    2.707 0.006789 ** 
PROX_SHOPPING            -25009.965   1500.391  -16.669  < 2e-16 ***
PROX_SUPERMARKET          10272.325   3062.601    3.354 0.000797 ***
WITHIN_1KM_PRIMARY        -9045.547    350.381  -25.816  < 2e-16 ***
WITHIN_350M_BUS            1100.533    173.527    6.342 2.31e-10 ***
WITHIN_350M_CHILDCARE     -3951.416    262.563  -15.049  < 2e-16 ***
WITHIN_350M_KINDERGARTEN   6691.904    478.697   13.979  < 2e-16 ***
LEASE_MTHS                  358.830      3.201  112.100  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 67600 on 23623 degrees of freedom
Multiple R-squared:  0.7279,    Adjusted R-squared:  0.7275 
F-statistic:  1974 on 32 and 23623 DF,  p-value: < 2.2e-16

6.3.2 Saving of file

write_rds(price_mlr, "data/model/price_mlr.rds")

6.4 GWR Predictive method

6.4.1 Converting sf data.frame to SpatialPointDataFrame

train_data_sp <- as_Spatial(train_data)
train_data_sp
class       : SpatialPointsDataFrame 
features    : 23656 
extent      : 11519.79, 42645.18, 28217.39, 48741.06  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 18
names       :   STOREY, AREA_SQM, LEASE_MTHS,   PRICE,          PROX_CBD,           PROX_BUS,       PROX_CHILDCARE,       PROX_ELDERCARE,       PROX_HAWKER,   PROX_GOODPRIMARY,          PROX_PARK,     PROX_SUPERMARKET,      PROX_SHOPPING,           PROX_MRT, WITHIN_350M_KINDERGARTEN, ... 
min values  : 01 TO 03,       70,        534,  250000, 0.999393538715878, 0.0158480216214286, 1.40841148620523e-08, 1.98943787433087e-08, 0.030603180648446, 0.0790226000235527, 0.0441643213745594, 6.15083813444168e-09, 0.0421144997725278, 0.0222985128432497,                        0, ... 
max values  : 49 TO 51,      145,       1168, 1370000,  19.6500691667807,  0.391470769487539,      0.5864949302319,     3.30163731686804,  2.86763031231414,   9.62214922623055,   2.44018373848866,     1.57131703659268,   2.28478087425324,   2.12939260942018,                        7, ... 

6.4.2 Computing Adaptive Bandwidth

bw_adaptive <- bw.gwr(PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD +
                   PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY +
                   PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_SHOPPING +
                   PROX_SUPERMARKET + WITHIN_1KM_PRIMARY + WITHIN_350M_BUS +
                   WITHIN_350M_CHILDCARE + WITHIN_350M_KINDERGARTEN + LEASE_MTHS,
                 data = train_data_sp,
                 approach = "CV",
                 kernel = "gaussian",
                 adaptive = TRUE,
                 longlat = FALSE)
Take a cup of tea and have a break, it will take a few minutes.
          -----A kind suggestion from GWmodel development group
Adaptive bandwidth: 14627 CV score: 9.931984e+13 
Adaptive bandwidth: 9048 CV score: 9.230587e+13 
Adaptive bandwidth: 5598 CV score: 8.283281e+13 
Adaptive bandwidth: 3468 CV score: 7.375926e+13 
Adaptive bandwidth: 2149 CV score: 6.276294e+13 
Adaptive bandwidth: 1336 CV score: 5.426306e+13 
Adaptive bandwidth: 831 CV score: 4.820518e+13 
Adaptive bandwidth: 521 CV score: Inf 
Adaptive bandwidth: 1024 CV score: 5.051152e+13 
Adaptive bandwidth: 713 CV score: 4.663173e+13 
Adaptive bandwidth: 638 CV score: NaN 
Adaptive bandwidth: 757 CV score: 4.71682e+13 
Adaptive bandwidth: 683 CV score: 4.622535e+13 
Adaptive bandwidth: 667 CV score: 4.601451e+13 
Adaptive bandwidth: 654 CV score: NaN 
Adaptive bandwidth: 671 CV score: 4.60621e+13 
Adaptive bandwidth: 660 CV score: 4.591026e+13 
Adaptive bandwidth: 660 CV score: 4.591026e+13 

6.4.3 Saving of file

write_rds(bw_adaptive, "data/model/bw_adaptive.rds")

6.5 Constructing Adaptive bandwidth GWR Model

bw_adaptive <- read_rds("data/model/bw_adaptive.rds")

Now, we can go ahead to calibrate the gwr-based hedonic pricing model by using adaptive bandwidth and Gaussian kernel

gwr_adaptive <- gwr.basic(PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD +
                   PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY +
                   PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_SHOPPING +
                   PROX_SUPERMARKET + WITHIN_1KM_PRIMARY + WITHIN_350M_BUS +
                   WITHIN_350M_CHILDCARE + WITHIN_350M_KINDERGARTEN + LEASE_MTHS,
                   data = train_data_sp,
                   bw = bw_adaptive,
                   kernel = "gaussian",
                   adaptive = TRUE,
                   longlat = FALSE)

6.5.2 Saving for future use

write_rds(gwr_adaptive, "data/model/gwr_adaptive.rds")

6.5.3 Displaying Modal Output

gwr_adaptive <- read_rds("data/model/gwr_adaptive.rds")
gwr_adaptive
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2023-03-22 07:44:21 
   Call:
   gwr.basic(formula = PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD + 
    PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY + PROX_HAWKER + 
    PROX_MRT + PROX_PARK + PROX_SHOPPING + PROX_SUPERMARKET + 
    WITHIN_1KM_PRIMARY + WITHIN_350M_BUS + WITHIN_350M_CHILDCARE + 
    WITHIN_350M_KINDERGARTEN + LEASE_MTHS, data = train_data_sp, 
    bw = bw_adaptive, kernel = "gaussian", adaptive = TRUE, longlat = FALSE)

   Dependent (y) variable:  PRICE
   Independent variables:  STOREY AREA_SQM PROX_BUS PROX_CBD PROX_CHILDCARE PROX_ELDERCARE PROX_GOODPRIMARY PROX_HAWKER PROX_MRT PROX_PARK PROX_SHOPPING PROX_SUPERMARKET WITHIN_1KM_PRIMARY WITHIN_350M_BUS WITHIN_350M_CHILDCARE WITHIN_350M_KINDERGARTEN LEASE_MTHS
   Number of data points: 23656
   ***********************************************************************
   *                    Results of Global Regression                     *
   ***********************************************************************

   Call:
    lm(formula = formula, data = data)

   Residuals:
    Min      1Q  Median      3Q     Max 
-358722  -44740   -1805   43766  367404 

   Coefficients:
                              Estimate Std. Error  t value Pr(>|t|)    
   (Intercept)              117534.323   8447.235   13.914  < 2e-16 ***
   STOREY04 TO 06            18819.414   1448.679   12.991  < 2e-16 ***
   STOREY07 TO 09            32913.451   1471.928   22.361  < 2e-16 ***
   STOREY10 TO 12            42124.788   1503.354   28.021  < 2e-16 ***
   STOREY13 TO 15            46539.487   1770.176   26.291  < 2e-16 ***
   STOREY16 TO 18            57792.485   2277.495   25.375  < 2e-16 ***
   STOREY19 TO 21            83110.456   3285.789   25.294  < 2e-16 ***
   STOREY22 TO 24            90026.901   3817.710   23.581  < 2e-16 ***
   STOREY25 TO 27           127578.933   4286.717   29.761  < 2e-16 ***
   STOREY28 TO 30           190839.103   5211.319   36.620  < 2e-16 ***
   STOREY31 TO 33           202066.715   6183.606   32.678  < 2e-16 ***
   STOREY34 TO 36           227966.101   6251.494   36.466  < 2e-16 ***
   STOREY37 TO 39           236073.194   7303.119   32.325  < 2e-16 ***
   STOREY40 TO 42           248879.827  10132.768   24.562  < 2e-16 ***
   STOREY43 TO 45           291774.008  13950.940   20.914  < 2e-16 ***
   STOREY46 TO 48           372521.834  24006.879   15.517  < 2e-16 ***
   STOREY49 TO 51           489215.496  39105.531   12.510  < 2e-16 ***
   AREA_SQM                   3321.643     68.241   48.675  < 2e-16 ***
   PROX_BUS                   -151.012   8556.345   -0.018 0.985919    
   PROX_CBD                 -16449.767    160.874 -102.252  < 2e-16 ***
   PROX_CHILDCARE           -30020.250   5966.182   -5.032 4.90e-07 ***
   PROX_ELDERCARE            -7304.721    782.959   -9.330  < 2e-16 ***
   PROX_GOODPRIMARY           -989.700    282.399   -3.505 0.000458 ***
   PROX_HAWKER              -22093.698    933.964  -23.656  < 2e-16 ***
   PROX_MRT                 -23007.446   1299.264  -17.708  < 2e-16 ***
   PROX_PARK                  2929.195   1081.974    2.707 0.006789 ** 
   PROX_SHOPPING            -25009.965   1500.391  -16.669  < 2e-16 ***
   PROX_SUPERMARKET          10272.325   3062.601    3.354 0.000797 ***
   WITHIN_1KM_PRIMARY        -9045.547    350.381  -25.816  < 2e-16 ***
   WITHIN_350M_BUS            1100.533    173.527    6.342 2.31e-10 ***
   WITHIN_350M_CHILDCARE     -3951.416    262.563  -15.049  < 2e-16 ***
   WITHIN_350M_KINDERGARTEN   6691.904    478.697   13.979  < 2e-16 ***
   LEASE_MTHS                  358.830      3.201  112.100  < 2e-16 ***

   ---Significance stars
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   Residual standard error: 67600 on 23623 degrees of freedom
   Multiple R-squared: 0.7279
   Adjusted R-squared: 0.7275 
   F-statistic:  1974 on 32 and 23623 DF,  p-value: < 2.2e-16 
   ***Extra Diagnostic information
   Residual sum of squares: 1.079661e+14
   Sigma(hat): 67560.28
   AIC:  593344.9
   AICc:  593345
   BIC:  570305.8
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Adaptive bandwidth: 660 (number of nearest neighbours)
   Regression points: the same locations as observations are used.
   Distance metric: Euclidean distance metric is used.

   ****************Summary of GWR coefficient estimates:******************
                                   Min.     1st Qu.      Median     3rd Qu.
   Intercept                -1.1183e+06 -1.6480e+05  9.7373e+04  1.9643e+05
   STOREY04.TO.06            7.3023e+03  1.3747e+04  1.6285e+04  2.3718e+04
   STOREY07.TO.09            1.7927e+04  2.5179e+04  3.5446e+04  4.3125e+04
   STOREY10.TO.12            2.3989e+04  3.3714e+04  4.4800e+04  5.3537e+04
   STOREY13.TO.15            2.9791e+04  4.3869e+04  5.2539e+04  6.0581e+04
   STOREY16.TO.18           -1.7291e+05  5.0694e+04  6.0136e+04  7.7027e+04
   STOREY19.TO.21            7.9094e+03  5.5462e+04  7.8953e+04  1.0348e+05
   STOREY22.TO.24           -1.1288e+05  3.9263e+04  7.2268e+04  9.8897e+04
   STOREY25.TO.27           -1.2484e+05  3.6756e+04  8.2800e+04  1.1769e+05
   STOREY28.TO.30           -1.0258e+04  9.4638e+04  1.5795e+05  2.8606e+05
   STOREY31.TO.33           -8.0947e+03  1.1493e+05  1.8650e+05  3.0076e+05
   STOREY34.TO.36            2.8995e+04  1.5065e+05  1.9605e+05  3.4434e+05
   STOREY37.TO.39           -5.8788e+05  1.5649e+05  2.2636e+05  3.1997e+05
   STOREY40.TO.42           -3.6570e+05  1.8819e+05  2.5783e+05  4.7685e+05
   STOREY43.TO.45           -1.3811e+05  2.3936e+05  3.6953e+05  5.2688e+05
   STOREY46.TO.48           -1.7869e+05  2.1867e+05  3.6884e+05  4.9424e+05
   STOREY49.TO.51           -6.0484e+04  4.6170e+05  5.3945e+05  7.0890e+05
   AREA_SQM                  1.0048e+03  1.9578e+03  2.6370e+03  4.4500e+03
   PROX_BUS                 -1.1409e+05 -3.2160e+04  6.0175e+02  3.7830e+04
   PROX_CBD                 -1.6789e+05 -2.6338e+04 -1.0992e+04 -1.9277e+03
   PROX_CHILDCARE           -1.1167e+05 -4.6607e+04 -1.8262e+04  3.2867e+03
   PROX_ELDERCARE           -6.1617e+04 -1.1421e+04  8.4106e+02  1.4051e+04
   PROX_GOODPRIMARY         -1.3622e+05 -1.9292e+04 -5.0061e+03  3.4082e+03
   PROX_HAWKER              -1.1497e+05 -3.0383e+04 -1.5337e+04  5.0468e+03
   PROX_MRT                 -1.4965e+05 -6.6886e+04 -3.9145e+04 -2.3683e+04
   PROX_PARK                -8.9552e+04 -2.6112e+04 -1.2344e+04  6.2669e+01
   PROX_SHOPPING            -7.7650e+04 -2.2379e+04  4.0035e+02  1.7550e+04
   PROX_SUPERMARKET         -1.1272e+05 -3.2530e+04  6.1451e+03  2.9493e+04
   WITHIN_1KM_PRIMARY       -2.7753e+04 -5.0426e+03  1.1244e+03  6.0239e+03
   WITHIN_350M_BUS          -4.6978e+03 -5.3193e+02  2.5353e+02  1.6309e+03
   WITHIN_350M_CHILDCARE    -8.8468e+03 -2.3716e+03 -4.9878e+02  1.0532e+03
   WITHIN_350M_KINDERGARTEN -1.2675e+04 -5.9119e+03 -3.2290e+03  1.6067e+03
   LEASE_MTHS               -2.3604e+02  2.7674e+02  3.7027e+02  4.8226e+02
                                  Max.
   Intercept                2062147.72
   STOREY04.TO.06             45193.24
   STOREY07.TO.09             70967.81
   STOREY10.TO.12             73659.14
   STOREY13.TO.15             94513.29
   STOREY16.TO.18            195075.84
   STOREY19.TO.21            295940.48
   STOREY22.TO.24            297305.08
   STOREY25.TO.27            336471.72
   STOREY28.TO.30            827653.60
   STOREY31.TO.33            925714.20
   STOREY34.TO.36            988033.81
   STOREY37.TO.39           1167635.17
   STOREY40.TO.42           1331308.13
   STOREY43.TO.45           1324128.63
   STOREY46.TO.48           1291557.95
   STOREY49.TO.51           1404189.66
   AREA_SQM                   13578.44
   PROX_BUS                  142295.70
   PROX_CBD                   49461.77
   PROX_CHILDCARE             79463.63
   PROX_ELDERCARE             88918.59
   PROX_GOODPRIMARY           91069.10
   PROX_HAWKER                68643.23
   PROX_MRT                   71934.01
   PROX_PARK                 113089.60
   PROX_SHOPPING             120670.57
   PROX_SUPERMARKET          119353.94
   WITHIN_1KM_PRIMARY         14684.02
   WITHIN_350M_BUS             8170.27
   WITHIN_350M_CHILDCARE      11899.36
   WITHIN_350M_KINDERGARTEN   16371.39
   LEASE_MTHS                   658.18
   ************************Diagnostic information*************************
   Number of data points: 23656 
   Effective number of parameters (2trace(S) - trace(S'S)): 617.6324 
   Effective degrees of freedom (n-2trace(S) + trace(S'S)): 23038.37 
   AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 572931.6 
   AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 572432.1 
   BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 553109.6 
   Residual sum of squares: 4.383611e+13 
   R-square value:  0.8895057 
   Adjusted R-square value:  0.8865433 

   ***********************************************************************
   Program stops at: 2023-03-22 08:01:12 

6.6 Preparing Coordinates Data

6.6.1 Extracting Coordinates Data

coords <- st_coordinates(mdata)
coords_train <- st_coordinates(train_data)
coords_test <- st_coordinates(test_data)

6.6.2 Writing output into RDS

coords <- write_rds(coords, "data/model/coords.rds" )
coords_train <- write_rds(coords_train, "data/model/coords_train.rds" )
coords_test <- write_rds(coords_test, "data/model/coords_test.rds" )

6.6.3 Dropping Geometry Field

train_data <- train_data %>%
  st_drop_geometry()

6.7 Calibrating Random Forest Model

Using Ranger

set.seed(1234)

rf <- ranger(PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD +
               PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY +
               PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_SHOPPING +
               PROX_SUPERMARKET + WITHIN_1KM_PRIMARY + WITHIN_350M_BUS +
               WITHIN_350M_CHILDCARE + WITHIN_350M_KINDERGARTEN + LEASE_MTHS,
             data = train_data)
print(rf)
Ranger result

Call:
 ranger(PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD + PROX_CHILDCARE +      PROX_ELDERCARE + PROX_GOODPRIMARY + PROX_HAWKER + PROX_MRT +      PROX_PARK + PROX_SHOPPING + PROX_SUPERMARKET + WITHIN_1KM_PRIMARY +      WITHIN_350M_BUS + WITHIN_350M_CHILDCARE + WITHIN_350M_KINDERGARTEN +      LEASE_MTHS, data = train_data) 

Type:                             Regression 
Number of trees:                  500 
Sample size:                      23656 
Number of independent variables:  17 
Mtry:                             4 
Target node size:                 5 
Variable importance mode:         none 
Splitrule:                        variance 
OOB prediction error (MSE):       1077227530 
R squared (OOB):                  0.9357699 

6.8 Calibrating Geographical Random Forest Model

6.8.1 Calculating Bandwidth

gwRF_bw <- grf.bw(formula = PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD +
                   PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY +
                   PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_SHOPPING +
                   PROX_SUPERMARKET + WITHIN_1KM_PRIMARY + WITHIN_350M_BUS +
                   WITHIN_350M_CHILDCARE + WITHIN_350M_KINDERGARTEN + LEASE_MTHS,
                 data = train_data,
                 kernel = "adaptive",
                 coords = coords_train)
Note

Since this took quite a few hours to run and I did not manage to get a screenshot, I took the value of 1184 as the bw

6.8.2 Calibrating Using Training Data

set.seed(1234)
gwRF_adaptive <- grf(formula = PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD +
                       PROX_CHILDCARE + PROX_ELDERCARE + PROX_GOODPRIMARY +
                       PROX_HAWKER + PROX_MRT + PROX_PARK + PROX_SHOPPING +
                       PROX_SUPERMARKET + WITHIN_1KM_PRIMARY + WITHIN_350M_BUS +
                       WITHIN_350M_CHILDCARE + WITHIN_350M_KINDERGARTEN +
                       LEASE_MTHS,
                     dframe = train_data,
                     ntree = 30,
                     bw = 1184,
                     kernel = "adaptive",
                     coords = coords_train)
Ranger result

Call:
 ranger(PRICE ~ STOREY + AREA_SQM + PROX_BUS + PROX_CBD + PROX_CHILDCARE +      PROX_ELDERCARE + PROX_GOODPRIMARY + PROX_HAWKER + PROX_MRT +      PROX_PARK + PROX_SHOPPING + PROX_SUPERMARKET + WITHIN_1KM_PRIMARY +      WITHIN_350M_BUS + WITHIN_350M_CHILDCARE + WITHIN_350M_KINDERGARTEN +      LEASE_MTHS, data = train_data, num.trees = 30, mtry = 5,      importance = "impurity", num.threads = NULL) 

Type:                             Regression 
Number of trees:                  30 
Sample size:                      23656 
Number of independent variables:  17 
Mtry:                             5 
Target node size:                 5 
Variable importance mode:         impurity 
Splitrule:                        variance 
OOB prediction error (MSE):       1143371358 
R squared (OOB):                  0.9318261 
                  STOREY                 AREA_SQM                 PROX_BUS 
            4.882740e+13             1.802607e+13             2.782766e+12 
                PROX_CBD           PROX_CHILDCARE           PROX_ELDERCARE 
            1.218228e+14             3.609986e+12             1.075990e+13 
        PROX_GOODPRIMARY              PROX_HAWKER                 PROX_MRT 
            4.035575e+13             9.870892e+12             1.046059e+13 
               PROX_PARK            PROX_SHOPPING         PROX_SUPERMARKET 
            8.409842e+12             9.944490e+12             6.336210e+12 
      WITHIN_1KM_PRIMARY          WITHIN_350M_BUS    WITHIN_350M_CHILDCARE 
            8.470948e+12             2.993314e+12             3.766528e+12 
WITHIN_350M_KINDERGARTEN               LEASE_MTHS 
            2.587680e+12             8.243288e+13 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-486667  -17333    5804    9901   37421  358911 
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-150306.7    -333.9   15378.7   24600.3   41276.4  453796.2 
                                  Min          Max         Mean          StD
STOREY                   108287315357 8.995044e+12 1.100987e+12 1.479233e+12
AREA_SQM                  40222929945 8.594432e+12 8.945332e+11 1.085839e+12
PROX_BUS                  26251768000 1.853974e+12 1.433912e+11 1.287721e+11
PROX_CBD                  51534830746 7.961513e+12 6.363806e+11 9.199543e+11
PROX_CHILDCARE            29235883707 3.491135e+12 2.196936e+11 3.175219e+11
PROX_ELDERCARE            27500057293 5.139419e+12 3.864121e+11 4.236617e+11
PROX_GOODPRIMARY          49424745796 4.808371e+12 4.487640e+11 5.211130e+11
PROX_HAWKER               45378946209 6.812408e+12 4.161554e+11 4.322252e+11
PROX_MRT                  40158696021 6.297802e+12 5.165779e+11 7.718710e+11
PROX_PARK                 41060200305 2.087646e+12 3.468526e+11 2.872485e+11
PROX_SHOPPING             30568298989 8.260459e+12 4.106731e+11 5.593409e+11
PROX_SUPERMARKET          27745114270 6.467449e+12 3.064361e+11 4.367737e+11
WITHIN_1KM_PRIMARY         5684248036 5.380428e+12 1.325541e+11 2.202085e+11
WITHIN_350M_BUS           18308261388 1.817433e+12 1.184396e+11 1.206658e+11
WITHIN_350M_CHILDCARE      8787221586 7.132997e+12 1.698429e+11 4.151069e+11
WITHIN_350M_KINDERGARTEN   2638285422 1.168817e+12 6.858959e+10 9.415190e+10
LEASE_MTHS               361999888182 1.990119e+13 3.786828e+12 4.169286e+12

6.8.3 Saving for future use

write_rds(gwRF_adaptive, "data/model/gwRF_adaptive.rds")
gwRF_adaptive <- read_rds("data/model/gwRF_adaptive.rds")

6.9 Predicting Using Test Data

6.9.1 Preparing test data

test_data <- cbind(test_data, coords_test) %>%
  st_drop_geometry()

6.9.2 Predicting with test data

gwRF_pred <- predict.grf(gwRF_adaptive, 
                         test_data, 
                         x.var.name="X",
                         y.var.name="Y", 
                         local.w=1,
                         global.w=0)

6.9.3 Saving for future use

GRF_pred <- write_rds(gwRF_pred, "data/model/GRF_pred.rds")

6.9.4 Converting Predicting Output into data frame

GRF_pred <- read_rds("data/model/GRF_pred.rds")
GRF_pred_df <- as.data.frame(GRF_pred)

6.9.5 Append Predicted Values into test_data

test_data_p <- cbind(test_data, GRF_pred_df)

6.9.6 Saving for future use

write_rds(test_data_p, "data/model/test_data_p.rds")

6.10 Calculate Root Mean Square Error

rmse(test_data_p$PRICE, 
     test_data_p$GRF_pred)
[1] 71684.46

6.11 Visualizing Predicted Values

ggplot(data = test_data_p,
       aes(x = GRF_pred,
           y = PRICE)) +
  geom_point()

Note

A better predictive model should have the scatter point close to the diagonal line. The scatter plot can be also used to detect if any outliers in the model.

7 Conclusion

Based off the analysis done about, we can see that the resale flat for 4-room flat for Jan and Feb were used as the test data shows that most of the models are significant with the P value being < 0.05 except for PROX_BUS which is not significant for the prediction.

Also, we can see from the Moran I Test that the residuals resemble more of a cluster distribution than it being randomly distributed.

As for multicolinearity, we can see that all the values are below 0.8 and are within the acceptable range. Thus showing no signs of multicolinearity.

Lastly, the resale values are concentrated around the 550k range with outliers as expected and the importance of the resale value from highest to lowest

  1. PROX_SHOPPING

  2. PROX_HAWKER

  3. WITHIN_1KM_PRIMARY

  4. PROX_PARK

  5. LEASE_MTHS

  6. PROX_SUPERMARKET

  7. STOREY

  8. PROX_GOODPRIMARY

  9. WITHIN_350M_CHILDCARE

  10. PROX_CHILDCARE

  11. WITHIN_350M_BUS

  12. PROX_MRT

  13. WITHIN_350M_KINDERGARTEN

  14. AREA_SQM

  15. PROX_CBD

  16. PROX_ELDERCARE

  17. PROX_MRT

Its quite surprising that MRT is ranked quite low here compared to bus and transport on the whole is less important. But this might have multiple other factors that we might have not catered. We can also see that the RMSE value between the predicted and observed value is quite far apart. Thus the model can be better trained to fit the dataset. However, due to time constraint and equipment limitations, this might be the best we can do for now.

But taking the error out of the way, this exercise shows how we can use public data which are available to us, to do EDA and even prediction that might come in handy for us next time. For those who are looking for a resale flat, maybe this exercise can help you better see which potential areas might fit your needs and the pricing which goes along with it.

8 Acknowledgement

The exercise is mainly based off Prof Kam’s hands-on and in-class exercise. Special thanks to Megan for her exercise which also closely resembles ours, and a random Githubthat scrapped Singapore’s mall through Wiki. Lastly, some of our classmates who helped each other in sourcing and validating the data together.

:’)