Exploration of Relationship between HDB Resale Prices and Proximity to Expressway


Background: Some home sellers believe that the value of their HDB flats suffer because they are near expressways, which are very noisy. Others say that proximity to expressway is good, due to convenience and unblocked view (at least for higher floors). This project seeks to explore whether proximity of the HDB to the expressway has a relationship with HDB resale prices.

Data Sources:
1. HDB Resale Prices (Jan 2017 to Nov 2022) - https://data.gov.sg/dataset/resale-flat-prices?resource_id=f1765b54-a209-4718-8d38-a39237f502b3
2. National Map Line (for expressway location info) - https://data.gov.sg/dataset/national-map-line
3. OneMap API (for HDB location info) - https://www.onemap.gov.sg/docs/

Author

Fong Bao Xian

Published

December 27, 2022

1 Import Libraries

import numpy as np
import pandas as pd
import requests
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
from shapely.geometry import Point
import re
import random
from statsmodels.stats.weightstats import ztest

%matplotlib inline

2 Import Data

2.1 Import National Map Line Data

road_map = gpd.read_file("data/national-map-line-geojson.geojson")
road_map.head()
Name Description geometry
0 kml_1 <center><table><tr><th colspan='2' align='cent... LINESTRING Z (103.85833 1.35595 0.00000, 103.8...
1 kml_2 <center><table><tr><th colspan='2' align='cent... LINESTRING Z (103.85759 1.35611 0.00000, 103.8...
2 kml_3 <center><table><tr><th colspan='2' align='cent... LINESTRING Z (103.86042 1.36817 0.00000, 103.8...
3 kml_4 <center><table><tr><th colspan='2' align='cent... LINESTRING Z (103.85978 1.37228 0.00000, 103.8...
4 kml_5 <center><table><tr><th colspan='2' align='cent... LINESTRING Z (103.85937 1.36913 0.00000, 103.8...
road_map.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 5135 entries, 0 to 5134
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   Name         5135 non-null   object  
 1   Description  5135 non-null   object  
 2   geometry     5135 non-null   geometry
dtypes: geometry(1), object(2)
memory usage: 120.5+ KB

2.1.1 Extract Expressway Data

The National Line Map data comprises of all the roads in Singapore and we will only extract the expressway data for this project.

# Define a function to extract the road name info
def extract_road_name(x):
    return re.findall(r"td\>(.*?)\<\/td", x)[0]

# Extracting the road name from Description and adding it back into map GeoDataFrame
road_map["road_name"] = road_map["Description"].apply(extract_road_name)
road_map.head()
Name Description geometry road_name
0 kml_1 <center><table><tr><th colspan='2' align='cent... LINESTRING Z (103.85833 1.35595 0.00000, 103.8... CENTRAL EXPRESSWAY
1 kml_2 <center><table><tr><th colspan='2' align='cent... LINESTRING Z (103.85759 1.35611 0.00000, 103.8... CENTRAL EXPRESSWAY
2 kml_3 <center><table><tr><th colspan='2' align='cent... LINESTRING Z (103.86042 1.36817 0.00000, 103.8... CENTRAL EXPRESSWAY
3 kml_4 <center><table><tr><th colspan='2' align='cent... LINESTRING Z (103.85978 1.37228 0.00000, 103.8... CENTRAL EXPRESSWAY
4 kml_5 <center><table><tr><th colspan='2' align='cent... LINESTRING Z (103.85937 1.36913 0.00000, 103.8... CENTRAL EXPRESSWAY
# Doing a check on the road_names extracted
road_map["road_name"].unique()
array(['CENTRAL EXPRESSWAY', 'CHANGI COAST ROAD', 'CHANGI NORTH CRESCENT',
       'CHANGI SOUTH AVENUE 3', 'CHIN SWEE ROAD', 'CHOA CHU KANG DRIVE',
       'LORNIE HIGHWAY', 'TAMPINES AVENUE 10', 'XILIN AVENUE',
       'CHOA CHU KANG WAY', 'CLEMENTI AVENUE 2', 'CLEMENTI AVENUE 6',
       'CLEMENTI ROAD', 'DAIRY FARM ROAD', 'EAST COAST PARK SERVICE ROAD',
       'AYER RAJAH EXPRESSWAY', 'BUKIT TIMAH EXPRESSWAY',
       'BUKIT TIMAH ROAD', 'CAIRNHILL CIRCLE', 'BALESTIER ROAD',
       'BAYSHORE ROAD', 'BEDOK NORTH AVENUE 3', 'BEDOK NORTH ROAD',
       'BEDOK SOUTH AVENUE 1', 'BENDEMEER ROAD', 'BRADDELL ROAD',
       'BRICKLAND ROAD', 'BUANGKOK EAST DRIVE',
       'BUKIT BATOK EAST AVENUE 3', 'BUKIT BATOK ROAD',
       'BUKIT PANJANG ROAD', 'EAST COAST PARKWAY',
       'KALLANG PAYA LEBAR EXPRESSWAY', 'KRANJI EXPRESSWAY',
       'MARINA COASTAL EXPRESSWAY', 'NICOLL UNDERPASS',
       'PAN ISLAND EXPRESSWAY', 'SELETAR EXPRESSWAY', 'SIME UNDERPASS',
       'TAMPINES EXPRESSWAY', 'ADAM ROAD', 'ADMIRALTY ROAD WEST',
       'AIRPORT ROAD', 'ALEXANDRA ROAD', 'ANG MO KIO AVENUE 1',
       'ANG MO KIO AVENUE 3', 'ANG MO KIO AVENUE 5', '', 'ADAM UNDERPASS',
       'LOWER DELTA ROAD', 'LOYANG AVENUE', 'MANDAI AVENUE',
       'MANDAI ROAD', 'MARINE VISTA', 'MERCHANT ROAD', 'MOULMEIN ROAD',
       'MOUNT PLEASANT ROAD', 'NICOLL HIGHWAY', 'NORMANTON PARK',
       'NORTH BUONA VISTA ROAD', 'OPHIR ROAD', 'OUTRAM ROAD',
       'KEPPEL ROAD', 'KIM KEAT LINK', 'KRAMAT ROAD', 'LENTOR AVENUE',
       'LORONG 2 TOA PAYOH', 'ELIAS ROAD', 'ENG NEO AVENUE', 'FORT ROAD',
       'HOUGANG AVENUE 10', 'JALAN AHMAD IBRAHIM', 'JALAN ANAK BUKIT',
       'JALAN BAHAR', 'JALAN BUKIT MERAH', 'JALAN BUROH', 'JALAN EUNOS',
       'JALAN KAYU', 'JURONG CANAL DRIVE', 'JURONG PIER ROAD',
       'JURONG TOWN HALL ROAD', 'JURONG WEST AVENUE 2',
       'SELETAR WEST LINK', 'SENGKANG EAST ROAD',
       'SOUTH BUONA VISTA ROAD', 'SERANGOON ROAD', 'SIGLAP LINK',
       'SIMEI AVENUE', 'SIMEI ROAD', 'SIMS WAY', 'STILL ROAD SOUTH',
       'SUNGEI TENGAH ROAD', 'T4 FLYOVER', 'TAMPINES AVENUE 12',
       'PASIR LABA ROAD', 'PASIR RIS DRIVE 12', 'PASIR RIS DRIVE 8',
       'PAYA LEBAR ROAD', 'PENJURU ROAD', 'PIONEER ROAD NORTH',
       'PUNGGOL ROAD', 'PUNGGOL WAY', 'SELETAR LINK', 'ALJUNIED ROAD',
       'ANG MO KIO AVENUE 6', 'ANSON ROAD', 'BARTLEY ROAD',
       'BARTLEY ROAD EAST', 'BEDOK ROAD', 'BENCOOLEN STREET',
       'BENOI ROAD', 'UPPER CHANGI ROAD EAST', 'UPPER CHANGI ROAD NORTH',
       'UPPER CROSS STREET', 'UPPER JURONG ROAD', 'UPPER SERANGOON ROAD',
       'WOODLANDS AVENUE 12', 'WOODLANDS AVENUE 2', 'WOODLANDS AVENUE 3',
       'WOODLANDS CENTRE ROAD', 'WOODLANDS ROAD', 'YIO CHU KANG ROAD',
       'ADMIRALTY ROAD', 'ADMIRALTY ROAD EAST', 'AIRPORT BOULEVARD',
       'TAMPINES AVENUE 5', 'TAMPINES AVENUE 7', 'TAMPINES LINK',
       'TAMPINES ROAD', 'TAMPINES STREET 31', 'TANAH MERAH COAST ROAD',
       'TANJONG KATONG ROAD SOUTH', 'THOMSON ROAD', 'TOH GUAN ROAD',
       'TOH TUCK AVENUE', 'TUAS ROAD', 'TUAS VIADUCT', 'UNNAMED ROAD',
       'UPPER THOMSON ROAD', 'HOLLAND ROAD', 'HOOT KIAM ROAD',
       'HOUGANG AVENUE 2', 'HOUGANG AVENUE 3', 'INTERNATIONAL ROAD',
       'BRAS BASAH ROAD', 'BUANGKOK DRIVE', 'BUANGKOK GREEN',
       'BUKIT BATOK AVENUE 1', 'BUKIT BATOK CENTRAL',
       'BUKIT BATOK EAST AVENUE 2', 'BUKIT BATOK EAST AVENUE 6',
       'BUKIT BATOK WEST AVENUE 2', 'CANBERRA LINK', 'CANBERRA ROAD',
       'CANTONMENT ROAD', 'CHANGI ROAD', 'CHOA CHU KANG AVENUE 1',
       'CHOA CHU KANG AVENUE 3', 'WOODSVILLE TUNNEL', 'ZION ROAD',
       'COLLYER QUAY', 'COMMONWEALTH AVENUE', 'COMMONWEALTH AVENUE WEST',
       'CORPORATION ROAD', 'DUNEARN ROAD', 'EAST COAST ROAD',
       'ESPLANADE DRIVE', 'EU TONG SEN STREET', 'EUNOS LINK',
       'FARRER ROAD', 'CHOA CHU KANG ROAD', 'DELTA ROAD', 'GANGES AVENUE',
       'LORNIE ROAD', 'LORNIE UNDERPASS', 'LORONG 1 TOA PAYOH',
       'TAMPINES AVENUE 1', 'BOON LAY WAY', 'BOUNDARY ROAD',
       'FULLERTON ROAD', 'GAMBAS AVENUE', 'GEYLANG ROAD',
       'GUILLEMARD ROAD', 'HANDY ROAD', 'HILL STREET', 'HILLVIEW AVENUE',
       'HILLVIEW ROAD', 'ORCHARD ROAD', 'PASIR PANJANG ROAD',
       'PASIR RIS DRIVE 1', 'PASIR RIS DRIVE 3', 'PATERSON HILL',
       'PATERSON ROAD', 'PIONEER CIRCUS', 'PIONEER ROAD',
       'PORTSDOWN AVENUE', 'PRINSEP STREET', 'PUNGGOL CENTRAL',
       'PUNGGOL EAST', 'PUNGGOL FIELD', 'QUEENSWAY', 'MACPHERSON ROAD',
       'MARINE PARADE ROAD', 'MARYMOUNT ROAD', 'MOUNTBATTEN ROAD',
       'NAPIER ROAD', 'NEIL ROAD', 'NEO TIEW ROAD', 'NEW BRIDGE ROAD',
       'NEW PUNGGOL ROAD', 'NEW UPPER CHANGI ROAD', 'NEWTON CIRCUS',
       'NEWTON ROAD', 'NICOLL DRIVE', 'OLD CHOA CHU KANG ROAD',
       'OLD JURONG ROAD', 'JALAN BOON LAY', 'JALAN JURONG KECHIL',
       'JALAN KEBUN LIMAU', 'JURONG EAST CENTRAL',
       'JURONG ISLAND HIGHWAY', 'JURONG PORT ROAD',
       'JURONG WEST AVENUE 4', 'KALLANG BAHRU', 'KALLANG ROAD',
       'KAMPONG BAHRU ROAD', 'KRANJI LOOP', 'KRANJI ROAD', 'KRANJI WAY',
       'LAVENDER STREET', 'LIM CHU KANG ROAD', 'LORONG 6 TOA PAYOH',
       'LORONG CHUAN', 'WEST COAST HIGHWAY', 'WEST COAST ROAD',
       'WHITLEY ROAD', 'WOODLANDS AVENUE 5', 'WOODLANDS AVENUE 7',
       'WOODLANDS AVENUE 8', 'WOODLANDS AVENUE 9', 'SENGKANG EAST WAY',
       'SENGKANG WEST AVENUE', 'SENGKANG WEST ROAD', 'SENGKANG WEST WAY',
       'SERANGOON CENTRAL', 'SHEARES AVENUE', 'SHENTON WAY',
       'SIGLAP ROAD', 'SIMS AVENUE', 'SIMS AVENUE EAST', 'STAMFORD ROAD',
       'STILL ROAD', 'SUNGEI ROAD', 'T1 BOULEVARD', 'TAMPINES AVENUE 2',
       'TAMPINES AVENUE 6', 'TAMPINES AVENUE 9', 'TANJONG KATONG ROAD',
       'TUAS SOUTH AVENUE 3', 'TUAS SOUTH AVENUE 9',
       'TUAS SOUTH BOULEVARD', 'TUAS WEST ROAD', 'TURF CLUB AVENUE',
       'ULU PANDAN ROAD', 'UPPER BUKIT TIMAH ROAD', 'UPPER CHANGI ROAD',
       '0', 'TANJONG KLING ROAD', 'TELOK BLANGAH ROAD', 'TELOK PAKU ROAD',
       'TIONG BAHRU ROAD', 'TOA PAYOH NORTH FLYOVER', 'TUAS CRESCENT',
       'RAFFLES QUAY', 'RIVER VALLEY ROAD', 'ROBINSON ROAD',
       'ROCHOR CANAL ROAD', 'ROCHOR ROAD', 'SCOTTS ROAD', 'SELEGIE ROAD',
       'SELETAR AEROSPACE WAY', 'SELETAR NORTH LINK', 'SEMBAWANG ROAD',
       'SEMBAWANG WAY', 'SENGKANG EAST AVENUE', 'SENGKANG EAST DRIVE',
       'UPPER EAST COAST ROAD', 'UPPER PAYA LEBAR ROAD',
       'VICTORIA STREET', '20', 'YIO CHU KANG LINK', 'YISHUN AVENUE 1',
       'YISHUN AVENUE 2', 'YISHUN AVENUE 8', '40', '80', '60', '100',
       '120', '160', '140'], dtype=object)

According to LTA website, there are currently 10 expressways in Singapore. We will take a subset of the road_map dataset comprising of the expressways.
(Ref: https://www.lta.gov.sg/content/ltagov/en/getting_around/driving_in_singapore/expressways.html)
1. Pan Island Expressway (PIE)
2. Ayer Rajah Expressway (AYE)
3. East Coast Parkway (ECP)
4. Central Expressway (CTE)
5. Tampines Expressway (TPE)
6. Kallang-Paya Lebar Expressway (KPE)
7. Seletar Expressway (SLE)
8. Bukit Timah Expressway (BKE)
9. Kranji Expressway (KJE)
10. Marina Coastal Expressway (MCE)

# Subset only the expressways from road_map
expressway_map = road_map[road_map["road_name"].str.lower().str.contains("expressway|parkway")].copy()
expressway_map["road_name"].value_counts()
PAN ISLAND EXPRESSWAY            155
AYER RAJAH EXPRESSWAY             65
CENTRAL EXPRESSWAY                64
TAMPINES EXPRESSWAY               48
SELETAR EXPRESSWAY                38
KALLANG PAYA LEBAR EXPRESSWAY     34
EAST COAST PARKWAY                30
BUKIT TIMAH EXPRESSWAY            24
KRANJI EXPRESSWAY                 21
MARINA COASTAL EXPRESSWAY          8
Name: road_name, dtype: int64
# Combine all geometry data relating to a single road
expressway_map = expressway_map.dissolve(by="road_name", as_index=False)
expressway_map
road_name geometry Name Description
0 AYER RAJAH EXPRESSWAY MULTILINESTRING Z ((103.77133 1.30711 0.00000,... kml_58 <center><table><tr><th colspan='2' align='cent...
1 BUKIT TIMAH EXPRESSWAY MULTILINESTRING Z ((103.77154 1.42324 0.00000,... kml_88 <center><table><tr><th colspan='2' align='cent...
2 CENTRAL EXPRESSWAY MULTILINESTRING Z ((103.85833 1.35595 0.00000,... kml_1 <center><table><tr><th colspan='2' align='cent...
3 EAST COAST PARKWAY MULTILINESTRING Z ((103.98068 1.33908 0.00000,... kml_204 <center><table><tr><th colspan='2' align='cent...
4 KALLANG PAYA LEBAR EXPRESSWAY MULTILINESTRING Z ((103.87639 1.29647 0.00000,... kml_206 <center><table><tr><th colspan='2' align='cent...
5 KRANJI EXPRESSWAY MULTILINESTRING Z ((103.77574 1.39085 0.00000,... kml_213 <center><table><tr><th colspan='2' align='cent...
6 MARINA COASTAL EXPRESSWAY MULTILINESTRING Z ((103.87600 1.29520 0.00000,... kml_215 <center><table><tr><th colspan='2' align='cent...
7 PAN ISLAND EXPRESSWAY MULTILINESTRING Z ((103.66634 1.32705 0.00000,... kml_218 <center><table><tr><th colspan='2' align='cent...
8 SELETAR EXPRESSWAY MULTILINESTRING Z ((103.77023 1.42053 0.00000,... kml_225 <center><table><tr><th colspan='2' align='cent...
9 TAMPINES EXPRESSWAY MULTILINESTRING Z ((103.84840 1.39641 0.00000,... kml_230 <center><table><tr><th colspan='2' align='cent...

2.2 Import HDB Resale Data

resale = pd.read_csv("data/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv", parse_dates=["month"])
resale.head()
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease resale_price
0 2017-01-01 ANG MO KIO 2 ROOM 406 ANG MO KIO AVE 10 10 TO 12 44.0 Improved 1979 61 years 04 months 232000.0
1 2017-01-01 ANG MO KIO 3 ROOM 108 ANG MO KIO AVE 4 01 TO 03 67.0 New Generation 1978 60 years 07 months 250000.0
2 2017-01-01 ANG MO KIO 3 ROOM 602 ANG MO KIO AVE 5 01 TO 03 67.0 New Generation 1980 62 years 05 months 262000.0
3 2017-01-01 ANG MO KIO 3 ROOM 465 ANG MO KIO AVE 10 04 TO 06 68.0 New Generation 1980 62 years 01 month 265000.0
4 2017-01-01 ANG MO KIO 3 ROOM 601 ANG MO KIO AVE 5 01 TO 03 67.0 New Generation 1980 62 years 05 months 265000.0
resale.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 140642 entries, 0 to 140641
Data columns (total 11 columns):
 #   Column               Non-Null Count   Dtype         
---  ------               --------------   -----         
 0   month                140642 non-null  datetime64[ns]
 1   town                 140642 non-null  object        
 2   flat_type            140642 non-null  object        
 3   block                140642 non-null  object        
 4   street_name          140642 non-null  object        
 5   storey_range         140642 non-null  object        
 6   floor_area_sqm       140642 non-null  float64       
 7   flat_model           140642 non-null  object        
 8   lease_commence_date  140642 non-null  int64         
 9   remaining_lease      140642 non-null  object        
 10  resale_price         140642 non-null  float64       
dtypes: datetime64[ns](1), float64(2), int64(1), object(7)
memory usage: 11.8+ MB
resale.describe()
floor_area_sqm lease_commence_date resale_price
count 140642.000000 140642.000000 1.406420e+05
mean 97.717510 1995.433384 4.749322e+05
std 24.092743 13.666275 1.645732e+05
min 31.000000 1966.000000 1.400000e+05
25% 82.000000 1985.000000 3.550000e+05
50% 94.000000 1996.000000 4.450000e+05
75% 113.000000 2006.000000 5.600000e+05
max 249.000000 2019.000000 1.418000e+06

2.2.1 Enrich HDB Resale Data with Location Coordinates

Using the address of the HDB, we want to obtain their location coordinates from OneMap API

# Create new address feature as search value for retreival from OneMap API
resale["address"] = resale["block"] + " " + resale["street_name"]
resale.head()
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease resale_price address
0 2017-01-01 ANG MO KIO 2 ROOM 406 ANG MO KIO AVE 10 10 TO 12 44.0 Improved 1979 61 years 04 months 232000.0 406 ANG MO KIO AVE 10
1 2017-01-01 ANG MO KIO 3 ROOM 108 ANG MO KIO AVE 4 01 TO 03 67.0 New Generation 1978 60 years 07 months 250000.0 108 ANG MO KIO AVE 4
2 2017-01-01 ANG MO KIO 3 ROOM 602 ANG MO KIO AVE 5 01 TO 03 67.0 New Generation 1980 62 years 05 months 262000.0 602 ANG MO KIO AVE 5
3 2017-01-01 ANG MO KIO 3 ROOM 465 ANG MO KIO AVE 10 04 TO 06 68.0 New Generation 1980 62 years 01 month 265000.0 465 ANG MO KIO AVE 10
4 2017-01-01 ANG MO KIO 3 ROOM 601 ANG MO KIO AVE 5 01 TO 03 67.0 New Generation 1980 62 years 05 months 265000.0 601 ANG MO KIO AVE 5
# Function to retreive coordinates
def get_coordinates(add):
    '''
    function to retreive the x, y, latitude and longitude of the HDB flat addresses from OneMap API
    '''
    response = requests.get(url="https://developers.onemap.sg/commonapi/search",
                           params={"searchVal":add, "returnGeom":"Y", "getAddrDetails":"N"})
    results = response.json()
    return results["results"][0]["X"], results["results"][0]["Y"], results["results"][0]["LATITUDE"], results["results"][0]["LONGITUDE"]
# # Retrieve x and y coordiates, longitude and latitude information from OneMap API (run cell only once for retrieval of API info)
# address_list = list(resale["address"])
# coord_list = []
# count = 0
# failed_count = 0

# for add in address_list:
#     try:
#         coord = get_coordinates(add)
#         count += 1
#         print(f"Extracting {count} out of {len(address_list)} addresses")
#         coord_list.append(coord)
#     except:
#         count += 1
#         failed_count += 1
#         print(f"Failed to extract {count} out of {len(address_list)} addresses")
#         coord_list.append(None)

# print(f"Total Number of addresses with no coordinates: {failed_count}")

# # Append the x, y, lat, long info to resale dataframe
# x, y, lat, long = zip(*coord_list)
# resale["x"] = pd.DataFrame(x).astype("float")
# resale["y"] = pd.DataFrame(y).astype("float")
# resale["latitude"] = pd.DataFrame(lat).astype("float")
# resale["longitude"] = pd.DataFrame(long).astype("float")

# # Saving enriched dataframe to csv file
# resale.to_csv("data/enriched_resale_flat_data.csv", index=False, encoding="utf-8")

2.2.2 Convert HDB Resale Location Coordinates to Geometry object

We want to further convert the location coordinates to a geometry object to facilitate subsequent analysis using GeoPandas.

# Import the enriched dataframe
resale = pd.read_csv("data/enriched_resale_flat_data.csv", parse_dates=["month"])
resale_map = gpd.GeoDataFrame(resale, geometry=gpd.points_from_xy(resale["longitude"], resale["latitude"]), crs=expressway_map.crs)
resale_map.head()
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease resale_price address x y latitude longitude geometry
0 2017-01-01 ANG MO KIO 2 ROOM 406 ANG MO KIO AVE 10 10 TO 12 44.0 Improved 1979 61 years 04 months 232000.0 406 ANG MO KIO AVE 10 30288.234663 38229.067463 1.362005 103.853880 POINT (103.85388 1.36200)
1 2017-01-01 ANG MO KIO 3 ROOM 108 ANG MO KIO AVE 4 01 TO 03 67.0 New Generation 1978 60 years 07 months 250000.0 108 ANG MO KIO AVE 4 28543.458747 39220.009892 1.370966 103.838202 POINT (103.83820 1.37097)
2 2017-01-01 ANG MO KIO 3 ROOM 602 ANG MO KIO AVE 5 01 TO 03 67.0 New Generation 1980 62 years 05 months 262000.0 602 ANG MO KIO AVE 5 28228.099954 40297.283149 1.380709 103.835368 POINT (103.83537 1.38071)
3 2017-01-01 ANG MO KIO 3 ROOM 465 ANG MO KIO AVE 10 04 TO 06 68.0 New Generation 1980 62 years 01 month 265000.0 465 ANG MO KIO AVE 10 30657.824693 38693.098657 1.366201 103.857201 POINT (103.85720 1.36620)
4 2017-01-01 ANG MO KIO 3 ROOM 601 ANG MO KIO AVE 5 01 TO 03 67.0 New Generation 1980 62 years 05 months 265000.0 601 ANG MO KIO AVE 5 28201.782487 40334.051212 1.381041 103.835132 POINT (103.83513 1.38104)
resale_map.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 140642 entries, 0 to 140641
Data columns (total 17 columns):
 #   Column               Non-Null Count   Dtype         
---  ------               --------------   -----         
 0   month                140642 non-null  datetime64[ns]
 1   town                 140642 non-null  object        
 2   flat_type            140642 non-null  object        
 3   block                140642 non-null  object        
 4   street_name          140642 non-null  object        
 5   storey_range         140642 non-null  object        
 6   floor_area_sqm       140642 non-null  float64       
 7   flat_model           140642 non-null  object        
 8   lease_commence_date  140642 non-null  int64         
 9   remaining_lease      140642 non-null  object        
 10  resale_price         140642 non-null  float64       
 11  address              140642 non-null  object        
 12  x                    140642 non-null  float64       
 13  y                    140642 non-null  float64       
 14  latitude             140642 non-null  float64       
 15  longitude            140642 non-null  float64       
 16  geometry             140642 non-null  geometry      
dtypes: datetime64[ns](1), float64(6), geometry(1), int64(1), object(8)
memory usage: 18.2+ MB
# Confirm that the crs of both datasets are the same
print(expressway_map.crs)
print(resale_map.crs)
epsg:4326
epsg:4326

2.3 Visualisation of Expressway Map and HDB Map

fig, ax = plt.subplots(figsize=(20,14))

expressway_map.plot(ax=ax, cmap="jet", column="road_name", legend=True)
resale_map.plot(ax=ax, color="black");

3 Finding HDB Flats near the expressway

To retreive the HDB flats that are near the expressway, we set a buffer of about 150m around the expressways. This buffer is an estimate as literature review has shown that it is best to keep a distance of about 500 feet (equivalent to about 150m) from the highway.
(Ref: https://www.latimes.com/local/california/la-me-freeway-pollution-what-you-can-do-20171230-htmlstory.html)

3.1 Setting a buffer of 150m around the expressway

Note that the current crs of expressway_map is EPSG: 4326. The units of this crs is in degrees.

expressway_map.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

To add a buffer of 150m, we will need to convert expressway_map to projected crs (epsg=3414 for Singapore). This is required so that we can apply distance by meters.

# Convert expressway_map to projected crs (epsg=3414 for Singapore)
expressway_map_projected = expressway_map.to_crs(epsg=3414)
expressway_map_projected.crs
<Derived Projected CRS: EPSG:3414>
Name: SVY21 / Singapore TM
Axis Info [cartesian]:
- N[north]: Northing (metre)
- E[east]: Easting (metre)
Area of Use:
- name: Singapore - onshore and offshore.
- bounds: (103.59, 1.13, 104.07, 1.47)
Coordinate Operation:
- name: Singapore Transverse Mercator
- method: Transverse Mercator
Datum: SVY21
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
# Add buffer of 150m to expressway
buffer_dist = 150
expressway_map_projected_buffer150 = expressway_map_projected["geometry"].buffer(distance=buffer_dist)

# Convert back to epsg=4326 and add back to expressway_map
expressway_map_projected_buffer150 = expressway_map_projected_buffer150.to_crs(epsg=4326)
expressway_map_buffered = gpd.GeoDataFrame(expressway_map["road_name"], geometry=expressway_map_projected_buffer150)
expressway_map_buffered
road_name geometry
0 AYER RAJAH EXPRESSWAY POLYGON ((103.63591 1.34841, 103.63594 1.34848...
1 BUKIT TIMAH EXPRESSWAY POLYGON ((103.76708 1.43664, 103.76702 1.43705...
2 CENTRAL EXPRESSWAY POLYGON ((103.82388 1.27717, 103.82347 1.27737...
3 EAST COAST PARKWAY POLYGON ((103.86101 1.28493, 103.86093 1.28496...
4 KALLANG PAYA LEBAR EXPRESSWAY POLYGON ((103.87293 1.29379, 103.87209 1.29391...
5 KRANJI EXPRESSWAY POLYGON ((103.70194 1.36065, 103.70193 1.36067...
6 MARINA COASTAL EXPRESSWAY POLYGON ((103.84718 1.27359, 103.84734 1.27357...
7 PAN ISLAND EXPRESSWAY POLYGON ((103.66209 1.32283, 103.66208 1.32285...
8 SELETAR EXPRESSWAY POLYGON ((103.76564 1.41738, 103.76564 1.41739...
9 TAMPINES EXPRESSWAY POLYGON ((103.84974 1.39764, 103.85064 1.39763...

3.2 Visualisation of Buffered Expressway Map and HDB map

fig, ax = plt.subplots(figsize=(20,14))

expressway_map_buffered.plot(ax=ax, cmap="jet", column="road_name", legend=True)
resale_map.plot(ax=ax, color="black");

3.3 Finding HDB flats within 150m of expressway

# Finding HDB flats within the 150m buffer of the expressway
intersection = gpd.overlay(resale_map, expressway_map_buffered, how="intersection")

The below code chunk shows that there are 12215 HDB flats within 150m of the expressway.

intersection.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 12215 entries, 0 to 12214
Data columns (total 18 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   month                12215 non-null  datetime64[ns]
 1   town                 12215 non-null  object        
 2   flat_type            12215 non-null  object        
 3   block                12215 non-null  object        
 4   street_name          12215 non-null  object        
 5   storey_range         12215 non-null  object        
 6   floor_area_sqm       12215 non-null  float64       
 7   flat_model           12215 non-null  object        
 8   lease_commence_date  12215 non-null  int64         
 9   remaining_lease      12215 non-null  object        
 10  resale_price         12215 non-null  float64       
 11  address              12215 non-null  object        
 12  x                    12215 non-null  float64       
 13  y                    12215 non-null  float64       
 14  latitude             12215 non-null  float64       
 15  longitude            12215 non-null  float64       
 16  road_name            12215 non-null  object        
 17  geometry             12215 non-null  geometry      
dtypes: datetime64[ns](1), float64(6), geometry(1), int64(1), object(9)
memory usage: 1.7+ MB

3.4 Visualisation of the HDB flats within 150m of expressway

fig, ax = plt.subplots(figsize=(20,14))
expressway_map_buffered.plot(ax=ax, cmap="jet", column="road_name", legend=True)
intersection.plot(ax=ax, color="black");

3.5 Adding column of whether near expressway to resale data

# Obtain a list of addresses that are near expressway
addresses_near_expressway = intersection["address"].unique()

# Add a column near_expressway to resale data
resale["near_expressway"] = resale["address"].apply(lambda x: True if x in addresses_near_expressway else False)
resale["near_expressway"].value_counts()
False    129114
True      11528
Name: near_expressway, dtype: int64

From the above, we note that there is a difference between the number of flats in intersection data (12215) and the number of flats marked True in our final resale dataset. We investigate the difference using the below code chunks and find that the discrepency is because certain HDB flats fall within 150m of more than one expressway. Hence, each of these HDB flats are featured twice in the intersection dataset.

# Create a list of all addresses where there is a difference
addresses_to_check = []
for add in addresses_near_expressway:
    num_in_resale = len(resale[resale["address"] == add])
    num_in_intersection = len(intersection[intersection["address"] == add])
    if num_in_resale != num_in_intersection:
        addresses_to_check.append(add)
        
# Picking one address to check
sample = random.choice(addresses_to_check)

# Check on resale dataset
resale[resale["address"] == sample]
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease resale_price address x y latitude longitude geometry near_expressway
2679 2017-03-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 111.0 Premium Apartment 2002 84 years 06 months 420000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
11931 2017-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 110.0 Premium Apartment 2002 84 years 01 month 400000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
26846 2018-05-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 111.0 Premium Apartment 2002 83 years 05 months 390000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
54948 2019-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 110.0 Premium Apartment 2002 82 years 02 months 388000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
54949 2019-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 10 TO 12 111.0 Premium Apartment 2002 82 years 01 month 398000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
58734 2019-10-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 91.0 Premium Apartment 2002 81 years 11 months 345000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
64662 2020-01-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 92.0 Premium Apartment 2002 81 years 08 months 320000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
64676 2020-01-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 92.0 Premium Apartment 2002 81 years 08 months 345000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
73522 2020-07-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 111.0 Premium Apartment 2002 81 years 03 months 395000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
85608 2020-12-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 91.0 Premium Apartment 2002 80 years 10 months 390000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
88138 2021-01-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 110.0 Premium Apartment 2002 80 years 08 months 418000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
104463 2021-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 111.0 Premium Apartment 2002 80 years 02 months 465000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
109721 2021-10-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 01 TO 03 110.0 Premium Apartment 2002 80 years 455000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
119481 2022-02-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 94.0 Premium Apartment 2002 79 years 08 months 435000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
123733 2022-04-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 92.0 Premium Apartment 2002 79 years 06 months 430000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
128112 2022-06-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 01 TO 03 92.0 Premium Apartment 2002 79 years 03 months 435000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
134984 2022-09-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 01 TO 03 94.0 Premium Apartment 2002 79 years 01 month 450000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
135015 2022-09-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 110.0 Premium Apartment 2002 79 years 01 month 505000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 POINT (103.77223 1.38946) True
# Check on intersection dataset
intersection[intersection["address"] == sample]
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease resale_price address x y latitude longitude road_name geometry
6712 2017-03-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 111.0 Premium Apartment 2002 84 years 06 months 420000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
6730 2017-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 110.0 Premium Apartment 2002 84 years 01 month 400000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
6761 2018-05-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 111.0 Premium Apartment 2002 83 years 05 months 390000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
6877 2019-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 110.0 Premium Apartment 2002 82 years 02 months 388000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
6878 2019-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 10 TO 12 111.0 Premium Apartment 2002 82 years 01 month 398000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
6892 2019-10-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 91.0 Premium Apartment 2002 81 years 11 months 345000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
6907 2020-01-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 92.0 Premium Apartment 2002 81 years 08 months 320000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
6910 2020-01-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 92.0 Premium Apartment 2002 81 years 08 months 345000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
6939 2020-07-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 111.0 Premium Apartment 2002 81 years 03 months 395000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
6970 2020-12-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 91.0 Premium Apartment 2002 80 years 10 months 390000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
6980 2021-01-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 110.0 Premium Apartment 2002 80 years 08 months 418000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
7015 2021-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 111.0 Premium Apartment 2002 80 years 02 months 465000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
7026 2021-10-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 01 TO 03 110.0 Premium Apartment 2002 80 years 455000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
7049 2022-02-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 94.0 Premium Apartment 2002 79 years 08 months 435000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
7063 2022-04-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 92.0 Premium Apartment 2002 79 years 06 months 430000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
7079 2022-06-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 01 TO 03 92.0 Premium Apartment 2002 79 years 03 months 435000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
7101 2022-09-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 01 TO 03 94.0 Premium Apartment 2002 79 years 01 month 450000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
7104 2022-09-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 110.0 Premium Apartment 2002 79 years 01 month 505000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 BUKIT TIMAH EXPRESSWAY POINT (103.77223 1.38946)
7129 2017-03-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 111.0 Premium Apartment 2002 84 years 06 months 420000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7151 2017-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 110.0 Premium Apartment 2002 84 years 01 month 400000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7199 2018-05-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 111.0 Premium Apartment 2002 83 years 05 months 390000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7348 2019-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 110.0 Premium Apartment 2002 82 years 02 months 388000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7349 2019-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 10 TO 12 111.0 Premium Apartment 2002 82 years 01 month 398000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7371 2019-10-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 91.0 Premium Apartment 2002 81 years 11 months 345000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7439 2020-01-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 92.0 Premium Apartment 2002 81 years 08 months 320000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7443 2020-01-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 92.0 Premium Apartment 2002 81 years 08 months 345000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7596 2020-07-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 111.0 Premium Apartment 2002 81 years 03 months 395000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7715 2020-12-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 91.0 Premium Apartment 2002 80 years 10 months 390000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7752 2021-01-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 110.0 Premium Apartment 2002 80 years 08 months 418000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7866 2021-08-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 13 TO 15 111.0 Premium Apartment 2002 80 years 02 months 465000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7898 2021-10-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 01 TO 03 110.0 Premium Apartment 2002 80 years 455000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7949 2022-02-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 94.0 Premium Apartment 2002 79 years 08 months 435000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
7975 2022-04-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 07 TO 09 92.0 Premium Apartment 2002 79 years 06 months 430000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
8009 2022-06-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 01 TO 03 92.0 Premium Apartment 2002 79 years 03 months 435000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
8058 2022-09-01 BUKIT PANJANG 4 ROOM 481 SEGAR RD 01 TO 03 94.0 Premium Apartment 2002 79 years 01 month 450000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)
8062 2022-09-01 BUKIT PANJANG 5 ROOM 481 SEGAR RD 07 TO 09 110.0 Premium Apartment 2002 79 years 01 month 505000.0 481 SEGAR RD 21201.178987 41264.641734 1.389456 103.772226 KRANJI EXPRESSWAY POINT (103.77223 1.38946)

4 Exporting updated dataset to csv

resale.to_csv("data/final_resale_flat_dataset.csv", index=False)

5 Analysis of proximity to expressway and resale prices

5.1 Descriptive statistics

# Plot a boxplot of resale price by whether they are near expressway
fig, ax = plt.subplots(figsize=(10,10))
sns.boxplot(data=resale, x="near_expressway", y="resale_price", palette="Set3");

# Plot a boxplot of resale price by whether they are near expressway, differentiated by flat_type
sns.catplot(data=resale, x="near_expressway", y="resale_price", palette="Set3", kind="box", col="flat_type", col_wrap=2);

# Checking the count of HDBs for each category
resale.groupby("flat_type")["near_expressway"].value_counts()
flat_type         near_expressway
1 ROOM            True                  57
2 ROOM            False               1783
                  True                 334
3 ROOM            False              29612
                  True                3287
4 ROOM            False              54551
                  True                4114
5 ROOM            False              32811
                  True                2996
EXECUTIVE         False              10287
                  True                 740
MULTI-GENERATION  False                 70
Name: near_expressway, dtype: int64

As there is insufficient data for flat types 1-room and multi-generation, we will exclude them from our analysis. From the plots, it seems that the mean resale price of HDBs near expressway is lower than that for HDBs not near expressway. We will confirm whether the difference is significant using a statistical test.

5.2 Hypothesis Testing

Defining Hypothesis:
H₀: The mean of resale price is equal for HDBs near expressway and not near expressway
H₁: The mean of resale price is lower for HDBs near expressway compared to HDBs not near expressway

As sample size is large, we will use the z-test.

flat_types_to_check = ["2 ROOM", "3 ROOM", "4 ROOM", "5 ROOM", "EXECUTIVE"]

for t in flat_types_to_check:
    near_hdb = resale[(resale["near_expressway"] == True) & (resale["flat_type"] == t)]["resale_price"]
    notnear_hdb = resale[(resale["near_expressway"] == False) & (resale["flat_type"] == t)]["resale_price"]
    
    print(f"Hypothesis Testing for {t.lower()} HDBs")
    print(f"Mean of resale price for {t.lower()} HDBs near expressway: {near_hdb.mean():.2f}")
    print(f"Mean of resale price for {t.lower()} HDBs not near expressway: {notnear_hdb.mean():.2f}")
    print(f"Approximate difference: {notnear_hdb.mean() - near_hdb.mean():.0f}")
    
    results = ztest(near_hdb, notnear_hdb, alternative="smaller")
    print(f"Test Statistic: {results[0]:.3f}")
    print(f"P-Value: {results[1]}\n")
Hypothesis Testing for 2 room HDBs
Mean of resale price for 2 room HDBs near expressway: 246284.60
Mean of resale price for 2 room HDBs not near expressway: 252597.56
Approximate difference: 6313
Test Statistic: -2.330
P-Value: 0.009911139297098021

Hypothesis Testing for 3 room HDBs
Mean of resale price for 3 room HDBs near expressway: 323473.42
Mean of resale price for 3 room HDBs not near expressway: 331850.87
Approximate difference: 8377
Test Statistic: -5.383
P-Value: 3.6673150279096e-08

Hypothesis Testing for 4 room HDBs
Mean of resale price for 4 room HDBs near expressway: 459668.32
Mean of resale price for 4 room HDBs not near expressway: 471070.14
Approximate difference: 11402
Test Statistic: -5.389
P-Value: 3.5441927127976104e-08

Hypothesis Testing for 5 room HDBs
Mean of resale price for 5 room HDBs near expressway: 554445.95
Mean of resale price for 5 room HDBs not near expressway: 569696.28
Approximate difference: 15250
Test Statistic: -5.190
P-Value: 1.0498689309696564e-07

Hypothesis Testing for executive HDBs
Mean of resale price for executive HDBs near expressway: 610140.01
Mean of resale price for executive HDBs not near expressway: 671823.95
Approximate difference: 61684
Test Statistic: -12.428
P-Value: 9.218846815177831e-36

As the p-values for all HDB types are less than 0.05, we have sufficient evidence to reject the null hypothesis for all HDB types. This means that at significance level of 0.05, the mean resale price for 2-room, 3-room, 4-room, 5-room and executive HDBs near expressway is significantly lower than that of HDBs not near expressways.

6 Limitations

This project recognises that HDB prices are affected by many different factors and comparing resale prices only by its proximity to expressway could undermine the effects of other factors, e.g., size of HDB and remaining lease period. Future iterations of the project will explore the effect of other factors that could affect HDB resale prices and generate predictions for the prices.

7 References and Acknowledgements

[1] Geocoding Singapore Coordinates: OneMap API
[2] Z Test Statistics Formula & Python Implementation