import numpy as np
import pandas as pd
import requests
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
from shapely.geometry import Point
import re
import random
from statsmodels.stats.weightstats import ztest
%matplotlib inline
Exploration of Relationship between HDB Resale Prices and Proximity to Expressway
Background: Some home sellers believe that the value of their HDB flats suffer because they are near expressways, which are very noisy. Others say that proximity to expressway is good, due to convenience and unblocked view (at least for higher floors). This project seeks to explore whether proximity of the HDB to the expressway has a relationship with HDB resale prices.
Data Sources:
1. HDB Resale Prices (Jan 2017 to Nov 2022) - https://data.gov.sg/dataset/resale-flat-prices?resource_id=f1765b54-a209-4718-8d38-a39237f502b3
2. National Map Line (for expressway location info) - https://data.gov.sg/dataset/national-map-line
3. OneMap API (for HDB location info) - https://www.onemap.gov.sg/docs/
1 Import Libraries
2 Import Data
2.1 Import National Map Line Data
= gpd.read_file("data/national-map-line-geojson.geojson")
road_map road_map.head()
Name | Description | geometry | |
---|---|---|---|
0 | kml_1 | <center><table><tr><th colspan='2' align='cent... | LINESTRING Z (103.85833 1.35595 0.00000, 103.8... |
1 | kml_2 | <center><table><tr><th colspan='2' align='cent... | LINESTRING Z (103.85759 1.35611 0.00000, 103.8... |
2 | kml_3 | <center><table><tr><th colspan='2' align='cent... | LINESTRING Z (103.86042 1.36817 0.00000, 103.8... |
3 | kml_4 | <center><table><tr><th colspan='2' align='cent... | LINESTRING Z (103.85978 1.37228 0.00000, 103.8... |
4 | kml_5 | <center><table><tr><th colspan='2' align='cent... | LINESTRING Z (103.85937 1.36913 0.00000, 103.8... |
road_map.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 5135 entries, 0 to 5134
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 5135 non-null object
1 Description 5135 non-null object
2 geometry 5135 non-null geometry
dtypes: geometry(1), object(2)
memory usage: 120.5+ KB
2.1.1 Extract Expressway Data
The National Line Map data comprises of all the roads in Singapore and we will only extract the expressway data for this project.
# Define a function to extract the road name info
def extract_road_name(x):
return re.findall(r"td\>(.*?)\<\/td", x)[0]
# Extracting the road name from Description and adding it back into map GeoDataFrame
"road_name"] = road_map["Description"].apply(extract_road_name)
road_map[ road_map.head()
Name | Description | geometry | road_name | |
---|---|---|---|---|
0 | kml_1 | <center><table><tr><th colspan='2' align='cent... | LINESTRING Z (103.85833 1.35595 0.00000, 103.8... | CENTRAL EXPRESSWAY |
1 | kml_2 | <center><table><tr><th colspan='2' align='cent... | LINESTRING Z (103.85759 1.35611 0.00000, 103.8... | CENTRAL EXPRESSWAY |
2 | kml_3 | <center><table><tr><th colspan='2' align='cent... | LINESTRING Z (103.86042 1.36817 0.00000, 103.8... | CENTRAL EXPRESSWAY |
3 | kml_4 | <center><table><tr><th colspan='2' align='cent... | LINESTRING Z (103.85978 1.37228 0.00000, 103.8... | CENTRAL EXPRESSWAY |
4 | kml_5 | <center><table><tr><th colspan='2' align='cent... | LINESTRING Z (103.85937 1.36913 0.00000, 103.8... | CENTRAL EXPRESSWAY |
# Doing a check on the road_names extracted
"road_name"].unique() road_map[
array(['CENTRAL EXPRESSWAY', 'CHANGI COAST ROAD', 'CHANGI NORTH CRESCENT',
'CHANGI SOUTH AVENUE 3', 'CHIN SWEE ROAD', 'CHOA CHU KANG DRIVE',
'LORNIE HIGHWAY', 'TAMPINES AVENUE 10', 'XILIN AVENUE',
'CHOA CHU KANG WAY', 'CLEMENTI AVENUE 2', 'CLEMENTI AVENUE 6',
'CLEMENTI ROAD', 'DAIRY FARM ROAD', 'EAST COAST PARK SERVICE ROAD',
'AYER RAJAH EXPRESSWAY', 'BUKIT TIMAH EXPRESSWAY',
'BUKIT TIMAH ROAD', 'CAIRNHILL CIRCLE', 'BALESTIER ROAD',
'BAYSHORE ROAD', 'BEDOK NORTH AVENUE 3', 'BEDOK NORTH ROAD',
'BEDOK SOUTH AVENUE 1', 'BENDEMEER ROAD', 'BRADDELL ROAD',
'BRICKLAND ROAD', 'BUANGKOK EAST DRIVE',
'BUKIT BATOK EAST AVENUE 3', 'BUKIT BATOK ROAD',
'BUKIT PANJANG ROAD', 'EAST COAST PARKWAY',
'KALLANG PAYA LEBAR EXPRESSWAY', 'KRANJI EXPRESSWAY',
'MARINA COASTAL EXPRESSWAY', 'NICOLL UNDERPASS',
'PAN ISLAND EXPRESSWAY', 'SELETAR EXPRESSWAY', 'SIME UNDERPASS',
'TAMPINES EXPRESSWAY', 'ADAM ROAD', 'ADMIRALTY ROAD WEST',
'AIRPORT ROAD', 'ALEXANDRA ROAD', 'ANG MO KIO AVENUE 1',
'ANG MO KIO AVENUE 3', 'ANG MO KIO AVENUE 5', '', 'ADAM UNDERPASS',
'LOWER DELTA ROAD', 'LOYANG AVENUE', 'MANDAI AVENUE',
'MANDAI ROAD', 'MARINE VISTA', 'MERCHANT ROAD', 'MOULMEIN ROAD',
'MOUNT PLEASANT ROAD', 'NICOLL HIGHWAY', 'NORMANTON PARK',
'NORTH BUONA VISTA ROAD', 'OPHIR ROAD', 'OUTRAM ROAD',
'KEPPEL ROAD', 'KIM KEAT LINK', 'KRAMAT ROAD', 'LENTOR AVENUE',
'LORONG 2 TOA PAYOH', 'ELIAS ROAD', 'ENG NEO AVENUE', 'FORT ROAD',
'HOUGANG AVENUE 10', 'JALAN AHMAD IBRAHIM', 'JALAN ANAK BUKIT',
'JALAN BAHAR', 'JALAN BUKIT MERAH', 'JALAN BUROH', 'JALAN EUNOS',
'JALAN KAYU', 'JURONG CANAL DRIVE', 'JURONG PIER ROAD',
'JURONG TOWN HALL ROAD', 'JURONG WEST AVENUE 2',
'SELETAR WEST LINK', 'SENGKANG EAST ROAD',
'SOUTH BUONA VISTA ROAD', 'SERANGOON ROAD', 'SIGLAP LINK',
'SIMEI AVENUE', 'SIMEI ROAD', 'SIMS WAY', 'STILL ROAD SOUTH',
'SUNGEI TENGAH ROAD', 'T4 FLYOVER', 'TAMPINES AVENUE 12',
'PASIR LABA ROAD', 'PASIR RIS DRIVE 12', 'PASIR RIS DRIVE 8',
'PAYA LEBAR ROAD', 'PENJURU ROAD', 'PIONEER ROAD NORTH',
'PUNGGOL ROAD', 'PUNGGOL WAY', 'SELETAR LINK', 'ALJUNIED ROAD',
'ANG MO KIO AVENUE 6', 'ANSON ROAD', 'BARTLEY ROAD',
'BARTLEY ROAD EAST', 'BEDOK ROAD', 'BENCOOLEN STREET',
'BENOI ROAD', 'UPPER CHANGI ROAD EAST', 'UPPER CHANGI ROAD NORTH',
'UPPER CROSS STREET', 'UPPER JURONG ROAD', 'UPPER SERANGOON ROAD',
'WOODLANDS AVENUE 12', 'WOODLANDS AVENUE 2', 'WOODLANDS AVENUE 3',
'WOODLANDS CENTRE ROAD', 'WOODLANDS ROAD', 'YIO CHU KANG ROAD',
'ADMIRALTY ROAD', 'ADMIRALTY ROAD EAST', 'AIRPORT BOULEVARD',
'TAMPINES AVENUE 5', 'TAMPINES AVENUE 7', 'TAMPINES LINK',
'TAMPINES ROAD', 'TAMPINES STREET 31', 'TANAH MERAH COAST ROAD',
'TANJONG KATONG ROAD SOUTH', 'THOMSON ROAD', 'TOH GUAN ROAD',
'TOH TUCK AVENUE', 'TUAS ROAD', 'TUAS VIADUCT', 'UNNAMED ROAD',
'UPPER THOMSON ROAD', 'HOLLAND ROAD', 'HOOT KIAM ROAD',
'HOUGANG AVENUE 2', 'HOUGANG AVENUE 3', 'INTERNATIONAL ROAD',
'BRAS BASAH ROAD', 'BUANGKOK DRIVE', 'BUANGKOK GREEN',
'BUKIT BATOK AVENUE 1', 'BUKIT BATOK CENTRAL',
'BUKIT BATOK EAST AVENUE 2', 'BUKIT BATOK EAST AVENUE 6',
'BUKIT BATOK WEST AVENUE 2', 'CANBERRA LINK', 'CANBERRA ROAD',
'CANTONMENT ROAD', 'CHANGI ROAD', 'CHOA CHU KANG AVENUE 1',
'CHOA CHU KANG AVENUE 3', 'WOODSVILLE TUNNEL', 'ZION ROAD',
'COLLYER QUAY', 'COMMONWEALTH AVENUE', 'COMMONWEALTH AVENUE WEST',
'CORPORATION ROAD', 'DUNEARN ROAD', 'EAST COAST ROAD',
'ESPLANADE DRIVE', 'EU TONG SEN STREET', 'EUNOS LINK',
'FARRER ROAD', 'CHOA CHU KANG ROAD', 'DELTA ROAD', 'GANGES AVENUE',
'LORNIE ROAD', 'LORNIE UNDERPASS', 'LORONG 1 TOA PAYOH',
'TAMPINES AVENUE 1', 'BOON LAY WAY', 'BOUNDARY ROAD',
'FULLERTON ROAD', 'GAMBAS AVENUE', 'GEYLANG ROAD',
'GUILLEMARD ROAD', 'HANDY ROAD', 'HILL STREET', 'HILLVIEW AVENUE',
'HILLVIEW ROAD', 'ORCHARD ROAD', 'PASIR PANJANG ROAD',
'PASIR RIS DRIVE 1', 'PASIR RIS DRIVE 3', 'PATERSON HILL',
'PATERSON ROAD', 'PIONEER CIRCUS', 'PIONEER ROAD',
'PORTSDOWN AVENUE', 'PRINSEP STREET', 'PUNGGOL CENTRAL',
'PUNGGOL EAST', 'PUNGGOL FIELD', 'QUEENSWAY', 'MACPHERSON ROAD',
'MARINE PARADE ROAD', 'MARYMOUNT ROAD', 'MOUNTBATTEN ROAD',
'NAPIER ROAD', 'NEIL ROAD', 'NEO TIEW ROAD', 'NEW BRIDGE ROAD',
'NEW PUNGGOL ROAD', 'NEW UPPER CHANGI ROAD', 'NEWTON CIRCUS',
'NEWTON ROAD', 'NICOLL DRIVE', 'OLD CHOA CHU KANG ROAD',
'OLD JURONG ROAD', 'JALAN BOON LAY', 'JALAN JURONG KECHIL',
'JALAN KEBUN LIMAU', 'JURONG EAST CENTRAL',
'JURONG ISLAND HIGHWAY', 'JURONG PORT ROAD',
'JURONG WEST AVENUE 4', 'KALLANG BAHRU', 'KALLANG ROAD',
'KAMPONG BAHRU ROAD', 'KRANJI LOOP', 'KRANJI ROAD', 'KRANJI WAY',
'LAVENDER STREET', 'LIM CHU KANG ROAD', 'LORONG 6 TOA PAYOH',
'LORONG CHUAN', 'WEST COAST HIGHWAY', 'WEST COAST ROAD',
'WHITLEY ROAD', 'WOODLANDS AVENUE 5', 'WOODLANDS AVENUE 7',
'WOODLANDS AVENUE 8', 'WOODLANDS AVENUE 9', 'SENGKANG EAST WAY',
'SENGKANG WEST AVENUE', 'SENGKANG WEST ROAD', 'SENGKANG WEST WAY',
'SERANGOON CENTRAL', 'SHEARES AVENUE', 'SHENTON WAY',
'SIGLAP ROAD', 'SIMS AVENUE', 'SIMS AVENUE EAST', 'STAMFORD ROAD',
'STILL ROAD', 'SUNGEI ROAD', 'T1 BOULEVARD', 'TAMPINES AVENUE 2',
'TAMPINES AVENUE 6', 'TAMPINES AVENUE 9', 'TANJONG KATONG ROAD',
'TUAS SOUTH AVENUE 3', 'TUAS SOUTH AVENUE 9',
'TUAS SOUTH BOULEVARD', 'TUAS WEST ROAD', 'TURF CLUB AVENUE',
'ULU PANDAN ROAD', 'UPPER BUKIT TIMAH ROAD', 'UPPER CHANGI ROAD',
'0', 'TANJONG KLING ROAD', 'TELOK BLANGAH ROAD', 'TELOK PAKU ROAD',
'TIONG BAHRU ROAD', 'TOA PAYOH NORTH FLYOVER', 'TUAS CRESCENT',
'RAFFLES QUAY', 'RIVER VALLEY ROAD', 'ROBINSON ROAD',
'ROCHOR CANAL ROAD', 'ROCHOR ROAD', 'SCOTTS ROAD', 'SELEGIE ROAD',
'SELETAR AEROSPACE WAY', 'SELETAR NORTH LINK', 'SEMBAWANG ROAD',
'SEMBAWANG WAY', 'SENGKANG EAST AVENUE', 'SENGKANG EAST DRIVE',
'UPPER EAST COAST ROAD', 'UPPER PAYA LEBAR ROAD',
'VICTORIA STREET', '20', 'YIO CHU KANG LINK', 'YISHUN AVENUE 1',
'YISHUN AVENUE 2', 'YISHUN AVENUE 8', '40', '80', '60', '100',
'120', '160', '140'], dtype=object)
According to LTA website, there are currently 10 expressways in Singapore. We will take a subset of the road_map dataset comprising of the expressways.
(Ref: https://www.lta.gov.sg/content/ltagov/en/getting_around/driving_in_singapore/expressways.html)
1. Pan Island Expressway (PIE)
2. Ayer Rajah Expressway (AYE)
3. East Coast Parkway (ECP)
4. Central Expressway (CTE)
5. Tampines Expressway (TPE)
6. Kallang-Paya Lebar Expressway (KPE)
7. Seletar Expressway (SLE)
8. Bukit Timah Expressway (BKE)
9. Kranji Expressway (KJE)
10. Marina Coastal Expressway (MCE)
# Subset only the expressways from road_map
= road_map[road_map["road_name"].str.lower().str.contains("expressway|parkway")].copy()
expressway_map "road_name"].value_counts() expressway_map[
PAN ISLAND EXPRESSWAY 155
AYER RAJAH EXPRESSWAY 65
CENTRAL EXPRESSWAY 64
TAMPINES EXPRESSWAY 48
SELETAR EXPRESSWAY 38
KALLANG PAYA LEBAR EXPRESSWAY 34
EAST COAST PARKWAY 30
BUKIT TIMAH EXPRESSWAY 24
KRANJI EXPRESSWAY 21
MARINA COASTAL EXPRESSWAY 8
Name: road_name, dtype: int64
# Combine all geometry data relating to a single road
= expressway_map.dissolve(by="road_name", as_index=False)
expressway_map expressway_map
road_name | geometry | Name | Description | |
---|---|---|---|---|
0 | AYER RAJAH EXPRESSWAY | MULTILINESTRING Z ((103.77133 1.30711 0.00000,... | kml_58 | <center><table><tr><th colspan='2' align='cent... |
1 | BUKIT TIMAH EXPRESSWAY | MULTILINESTRING Z ((103.77154 1.42324 0.00000,... | kml_88 | <center><table><tr><th colspan='2' align='cent... |
2 | CENTRAL EXPRESSWAY | MULTILINESTRING Z ((103.85833 1.35595 0.00000,... | kml_1 | <center><table><tr><th colspan='2' align='cent... |
3 | EAST COAST PARKWAY | MULTILINESTRING Z ((103.98068 1.33908 0.00000,... | kml_204 | <center><table><tr><th colspan='2' align='cent... |
4 | KALLANG PAYA LEBAR EXPRESSWAY | MULTILINESTRING Z ((103.87639 1.29647 0.00000,... | kml_206 | <center><table><tr><th colspan='2' align='cent... |
5 | KRANJI EXPRESSWAY | MULTILINESTRING Z ((103.77574 1.39085 0.00000,... | kml_213 | <center><table><tr><th colspan='2' align='cent... |
6 | MARINA COASTAL EXPRESSWAY | MULTILINESTRING Z ((103.87600 1.29520 0.00000,... | kml_215 | <center><table><tr><th colspan='2' align='cent... |
7 | PAN ISLAND EXPRESSWAY | MULTILINESTRING Z ((103.66634 1.32705 0.00000,... | kml_218 | <center><table><tr><th colspan='2' align='cent... |
8 | SELETAR EXPRESSWAY | MULTILINESTRING Z ((103.77023 1.42053 0.00000,... | kml_225 | <center><table><tr><th colspan='2' align='cent... |
9 | TAMPINES EXPRESSWAY | MULTILINESTRING Z ((103.84840 1.39641 0.00000,... | kml_230 | <center><table><tr><th colspan='2' align='cent... |
2.2 Import HDB Resale Data
= pd.read_csv("data/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv", parse_dates=["month"])
resale resale.head()
month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | resale_price | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2017-01-01 | ANG MO KIO | 2 ROOM | 406 | ANG MO KIO AVE 10 | 10 TO 12 | 44.0 | Improved | 1979 | 61 years 04 months | 232000.0 |
1 | 2017-01-01 | ANG MO KIO | 3 ROOM | 108 | ANG MO KIO AVE 4 | 01 TO 03 | 67.0 | New Generation | 1978 | 60 years 07 months | 250000.0 |
2 | 2017-01-01 | ANG MO KIO | 3 ROOM | 602 | ANG MO KIO AVE 5 | 01 TO 03 | 67.0 | New Generation | 1980 | 62 years 05 months | 262000.0 |
3 | 2017-01-01 | ANG MO KIO | 3 ROOM | 465 | ANG MO KIO AVE 10 | 04 TO 06 | 68.0 | New Generation | 1980 | 62 years 01 month | 265000.0 |
4 | 2017-01-01 | ANG MO KIO | 3 ROOM | 601 | ANG MO KIO AVE 5 | 01 TO 03 | 67.0 | New Generation | 1980 | 62 years 05 months | 265000.0 |
resale.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 140642 entries, 0 to 140641
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 month 140642 non-null datetime64[ns]
1 town 140642 non-null object
2 flat_type 140642 non-null object
3 block 140642 non-null object
4 street_name 140642 non-null object
5 storey_range 140642 non-null object
6 floor_area_sqm 140642 non-null float64
7 flat_model 140642 non-null object
8 lease_commence_date 140642 non-null int64
9 remaining_lease 140642 non-null object
10 resale_price 140642 non-null float64
dtypes: datetime64[ns](1), float64(2), int64(1), object(7)
memory usage: 11.8+ MB
resale.describe()
floor_area_sqm | lease_commence_date | resale_price | |
---|---|---|---|
count | 140642.000000 | 140642.000000 | 1.406420e+05 |
mean | 97.717510 | 1995.433384 | 4.749322e+05 |
std | 24.092743 | 13.666275 | 1.645732e+05 |
min | 31.000000 | 1966.000000 | 1.400000e+05 |
25% | 82.000000 | 1985.000000 | 3.550000e+05 |
50% | 94.000000 | 1996.000000 | 4.450000e+05 |
75% | 113.000000 | 2006.000000 | 5.600000e+05 |
max | 249.000000 | 2019.000000 | 1.418000e+06 |
2.2.1 Enrich HDB Resale Data with Location Coordinates
Using the address of the HDB, we want to obtain their location coordinates from OneMap API
# Create new address feature as search value for retreival from OneMap API
"address"] = resale["block"] + " " + resale["street_name"]
resale[ resale.head()
month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | resale_price | address | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2017-01-01 | ANG MO KIO | 2 ROOM | 406 | ANG MO KIO AVE 10 | 10 TO 12 | 44.0 | Improved | 1979 | 61 years 04 months | 232000.0 | 406 ANG MO KIO AVE 10 |
1 | 2017-01-01 | ANG MO KIO | 3 ROOM | 108 | ANG MO KIO AVE 4 | 01 TO 03 | 67.0 | New Generation | 1978 | 60 years 07 months | 250000.0 | 108 ANG MO KIO AVE 4 |
2 | 2017-01-01 | ANG MO KIO | 3 ROOM | 602 | ANG MO KIO AVE 5 | 01 TO 03 | 67.0 | New Generation | 1980 | 62 years 05 months | 262000.0 | 602 ANG MO KIO AVE 5 |
3 | 2017-01-01 | ANG MO KIO | 3 ROOM | 465 | ANG MO KIO AVE 10 | 04 TO 06 | 68.0 | New Generation | 1980 | 62 years 01 month | 265000.0 | 465 ANG MO KIO AVE 10 |
4 | 2017-01-01 | ANG MO KIO | 3 ROOM | 601 | ANG MO KIO AVE 5 | 01 TO 03 | 67.0 | New Generation | 1980 | 62 years 05 months | 265000.0 | 601 ANG MO KIO AVE 5 |
# Function to retreive coordinates
def get_coordinates(add):
'''
function to retreive the x, y, latitude and longitude of the HDB flat addresses from OneMap API
'''
= requests.get(url="https://developers.onemap.sg/commonapi/search",
response ={"searchVal":add, "returnGeom":"Y", "getAddrDetails":"N"})
params= response.json()
results return results["results"][0]["X"], results["results"][0]["Y"], results["results"][0]["LATITUDE"], results["results"][0]["LONGITUDE"]
# # Retrieve x and y coordiates, longitude and latitude information from OneMap API (run cell only once for retrieval of API info)
# address_list = list(resale["address"])
# coord_list = []
# count = 0
# failed_count = 0
# for add in address_list:
# try:
# coord = get_coordinates(add)
# count += 1
# print(f"Extracting {count} out of {len(address_list)} addresses")
# coord_list.append(coord)
# except:
# count += 1
# failed_count += 1
# print(f"Failed to extract {count} out of {len(address_list)} addresses")
# coord_list.append(None)
# print(f"Total Number of addresses with no coordinates: {failed_count}")
# # Append the x, y, lat, long info to resale dataframe
# x, y, lat, long = zip(*coord_list)
# resale["x"] = pd.DataFrame(x).astype("float")
# resale["y"] = pd.DataFrame(y).astype("float")
# resale["latitude"] = pd.DataFrame(lat).astype("float")
# resale["longitude"] = pd.DataFrame(long).astype("float")
# # Saving enriched dataframe to csv file
# resale.to_csv("data/enriched_resale_flat_data.csv", index=False, encoding="utf-8")
2.2.2 Convert HDB Resale Location Coordinates to Geometry object
We want to further convert the location coordinates to a geometry object to facilitate subsequent analysis using GeoPandas.
# Import the enriched dataframe
= pd.read_csv("data/enriched_resale_flat_data.csv", parse_dates=["month"])
resale = gpd.GeoDataFrame(resale, geometry=gpd.points_from_xy(resale["longitude"], resale["latitude"]), crs=expressway_map.crs)
resale_map resale_map.head()
month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | resale_price | address | x | y | latitude | longitude | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2017-01-01 | ANG MO KIO | 2 ROOM | 406 | ANG MO KIO AVE 10 | 10 TO 12 | 44.0 | Improved | 1979 | 61 years 04 months | 232000.0 | 406 ANG MO KIO AVE 10 | 30288.234663 | 38229.067463 | 1.362005 | 103.853880 | POINT (103.85388 1.36200) |
1 | 2017-01-01 | ANG MO KIO | 3 ROOM | 108 | ANG MO KIO AVE 4 | 01 TO 03 | 67.0 | New Generation | 1978 | 60 years 07 months | 250000.0 | 108 ANG MO KIO AVE 4 | 28543.458747 | 39220.009892 | 1.370966 | 103.838202 | POINT (103.83820 1.37097) |
2 | 2017-01-01 | ANG MO KIO | 3 ROOM | 602 | ANG MO KIO AVE 5 | 01 TO 03 | 67.0 | New Generation | 1980 | 62 years 05 months | 262000.0 | 602 ANG MO KIO AVE 5 | 28228.099954 | 40297.283149 | 1.380709 | 103.835368 | POINT (103.83537 1.38071) |
3 | 2017-01-01 | ANG MO KIO | 3 ROOM | 465 | ANG MO KIO AVE 10 | 04 TO 06 | 68.0 | New Generation | 1980 | 62 years 01 month | 265000.0 | 465 ANG MO KIO AVE 10 | 30657.824693 | 38693.098657 | 1.366201 | 103.857201 | POINT (103.85720 1.36620) |
4 | 2017-01-01 | ANG MO KIO | 3 ROOM | 601 | ANG MO KIO AVE 5 | 01 TO 03 | 67.0 | New Generation | 1980 | 62 years 05 months | 265000.0 | 601 ANG MO KIO AVE 5 | 28201.782487 | 40334.051212 | 1.381041 | 103.835132 | POINT (103.83513 1.38104) |
resale_map.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 140642 entries, 0 to 140641
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 month 140642 non-null datetime64[ns]
1 town 140642 non-null object
2 flat_type 140642 non-null object
3 block 140642 non-null object
4 street_name 140642 non-null object
5 storey_range 140642 non-null object
6 floor_area_sqm 140642 non-null float64
7 flat_model 140642 non-null object
8 lease_commence_date 140642 non-null int64
9 remaining_lease 140642 non-null object
10 resale_price 140642 non-null float64
11 address 140642 non-null object
12 x 140642 non-null float64
13 y 140642 non-null float64
14 latitude 140642 non-null float64
15 longitude 140642 non-null float64
16 geometry 140642 non-null geometry
dtypes: datetime64[ns](1), float64(6), geometry(1), int64(1), object(8)
memory usage: 18.2+ MB
# Confirm that the crs of both datasets are the same
print(expressway_map.crs)
print(resale_map.crs)
epsg:4326
epsg:4326
2.3 Visualisation of Expressway Map and HDB Map
= plt.subplots(figsize=(20,14))
fig, ax
=ax, cmap="jet", column="road_name", legend=True)
expressway_map.plot(ax=ax, color="black"); resale_map.plot(ax
3 Finding HDB Flats near the expressway
To retreive the HDB flats that are near the expressway, we set a buffer of about 150m around the expressways. This buffer is an estimate as literature review has shown that it is best to keep a distance of about 500 feet (equivalent to about 150m) from the highway.
(Ref: https://www.latimes.com/local/california/la-me-freeway-pollution-what-you-can-do-20171230-htmlstory.html)
3.1 Setting a buffer of 150m around the expressway
Note that the current crs of expressway_map
is EPSG: 4326. The units of this crs is in degrees.
expressway_map.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
To add a buffer of 150m, we will need to convert expressway_map
to projected crs (epsg=3414 for Singapore). This is required so that we can apply distance by meters.
# Convert expressway_map to projected crs (epsg=3414 for Singapore)
= expressway_map.to_crs(epsg=3414)
expressway_map_projected expressway_map_projected.crs
<Derived Projected CRS: EPSG:3414>
Name: SVY21 / Singapore TM
Axis Info [cartesian]:
- N[north]: Northing (metre)
- E[east]: Easting (metre)
Area of Use:
- name: Singapore - onshore and offshore.
- bounds: (103.59, 1.13, 104.07, 1.47)
Coordinate Operation:
- name: Singapore Transverse Mercator
- method: Transverse Mercator
Datum: SVY21
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
# Add buffer of 150m to expressway
= 150
buffer_dist = expressway_map_projected["geometry"].buffer(distance=buffer_dist)
expressway_map_projected_buffer150
# Convert back to epsg=4326 and add back to expressway_map
= expressway_map_projected_buffer150.to_crs(epsg=4326)
expressway_map_projected_buffer150 = gpd.GeoDataFrame(expressway_map["road_name"], geometry=expressway_map_projected_buffer150)
expressway_map_buffered expressway_map_buffered
road_name | geometry | |
---|---|---|
0 | AYER RAJAH EXPRESSWAY | POLYGON ((103.63591 1.34841, 103.63594 1.34848... |
1 | BUKIT TIMAH EXPRESSWAY | POLYGON ((103.76708 1.43664, 103.76702 1.43705... |
2 | CENTRAL EXPRESSWAY | POLYGON ((103.82388 1.27717, 103.82347 1.27737... |
3 | EAST COAST PARKWAY | POLYGON ((103.86101 1.28493, 103.86093 1.28496... |
4 | KALLANG PAYA LEBAR EXPRESSWAY | POLYGON ((103.87293 1.29379, 103.87209 1.29391... |
5 | KRANJI EXPRESSWAY | POLYGON ((103.70194 1.36065, 103.70193 1.36067... |
6 | MARINA COASTAL EXPRESSWAY | POLYGON ((103.84718 1.27359, 103.84734 1.27357... |
7 | PAN ISLAND EXPRESSWAY | POLYGON ((103.66209 1.32283, 103.66208 1.32285... |
8 | SELETAR EXPRESSWAY | POLYGON ((103.76564 1.41738, 103.76564 1.41739... |
9 | TAMPINES EXPRESSWAY | POLYGON ((103.84974 1.39764, 103.85064 1.39763... |
3.2 Visualisation of Buffered Expressway Map and HDB map
= plt.subplots(figsize=(20,14))
fig, ax
=ax, cmap="jet", column="road_name", legend=True)
expressway_map_buffered.plot(ax=ax, color="black"); resale_map.plot(ax
3.3 Finding HDB flats within 150m of expressway
# Finding HDB flats within the 150m buffer of the expressway
= gpd.overlay(resale_map, expressway_map_buffered, how="intersection") intersection
The below code chunk shows that there are 12215 HDB flats within 150m of the expressway.
intersection.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 12215 entries, 0 to 12214
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 month 12215 non-null datetime64[ns]
1 town 12215 non-null object
2 flat_type 12215 non-null object
3 block 12215 non-null object
4 street_name 12215 non-null object
5 storey_range 12215 non-null object
6 floor_area_sqm 12215 non-null float64
7 flat_model 12215 non-null object
8 lease_commence_date 12215 non-null int64
9 remaining_lease 12215 non-null object
10 resale_price 12215 non-null float64
11 address 12215 non-null object
12 x 12215 non-null float64
13 y 12215 non-null float64
14 latitude 12215 non-null float64
15 longitude 12215 non-null float64
16 road_name 12215 non-null object
17 geometry 12215 non-null geometry
dtypes: datetime64[ns](1), float64(6), geometry(1), int64(1), object(9)
memory usage: 1.7+ MB
3.4 Visualisation of the HDB flats within 150m of expressway
= plt.subplots(figsize=(20,14))
fig, ax =ax, cmap="jet", column="road_name", legend=True)
expressway_map_buffered.plot(ax=ax, color="black"); intersection.plot(ax
3.5 Adding column of whether near expressway to resale data
# Obtain a list of addresses that are near expressway
= intersection["address"].unique()
addresses_near_expressway
# Add a column near_expressway to resale data
"near_expressway"] = resale["address"].apply(lambda x: True if x in addresses_near_expressway else False)
resale["near_expressway"].value_counts() resale[
False 129114
True 11528
Name: near_expressway, dtype: int64
From the above, we note that there is a difference between the number of flats in intersection data (12215) and the number of flats marked True in our final resale dataset. We investigate the difference using the below code chunks and find that the discrepency is because certain HDB flats fall within 150m of more than one expressway. Hence, each of these HDB flats are featured twice in the intersection dataset.
# Create a list of all addresses where there is a difference
= []
addresses_to_check for add in addresses_near_expressway:
= len(resale[resale["address"] == add])
num_in_resale = len(intersection[intersection["address"] == add])
num_in_intersection if num_in_resale != num_in_intersection:
addresses_to_check.append(add)
# Picking one address to check
= random.choice(addresses_to_check)
sample
# Check on resale dataset
"address"] == sample] resale[resale[
month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | resale_price | address | x | y | latitude | longitude | geometry | near_expressway | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2679 | 2017-03-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 111.0 | Premium Apartment | 2002 | 84 years 06 months | 420000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
11931 | 2017-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 110.0 | Premium Apartment | 2002 | 84 years 01 month | 400000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
26846 | 2018-05-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 111.0 | Premium Apartment | 2002 | 83 years 05 months | 390000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
54948 | 2019-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 110.0 | Premium Apartment | 2002 | 82 years 02 months | 388000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
54949 | 2019-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 10 TO 12 | 111.0 | Premium Apartment | 2002 | 82 years 01 month | 398000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
58734 | 2019-10-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 91.0 | Premium Apartment | 2002 | 81 years 11 months | 345000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
64662 | 2020-01-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 92.0 | Premium Apartment | 2002 | 81 years 08 months | 320000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
64676 | 2020-01-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 92.0 | Premium Apartment | 2002 | 81 years 08 months | 345000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
73522 | 2020-07-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 111.0 | Premium Apartment | 2002 | 81 years 03 months | 395000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
85608 | 2020-12-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 91.0 | Premium Apartment | 2002 | 80 years 10 months | 390000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
88138 | 2021-01-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 110.0 | Premium Apartment | 2002 | 80 years 08 months | 418000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
104463 | 2021-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 111.0 | Premium Apartment | 2002 | 80 years 02 months | 465000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
109721 | 2021-10-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 01 TO 03 | 110.0 | Premium Apartment | 2002 | 80 years | 455000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
119481 | 2022-02-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 94.0 | Premium Apartment | 2002 | 79 years 08 months | 435000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
123733 | 2022-04-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 92.0 | Premium Apartment | 2002 | 79 years 06 months | 430000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
128112 | 2022-06-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 01 TO 03 | 92.0 | Premium Apartment | 2002 | 79 years 03 months | 435000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
134984 | 2022-09-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 01 TO 03 | 94.0 | Premium Apartment | 2002 | 79 years 01 month | 450000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
135015 | 2022-09-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 110.0 | Premium Apartment | 2002 | 79 years 01 month | 505000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | POINT (103.77223 1.38946) | True |
# Check on intersection dataset
"address"] == sample] intersection[intersection[
month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | resale_price | address | x | y | latitude | longitude | road_name | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6712 | 2017-03-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 111.0 | Premium Apartment | 2002 | 84 years 06 months | 420000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
6730 | 2017-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 110.0 | Premium Apartment | 2002 | 84 years 01 month | 400000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
6761 | 2018-05-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 111.0 | Premium Apartment | 2002 | 83 years 05 months | 390000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
6877 | 2019-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 110.0 | Premium Apartment | 2002 | 82 years 02 months | 388000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
6878 | 2019-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 10 TO 12 | 111.0 | Premium Apartment | 2002 | 82 years 01 month | 398000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
6892 | 2019-10-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 91.0 | Premium Apartment | 2002 | 81 years 11 months | 345000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
6907 | 2020-01-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 92.0 | Premium Apartment | 2002 | 81 years 08 months | 320000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
6910 | 2020-01-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 92.0 | Premium Apartment | 2002 | 81 years 08 months | 345000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
6939 | 2020-07-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 111.0 | Premium Apartment | 2002 | 81 years 03 months | 395000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
6970 | 2020-12-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 91.0 | Premium Apartment | 2002 | 80 years 10 months | 390000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
6980 | 2021-01-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 110.0 | Premium Apartment | 2002 | 80 years 08 months | 418000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
7015 | 2021-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 111.0 | Premium Apartment | 2002 | 80 years 02 months | 465000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
7026 | 2021-10-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 01 TO 03 | 110.0 | Premium Apartment | 2002 | 80 years | 455000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
7049 | 2022-02-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 94.0 | Premium Apartment | 2002 | 79 years 08 months | 435000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
7063 | 2022-04-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 92.0 | Premium Apartment | 2002 | 79 years 06 months | 430000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
7079 | 2022-06-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 01 TO 03 | 92.0 | Premium Apartment | 2002 | 79 years 03 months | 435000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
7101 | 2022-09-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 01 TO 03 | 94.0 | Premium Apartment | 2002 | 79 years 01 month | 450000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
7104 | 2022-09-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 110.0 | Premium Apartment | 2002 | 79 years 01 month | 505000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | BUKIT TIMAH EXPRESSWAY | POINT (103.77223 1.38946) |
7129 | 2017-03-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 111.0 | Premium Apartment | 2002 | 84 years 06 months | 420000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7151 | 2017-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 110.0 | Premium Apartment | 2002 | 84 years 01 month | 400000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7199 | 2018-05-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 111.0 | Premium Apartment | 2002 | 83 years 05 months | 390000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7348 | 2019-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 110.0 | Premium Apartment | 2002 | 82 years 02 months | 388000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7349 | 2019-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 10 TO 12 | 111.0 | Premium Apartment | 2002 | 82 years 01 month | 398000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7371 | 2019-10-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 91.0 | Premium Apartment | 2002 | 81 years 11 months | 345000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7439 | 2020-01-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 92.0 | Premium Apartment | 2002 | 81 years 08 months | 320000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7443 | 2020-01-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 92.0 | Premium Apartment | 2002 | 81 years 08 months | 345000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7596 | 2020-07-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 111.0 | Premium Apartment | 2002 | 81 years 03 months | 395000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7715 | 2020-12-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 91.0 | Premium Apartment | 2002 | 80 years 10 months | 390000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7752 | 2021-01-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 110.0 | Premium Apartment | 2002 | 80 years 08 months | 418000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7866 | 2021-08-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 13 TO 15 | 111.0 | Premium Apartment | 2002 | 80 years 02 months | 465000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7898 | 2021-10-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 01 TO 03 | 110.0 | Premium Apartment | 2002 | 80 years | 455000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7949 | 2022-02-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 94.0 | Premium Apartment | 2002 | 79 years 08 months | 435000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
7975 | 2022-04-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 07 TO 09 | 92.0 | Premium Apartment | 2002 | 79 years 06 months | 430000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
8009 | 2022-06-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 01 TO 03 | 92.0 | Premium Apartment | 2002 | 79 years 03 months | 435000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
8058 | 2022-09-01 | BUKIT PANJANG | 4 ROOM | 481 | SEGAR RD | 01 TO 03 | 94.0 | Premium Apartment | 2002 | 79 years 01 month | 450000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
8062 | 2022-09-01 | BUKIT PANJANG | 5 ROOM | 481 | SEGAR RD | 07 TO 09 | 110.0 | Premium Apartment | 2002 | 79 years 01 month | 505000.0 | 481 SEGAR RD | 21201.178987 | 41264.641734 | 1.389456 | 103.772226 | KRANJI EXPRESSWAY | POINT (103.77223 1.38946) |
4 Exporting updated dataset to csv
"data/final_resale_flat_dataset.csv", index=False) resale.to_csv(
5 Analysis of proximity to expressway and resale prices
5.1 Descriptive statistics
# Plot a boxplot of resale price by whether they are near expressway
= plt.subplots(figsize=(10,10))
fig, ax =resale, x="near_expressway", y="resale_price", palette="Set3"); sns.boxplot(data
# Plot a boxplot of resale price by whether they are near expressway, differentiated by flat_type
=resale, x="near_expressway", y="resale_price", palette="Set3", kind="box", col="flat_type", col_wrap=2); sns.catplot(data
# Checking the count of HDBs for each category
"flat_type")["near_expressway"].value_counts() resale.groupby(
flat_type near_expressway
1 ROOM True 57
2 ROOM False 1783
True 334
3 ROOM False 29612
True 3287
4 ROOM False 54551
True 4114
5 ROOM False 32811
True 2996
EXECUTIVE False 10287
True 740
MULTI-GENERATION False 70
Name: near_expressway, dtype: int64
As there is insufficient data for flat types 1-room and multi-generation, we will exclude them from our analysis. From the plots, it seems that the mean resale price of HDBs near expressway is lower than that for HDBs not near expressway. We will confirm whether the difference is significant using a statistical test.
5.2 Hypothesis Testing
Defining Hypothesis:
H₀: The mean of resale price is equal for HDBs near expressway and not near expressway
H₁: The mean of resale price is lower for HDBs near expressway compared to HDBs not near expressway
As sample size is large, we will use the z-test.
= ["2 ROOM", "3 ROOM", "4 ROOM", "5 ROOM", "EXECUTIVE"]
flat_types_to_check
for t in flat_types_to_check:
= resale[(resale["near_expressway"] == True) & (resale["flat_type"] == t)]["resale_price"]
near_hdb = resale[(resale["near_expressway"] == False) & (resale["flat_type"] == t)]["resale_price"]
notnear_hdb
print(f"Hypothesis Testing for {t.lower()} HDBs")
print(f"Mean of resale price for {t.lower()} HDBs near expressway: {near_hdb.mean():.2f}")
print(f"Mean of resale price for {t.lower()} HDBs not near expressway: {notnear_hdb.mean():.2f}")
print(f"Approximate difference: {notnear_hdb.mean() - near_hdb.mean():.0f}")
= ztest(near_hdb, notnear_hdb, alternative="smaller")
results print(f"Test Statistic: {results[0]:.3f}")
print(f"P-Value: {results[1]}\n")
Hypothesis Testing for 2 room HDBs
Mean of resale price for 2 room HDBs near expressway: 246284.60
Mean of resale price for 2 room HDBs not near expressway: 252597.56
Approximate difference: 6313
Test Statistic: -2.330
P-Value: 0.009911139297098021
Hypothesis Testing for 3 room HDBs
Mean of resale price for 3 room HDBs near expressway: 323473.42
Mean of resale price for 3 room HDBs not near expressway: 331850.87
Approximate difference: 8377
Test Statistic: -5.383
P-Value: 3.6673150279096e-08
Hypothesis Testing for 4 room HDBs
Mean of resale price for 4 room HDBs near expressway: 459668.32
Mean of resale price for 4 room HDBs not near expressway: 471070.14
Approximate difference: 11402
Test Statistic: -5.389
P-Value: 3.5441927127976104e-08
Hypothesis Testing for 5 room HDBs
Mean of resale price for 5 room HDBs near expressway: 554445.95
Mean of resale price for 5 room HDBs not near expressway: 569696.28
Approximate difference: 15250
Test Statistic: -5.190
P-Value: 1.0498689309696564e-07
Hypothesis Testing for executive HDBs
Mean of resale price for executive HDBs near expressway: 610140.01
Mean of resale price for executive HDBs not near expressway: 671823.95
Approximate difference: 61684
Test Statistic: -12.428
P-Value: 9.218846815177831e-36
As the p-values for all HDB types are less than 0.05, we have sufficient evidence to reject the null hypothesis for all HDB types. This means that at significance level of 0.05, the mean resale price for 2-room, 3-room, 4-room, 5-room and executive HDBs near expressway is significantly lower than that of HDBs not near expressways.
6 Limitations
This project recognises that HDB prices are affected by many different factors and comparing resale prices only by its proximity to expressway could undermine the effects of other factors, e.g., size of HDB and remaining lease period. Future iterations of the project will explore the effect of other factors that could affect HDB resale prices and generate predictions for the prices.
7 References and Acknowledgements
[1] Geocoding Singapore Coordinates: OneMap API
[2] Z Test Statistics Formula & Python Implementation