Toronto Apartment Building Age

Using legal data to chart apartment construction trends and building age
Author

Simon Wallace

Published

October 21, 2022

Charting apartment constructions booms and busts

When were Toronto apartment buildings constructed and how have trends in density changed over time? In this notebook I present visualizations answering these questions. My main goal is to show how Python and computational approaches to research can be simply, quickly, and effecitively leveraged to answer important questions.

The problem

Almost 50% of Torontonians live in rental units. The age of those units matters: older units may require more maintenance and if new units are not being built housing stock might age out. Can we programmatically develop an understanding of the age, nature, and geographic distribution of Toronto’s apartment buildings?

The dataset

In 2017, the City of Toronto launched the RentSafeTO program. Its object is ensure that tenants live in “safe, well-maintained buildings” by subjecting properties to regular inspections. Owners of properties that fail to meet City of Toronto standards face financial penalities for non-compliance.

The City reports on its enforcement and inspection efforts through its open data portal. The dataset includes, inter alia, information about:

  • the size of buildings,
  • outcomes of inspections,
  • the scores of each inspection,
  • the location of each building, and
  • the height of each building.

Using the data

To explore the data, we are going to use two popular libraries: pandas and seaborn.

import pandas as pd
import seaborn as sns
import seaborn.objects as so

We begin by loading the information from the City of Toronto.

df = pd.read_csv('https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/4ef82789-e038-44ef-a478-a8f3590c3eb1/resource/979fb513-5186-41e9-bb23-7b5cc6b89915/download/Apartment%20Building%20Evaluation.csv')

This gives us a dataframe (a ‘df’), which is basically a spreadsheet. Let’s take a look at the first three rows:

df.head(3)
_id RSN YEAR_REGISTERED YEAR_EVALUATED YEAR_BUILT PROPERTY_TYPE WARD WARDNAME SITE_ADDRESS CONFIRMED_STOREYS ... EXTERIOR_WALKWAYS BALCONY_GUARDS WATER_PEN_EXT_BLDG_ELEMENTS PARKING_AREA OTHER_FACILITIES GRID LATITUDE LONGITUDE X Y
0 1577658 4155099 2017.0 NaN 1951.0 PRIVATE 5 York South-Weston 60 CLEARVIEW HTS 4 ... 4.0 4.0 5.0 3.0 NaN W0532 43.692977 -79.481347 306287.792 4839003.552
1 1577659 4154772 2017.0 NaN 1989.0 SOCIAL HOUSING 16 Don Valley East 7 THE DONWAY E 4 ... 5.0 5.0 4.0 4.0 5.0 N1627 43.733247 -79.339561 317708.903 4843488.607
2 1577660 4153788 2019.0 NaN 1962.0 PRIVATE 15 Don Valley West 365 EGLINTON AVE E 7 ... 4.0 4.0 5.0 4.0 NaN N1530 43.709092 -79.385527 314009.544 4840800.059

3 rows × 40 columns

This is a big dataset:

print(f'The spreadsheet is {df.shape[0]} rows by {df.shape[1]} columns.')
print(f'Here is a list of all the columns:')
print(df.columns)
The spreadsheet is 10371 rows by 40 columns.
Here is a list of all the columns:
Index(['_id', 'RSN', 'YEAR_REGISTERED', 'YEAR_EVALUATED', 'YEAR_BUILT',
       'PROPERTY_TYPE', 'WARD', 'WARDNAME', 'SITE_ADDRESS',
       'CONFIRMED_STOREYS', 'CONFIRMED_UNITS', 'EVALUATION_COMPLETED_ON',
       'SCORE', 'RESULTS_OF_SCORE', 'NO_OF_AREAS_EVALUATED', 'ENTRANCE_LOBBY',
       'ENTRANCE_DOORS_WINDOWS', 'SECURITY', 'STAIRWELLS', 'LAUNDRY_ROOMS',
       'INTERNAL_GUARDS_HANDRAILS', 'GARBAGE_CHUTE_ROOMS',
       'GARBAGE_BIN_STORAGE_AREA', 'ELEVATORS', 'STORAGE_AREAS_LOCKERS',
       'INTERIOR_WALL_CEILING_FLOOR', 'INTERIOR_LIGHTING_LEVELS', 'GRAFFITI',
       'EXTERIOR_CLADDING', 'EXTERIOR_GROUNDS', 'EXTERIOR_WALKWAYS',
       'BALCONY_GUARDS', 'WATER_PEN_EXT_BLDG_ELEMENTS', 'PARKING_AREA',
       'OTHER_FACILITIES', 'GRID', 'LATITUDE', 'LONGITUDE', 'X', 'Y'],
      dtype='object')

I’ve poked around and noticed that there are some duplicate entries because some buildings were inspected multiple times. To get a cleaner dataset, we will eliminate duplicate entries. After looking at the dataset’s data dictionary I learned that the ‘RSN’ is a unique number assigned to each building. We will drop any duplicate RSN rows and then isolate for a few columns.

df.sort_values(by=['YEAR_EVALUATED'])
df.drop_duplicates(subset=['RSN'], inplace=True, keep='last')

I am only interested in a few columns, so I will drop everything that isn’t directly relevant.

df = df[['YEAR_BUILT','CONFIRMED_STOREYS','CONFIRMED_UNITS','X', 'Y']]
df.shape
(3482, 5)

Wow! That eliminated a lot of information. Now our dataset describes 3482 apartment buildings. We can get some summary information:

df['YEAR_BUILT'].describe()
count    3463.000000
mean     1961.985850
std        19.281208
min      1805.000000
25%      1955.000000
50%      1962.000000
75%      1970.000000
max      2021.000000
Name: YEAR_BUILT, dtype: float64

This is interesting! The average building is about 60 years old, the oldest building was built in 1805, and 68% of apartment buildings in Toronto were contructed between 1943 and 1981.

Visualize the Data

Let’s get a better feel for what’s in the dataset by visualizing it. We will begin by plotting the coorindates of each building onto a scatter plot. I’ve coded the program to output larger points for buildings with more units.

(
    so.Plot(df, x="X", y="Y", pointsize='CONFIRMED_UNITS')
    .add(so.Dot())
    .label(
        x='X coordinates',
        y='Y coordinates',
        title='Toronto Apartment Buildings')
)

This appears to show — and this passes the gut check — that the tallest apartment buuildings are clustered downtown and that apartment buildings are spread out along major arteries.

Now let’s visualize the age of the buildings.

(
    so.Plot(df, x="YEAR_BUILT")
    .add(so.Bars(), so.Hist())
)

This shows a huge building boom in the 50s and 60s, that suddenly dropped off in the 70s. But do fewer new buildings mean fewer new units? Let’s check by manipulating the data a bit. This next graph will visulize the total number of units built each year:

units_per_year = df.groupby('YEAR_BUILT')["CONFIRMED_UNITS"].sum().to_frame()
so.Plot(units_per_year, x='YEAR_BUILT', y='CONFIRMED_UNITS').add(so.Line())

This more or less correlates with the total number of buildings built. But have the buildings changed? Let’s look to see what the average number of units for a building built each year is:

average_units_per_building_per_year = df.groupby('YEAR_BUILT')["CONFIRMED_UNITS"].mean().to_frame()
so.Plot(average_units_per_building_per_year, x='YEAR_BUILT', y='CONFIRMED_UNITS').add(so.Line())

This is really interesting! From 1960 to 2000, the average number of units in a new build steadily decresed, only to suddenly rebound and later climb.

Questions for future research

These visualizations leave me with more questions than answers. What happened in 2000 that encouraged developers to build apartment buildings with more units? Do taller buildings portend a meaningful increase in available rental housing stock in Toronto or, at the 1960s would suggest, is the only way to increase the number of rental units to build more buildings?