Real estate developers and financial firms both benefit from a data-driven picture of where people live, shop, and work. At no other time in recent history has the United States seen so many people moving (temporarily or permanently) than during the COVID-19 pandemic. Be it the new opportunity to work from anywhere, seeking lower taxes, or a desire to “slow down,” all the news tells us is that the fabric of our cities are changing.
In 2020, we developed and publicly released a series of Mobility Insights dashboards to help people understand these types of trends. Our Home Switcher Trend Analysis dashboard looks at the rate of relocation and the most popular destinations for recent home switchers across the US. Available at the national, state, and county level, this analysis compares how relocations over time compare to last year, showing how the pandemic and resulting economic shifts have impacted residency across the nation.
While the Home Switcher dashboard has been extremely valuable for many people, we also understand that some customers may need to approach the same data in slightly different ways than we do. In fact, this is exactly why the Spectus Data Clean Room was developed – to enable customers to gain access to the same underlying data Spectus uses, with the ability to customize the analysis to suit their needs.
In this post, we will walk through the steps that one can take within the Spectus data clean room to tailor their analysis of location data with maximum flexibility. We will focus on two real world scenarios related to real estate development during the first half of 2021:
- Detecting potential gentrifying areas in California / Los Angeles county
- Finding devices which switched homes from New York City to Florida
Disclaimer: This post goes into technical detail and is therefore intended for a more technical audience such as data analysts and data scientists. We assume readers have, at a minimum, knowledge of SQL, Python and Jupyter Notebooks.
Setting the stage
Before we dive in, it’s important to explain a few concepts specific to Spectus and the definition of Home Switchers.
Spectus users have access to various precomputed datasets that are used in the analysis we describe below. Additionally, we provide various tools, such as a Jupyter Notebook environment and a managed infrastructure, users need to perform analysis on the datasets. Using these tools, one can write SQL queries or Python code to analyze the data inside of our data clean room and create their own custom analyses.
There are a few significant benefits of how we have set up the Spectus infrastructure that are important to share before diving into the Home Switchers analysis below. The aforementioned precomputed datasets we give clients access to are standardized across all clients in terms of the data itself and the format. Likewise, the default tools to analyze that data, such as Jupyter Notebooks, are the same across clients. This gives us the opportunity to package up and share our institutional knowledge about how to combine the tools, data to glean insight from location data in the form of what we call notebook tutorials. These tutorials are sample notebooks that are part documentation, part code samples and visualizations that clients can run in Spectus out of the box to learn from and cut short the learning curve. Clients can even copy these tutorials and modify them to suit their needs. The scope of a given tutorial can range from education on a particular precomputed dataset to best practices on a tool available in our data clean room to location intelligence use cases.
In the remainder of this post, we are going to take a deep dive on one of these notebook tutorials. The tutorial guides users on how to use standard datasets available on Spectus to identify home switchers and perform customized analysis on these users.
Defining a Home Switcher
Now, let’s define “Who is a home switcher?” Spectus defines a home switcher as a user who has lived in one place for a sustained period of time, then moved to a new location after a certain period of time. Spectus has developed and refined algorithms over the last several years to infer home areas and exposes this information at the Census Block Group level – a level of granularity that retains the utility by making it possible to join with Census data, yet broad enough to protect the privacy of users. Spectus users benefit from having this precomputed data at their fingertips and, as we’ll see below, will be very handy for performing this analysis.
Defining the Switchers class
To provide users with flexibility to customize the analysis to suit their needs, our Data Science team defined a Python class for “Switchers” in the notebook tutorial to enable users to easily change parameters of the analysis:
- Select the date range you want to analyze
- Specify the state you want to focus on – users can pick a single state to evaluate intra-state relocations at the census block or county level, or analyze cross-state moves with a state-level analysis.
As you can see from the partial snippet below, we share the definition of the class with all Spectus users. This gives users the ability to understand exactly how it works and, if desired, modify the definition themselves.
All you need to do to use the class for your analysis is to specify values for the minimal parameters in cell 6 and run the subsequent cells as shown here:
Now when we run our subsequent queries, results are limited to the values we set in our parameters without any additional input. The next few queries in the notebook prepare a dataset that will be the starting point for multiple aggregations and analyses later on.
This dataset contains devices that meet the criteria defined in the Switchers class along with:
- The FIPS code of the state they moved from.
- The first week of analysis specified by the user.
- The FIPS code of the state they moved to.
- The last week of analysis specified by the user.
With virtually no effort on the user, we can now begin to use this dataset to dig into interesting analyses such as the use cases described below.
Use Case: Which Regions of Los Angeles are Gentrifying?
Now that you’ve seen how the Spectus notebooks are set up and can be customized, let’s use them to address the two use cases introduced at the beginning.
A practical use case for real estate developers could be to locate areas of a city that are seeing a recent influx of people from wealthier areas. Such an influx could reveal demographic trends that precede the sharp rise in real estate value of an urban region.
In this section we answer “Which areas in Los Angeles, CA have seen home switchers moving in from higher income areas to lower income areas, exclusively within that state?”
We use a similar methodology as the table generated above, but set the resolution to blockgroup and restrict the dataset to only those devices within California. We’ll use the same dates as before, January 1-June 30, 2020, as above, so there is no need to create a new time frame. We do however need to tailor the query a little bit to serve our purpose:
The resulting structure of the data table is the same as outlined for the previous one. Though you can see the blockgroup (bg) has been changed to only those that correspond to California (state fip 06) for both the ‘from’ blockgroup and the ‘to’ blockgroup.
Spectus’ public data tables: Census blockgroup income
To address the Income element of the hypothesis, we must overlay US Census data. Spectus provides out-of-the-box census data and polygons within the platform to make this type of data enrichment very straightforward. To that end, we will append median income data from the Census and make a simple classification of potentially interesting blockgroups that have had an influx from people that used to live in richer areas. The resulting data will have a column for Yearly Median Household Income (med_hh_income).
Blockgroups with influx from higher income areas
Next we need to look at the difference in incomes between the origin to the destination, using the median household income of each incoming/outgoing geography. Then, classify the destination blockgroup higher/lower income with respect to the origin blockgroup.
The steps of processing end up looking like this:
- Join income from origin blockgroup
- Join income from destination blockgroup
- Difference in incomes between origin and destination (positive means destination is lower income)
- Determine whether the income at the destination blockgroup is lower
Next, we will calculate which blockgroups are seeing an influx from higher income areas by leveraging the geography data readily available within the platform. Then you’re left with a table that shows how many devices moved to which blockgroups with lower income than their origin.
To go one step further, you can add visualizations to make results even easier to interpret.
Leveraging commonly used open source Python visualization packages to show which blockgroups in California saw an influx of people from higher income areas, indicating gentrification.
Los Angeles County
Now, what if we want to dive deeper and do a more granular analysis of just LA? From the resulting data above we can zoom in or out as needed.
In this case we want to look only at Los Angeles county, so we specify its FIPS code and look at the subset of devices which moved to blockgroups with lower median income.
The previous maps and tables display which blockgroups show the biggest influx of home-switchers from areas with higher median income, for both the entire state of California and Los Angeles county. An influx of population from higher income blockgroups could mean that prices here will start rising in a few years, making it an interesting place to keep an eye on for real estate opportunities.
Use Case: How Many New Yorkers Moved to Florida during H1 2021?
The Home Switcher analysis can also be deployed to observe more macro trends. For example, we can start by asking the question: Which devices moved from New York City to Florida during the first part of 2021? And which NYC boroughs did they come from?
As in the examples above, we start by defining the from/to regions, but this time we specify two different geographical levels: county (borough) and state. We specify the 5 NYC boroughs and the Florida FIPS.
You can see the code assigned to each borough also listed in the geo-from column and Florida is represented by “12” and can be seen in the geo_to column.
We use this table to understand the proportion of devices moving from each borough. The data shows that Queens and Brooklyn have the highest proportion of origin devices, while Staten Island and the Bronx have the least.
The use cases presented here show multiple ways of generating custom analysis for home switchers at multiple levels of geography. Additionally, we provide just a few examples of how these insights can be applied to answer a wide array of applications for real estate, finance, academia, and beyond. The examples presented here showcase the power and flexibility of Spectus’ home inference tables, and we have provided code and examples that will allow analysts and data scientists to further refine their analyses. If you would like to gain access to the Spectus Data Clean Room to go deeper, we encourage you to reach out to a Spectus representative.