Introduction and Problem
To follow Kroc’s lead, one might leverage location and geographic data to try to identify new real estate opportunities that could serve as growth areas for new markets. This is a ripe area for data science to explore.
The Foursquare database contains extensive data that maps food industry locations. It is an ideal resource for determining the density and concentration of specific types of food industry locations.
With a powerful scripting and data science language like Python, large location databases like Foursquare’s can be leveraged to help identify possible new opportunities in key growth areas. However, data scientists would do well to use data like this to identify whether specific locations may already be experiencing some saturation of existing competitors already. If one neighborhood already has three hamburger joints, it might be more difficult to attract customers to another.
This final project will use geographic data for the city of Boston to identify food establishments of a particular type (in this case what Foursquare refers to as “Pizza Places”) to determine whether certain neighborhoods may already be showing some saturation already. In addition, we will use this same information to understand if the existing neighborhood locations serve as sufficient categories for which to group these pizza places. Are all pizza places in the Financial District the same, or is there more variation between neighborhoods than within neighborhoods geographically?
These questions are difficult to answer completely because of the limited nature of the data that we will be pulling from Foursquare for this assignment. However, many of these questions can be explored in detail even with the data that we already have access to. This is an exciting and useful area to be doing research in data science and it could be extremely useful to corporate headquarters of any number of brick-and-morter service industries including food service, retail, fitness centers, and grocery stores.
Data
The city of Boston posts its location data publicly on its website to help real estate developers, researchers, and city planners have access to this information. In this lab, we will download one of the city’s more popular neighborhood datasets simply called “Boston Neighborhoods”, available [here]( https://data.boston.gov/dataset/boston-neighborhoods/resource/13ee2b65-6547-4168-b112-83995f138602).
In this project will also be downloading and accessing data from Foursquare. Foursquare is a company that provides location data and intelligence to its customers - primarily web developers - for use in their applications. In this project, we will be making a limited number of calls to Foursquare’s API to pull data about pizza places in areas of the city of particular interest. We will be using the Foursquare [explore endpoint]( https://developer.foursquare.com/docs/places-api/endpoints/) to get venue recommendations in the “Pizza Place” Category.
To begin, we will be pulling those pizza place categories that correspond to the top 50 locations of the geographic center of each of the neighborhoods in question. These geographic centers we will identify through Google.
To make this assignment more practical, we will be excluding several neighborhoods in our analysis. This will include most of the larger, outlying neighborhoods outside of the city center (including Brighton, Allston, Dorchester, Roxbury, Mattapan, the Harbor Islands, Roslindale, West Roxbury, and others).
Dictionary and Libraries: To complete this project, we will also be downloading a number of python scripting libraries. This includes the pandas library to work with data, the Numpy library to work with vectorized data, the json library to work with the json data that we will be downloading from the City of Boston, the Geopy library to find location data for Boston, the MatplotLib library to plot our data, Folium to render our data in a map, and SkLearn to run a Kmeans clustering algorithm to group the restaurant data into clusters.
Neighborhood Centers
Pizza Places from Foursquare
Pizza Places Categorized by Geography
Results and Discussion
One large takeaway from this analysis is that the South Boston Waterfront (Otherwise known as the Seaport District) has a dearth of pizza restaurants in general. Although this section of the city has been under extensive development recently, there is a paucity of pizza restaurants in this section of the city. Anyone who desired pizza who happened to be in this area would have to cross the fort point channel to get downtown.
Another key takeaway is that the neighborhoods are not a particularly useful starting place from which to begin our analysis. In order to get a fuller picture of the most operative development opportunities for Pizza Restaurants around the city, more research would be necessary to understand zoning restrictions and also the general socioeconomic status of any development area in question. Like McDonald’s, a pizza chain might fit best along highly trafficked areas of sprawling suburban areas.