Problem statement
Lyber Inc, is the newest ride sharing competitor to Uber and Lyft in the transportation industry. They are looking for a recommendation on the optimal hotspots to place their vehicles based on inefficiencies found in their competitor Uber's current vehicle dispersion. The specific target area is the MTA subway stations in NYC area.
Strategy
We target subway riders exiting stations. We assume that a potential Lyber user uses the MTA system for the long haul before switching to a ride sharing service for the remainder of his/her trip after exiting a station.
We need to identify the areas (subways stations) where there is a significant lack of supply (competitor company drivers) compared to demand (potential riders) and encourage Lyber drivers to prioritize these areas.
To calculate the supply from a competitor (Uber), we use the publicly available data from Uber with the time and location for each pickup.
To calculate the demand from the potential riders, we use the MTA turnstile data which is also publicly available. In addition, we found station entrance coordinates for the stations of interest.
By comparing the two for a particular station within a given time frame (for example, 6 months), we can have some idea on how well the Uber is serving the potential riders from that particular station. The opportunity arrives when we identify a station where Uber is clearly not sending** enough cars.
** Uber does not technically send the cars anywhere, however, it increases the fare for some areas as an incentive to their drivers.
Hypotheses
We have two hypotheses/assumptions as the basis for our data project.
(1) The potential number of ride requests is proportional to the number of people exiting the station at any given time
​
(2) The above effect disappears after a certain distance R from the exit of the station.
Mapping pickups
We look at all Uber pickups within the same time frame (6 months) and registered only the pickup events that are close enough to the station(within a search radius of 200 meters, an estimate of an average city block size). The reason to include the radius is that the concentration effect dies down further away from the station and has little influence on the chance of pickup.
Example correlations
The following charts illustrate the correlations between the number of station exits and Uber pickup for two stations, respectively.
The high correlation suggests that when there is a high number of people exiting the station (Bleecker St in this case), the corresponding Uber pickup number also increases. This translates to a high efficiency of Uber for the station, where there might be less opportunity for Lyber drivers.
The low correlation, on the other hand, indicates a gap between the supply and the demand. Uber appears to be out of sync with the number of people exiting the station (Morris Park in this case) and potentially losing business. Therefore this station is where Lyber could benefit by sending more drivers.
Calculate correlation for all stations
We then apply the same concept to all stations of interest and calculate the correlation between the number of people exiting the station and Uber pickups. These results have been aggregated for the 6 months of available Uber data.
Recommended target stations
Our preliminary findings gave us the 10 least correlated stations for passenger exits with Uber pickups. ​We found the following stations to have the lowest Uber to turnstile exit correlations making them ideal locations .
Summary
The project made substantial use of the following skills:
pandas data wrangling, time series analysis, data visualization
The following can be followed up in the future:
-
More analysis of other competitors
-
Span the duration of the research over a longer time frame
-
Examine more trends (and into the future)
-
Join with other data for more accurate predictions
-
Financial projection