Episode 1: Ride to Glory

Introduction and Exploratory Data Analysis

The citi.bike bike share program in New York City is looking to add two more bike share locations. These sites should be decided based on two factors, Raising traffic (in both subscribers and non-subscribers) and increasing female traffic.

We began by identifying the variable most important for determining whether somebody is a customer (non-subscriber) or subscriber using classification trees. Our trees demonstrated that the trips duration was the most important classification variable. Using this information, we plotted the trip duration histogram below. The red line on the graph, represents 30 minutes, after which price increases. We were thus able to find a set of outliers, which we identified as rides above 3600 seconds, or 1 hour. We found that Customers tended to stay near that limit, as can be seen in the table below. Subscribers, however, tended to use bikes for shorter rides, mostly under 20 minutes.

We then theorized that there may be significant differences during the different periods of the work day. Theorizing, that there were morning commuters, evening commuters and leisure riders in the middle of the day. We plotted the top destinations and starting points on a geogrphical mapping of New York City. Below is the graph, the red and yellow dots represent Customers and blue to green represents Subscriebrs. Using this classification and this graph, we identified how Subscribers, the majority of our data, used bikes. We found short trips, often from bus and subway stations and major traffic hubs to destinations less than 20 minutes away. With this information, we designed our model.