This document illustrates the results of a parameterizable reproducible research process. It pulls train delay data from WMATA, then does some simple analyses on that data.

5466 delays were loaded.

The data was filtered to only include the 1675 events on the Red line. The first delay on that line was at 2012-04-30 08:11:00; the last was at 2013-07-07 17:20:00. Here are the three most common causes for delays on that line: a brake problem, a door problem, an equipment problem.

This table shows the mean delay and counts of the most frequent causes:

Cause mean_delay n
a brake problem 8.337 499
a door problem 7.291 255
an equipment problem 6.94 215
a signal problem 9.991 112
expressed for schedule adherence/improved train spacing NaN 102
an operational problem 6.337 101
a sick customer 7.25 66
did not operate 6.929 60
police activity 6.822 46
11.64 37

And this graph shows when delays happened by date and hour.

plot of chunk Plot_Data