Train predictions are hard to get right; any number of things can make your train late, early or on-time at any given moment. Some might say it’s impossible if you factor in delays. (We’ve spoken to some WMATA employees who have told us as much.) It very well might be, but maybe we’re getting closer?
Today, MetroHero rolled out a new kind of system for generating train ETAs at Metrorail stations: one that observes each isolated trip between every two neighboring stations in the system and adjusts its predictions accordingly, all in real-time. This system doesn’t use WMATA’s PIDS (Passenger Information Display System) at all, but rather an in-house algorithm that relies exclusively on Metro’s new real-time train position APIs.
So, how exactly does it work?
Here it is at a very basic level:
For any given train trip between two neighboring stations, the elapsed time between starting and finishing the trip (i.e. pulling away from the first station and pulling into the next station, respectively) is compared against the fastest time a train has ever made the same trip. We assume that the fastest trip a train has ever made through a given stretch of track is the “expected” or most ideal possible trip time for that stretch, given that there is a certain threshold after which a train cannot physically go faster due to physical limitations and safety protocols. In a perfect scenario, this is about the speed that every train should achieve when traversing that same stretch of track. Therefore, if the current train’s trip time is faster than the fastest recorded time in the system, we use this new value as the new metric of comparison for all subsequent trains.
Most of the time, however, trains traveling through a stretch of track are going slower than the ideal speed. When a train takes longer than the expected time for a given trip, this trip’s stretch incurs a penalty (equal to the difference between the expected trip time and this most recent train’s trip time). This penalty affects the predictions for all the trains further up the line that have yet to pass through the same stretch of track–at least until the next train comes through and either reduces this penalty (by traveling faster than the train before it) or makes it worse (by going slower). If you look at MetroHero’s line maps today, some of this is visualized by the heatmaps flanking either side of the colored line: yellow/orange is used to designate penalties of 30 to 60 seconds, while red indicates delays of 60 seconds or more.
Yes, we equate any discrepancies with delays; even the smallest amount of deviation can be an indicator of something more severe, whether it be trains holding ahead on the next stretch of track or a minor speed restriction in the immediate area. These deviations can quickly add up, especially when trying to accurately predict the arrival of trains many, many minutes away, which is a challenge both MetroHero and the PIDS face.
Hold on there, Jethro! Why not just use WMATA’s train predictions and the PIDS? Why make your own system that provides you many of the same things instead?
Frankly, because the PIDS alone doesn’t do everything we’re doing (yet). Sure, Metro’s train predictions APIs are an easily accessible way to get started. After all, almost every app that offers true real-time data–not Google, for example, which uses static train schedule data like this, which we think is the furthest thing from real-time–uses this train predictions feed from WMATA. But the data they offer today doesn’t meet our needs, nor do we believe it meets the needs of riders. There are only three train predictions per platform per station, train predictions can clear out or disappear entirely, there’s no way to tie one train prediction for one station to another at a different station… all of this makes it very difficult for other app developers to do the types of things MetroHero does: uniquely identify and track trains, show if a given train is slow/holding, show how many delays a train has accumulated during its trip so far, allow users to tag individual trains for good/bad traits, etc. The new train positions API can be used solve these problems, but only with a lot of hard work; it is not an out-of-the-box solution like the train predictions API is, and while Metro has plans to improve the PIDS and ultimately the train predictions API, it will probably be a while longer until third-party developers can get their hands on this kind of convenience.
Rather than wait for things to get better, we knew that if we spent the time building our own PIDS-like system, we would have more control over the types of things we can do versus relying on WMATA to make timely changes just for us. After all, almost none of this is really Metro’s fault; they’re a large, siloed organization with a lot of stakeholders and red tape, while MetroHero is just two software engineers working in their spare time on something they use as riders everyday, so the less we can ask of them, the better it is for everyone. Plus, we can get away with a hell of a lot more and turn feature requests around faster by not being as dependent on our primary data source.
Neat. (I’m clearly interested because I kept reading through walls of text.) So, what now? What are some next steps?
Ultimately, we don’t yet use trip time deviations between stations when calculating train ETAs for individual stations. This is an area we’re exploring carefully rather than jumping the gun and just doing it, as there are still many unanswered questions. For example, if a train is holding for many, many minutes longer than expected, does that mean the trains waiting behind it will as well when they get to the same stretch of track (e.g. track problem, single-tracking), or is this just a one-off delay specific to the train that might be cleared up by the time it starts to move again? This is when data about incidents that might explain why said train is holding could be critical, like if WMATA or folks on Twitter say there’s a mechanical problem vs. single-tracking in the area, as those could influence how long we should expect the train to continue to hold as well as how bad the residual delays might be for any trains behind it.
Anyway, that’s all for now. We’ll be sure to report back with any more progress we make. In the meantime, feel free to contact us directly via email at email@example.com with any comments, questions, suggestions or complaints. We read and respond to them all.