Machine learning is an extremely important topic in computer science. We’ve come to the point where there’s some problems that just cannot be solved with algorithms and code, and machine learning is the solution.
I haven’t had a ton of experience with machine learning beyond Andrew Ng’s amazing Machine Learning course and I recently set out to change that.
Walt Disney World
Walt Disney World parks attract tens of millions of visitors per year. In fact, Magic Kingdom, the flagship park, hosts about 20 million visitors per year by itself. Given the fixed number of attractions and their ride capacities in the thousands-of-visitors-per-hour, this can translate to extremely long wait times during peak seasons.
In general, you can predict what sorts of wait times to expect based on the day of the week and the season– September, for example, tends to have fairly low attendance due to kids going back to school, while Thanksgiving is one of the most popular days of the year and often elicits hours-long wait times for some attractions.
Using this information, you can make a general decision on when to make your vacation. Most people don’t visit Walt Disney World and stay on a strict schedule– after all, it’s supposed to be a vacation, not a drill.
But, there’s a dedicated few that are committed to visiting the maximum number of attractions as possible, and a general sense of park crowded-ness is not enough.
Predicting Wait Times with Neural Networks
With my mission to learn how to use machine learning techniques in a real setting, I decided that predicting theme park wait times would be a good start. The data is available, constantly flowing, and generally in a pattern.
I had a couple of choices– use some sort of regression model, which was the obvious choice for a job like this (use of continuous inputs to generate a continuous output), or a neural network regression model. Ultimately, for a variety of reasons, I opted to use neural networks.
Many people may believe neural networks are overkill for a simple regression problem, but my experiments with several machine learning libraries determined that it would be the easiest solution to implement and get data from, given that I was using Node.js.
With this starting point, Park Genius was born.
Before we can glean any insights from the raw wait time data, we need to train the neural network model. Every couple of minutes, the official posted wait times are sourced from Disney directly. These numbers are artificially inflated, but they give a general sense of how long the wait will be for an attraction. User-submitted wait times are also supported and weighted more heavily to train the model, but with a user base of exactly 11, there wasn’t a lot of user-submitted data to use.
Fortunately, the data is both numerous and fairly high quality. There’s no real bogus data that needs to be removed, and some of the noise where a wait time will go from 60-to-90-to-60 in the course of a couple minutes is smoothed by the prediction model.
Neural networks are great and finding patterns in data by itself. However, wait time trends actually have more nuanced patterns than simply fluctuating over the course of the day. In fact, there’s many “cycles” that affect wait times at theme parks:
- Time of the day. The simple one.
- Day of the year. Holidays are extremely busy compared to a normal day in the same month.
- Day of the week. Weekdays are generally calmer
- Month of the year. During peak seasons, the crowds can be much higher than during the off season.
Additionally, there’s special events (such as Disney’s Food and Wine Festival at Epcot’s World Showcase) that draw a significant number of visitors, but don’t necessarily occur on the same set of days every year.
Because of this, a single neural network didn’t suffice– my first experiments resulted in a single neural network producing somewhat reasonable wait time predictions, but could be wildly off between week days and weekends.
The breadth of these “cycles” meant that the minimal set of data I have collected was going to be an issue. Though Park Genius operates on about 65k data points across all of the attractions in Magic Kingdom (as of Feb. 24th, 2016), this is actually not enough. As previously mentioned, there are trends that go beyond the wait time fluctuations in a single day. This means that, at the moment, if you were to look at the predictions for Thanksgiving 2016 or some other holiday, Park Genius would underestimate the wait times. Unless a historical data archive can be used for training, the prediction model will not be accurate for special days, like holidays, until after they’ve already occurred in 2016.
I chose to make a system of neural networks that incorporated data trained using the above factors. This means that each attraction has multiple networks associated with it, which are then consulted at prediction time and combined in a weighted average.
For the current day, the model is both trained and the data re-predicted every hour. This means that as the current day goes on, the predictions for that day will be updated and made more accurate. Watching this process is actually quite fascinating because you can see the prediction lines change over the course of the day as they become more accurate.
Prediction data is available on the site as a simple line graph that shows both the official wait times, as well as the predictions that Park Genius has come up with.
Using the Data
Wait time predictions are fascinating, but by themselves not extremely useful. After all, a visitor to Walt Disney World will probably not be able to guess exactly when they will visit which attraction– especially if it’s their first trip.
However, this data can be used in some interesting ways, such as to build a touring plan for the theme parks. Park Genius actually tracks several different attributes of the theme parks and their attractions– wait times and predictions, the length and intensity of the attraction, and the physical geographical location of the attraction for example.
Using this data together, I can build a touring plan for someone that wants to visit Magic Kingdom, prefers roller coasters, but wishes to skip kid-focused attractions like “Stitch’s Great Escape!”.
The way to solve for this “optimal”, yet customized, touring plan is essentially a dynamically changing travelling salesman problem– the weights change over the course of the day, but you still want to try and find the quickest route (in terms of travel time and waiting for attractions) between every attraction.
A preliminary version of the planning software actually generated something that is somewhat reasonable looking, and will only get better with some extra tuning.
Though the wait time prediction aspect of Park Genius was fun to implement and a great way for me to use neural networks in a practical application, I’m even more excited to actually use the data in interesting ways. Though the planner is close to completion, I have several new ideas to build on the data set I’ve collected and can’t wait to bring them to fruition.
Be sure to checkout Park Genius and the predictions that it generates. If you’re planning on visiting Magic Kingdom anytime soon, it might even be helpful for you.
- That one user, me, doesn’t even live anywhere near a Disney park. ↩