Wednesday, February 26, 2025

Modelling Daily Takings at the Pub



Our pub, the Talbot Taphouse is not our pub any more - we quit nearly ten years ago. However we still have data so I thought I'd give some modelling ideas a go. What factors might influence how much a pub makes on a daily basis? Day of the week. Weather. Beers on tap and their quality. Ambience and cleanliness. Staff... efficiency? Attitude? The Talbot was wet-led so food doesn't come into the equation here. 

I had a sliver of data for Q3 2015 so day of the week we can do. Using the Weather Underground web site I can get daily weather data for the period, sourced from a weather station at East Midlands Airport: close enough? For some reason rainfall wasn't recorded so humidity will have to do, along with temperature, dew point, wind speed and pressure. 

First we check a few things in the data to see if Multiple Linear Regression is a permissible modelling regime. This involves questions about distributions (normal?), outliers (none) and variances (similar) where each has a specific statistical test which I won't go into right now. Suffice to say the data pretty much passed the tests, although I had some misgivings about outliers. 

I was going to do a layered approach, adding various data sets to a model and see which ones were effective at describing the takings. First up, I tried a 'weather-only' model, using all weather fields. This fared poorly, with an R-squared value of 0.00585. This shows the model describes just 0.6% of the change in daily takings. 

Add in the day of the week and the R-squared shoots up to 0.7747, or 77.5%! In fact, interrogating this model showed that only Thursday, Friday, Saturday and Sunday were significant in predicting takings. 

Next I removed the weather data and just used day of the week. Here, the R-squared value held steady at 0.7517, and all the days except Friday were significant predictors of takings. Going back to the data I was intrigued to see how well the points might be predicted, so I took 20% of the points as a test set and retrained the model on the 80%. R-squared was 0.7548, so comparable with previous iterations. Running the now previously unseen data through the model , I got the values it predicted. These are shown in the graph, above. Blue crosses are the training (actual) values and red circles are those predicted by the model. Looks pretty good, eh?

As for the other data, well back in 2015 we didn't really do Untappd (for beer scores) or TripAdvisor (for ambience and staff) so I'll have to leave things there for this pub. I'm working on getting data for other pubs so watch this space...

Evaluating Embeddings for NLP and Document Clustering

Embeddings are an alternative to traditional methods of vectorisation first proposed by Bengio et al in 2003, who developed the first langua...