Leveraging Machine Learning and Diverse Data Sources for Accurate Footfall Classification in Small Hospitality Venues
This repository contains the code and resources for the dissertation titled "Leveraging Machine Learning and Diverse Data Sources for Accurate Footfall Classification in Small Hospitality Venues".
The project aims to address the challenge faced by small hospitality businesses in accurately predicting customer footfall. It explores the use of machine learning (ML) models combined with diverse data sources, including urban sensor counts (York, Leeds), coffee shop transaction data (Nottingham), historical weather records, and event schedules to classify daily footfall into categories ('Low' to 'High').
Key methods involve data preprocessing, feature engineering (using K-Means clustering to define footfall categories), Principal Component Analysis (PCA), and training/evaluating four ML classifiers (Logistic Regression, Random Forest, XGBoost, Neural Network) across different scenarios (city-specific, cross-city, combined multi-city).
The primary finding is that while direct model transfer between cities is challenging, a unified model trained on combined multi-city data achieves strong general performance (up to 81% accuracy with Random Forest) and significantly improves predictions for the specific coffee shop case study.
-
Python Environment
Make sure you have Python 3.7+ installed. I recommend setting up and activating a virtual environment (venvorconda) before installing the necessary dependencies. -
Dependencies
Install the necessary libraries:pip install pandas numpy scikit-learn holidays os matplotlib scipy xgboost json sqlite3 seaborn
-
Database
Ensure that the database file (footfall_weather.db) is placed in thedatabasefolder.
A simplified look at the relevant directories and files:
.
├── database
│ └── footfall_weather.db
├── scripts
│ ├── PreProcessing
│ │ ├── Leeds
│ │ │ └── Leeds_data_preprocessing.py
│ │ ├── Nottingham
│ │ │ └── Notts_data_preprocessing.py
│ │ ├── York
│ │ │ └── York_data_preprocessing.py
│ └── models
│ └── scenarios
│ └── model_1.py
├── .gitignore
├── README.md
└── ...
-
Preprocessing Scripts
Run the following scripts before running the main model script. These preprocessing scripts will connect to the SQLite database (
footfall_weather.db) and generate three CSV files (one for each city).Leeds_data_preprocessing.pyNotts_data_preprocessing.pyYork_data_preprocessing.py
After running these scripts, you should see three new CSV files (
leeds_classification_data.csv,nottingham_classification_data.csv, andyork_classification_data.csv) in yourroot. -
Model Script
With all three CSVs generated, run the main model script!
This script will process the combined data from the three CSV files and output a JSON file with all the results (for example,
model_comparison_results.json). -
Reviewing the Output
- CSV Files: Verify that the CSV files are properly generated in the
rootdirectory. - JSON Output: Check that the JSON file with results has been created. This file will contain the details of the model's performance and comparisons.
- CSV Files: Verify that the CSV files are properly generated in the
