Leveraging Machine Learning and Diverse Data Sources for Accurate Footfall Classification in Small Hospitality Venues

Project Overview

This repository contains the code and resources for the dissertation titled "Leveraging Machine Learning and Diverse Data Sources for Accurate Footfall Classification in Small Hospitality Venues".

The project aims to address the challenge faced by small hospitality businesses in accurately predicting customer footfall. It explores the use of machine learning (ML) models combined with diverse data sources, including urban sensor counts (York, Leeds), coffee shop transaction data (Nottingham), historical weather records, and event schedules to classify daily footfall into categories ('Low' to 'High').

Key methods involve data preprocessing, feature engineering (using K-Means clustering to define footfall categories), Principal Component Analysis (PCA), and training/evaluating four ML classifiers (Logistic Regression, Random Forest, XGBoost, Neural Network) across different scenarios (city-specific, cross-city, combined multi-city).

The primary finding is that while direct model transfer between cities is challenging, a unified model trained on combined multi-city data achieves strong general performance (up to 81% accuracy with Random Forest) and significantly improves predictions for the specific coffee shop case study.

Prerequisites

Python Environment
Make sure you have Python 3.7+ installed. I recommend setting up and activating a virtual environment (venv or conda) before installing the necessary dependencies.

Dependencies
Install the necessary libraries:

pip install pandas numpy scikit-learn holidays os matplotlib scipy xgboost json sqlite3 seaborn

Database
Ensure that the database file (footfall_weather.db) is placed in the database folder.

Project Structure

A simplified look at the relevant directories and files:

.
├── database
│   └── footfall_weather.db
├── scripts
│   ├── PreProcessing
│   │   ├── Leeds
│   │   │   └── Leeds_data_preprocessing.py
│   │   ├── Nottingham
│   │   │   └── Notts_data_preprocessing.py
│   │   ├── York
│   │   │   └── York_data_preprocessing.py
│   └── models
│       └── scenarios
│           └── model_1.py
├── .gitignore
├── README.md
└── ...

Running the Project

Preprocessing Scripts

Run the following scripts before running the main model script. These preprocessing scripts will connect to the SQLite database (footfall_weather.db) and generate three CSV files (one for each city).
- Leeds_data_preprocessing.py
- Notts_data_preprocessing.py
- York_data_preprocessing.py
After running these scripts, you should see three new CSV files (leeds_classification_data.csv, nottingham_classification_data.csv, and york_classification_data.csv) in your root.
Model Script

With all three CSVs generated, run the main model script!

This script will process the combined data from the three CSV files and output a JSON file with all the results (for example, model_comparison_results.json).
Reviewing the Output
- CSV Files: Verify that the CSV files are properly generated in the root directory.
- JSON Output: Check that the JSON file with results has been created. This file will contain the details of the model's performance and comparisons.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
docs		docs
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leveraging Machine Learning and Diverse Data Sources for Accurate Footfall Classification in Small Hospitality Venues

Project Overview

Prerequisites

Project Structure

Running the Project

Here is an overview of the projects pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

LinusSharp/dissertation

Folders and files

Latest commit

History

Repository files navigation

Leveraging Machine Learning and Diverse Data Sources for Accurate Footfall Classification in Small Hospitality Venues

Project Overview

Prerequisites

Project Structure

Running the Project

Here is an overview of the projects pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages