Skip to content

LinusSharp/dissertation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Leveraging Machine Learning and Diverse Data Sources for Accurate Footfall Classification in Small Hospitality Venues

Project Overview

This repository contains the code and resources for the dissertation titled "Leveraging Machine Learning and Diverse Data Sources for Accurate Footfall Classification in Small Hospitality Venues".

The project aims to address the challenge faced by small hospitality businesses in accurately predicting customer footfall. It explores the use of machine learning (ML) models combined with diverse data sources, including urban sensor counts (York, Leeds), coffee shop transaction data (Nottingham), historical weather records, and event schedules to classify daily footfall into categories ('Low' to 'High').

Key methods involve data preprocessing, feature engineering (using K-Means clustering to define footfall categories), Principal Component Analysis (PCA), and training/evaluating four ML classifiers (Logistic Regression, Random Forest, XGBoost, Neural Network) across different scenarios (city-specific, cross-city, combined multi-city).

The primary finding is that while direct model transfer between cities is challenging, a unified model trained on combined multi-city data achieves strong general performance (up to 81% accuracy with Random Forest) and significantly improves predictions for the specific coffee shop case study.

Prerequisites

  1. Python Environment
    Make sure you have Python 3.7+ installed. I recommend setting up and activating a virtual environment (venv or conda) before installing the necessary dependencies.

  2. Dependencies
    Install the necessary libraries:

    pip install pandas numpy scikit-learn holidays os matplotlib scipy xgboost json sqlite3 seaborn
  3. Database
    Ensure that the database file (footfall_weather.db) is placed in the database folder.

Project Structure

A simplified look at the relevant directories and files:

.
├── database
│   └── footfall_weather.db
├── scripts
│   ├── PreProcessing
│   │   ├── Leeds
│   │   │   └── Leeds_data_preprocessing.py
│   │   ├── Nottingham
│   │   │   └── Notts_data_preprocessing.py
│   │   ├── York
│   │   │   └── York_data_preprocessing.py
│   └── models
│       └── scenarios
│           └── model_1.py
├── .gitignore
├── README.md
└── ...

Running the Project

  1. Preprocessing Scripts

    Run the following scripts before running the main model script. These preprocessing scripts will connect to the SQLite database (footfall_weather.db) and generate three CSV files (one for each city).

    • Leeds_data_preprocessing.py
    • Notts_data_preprocessing.py
    • York_data_preprocessing.py

    After running these scripts, you should see three new CSV files (leeds_classification_data.csv, nottingham_classification_data.csv, and york_classification_data.csv) in your root.

  2. Model Script

    With all three CSVs generated, run the main model script!

    This script will process the combined data from the three CSV files and output a JSON file with all the results (for example, model_comparison_results.json).

  3. Reviewing the Output

    • CSV Files: Verify that the CSV files are properly generated in the root directory.
    • JSON Output: Check that the JSON file with results has been created. This file will contain the details of the model's performance and comparisons.

Here is an overview of the projects pipeline

pipeline

About

This is my dissertation project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages