My Own Room

NeuralAnalytics: Real-Time Brain Signal Analysis for my Bachelor's Thesis

2025-12-01T00:00:00+00:00

After months of hard work, I’m thrilled to finally share the culmination of my Bachelor’s Thesis at the Universitat Politècnica de València. What started as a curious exploration into the intersection of neuroscience and software engineering has evolved into a fully functional brain-computer interface system capable of analyzing EEG signals in real-time using deep learning techniques.

This project, which I’ve named NeuralAnalytics, represents not just the end of my academic journey, but also the beginning of something that I believe has genuine potential to improve lives. The core idea is straightforward but ambitious: capture brain signals, process them in real-time, and translate specific neural patterns into actionable commands—like turning on a light bulb using only the power of thought.

The Vision Behind the Project

When I first started thinking about what my Bachelor’s Thesis should be, I knew I wanted something challenging—something that would push me beyond the comfortable boundaries of conventional software development. My passion for informatics began in adolescence, but I wanted to apply it to a domain that could make a tangible difference in people’s lives. The idea of creating a system that could interpret human brain activity felt like the perfect intersection of my interests in embedded systems, deep learning, and real-time computing. What particularly motivated me was the potential to help individuals with motor disabilities—when I read about Stephen Hawking’s communication challenges, I realized this technology could genuinely improve quality of life for people who struggle with traditional interfaces. Additionally, I was driven by the desire to contribute to the scientific community. Rather than pursuing commercial gain, I chose to publish all code and models openly, hoping other researchers could build upon this work. The project represents not just an academic requirement, but a meaningful step toward making brain-computer interface technology more accessible and practical for everyday use.

The project follows the regulatory framework of UNE-EN 62304 for medical device software, which added an extra layer of complexity but also gave me invaluable experience in developing software for critical applications. This wasn’t just about writing code that worked; it was about writing code that could be trusted.

Understanding the Technical Challenge

The fundamental challenge of analyzing EEG signals in real-time lies in the nature of the data itself. Electroencephalographic signals are notoriously noisy, with artifacts from eye movements, muscle contractions, and electrical interference constantly threatening to overwhelm the actual neural patterns we’re trying to detect. Additionally, the brain regions we’re interested in—specifically the occipital and temporal lobes—produce signals that vary significantly between individuals and even between sessions for the same person.

To tackle this, I designed a system architecture that separates concerns cleanly. The signal acquisition layer uses the BrainBit device, a consumer-grade EEG headband that captures data from four channels (T3, T4, O1, O2). This data flows through a preprocessing pipeline that normalizes and segments the signals before feeding them to the deep learning model for classification.

The classification task itself is framed around three states:

RED: A specific mental task indicating one command
GREEN: A different mental task indicating another command
TRASH: Everything else—noise, artifacts, or ambiguous signals

This three-class approach allows the system to be conservative, defaulting to “TRASH” when the model isn’t confident, which is crucial for a real-time control system where false positives could be problematic.

Deep Learning Architecture: CNN-LSTM Hybrid

After extensive experimentation with different model architectures, I settled on a hybrid approach combining Convolutional Neural Networks (CNN) for spatial feature extraction with Long Short-Term Memory (LSTM) networks for temporal pattern recognition.

The rationale behind this design is grounded in the nature of EEG data. The CNN layers excel at extracting local patterns—frequency components, amplitude variations, and cross-channel relationships—while the LSTM layers capture the temporal dynamics that distinguish one mental state from another.

class NeuralAnalyticsModel(nn.Module):
    def __init__(self):
        super(NeuralAnalyticsModel, self).__init__()

        # CNN Feature Extractor
        self.conv1 = nn.Conv1d(in_channels=4, out_channels=16, kernel_size=5, padding=2)
        self.bn1 = nn.BatchNorm1d(16)
        self.pool1 = nn.MaxPool1d(kernel_size=2, stride=2)
        
        self.conv2 = nn.Conv1d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm1d(32)
        self.pool2 = nn.MaxPool1d(kernel_size=2, stride=2)
        
        # LSTM Temporal Encoder
        self.lstm = nn.LSTM(input_size=32, hidden_size=32, num_layers=1,
                            batch_first=True, bidirectional=True)
        
        # Classifier
        self.classifier = nn.Sequential(
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(32, 3),
            nn.Softmax(dim=1)
        )

The input data is processed in windows of 62 samples with 50% overlap, providing sufficient context for pattern detection while maintaining real-time responsiveness. Each window passes through z-score normalization per channel, ensuring consistent feature scales regardless of signal amplitude variations.

The Importance of Data Normalization

One aspect that significantly impacted model performance was the normalization strategy. Initially, I experimented with various approaches, but z-score normalization per window proved to be the most robust:

X_{norm} = \frac{X - μ}{σ}

Where μ is the mean and σ is the standard deviation of each channel within the window. This approach accounts for the natural drift in EEG baseline values and ensures that the model focuses on relative patterns rather than absolute amplitudes.

Rust-Based Inference Engine

For the inference side, I chose Rust as the implementation language. This decision was driven by several factors: deterministic memory management for real-time constraints, excellent performance characteristics, and the availability of the Tract library for ONNX model inference.

The model trained in PyTorch is exported to ONNX format, allowing clean separation between the training environment (Python with GPU acceleration) and the inference environment (Rust on a Raspberry Pi 4). This architecture mirrors what I explored in my previous post about creating a predictive system in Rust and PyTorch, though the complexity here is significantly higher due to real-time constraints.

impl Default for NeuralAnalyticsService {
    fn default() -> Self {
        let model = tract_onnx::onnx()
            .model_for_path("assets/neural_analytics.onnx")
            .expect("Failed to load model")
            .with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), tvec!(1, 62, 4)))
            .expect("Failed to set input shape")
            .into_optimized()
            .expect("Failed to optimize model")
            .into_runnable()
            .expect("Failed to create runnable model");

        NeuralAnalyticsService { model }
    }
}

The state machine architecture handles the continuous signal flow, managing the buffering, preprocessing, and inference pipeline while maintaining strict timing constraints. The system runs on a Raspberry Pi 4 Model B (8GB) with a real-time operating system configuration to guarantee response times.

Hardware Integration and Smart Home Control

One of the most satisfying aspects of this project was the tangible output: controlling a Tapo Smart Bulb using brain signals. When the model detects a valid “GREEN” pattern with sufficient confidence, it triggers a state change in the smart bulb. The feedback loop is immediate and visceral—you think, and the light responds.

The first time the system actually worked as intended was both surreal and incredibly rewarding. After weeks of debugging signal processing issues and model accuracy problems, I had one late-night session where everything finally clicked. I was wearing the BrainBit headband, focusing on associating the color green with a specific mental visualization (imagining a bright, energizing light), and when the Tapo bulb switched on reliably in response to that thought pattern, I literally jumped out of my chair.

There were definitely funny moments along the way—like the time I kept getting false positives whenever I laughed, because the facial muscle movements were being misinterpreted as brain signals. Or when my cat walked across the keyboard during a recording session and somehow triggered a series of commands that made the light flicker erratically. These mishaps taught me valuable lessons about signal isolation and the importance of proper grounding.

The most memorable moment came during a dry run for my thesis defense presentation. I had invited a few friends to watch, and when I successfully turned the lamp on and off three times in a row using only thought commands, the room erupted in cheers. It was validation not just of the technical work, but of the months of persistent troubleshooting and refinement that had led to that point.

The BrainFlow SDK handles the Bluetooth communication with the BrainBit device, abstracting away the low-level protocol details and providing a clean streaming interface. This allowed me to focus on the signal processing and machine learning aspects without getting bogged down in hardware-specific implementation details.

Project Structure and Code Organization

The project follows a modular structure that separates concerns across different packages:

NeuralAnalytics/
├── packages/
│   ├── neural_analytics_core/     # Core Rust implementation
│   ├── neural_analytics_data/     # Data capture utilities
│   ├── neural_analytics_gui/      # GUI for signal visualization
│   └── neural_analytics_model/    # PyTorch model training
├── docs/                          # LaTeX thesis documentation
└── dataset/                       # Training data organized by class

Each package has clear responsibilities, and the boundaries between them are well-defined. This modular approach made iterative development much easier—I could refine the model training pipeline without touching the Rust inference code, and vice versa.

Challenges and Lessons Learned

The technical journey presented several significant challenges that forced me to deepen my understanding across multiple domains.

The hardest problem to solve was dealing with inter-session variability in EEG signals. Early in development, I noticed that a model trained on one day’s data would perform poorly the next day, even with the same subject and similar mental states. This wasn’t just about signal noise—it appeared to be fundamental shifts in the baseline neural patterns, possibly due to factors like fatigue, hydration levels, or even subtle changes in electrode positioning. I addressed this by implementing domain adaptation techniques and creating more robust normalization strategies that focused on relative patterns rather than absolute values.

Another major challenge was meeting real-time constraints on the Raspberry Pi 4. The initial Python prototype had unacceptable latency (over 200ms), which made the system feel unresponsive. Migrating the inference engine to Rust with the Tract library reduced this to under 50ms, but required completely rethinking my approach to memory management and data buffering.

If I were to start over, I would invest more time upfront in designing a subject-independent feature extraction pipeline. Rather than trying to normalize away individual differences after the fact, I’d explore techniques like Riemannian geometry for covariance matrices or transfer learning approaches that could leverage population data while adapting quickly to new users. I would also implement a more comprehensive signal quality monitoring system from the beginning, rather than adding it as an afterthought when poor signal quality ruined entire recording sessions.

The journey wasn’t without its obstacles. One particularly challenging aspect was dealing with the variability in EEG signals between recording sessions. A model that performed excellently on one day’s data could struggle the next. This led me to implement more robust augmentation strategies and to be more careful about the stratification of training and validation sets.

Another lesson was the importance of end-to-end testing. It’s one thing to achieve high accuracy on pre-recorded datasets, but real-time performance with a live signal stream is a different beast entirely. Latency, jitter, and the psychological pressure of a live demonstration all introduced factors that weren’t present in offline evaluation.

Media Coverage and Public Recognition

I was fortunate that this project caught the attention of major Spanish media outlets. El Español published an article about the project, and I was invited to demonstrate the system live on Antena 3’s “Y Ahora Sonsoles” program.

The media coverage was both unexpected and deeply humbling. When El Español reached out to feature the project, I was initially surprised that a technical thesis project would generate such interest, but it quickly became clear that the story resonated because it represented something tangible—technology that people could see and understand immediately.

Appearing on Antena 3’s “Y Ahora Sonsoles” program was an entirely different experience. The studio environment with its bright lights, multiple cameras, and live audience created pressure I hadn’t anticipated. Unlike my controlled lab environment, I couldn’t retake failed attempts or adjust parameters on the fly. There was a moment during the live demo where the signal quality dropped due to movement artifacts, and for several tense seconds, the system wasn’t responding as expected. I had to calmly guide the host through the recalibration process while millions watched—a situation that tested both my technical knowledge and my ability to communicate under pressure.

What struck me most was the genuine curiosity and enthusiasm from the audience and hosts. Rather than treating it as a magic trick, they asked thoughtful questions about how the technology actually worked, its limitations, and its potential applications. This reinforced my belief that when complex technology is explained accessibly, it can inspire meaningful conversations about innovation and its role in society.

The public interest in this project has been overwhelming and humbling. It reinforced my belief that technology, when applied thoughtfully, has the potential to genuinely improve people’s lives—particularly for those with motor disabilities who could benefit from brain-computer interfaces for communication and control.

Technical Specifications and Results

For those interested in the technical details, here’s a summary of the system specifications:

Component	Specification
EEG Device	BrainBit (4 channels: T3, T4, O1, O2)
Processing Platform	Raspberry Pi 4 Model B (8GB)
Model Architecture	CNN-LSTM Hybrid (16→32 conv, 32×2 LSTM)
Window Size	62 samples with 50% overlap
Input Normalization	Z-score per channel per window
Output Classes	3 (RED, GREEN, TRASH)
Inference Library	Tract (ONNX runtime for Rust)
Smart Device	Tapo Smart Bulb

The model achieves reliable classification performance in controlled conditions, with the system maintaining real-time responsiveness on the constrained hardware platform.

Future Directions

Looking ahead, I see brain-computer interface technology evolving along several exciting trajectories. The field is moving beyond simple binary control toward more nuanced, multi-dimensional interaction paradigms. I’m particularly excited about the potential for adaptive systems that can learn from users in real-time, reducing the calibration burden and accommodating natural variations in brain signals.

My personal plans involve continuing to contribute to open-source BCI tools and frameworks. I believe the key to widespread adoption lies in making these technologies more accessible—not just from a cost perspective, but also in terms of usability and setup simplicity. I’m exploring ways to simplify the signal processing pipeline while maintaining robustness, potentially leveraging edge AI accelerators for more complex feature extraction.

Specific applications I’m eager to explore include:

Communication aids: Developing more sophisticated text-entry systems for individuals with severe motor impairments
Neurofeedback applications: Creating tools for mental wellness, attention training, and stress management
Augmented reality interfaces: Combining BCI with AR/VR for immersive, hands-free interaction
Passive BCI: Using brain signals not for explicit commands, but for implicit feedback to adapt interfaces based on cognitive load or emotional state

The convergence of BCI with other emerging technologies—like advanced materials for more comfortable electrodes, improved signal processing algorithms, and more intuitive machine learning approaches—promises to unlock applications we can barely imagine today.

While this project represents the completion of my Bachelor’s Thesis, I don’t consider it finished. There are numerous avenues for improvement and extension:

Expanded command vocabulary: Moving beyond binary control to multiple distinct commands
Personalization pipelines: Real-time adaptation to individual users without extensive retraining
Alternative output modalities: Integration with wheelchair controls, computer cursors, or speech synthesis
Edge deployment optimization: Quantization and pruning for even lower latency

The field of brain-computer interfaces is evolving rapidly, and I’m excited to continue contributing to it.

Conclusion

This project has been one of the most challenging and rewarding experiences of my academic career. It pushed me to learn about domains I had never explored before—neuroscience, real-time systems, regulatory compliance for medical devices—while also deepening my expertise in areas I was already passionate about, like deep learning and Rust development.

Reflecting on this project, I realize it has fundamentally transformed how I view the intersection of technology and human potential. Before NeuralAnalytics, I saw software engineering primarily as a tool for building efficient systems and solving logical puzzles. This thesis revealed to me that technology’s true power lies in its ability to extend human capabilities—especially for those whose abilities are limited by circumstance.

The journey taught me that meaningful innovation requires more than technical skill; it demands empathy, interdisciplinary curiosity, and the courage to venture into unfamiliar territories. Working at the boundary of neuroscience and engineering forced me to learn new languages (both literal and figurative), to respect the complexity of biological systems, and to appreciate that sometimes the most elegant solutions come from embracing rather than fighting variability.

Professionally, this experience has solidified my commitment to developing technology that serves people first. Whether I’m working on biometric systems at Facephi or exploring other domains, I now constantly ask: “Who does this serve? How does it improve lives? Is it accessible and ethical?” The project also gave me confidence to tackle ambitious, ill-defined problems—knowing that persistence, systematic experimentation, and learning from failure can lead to breakthroughs even in seemingly opaque domains like brain signal interpretation.

Most importantly, NeuralAnalytics reminded me that engineering excellence isn’t just about writing correct code—it’s about creating systems that dignify and empower human experience. When I see someone smile as they turn on a light with their thoughts, I’m reminded why I fell in love with engineering in the first place: to build things that matter.

The complete source code is available on GitHub under the GPL-3.0 license. The repository includes the training code, the Rust inference engine, documentation, and everything needed to replicate or extend this work. I hope it serves as a useful reference for anyone interested in exploring the fascinating intersection of neuroscience and software engineering.

Credits

The header image of this post is made using Midjourney AI.

Creating a Predictive System using PyTorch and Rust

2024-10-25T00:00:00+00:00

I hadn’t published anything here for a while. Many adventures that I will surely have to document soon.

This time, I wanted to practice with a project that I had been meaning to explore for some time. Setting up my first AI model beyond a classroom and integrating it with an application (which could have been made in Rust, Golang, Flutter, or any other technology, to be honest).

As a curious fact, this project is inspired by the same technological foundations as the ‘Bachellor’s Thesis’ that I will present in a few months at the Universitat Politècnica de València thus ending a stage that has been a true rollercoaster of emotions.

Understanding the Underlying Problem

Previously, at the Universitat Politècnica de València, I had worked with TensorFlow and Scikit Learn, creating many small Classification and Regression models. However, I was worried about having such a high degree of dependency on TensorFlow and not exploring other frameworks like PyTorch.

Additionally, I didn’t quite understand how to design a Deep Learning model that, using a context of previous values and their respective timestamps, could predict the next value of a series. I’m not talking about a series of features inputted as different inputs to return an output; I’m referring to a matrix as a feature whose values are time-dependent.

So, I decided to take a chance and venture into researching this issue. To do this, I had to investigate the topic I was going to address to solve this problem. In this case, I was inspired by models predicting electrical loads, which are quite relevant for understanding the demand that an electrical grid might receive and, consequently, for adapting to it.

Honestly, before diving deep into this, I particularly like the concept of decoupling elements. This generally helps avoid introducing unnecessary runtime dependencies, thereby minimizing the overall size of the application and introducing a good degree of modularity to the project.

What is ONNX, and Why is it Important for Portability?

To understand ONNX thoroughly, it’s essential to clarify that Machine Learning and Deep Learning models are developed in two fundamental stages: the Training stage and the Inference stage. These phases do not necessarily have to run on the same device or under the same frameworks.

Based on this, it becomes crucial to have a method for serializing the model after training, so it can then be used in the inference phase. For embedded systems, for example, TensorFlow developed the .tflite format, which I have previously tested with very good results on such devices. Additionally, the inference library from Sonos, Tract, provides excellent support for this format in Rust. However, I intended to explore options beyond the TensorFlow ecosystem.

Once this is explained, this is where ONNX (Open Neural Network Exchange) plays a fundamental role, standing out as a neutral format independent of any specific framework, allowing the serialization of models that can be used with different tools and environments. In fact, ONNX has established itself as the de facto standard for facilitating the flexible deployment of models, even in cases where TensorFlow Lite presents limitations.

To further understand the power of this format, we should consider that when we define a model in PyTorch, for example, this graph can ultimately be represented within the ONNX model.

Let’s illustrate this with a good example; here we have the model we will use in this project, which we will explain how it works internally later on:

import torch
import torch.nn as nn

INPUT_SIZE = 3  # Number of features in the input
HIDDEN_SIZE = 64  # Size of hidden units in the LSTM

class GridLSTMModel(nn.Module):
    def __init__(self):
        super(GridLSTMModel, self).__init__()

        # LSTM with 4 layers
        self.lstm = nn.LSTM(INPUT_SIZE, HIDDEN_SIZE, 4, batch_first=True, bidirectional=False)

        # Using mean pooling to reduce the sequence
        self.fc = nn.Linear(HIDDEN_SIZE, 1)  # Final fully-connected layer

    def forward(self, x):
        # x shape: (batch_size, seq_length, input_size)
        lstm_out, _ = self.lstm(x)  # lstm_out: (batch_size, seq_length, hidden_size)

        # Compute the mean across the sequence dimension
        mean_out = torch.mean(lstm_out, dim=1)  # (batch_size, hidden_size)

        # Pass the reduced context through the fully-connected layer
        output = self.fc(mean_out)  # (batch_size, 1)

        return output

And this is how the graph would look when visualized with tools like Netron or similar:

The ability to perform an efficient conversion of models without relying on specific frameworks like TensorFlow or PyTorch for inference is what gives ONNX its great utility. This format allows for easy and effective model exporting, facilitating implementation in production environments. This is exactly what we will need to quickly debug this project, as it means we can make changes to the model without impacting the code we write in Rust.

How Do We Collect (and Preprocess) Training Data?

I have to be honest here; the ENTSOE Transparency Portal has been very helpful for retrieving the CSV dataset I used for this project. Additionally, its public API has allowed me to collect all the data related to real-time electricity demand.

START_TIME	STOP_TIME	FORECAST_LOAD	REAL_LOAD
01/01/15 0:00	01/01/15 0:15	6794	6168
01/01/15 0:15	01/01/15 0:30	6757	6088
01/01/15 0:30	01/01/15 0:45	6791	6060
01/01/15 0:45	01/01/15 1:00	6750	5958
01/01/15 1:00	01/01/15 1:15	6737	6017
01/01/15 1:15	01/01/15 1:30	6692	5967
01/01/15 1:30	01/01/15 1:45	6722	5936
01/01/15 1:45	01/01/15 2:00	6690	5934
01/01/15 2:00	01/01/15 2:15	6633	5751
01/01/15 2:15	01/01/15 2:30	6573	5778
01/01/15 2:30	01/01/15 2:45	6602	5746

This is roughly what the dataset I’ve been using looks like, which is essentially a near-pure extract of what I received from the historical series. I say “near-pure extract” because I’ve renamed the columns for convenience. If we take a look, we can see that we have a column describing the expected load according to the model that the folks at ENTSOE have in production. With this, we could calculate the deviation of our model compared to their production model.

If we want the model to learn based on this data, we will need to use the concept of sliding time windows so that it can predict the next value of the series conditioned by the hour and the day.

How Do We Design (and Export) the Prediction Model?

The model I have decided to use for this project is a Recurrent Neural Network (RNN), specifically an LSTM model. This type of model is very useful for working with time series, as it can remember information from previous steps and use it to predict the next value.

In this case, the model I designed is an LSTM model that takes a sequence of values and returns a single value. To incorporate time information into the model, I decided to use a sinusoidal representation, which involves representing values such as the day of the year or the hour of the day as a sine wave.

In simple terms, to understand what the LSTM operator does: an LSTM is a type of recurrent neural network that can remember information from previous steps and use it to predict the next value. In this case, the model I designed is an LSTM model that takes a sequence of values and returns a single value.

To do this, I had to tackle the challenge of creating a function that takes the entire DataFrame and from there can create the sliding time window. To achieve this, I created a function that takes the DataFrame and the window size as input and returns a sequence of tensors with the time information encoded in sinusoidal form. Below is the code for the function:

def create_daily_sliding_windows(df: DataFrame, window_size: int) -> DataFrame:
    df = normalize_by_year(df)

    # Convert TIMESTAMP to datetime for easier grouping
    df['TIMESTAMP'] = to_datetime(df['TIMESTAMP'], unit='ns')

    # Create the sinusoidal components
    df['day_of_year'] = df['TIMESTAMP'].dt.dayofyear
    df['minutes_of_day'] = df['TIMESTAMP'].dt.hour * 60 + df['TIMESTAMP'].dt.minute

    # Sinusoidal component of the day in the year
    df['day_sin'] = np.sin(2 * np.pi * df['day_of_year'] / 365)

    # Sinusoidal component of the minutes in the day
    df['minute_sin'] = np.sin(2 * np.pi * df['minutes_of_day'] / 1440)  # 1440 = 60 * 24 (minutes in a day)

    # Group by day
    daily_groups = df.groupby(df['TIMESTAMP'].dt.date)

    windows = []

    for date, group in daily_groups:
        # Check if the group has enough data to create windows
        if len(group) > window_size:
            # Create sliding windows for each daily group
            for i in range(len(group) - window_size):
                window_values = group['GLOBAL_LOAD_TOTAL'].iloc[i:i + window_size].values
                next_value = group['GLOBAL_LOAD_TOTAL'].iloc[i + window_size]

                # Add the sinusoidal components to the window
                window_day_sin = group['day_sin'].iloc[i:i + window_size].values
                window_minute_sin = group['minute_sin'].iloc[i:i + window_size].values

                if len(window_values) == window_size:
                    windows.append((window_values, window_day_sin, window_minute_sin, next_value))
        else:
            print(f"[!] The group for {date} has fewer records than the window size ({len(group)} < {window_size})")

    # Check if any windows were generated
    if len(windows) == 0:
        print("[!] No sliding windows have been generated, check the size of your groups and the window.")

    # Convert the list of windows to a DataFrame
    window_df = DataFrame(windows, columns=['window_values', 'day_sin', 'minute_sin', 'next_value'])

    print("[*] Data generated from the raw dataset:")
    print(window_df.tail())

    print("[*] Preprocessed dataset to a two-column matrix...")

    return window_df

There are likely other, more effective ways to represent time in the model, but I chose to use this representation for its simplicity and effectiveness. It’s important to note that to incorporate the time information into the model, I had to scale the network’s point load values between 0 and 1 so that the model could learn more efficiently, we take later a more consideration about this. To do this, I normalized the network load values using the following function:

def normalize_by_year(df: DataFrame) -> DataFrame:
    # Extract the year from the TIMESTAMP for grouping
    df['TIMESTAMP'] = to_datetime(df['START_TIME'])
    df['year'] = df['TIMESTAMP'].dt.year

    # Initialize the scaler
    scaler = MinMaxScaler()

    # Create a new DataFrame with the normalized values
    normalized_df = df.copy()

    # Group by year and normalize GLOBAL_LOAD_TOTAL
    for year, group in df.groupby('year'):
        # Normalize the values for each year
        values = group['GLOBAL_LOAD_TOTAL'].values.reshape(-1, 1)
        normalized_values = scaler.fit_transform(values)

        # Replace the values in the original DataFrame
        normalized_df.loc[df['year'] == year, 'GLOBAL_LOAD_TOTAL'] = normalized_values

    # Convert the TIMESTAMP to an integer for easier processing
    normalized_df['TIMESTAMP'] = df['TIMESTAMP'].astype(int)

    # Drop the year column as it's no longer needed
    normalized_df.drop(columns=['year', 'START_TIME'], inplace=True)

    print("[*] Normalized GLOBAL_LOAD_TOTAL by year.")
    print(normalized_df.tail())

    return normalized_df

In this way, the remaining model can take all this information into account to predict the next value of the time series. One not-so-obvious consideration, but which is very important for our AI models to learn efficiently, is data normalization. In this case, after normalizing the data, the model can learn effectively and make accurate predictions. For this normalization, I used the MinMaxScaler function from the Scikit Learn library. Which formula is as follows:

X_norm = \frac{X - X_\min}{X_\max - X_\min}

Where:

( $X_norm$ ) is the normalized value.
( $X$ ) is the original value.
( $X_\min$ ) is the minimum value in the dataset, in this case, the minimum in the year.
( $X_\max$ ) is the maximum value in the dataset, in this case, the maximum in the year.

I have noticed that without this normalization, the model is unable to learn efficiently and, consequently, cannot make accurate predictions. Therefore, it is essential to normalize the data before inputting it into the model so that the model can learn effectively and make precise predictions.

To assess the performance of the model, I have decided to use the R² statistic (coefficient of determination) to evaluate the model’s ability to predict the electric grid load. I will also use this statistic to calculate the deviation of the model compared to the production model from ENTSOE.

The formula for calculating the R² statistic is as follows:

R^2 = 1 - \frac{SS_res}{SS_tot}

Where:

( $SS_res$ ) is the residual sum of squares.
( $SS_tot$ ) is the total sum of squares.

With this, we can calculate the R² statistic and evaluate the model’s performance in predicting the electric grid load. This will allow us to compare the model’s performance with the production model from ENTSOE and determine the deviation of the model compared to the production model.

Finally, to export the model to ONNX, I used the ONNX library from PyTorch, which allows for easy export of PyTorch models to the ONNX format. To do this, I used the following function:

def export_model(model, device, input_size, output_path):
    """
    Exports a PyTorch model to ONNX format and simplifies the ONNX model.

    :param model: PyTorch model to be exported.
    :param device: PyTorch device being used for training.
    :param input_size: Input size of the model (e.g., (batch_size, channels, height, width)).
    :param output_path: Path where the ONNX model will be saved.
    """
    # Set the model to evaluation mode
    model.eval()

    # Create a dummy input tensor
    dummy_input = torch.randn(*input_size).to(device)

    # Check and create the export directory if it doesn't exist
    output_dir = os.path.dirname(output_path)
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        print(f'[*] Directory created: {output_dir}')

    # Export the model to ONNX format
    torch.onnx.export(
        model, dummy_input, output_path,
        export_params=True,
        opset_version=11,
        do_constant_folding=True,  # Constant optimization
        input_names=['input'],
        output_names=['output'],
        dynamic_axes={
            'input': {0: 'batch_size'},  # Dynamic axis for batch size
            'output': {0: 'batch_size'}
        }
    )

    # Load the ONNX model for simplification
    model_onnx = onnx.load(output_path)

    # Save the simplified model
    onnx.save(model_onnx, output_path)

    print(f'[*] Model exported and simplified to: {output_path}')

With this, we now have the model exported to ONNX and ready to be loaded in Rust for making predictions. Next, it’s time to deploy the model in the application and enable it to make real-time predictions.

Deploying the Model in the Application

We already understand how the model works, and we have clarified how we preprocessed the information to make the learning effective. Now it’s time to make the model deployable in an application that can be consumed from production.

This project, just like I would like it to be in my “Bachellor’s Thesis”, operates with a core in Rust that handles loading the ONNX model and making predictions. For this, I used the Tract library, which, as I mentioned earlier, is a neural network inference library in Rust that supports the ONNX format.

The great thing about Tract is that it is written in Rust, and its low-level operations are implemented in Assembly, making it very efficient and fast. Additionally, Tract is very easy to use and has a straightforward API that allows you to load an ONNX model and make predictions with it in just a few lines of code.

To load the ONNX model in Tract, I used the following function:

impl Default for PredictLoadService {
    fn default() -> PredictLoadService {
        let model_path = PathBuf::from("assets/grid_predictor.onnx");
        let model = tract_onnx::onnx()
            .model_for_path(model_path)
            .expect("OS: Failed to read model file")
            .with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), tvec!(1, 19, 3)))
            .expect("OS: Failed to set input shape")
            .into_optimized()
            .expect("OS: Failed to optimize model")
            .into_runnable()
            .expect("OS: Failed to create runnable model");

        PredictLoadService { model }
    }
}

And to effectively utilize the model, I created a function that takes a time window and returns the model’s prediction:

impl PredictLoadService {
    pub fn predict_load(&self, input_data: Vec<(i64, f32)>) -> Result<f32, String> {
        if input_data.len() != 19 {
            return Err("BUG: Input data must have 19 elements".to_string());
        }

        let window_values: Vec<f32> = input_data.iter().flat_map(
            |(timestamp, load)| {
                let (day_sin, minute_sin) = generate_sin_components(*timestamp)
                    .map_err(|e| format!("BUG: Error generating sin components -> {}", e)).unwrap();

                vec![day_sin, minute_sin, *load]
            }
        ).collect();

        // Convert the values to a Tensor with the appropriate shape [1, 19, 3]
        let input_tensor = Tensor::from_shape(&[1, 19, 3], &*window_values)
            .map_err(|e| format!("BUG: Error creating input tensor -> {}", e))?;

        // Run the model
        let result = self.model.run(
            tvec!(input_tensor.into())).map_err(|e| format!("BUG: Error running model -> {}", e)
        )?;

        // Extract the output value
        let output = result[0].to_scalar::<f32>().map_err(|e|
            format!("BUG: Error extracting output -> {}", e)
        )?;

        Ok(*output)
    }
}

As we can see, the code remains clean and easy to understand. Additionally, Tract allows us to load the model very efficiently and make predictions with it in just a few lines of code. This is what I enjoyed about working with Tract, and it has enabled me to effectively integrate the model into the application.

To retrieve production data, I used the ENTSOE API, which allowed me to obtain real-time production data and make predictions with the model. In the end, it all comes down to making API calls, and I think explaining how it’s done would be a bit redundant. So, dear reader, I invite you to explore the project’s source code to see how I retrieved the production data and made predictions with the model.

Out of curiosity, I wanted to extract the R² statistic to compare the deviation of my model against the ENTSOE production model. Using the Scikit Learn library, I was able to calculate it and obtained the following values after finishing the model training:

[neirth@beast-dragon electrical_grid_model] $ python3 src/main.py
[*] Training module for the "Electrical_Grid" model
[*] The device to be used will be "mps"
...
[*] Training completed in 1026.80 seconds.
[*] R^2 of the model with the evaluation set: 0.9973
[*] R^2 of the production forecast: 0.8754

As we can see, the model achieved an R² value of 0.9973, indicating that it can predict the electrical grid load with an accuracy of 99.73%. On the other hand, the ENTSOE production model obtained an R² value of 0.8754, suggesting that the ENTSOE production model can predict the electrical grid load with an accuracy of 87.54%.

This means that the model I developed is capable of predicting the electrical grid load with much higher accuracy than the ENTSOE production model. This is a very positive result, as it far exceeds the objective I had set at the beginning of the project.

Conclusions

It has been a very interesting and enriching project, allowing me to explore new technologies and learn a lot about developing AI models and deploying them in production applications. Additionally, it has enabled me to explore the use of ONNX and Tract, which are fascinating technologies with great potential for the development of AI applications in production.

This project is published in the GitHub repository if you want to take a look at the source code and see how I developed the model and deployed it in the application. In the repository, you will also find everything I used to calculate the results and make predictions with the model, making it all replicable.

Credits

The header image of this post is made using Midjourney AI.

Exploring OpenCL for accelerate processes in Backend Side

2023-04-28T00:00:00+00:00

I was interested in learning about the effective parallelism of certain heavy operations, and how these could be better leveraged in a fairly demanding environment, such as a data center. The result, therefore, was to learn more about OpenCL and how it could be exploited beyond servers to speed up processes.

Undestanding the problem behind

Mainly, the issue we are going to address lies in how we could speed up certain processes that might otherwise be quite costly on the wrong device.

We must understand that in the end in most servers, all computing power is being derived to the processor. This is usually shared by Hypervisor Type 1, different cores that must access in shared time with other cores of other operating systems, and then the services that make this can interact between the core and the real world, in other words, auxiliary services so that our application can run.

Translated into common language, we have the processor busy with a thousand tasks which we must know how to manage very well so that our application can quickly attend to our requests. If our service also has to perform queries to external computer services, in the end we are totally wasting the capacity of our machine.

This problem has been the subject of study in academia for years, where work has been ongoing on better algorithms to improve effective CPU utilization. Although a little more than a decade ago there was experimentation to introduce an additional player in this segment of computation. I am effectively talking about the GPGPU concept.

In the academic world there was a lot of interest in taking advantage of the potential of these devices that were being wasted, not without criticism of course.

It was realized that GPUs were very specialized devices in one type of computation, vector and matrix computation. This type of computation, largely specialized to be able to generate at an acceptable Frames per Second rate in terms of real time, could be exploited for different scientific applications, or technologies that could require a fairly high computational capacity.

As these devices were relieved of the responsibility of having to manage an operating system, or in the worst case a Type 1 Hypervisor running different operating systems, they became a very interesting option to be able to launch workflows waiting for an answer very soon.

How works OpenCL and how coordinates with the CPU

Before introducing OpenCL, it is worth mentioning that it can be used for GPUs, FPGAs, NPUs and even CPUs [4]. This is known as a hetereogeneous computing framework, and normally, although it is mostly exploited in servers, it is not formally linked to the server world. (Probably your mobile device has OpenCL drivers and you didn’t realize it).

OpenCL works with a queuing system allowing the CPU to effectively delegate the workload to the intensive processing unit.

This has a very clear advantage of being able to free up the CPU so that it is taking care of other tasks without having to worry about whether or not it will be processing our request full time. Bye bye CPU Context Changes (For now).

It also allows us to manage from our application the memory regions that the processing unit will have available. This is possible thanks to the fact that complete memory arrays can be transferred to it, or even exploit those of the host itself. The latter has innumerable advantages such as random access to information, where we can avoid having to copy large blocks of memory between transactions. This of course depends on what type of application we want to develop and if it is going to be appropriate or not.

Another point that makes OpenCL interesting is that its programming language is based on one already known by many. It is syntactically based on the C language (I would say that it is based on the C99 standard for some aspects that I have been seeing). Although it is based on C, it is important to take into account that we will not have access to libraries such as #import <strings.h> or #import <stdio.h>, or external libraries such as Boost’s, this is because since In the beginning we have access to the limited OpenCL functions that are defined within their standard.

It must be taken into account that the limitation described above is given because we are trying to make our code compatible with the largest amount of hardware that has certified drivers.

In the implementation of OpenCL drivers, there is a trend towards using the SPIRV binary format, which is also used by Vulkan shaders. This trend aims to simplify graphics drivers to focus on efficient Vulkan driver development. For example, Portable Computing Language (PoCL) also allows OpenCL to be used on devices that only have Vulkan drivers, such as the Raspberry Pi 4 [5]. Intel uses SPIRV as a binary format for their OpenCL drivers [6]. However, NVIDIA uses its own PTX format [3], which is incompatible between platforms, an important aspect to consider.

Implementing Matrix Transpose as a Hello World Example

In this example we are going to try to understand how we can speed up our calculations to transpose a matrix, in a paradigm other than parallelism, this would be developed through a succession of iterations that go through the entire matrix and copy the new result to a result matrix.

In OpenCL we are going to have to think about how we are going to divide the problem into smaller problems, and how we are going to be able to solve them in parallel. In this case, we are going to divide the matrix into rows and columns and we are going to assign each row and column to a work item. This will allow us to have a fairly simple solution to the problem.

__kernel void transpose(__global float *odata, __global float* idata, int width, int height)
{
    // Calculate the global index
    int index = get_global_id(0) * width + get_global_id(1);
    
    // Calculate the coordenates in the matrix
    int x = index / height;
    int y = index % height;
    
    // Calculate the index in the original matrix and the index in the transpose matrix
    int index_in = y * width + x;
    int index_out = x * height + y;
    
    // Copy the value from the original matrix to the transpose matrix
    odata[index_out] = idata[index_in];
}

OpenCL incorporates a concept of multi-universal worker instantiation. This allows us to assign different ID’s for each dimension in which a specific worker is located. This will be very useful to further speed up certain types of matrix operations.

Finally, if we come from programming in other programming languages that are similar to C, we will have noticed that the function is designated as __kernel and as a void return value, this is because OpenCL forces it to have it structured in this way, in the same way that it needs to know what parameters we expect to introduce in the program that we are going to call.

An OpenCL kernel is the sum of a program plus its respective arguments plus a queue definition so that requests to the device can be queued.

The result will be sent through a result matrix to the host program, this is a common practice in OpenCL programs.

For the Host Program, we will write a Rust program for catch the result from OpenCL program. In this case, we will use the ocl crate.

/// Transpose a matrix using OpenCL
///
/// # Arguments
///
/// * `width` - Width of the matrix
/// * `height` - Height of the matrix
/// * `matrix` - Matrix to transpose
///
/// # Returns
///
/// * `matrix_output` - Transpose matrix
///
fn transpose_matrix(width: usize, height: usize, matrix: &mut Vec<f32>) -> ocl::Result<Vec<f32>> {
    // Build the program into device driver
    let program = ProQue::builder().src(accel_src).dims(ocl::SpatialDims::Two(width, height)).build()?;

    // Create memory buffer between hardware accelerator and main ram
    let matrix_buff = program.buffer_builder::<f32>().len(matrix.len()).copy_host_slice(matrix).build()?;

    // Create memory buffer between hardware accelerator and main ram
    let result_buff = program.create_buffer::<f32>()?;

    // Prepare program with arguments to build kernel
    let kernel = program.kernel_builder("transpose")
                        .arg(&result_buff)
                        .arg(&matrix_buff)
                        .arg(width as i32)
                        .arg(height as i32)
                        .build()?;
    
    // Run the kernel inside the device and wait for the result.
    unsafe { kernel.enq()?; }

    // Prepare output matrix
    let mut matrix_output = vec![0.0f32; matrix.len()];

    // Transfer matrix into the main memory
    result_buff.read(&mut matrix_output).enq()?;

    // Return the traspose matrix
    Ok(matrix_output)
}

In this way, we have already generated a wrapper that abstracts the call to OpenCL in a method that will be a black box for whoever wants to use it.

Implementing Shortest Path Algorithm as a Complete Computation Example

In this case I was working on a small project where I would introduce operations to obtain the shortest path possible. For this I was inspired to introduce Dijkstra’s algorithm[2] for the shortest path, from that point I had to consider it so that it was parallelizable.

__kernel void initialize_algorithm_buffers(__global float *result, __global float *distance, __global int *visited, __global float *vertex, __global float *vertex_temp) {
    // Get the global id based on count of vertexs and assigned for thread
    int gid = get_global_id(0);

    // Initialize the buffers in parallel
    if (gid == 0) {
        visited[gid] = 1;
        result[gid] = 0;
    } else {
        visited[gid] = 0;
        result[gid] = FLT_MAX;
    }

    distance[gid] = 0;
    vertex[gid] = 0;
    vertex_temp[gid] = 0;
}

__kernel void shortest_path_algorithm(__global float *result, __global float *matrix, __global float *distance, __global int *visited, __global float *vertex_temp, int vertex_count) {
    // Get the global id based on count of vertexs and assigned for thread
    int gid = get_global_id(0);

    // Validate if the vertex is not visited
    if (visited[gid] != 1) {
        // Mark the vertex as visited
        visited[gid] = 1;

        // Get the start edge
        for(int edge = 0; edge < vertex_count; edge++) {
            // Get the edge from adjacent matrix
            float weight = matrix[gid * vertex_count + edge];

            // Validate if the edge is valid
            if (weight != 0.0f && weight != FLT_MAX) {
                // Get the distance
                float dist = result[edge] + weight;

                // Get the result
                if (distance[gid] == 0.0 || result[gid] > dist) {
                    distance[gid] = dist;
                    vertex_temp[gid] = edge;
                }
            }
        }
    }
}

__kernel void merge_sortest_path(__global float *result, __global float *distance, __global int *visited, __global float *vertex, __global float *vertex_temp) {
    // Get the global id based on count of vertexs and assigned for thread
    int gid = get_global_id(0);

    // Get the result
    if (result[gid] > distance[gid]) {
        result[gid] = distance[gid];
        vertex[gid] = vertex_temp[gid];
    }

    // Reset the visited flag
    if (gid != 0) {
        visited[gid] = 0;
    }
}

In this case I had to divide the algorithm into three different cores, where it is differentiated from the initial algorithm. The initialize_algorithm_buffers kernel is formalized so that we can take advantage of the device’s time to initialize memory.

The next step is shortest_path_algorithm, where we will do all the comparative logic to be able to develop the algorithm as it was established in its day.

Finally, we have the core of merge_sortest_path, where the changes between the temporary matrix and the result matrix will be evaluated to introduce them safely. This is necessary since, for example, in a device such as an NVIDIA GPU [3] there are no concurrent access problems as usual, when trying to execute an OpenCL program within a CPU, context changes can occur that prevent the thread from executing at the same time. same time as the rest of the workers, which can cause inconsistent states in the result. And it is important to remember, your OpenCL program does not know where it will end up running, so these cases are mandatory to consider to avoid disasters.

For this time I will omit some of the technical details of how the host program was structured, but for those who are interested, you can take a look at the repo where I published everything I was learning about the capabilities of this technology. Link to repo: Path Walker - GitHub Repo.

Conclusion

Through this article we have been able to evaluate the real capabilities of this technology. In addition to discovering that it is based on a language familiar to developers, it also allows us to think from the point of view of the maximum use of the hardware that we can equip a server with.

From an external point of view, it may not be possible to see where this technology can shine. But in the real world, matrices and graphs are often used for everything from finding the shortest way to drive your car to understanding what’s in front of you through computer vision. Together with CUDA, this technology is also having a second life for the world of Artificial Intelligence through, for example, Tensorflow.

It is very interesting to see that OpenCL is supported by the Khronos group, precursors of OpenGL and Vulkan, and that it is in good health by the big driver vendors.

References

[1] Midjourney. “Header image of a man walking on a trail”. https://midjourney.com.

[2] T. G. Mattson, D. Ginsburg, B. Gaster, and A. Munshi, “OpenCL Programming Guide,” Pearson Education, 2011.

[3] NVIDIA, “Parallel Thread Execution ISA Version 8.1”, https://docs.nvidia.com/cuda/parallel-thread-execution/

[4] Khronos Group, “OpenCL Overview”, https://www.khronos.org/opencl/

[5] PoCL Developers, “PoCL - Portable Computing Language”, https://github.com/pocl/pocl

[6] Intel Corporation, “SPIR-V*: Default Interface to Intel® Graphics Compiler for OpenCL™ Workloads”, https://www.intel.com/content/www/us/en/developer/articles/case-study/spir-v-default-interface-to-intel-graphics-compiler-for-opencl-workloads.html

Making my own IoT backend-less vending machine

2021-03-12T00:00:00+00:00

I am finishing my specialty in Multi-Platform Application Development before starting my university studies. Honestly, I’ve been invited to create a project to showcase my new skills. After thinking about it and weighing a few ideas, I decided to develop a project that would let me dive into the world of IoT and serverless programming.

In this case, I decided to create a ticket vending machine for events that doesn’t need a central server to operate. The idea is that the machine can issue and validate event tickets autonomously, using IoT principles and making Android the core of the user interface. Firebase became my go-to choice for managing the data layer. With it, I was able to build the database structure for tickets and advertisements without maintaining a traditional backend, and use Firestore to handle tickets, advertisements, and the entire ticket lifecycle.

Understanding the Concept

The main challenge was to design a vending setup that could issue and validate event tickets autonomously. That meant building a system that could authenticate, store, and process everything on the edge, using IoT principles and making Android the center for the UI. Firebase became my go-to for managing the data layer. With it, I could build out the database structure for tickets and ads without maintaining a traditional backend, and use Firestore for handling tickets, announcements, and the entire ticket lifecycle.

There’s an architectural angle here that makes this project interesting. With hexagonal architecture, the business logic can stay independent from the implementation details of specific hardware or even database choice, making it relatively easy to adapt the design for future hardware or requirements. Hexagonal architecture also allowed me to modularize the ticket validation and user interfaces without one depending directly on the other.

Building Out the Core System

With the architecture in place, the next step was diving into the setup for Firestore collections and Firebase Authentication, which would help keep things secure. I structured the data collections to capture concerts, tickets, and advertisements separately. With this, the vending machine could dynamically serve up ads in between purchases or during idle times. For tickets, I used Apple’s Passbook format to issue each one with a downloadable QR code, which the machine could validate on-site. That saved me a lot of trouble, as it meant the tickets were compatible with existing mobile wallets and minimized the data exchange needed between devices.

For the UI and validation interfaces, I turned to JavaFX and Android SDK. The Android device displayed the main interface, while a second device handled QR validation and triggered the access permissions. The ticketing and validation systems both use a few simple interfaces to make calls to Firestore, and I kept it flexible so I could change the data source without affecting the overall flow.

Handling Real-World Constraints

One of the biggest constraints was making sure the vending machine could work autonomously, with minimal need for maintenance. I had to ensure that user sessions and data like QR codes and tickets expired after a given period to avoid bloating memory and losing performance over time. Each ticket has a status field that updates once it’s used, so validation is straightforward and efficient. The whole system feels robust enough to handle event-level traffic without the kind of congestion that would come with multiple round-trips to a central server.

Another core piece is the emulator I used to simulate coin insertion for testing payments, which allowed me to validate the sequence end-to-end without a physical coin interface. I kept it simple: a timer increments the “inserted” count, and it’s easy enough to switch this to a real-world currency handler later on if needed.

Diving into Firebase and Data Handling

Since there was no backend server, I leveraged Firebase for both user data and ticket management. Using Firestore, I created three main collections: one for tickets, another for events, and a third for advertisements. This structure allows the machine to handle ticket sales by storing a new entry in the tickets collection with all the necessary metadata, including event details, ticket status, and QR code information.

For ticket generation, I went with Apple’s Passbook format since it’s widely compatible with mobile wallets and offers easy QR code support. Each ticket purchase generates a QR code, which is stored in Firebase and pulled up when scanned for validation. I configured Firebase Authentication to manage user access, allowing administrators to log in and update event or ad details directly from the device.

One challenge here was data persistence and cleanup. Since the machine could end up storing a large number of tickets over time, I implemented an automated cleanup process that clears expired or validated tickets after a set period. This keeps memory usage low and ensures the system doesn’t slow down over time. With Firestore, it’s fairly straightforward to set a retention policy, so I can keep data management automatic and avoid any need for manual intervention.

Implementing the User Interface and Device Setup

The main interface runs on an Android device, while a second device handles the validation side. I used JavaFX to build the ticket validation UI, while the Android SDK powers the main sales interface. The idea was to create a seamless user experience where buyers could quickly interact with the machine, and validation staff could easily check tickets without needing direct access to the main UI.

For the UI, I wanted a balance between simplicity and reliability. Using MVVM (Model View ViewModel) helped separate the presentation logic from the core business logic. This pattern lets each activity have its own ViewModel, making it easy to manage state changes without causing unexpected behavior. For instance, the sales UI could show a different screen during idle times, displaying ads from the Firebase ads collection, while the ticketing functions remain ready to jump back into action when a new user approaches.

The validation device runs a JavaFX-based program that continuously listens for QR codes using a camera module. Once a QR code is detected, it triggers the Firebase API to check ticket validity. This process happens instantly, and a simple color-coded response – green for valid, red for invalid – provides feedback without delay. The idea was to minimize user wait time and make sure the entire flow felt frictionless.

Real-World Testing and Performance

In testing, one of my priorities was ensuring that everything would work without server latency impacting the user experience. I performed several End-to-End tests using different devices, like a Google Pixel XL, Xiaomi Mi A2, and Raspberry Pi 3 with Android Things. These tests were especially useful for checking the system’s stability in low-connectivity situations and evaluating how it handled high traffic.

The Raspberry Pi setup demonstrated that the machine could run autonomously with minimal intervention. The coin-insertion emulator I used for testing was another interesting feature. By simulating coin insertion, I was able to test the entire payment and ticketing flow without a physical payment module. It worked well for prototyping, and if the machine ever gets deployed with real payment functionality, this emulator could easily be swapped out.

All tests focused on maintaining performance during idle and active states, ensuring there were no memory leaks or slowdowns. Firebase’s caching proved reliable, and having offline persistence in Firestore meant that even if connectivity dropped momentarily, the machine could still function as expected.

Conclusion

Building an IoT vending machine without a backend server was a fun and challenging project that pushed the limits of what’s possible with edge computing. By leveraging Firebase for data management and Android for the UI, I was able to create a self-sustaining system that could handle ticket sales and validation without needing a central server. The hexagonal architecture made it easy to adapt the design for different hardware setups, and the modular approach to ticketing and validation interfaces kept the system flexible and scalable.

The project is hosted on Github, and you can check it out here

Credits

The header image of this post is made using Midjourney AI.