Meet the NORCICS PhDs and PostDocs – Touseef Sadiq

SFI-NORCICS .

Published Oct 23, 2023

Multimodal Machine Learning: Learning multimodal intermediate video and language representations in deep networks for descriptive object identification and tracking in urban environments.

Meet Touseef Sadiq, a PhD researcher at the Center of Artificial Intelligence Research (CAIR) at the University of Agder, Norway. Touseef's current research focuses on exploring deep multimodal learning for descriptive object identification and tracking in urban environments. His role within NORCICS Task 3.4 pertains to, "Humanized Deep Learning & Big Data Analytics"(NORCICS Task 3.4).

Human learning encompasses various modalities; we read, watch, and listen, processing diverse sensory inputs. Computers, too, can learn from multiple data types, termed multimodal data, to address intricate challenges. In smart city contexts, the integration of visual data and textual data is essential for unleashing the complete potential of Multimodal Machine Learning (MML). MML's mission is to bridge data barriers, allowing visual information and human language to coexist harmoniously.

Smart cities in today's era are brimming with diverse data sources, including surveillance video monitoring, among others. Our research delves into integrating features from these data sources to enhance real-world applications, with a specific focus on combining videos and text data for advancing intelligent transportation systems in smart cities. Our dataset, CityFlow-NL, specializes in tracked-vehicle retrieval via natural language descriptions, serving as a tool to evaluate natural language-based tracked-vehicle retrieval systems in intelligent traffic contexts.

Recommended by LinkedIn

Machine Learning vs Deep Learning: What’s the…

Infiniticube 1 year ago

Essential Linear Algebra for Understanding Deep…

Jude Ranuka Thamel 1 year ago

How computer vision can accelerate science

CAS 1 month ago

In Multimodal Machine Learning (MML), deep learning models are the engines that drive the fusion of different data types. MML relies on neural networks, similar to the human brain, to process inputs, encompassing visual processing and text understanding. To extract meaningful features from videos, Convolutional Neural Networks (CNNs) such as ReXNET50 and EfficientNetB0 are employed, capturing visual details through convolutional layers. These architectures autonomously learn hierarchical representations from raw pixels, excelling at tasks like object detection. On the textual side, Bidirectional Encoded Representations from Transformers (BERT) and its variants, like RoBERTa base and RoBERTa large, are used for encoding text. BERT's ability to comprehend complex interactions through pre-training and fine-tuning makes it a powerful tool for MML.

The alignment challenge revolves around aligning features from diverse modalities to enable their learning in a common representation space. This crucial task ensures that data from different sources and formats can be effectively integrated and processed together. To address these disparities, the focus is on identifying common features and framing them in a shared embedded space, which is where Multimodal Machine Learning excels. Two approaches are explored for bridging the gap between visual and text input features: Similarity Learning, which uses Siamese Neural Networks for feature alignment, and Contrastive Approaches, employing measures like infoNCE loss and circle loss to gauge similarity between multimodal features, enhancing alignment models. Our journey highlights the pivotal role of feature extraction techniques in extracting information from vision and language modalities, emphasizing model selection, including the use of CNNs for visual encoding and transformers-based models like BERT for text processing.

In the pursuit of feature alignment within a common latent space, we've explored similarity-based techniques and contrastive methods to narrow the "modality gap" between features. Despite our efforts to minimize the modality gap, performance losses in downstream tasks persist, often attributed to the next module in the pipeline. Within the modality-specific latent space, we've carefully assessed how modality-specific feature representations influence downstream performance. Yet, an "information gap" persists even with these advanced models. To address this challenge, we consider the use of regularization techniques such as deep feature loss and modality-specific inter and intra-modality losses. Moreover, our progress is hindered by the pervasive problem of data scarcity in machine learning, prompting our exploration of innovative data synthesis methods, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and data augmentation.

To view or add a comment, sign in

LinkedIn respects your privacy

Meet the NORCICS PhDs and PostDocs – Touseef Sadiq

SFI-NORCICS .

Recommended by LinkedIn

More articles by SFI-NORCICS .

Others also viewed

Understanding the Hierarchy of AI

Face Recognition using Transfer learning on a pre-trained model(VGG16)

AI Platforms journey in 2025 LLM‐s GenAI

Exposition of AI Technology, from Machine Learning/Deep Learning to Foundation Models & LLMs

Face Recognition using Transfer learning on a pre-trained model(VGG16)

The Power of Artificial Intelligence: Understanding the Basics

Expected Improvements and Innovations in Machine Learning Algorithms and Their Applications in Real-World Scenarios by 2025

A Comprehensive Guide to AI, Machine Learning, and Deep Learning: Understanding Key Algorithms and Techniques

Generative AI learning path

My Top 3 Takeaways from Deep Learning Day!

Explore content categories

Recommended by LinkedIn

More articles by SFI-NORCICS .

Meet the NORCICS PhDs and PostDocs – Konstantinos Kampourakis

Meet the NORCICS PhDs and PostDocs – Gizem Erceylan

Meet the NORCICS PhDs and PostDocs – Jessica Barbosa Heluany - Part 2

Meet the NORCICS PhDs and PostDocs – Julie Langedahl Leirmo

The channel is the message - By James Wright

Meet the NORCICS PhDs and PostDocs – Jessica Barbosa Heluany

Meet the NORCICS PhDs and PostDocs – Shao-Fang Wen (Steven)

Meet the NORCICS PhDs and PostDocs – Vyron Kampourakis

Meet the NORCICS PhDs and PostDocs - Trond Vatten

SFI NORCICS workshops June 5th and 6th in Gjøvik

Others also viewed

Understanding the Hierarchy of AI

Face Recognition using Transfer learning on a pre-trained model(VGG16)

AI Platforms journey in 2025 LLM‐s GenAI

Exposition of AI Technology, from Machine Learning/Deep Learning to Foundation Models & LLMs

Face Recognition using Transfer learning on a pre-trained model(VGG16)

The Power of Artificial Intelligence: Understanding the Basics

Expected Improvements and Innovations in Machine Learning Algorithms and Their Applications in Real-World Scenarios by 2025

A Comprehensive Guide to AI, Machine Learning, and Deep Learning: Understanding Key Algorithms and Techniques

Generative AI learning path

My Top 3 Takeaways from Deep Learning Day!

Explore content categories