Metalinguist: enhancing hate speech detection with cross-lingual meta-learning

Hashmi, Ehtesham; Yayilgan, Sule Yildirim; Abomhara, Mohamed

doi:10.1007/s40747-025-01808-w

Metalinguist: enhancing hate speech detection with cross-lingual meta-learning

Original Article
Open access
Published: 27 February 2025

Volume 11, article number 179, (2025)
Cite this article

You have full access to this open access article

Download PDF

View saved research

Complex & Intelligent Systems Aims and scope Submit manuscript

2111 Accesses
5 Citations
Explore all metrics

Abstract

The rise of social media has led to an increase in hate speech. Hate speech is generally described as a deliberate act of aggression aimed at a particular group, intended to harm or marginalize them based on specific attributes of their identity. While positive interactions in diverse communities can greatly enhance confidence, it is important to acknowledge that negative remarks such as hate speech can weaken community unity and present a significant impact on people’s well-being. This highlights the need for improved monitoring and guidelines on social media platforms to protect individuals from discriminatory and harmful actions. Despite extensive research on resource-rich languages, such as English and German, the detection and analysis of hate speech in less-resourced languages, such as Norwegian, remains underexplored. Addressing this gap, our study leverages a metalinguistic approach that uses advanced meta-learning techniques to enhance the detection capabilities across bilingual texts, effectively linking technical advancements directly to the pressing social issue of hate speech. In this study, we introduce techniques that adapt models that deal with hate speech detection within the same languages (intra-lingual), across different languages (cross-lingual), and techniques that adapt models to new languages with minimal extra training, independent of the model type (cross-lingual model-agnostic meta-learning-based approaches) for bilingual text analysis in Norwegian and English. Our methodology incorporates attention mechanisms (components that help the model focus on relevant parts of the text) and adaptive learning rate schedulers (tools that adjust the learning speed based on performance). Our methodology incorporates components that help the model focus on relevant parts of the text (attention mechanisms) and tools that adjust the learning speed based on performance (adaptive learning rate schedulers). We conducted various experiments using language-specific and multilingual transformers. Among these, the combination of Nor-BERT and LSTM with zero-shot and few-shot model-agnostic meta-learning achieved remarkable F1 scores of 79% and 90%, highlighting the effectiveness of our proposed framework.

Multi-class hate speech detection in the Norwegian language using FAST-RNN and multilingual fine-tuned transformers

Article Open access 21 March 2024

Empowering Hate Speech Detection: A Comparative Exploration of Deep Learning Models

A comprehensive review on automatic hate speech detection in the age of the transformer

Article Open access 09 October 2024

Introduction

In the era of digital transformation, social computing has revolutionized the way we communicate, fundamentally altering interpersonal interactions. This shift is most apparent in the pervasive adoption of social media platforms and online forums [52, 57]. These digital arenas have not only expanded the scope and speed of communication but have also introduced new dynamics in the way individuals connect, share information, and collaborate. The continuous advancement of digital technologies further amplifies these interactions, shaping a new environment of digital communication that is more immediate, inclusive, and interactive than ever before [20, 25]. However, this transformation also presents significant challenges, especially in managing Hate Speech (HS) which varies widely across different regions and cultures as well as making its identification and regulation complex [21, 24].

The definition of HS often encompasses a range of negative behaviors including cyberbullying, flaming, profanity, abusive language, expressions of toxicity, and acts of discrimination [8, 31, 32]. Each of these forms can lead to highly controversial discussions and escalate tensions, potentially resulting in serious social consequences such as violent crimes or physical attacks [17, 41].

The rapid growth of social media usage highlights the critical demand for sophisticated HS detection systems that can effectively filter and eliminate harmful content. This escalating requirement shows the significance of innovative strategies to manage online discourse. Advances in Artificial Intelligence (AI) and Natural Language Processing (NLP) have further highlighted the importance of techniques for identifying HS [26, 28]. Prompt and precise methods for identifying and addressing these issues demand urgent and careful attention due to their rapid proliferation and evolving nature [45]. Many studies have primarily focused on well-resourced languages such as English [19]. This emphasis on resource-rich languages has resulted in a significant gap in research related to HS, particularly for languages with fewer resources, like Norwegian, Finnish, Irish, and Portuguese [43]. This study is dedicated to exploring HS detection within and across Norwegian and English using intra-lingual, cross-lingual, and meta-learning-based methods, particularly in low-resource scenarios. The focus is on effectively leveraging smaller datasets to improve detection accuracy, which is crucial for languages that may not have extensive annotated resources. This approach aims to enhance the adaptability and effectiveness of HS detection systems in both languages, addressing the challenges of linguistic diversity and lack of data.

This study addresses critical challenges in HS detection within digital communication platforms, particularly focusing on the following research questions:

RQ1: How effective are intra-lingual and cross-lingual learning strategies in detecting HS across bilingual texts?
RQ2: Can a meta-linguistic framework leveraging meta-learning techniques rapidly adapt to new languages and dialects for HS detection?
RQ3: Can cross-lingual meta-learning techniques enhance the performance of HS detection in Norwegian, a language characterized by both limited data availability and fewer linguistic resources?

Work contributions

Our contributions to the proposed work are as follows,

1.
We collected and annotated a diverse dataset from Twitter, encompassing a broad range of HS expressions in both English and Norwegian. For annotation, we employed the Llama 3 Large Language Model (LLM), optimized to accurately classify text into specific categories.
2.
By incorporating both intra-lingual and cross-lingual learning strategies, our methodology effectively handles the nuances of language-specific and bilingual contexts. This dual approach allows for more accurate detection of HS by understanding contextual nuances in both Norwegian and English.
3.
We implemented a meta-linguistic framework that leverages meta-learning techniques to understand and detect HS across bilingual texts. This approach enables the model to rapidly adapt to new languages and dialects, a crucial advancement for addressing the scarcity of resources in underrepresented languages.
4.
The application of zero-shot and few-shot learning techniques in our model highlights its ability to perform with limited training data. This is particularly beneficial for low-resource languages with limited annotated datasets.

Structure of the paper

The remaining part of the paper is structured as follows: Sect. “Related works” discusses the existing research work on HS. Section “Work methodology” explains the proposed work methodology. Section “Results and discussion” focuses on the results and discussions. Section “Work limitation” highlights the work limitations. Section “Conclusion and future work” presents the conclusion and future work.

Table 1 Comparative analysis of State-of-the-Art methods

Full size table

Related works

This section provides an overview of the current research landscape and academic contributions in AI, specifically focusing on HS detection using Machine Learning (ML), Deep Learning (DL), transformers, cross-lingual learning, and meta-learning across different languages.

Numerous studies, as summarized in Table 1, have extensively explored the complexities of bilingual or multilingual datasets in the context of HS detection. This body of work highlights the advancements in methodology and the increasing sophistication of models that can navigate the nuances of multiple languages simultaneously. Mazari et al. [37] utilize a BERT-based ensemble learning approach for multi-aspect HS detection on social media. By integrating pre-trained Bidirectional Encoder Representations from Transformers (BERT) [12] with DL-based models like Bi-directional Long Short Term Memory (BiLSTM) [58] and Bi-directional Gated Recurrent Unit (BiGRU) [13], the approach effectively classifies various forms of HS. This novel combination achieves a significant improvement in detection performance, evidenced by a high ROC_AUC score of 98.63%, demonstrating its potential as a robust tool for identifying and mitigating HS online. Maity et al. [35] created a benchmark dataset for HS identification in Thai, which was improved with sentiment annotations to improve detection algorithms. This study uses a multitask learning framework to merge Sentiment Analysis (SA) and HS identification to improve performance, with a dual-channel DL-based that combines FastText [23] and BERT embeddings. The results show a significant improvement in both HS and sentiment detection tasks, implying that SA greatly helps in the identification of HS, which is critical for the integrity of digital communication platforms. Saeed et al. [50] focused on the Kurdish language, which highlights a significant gap in HS detection for under-researched languages. Their work, ‘Hate Speech Detection in Social Media for the Kurdish Language,’ presents a dataset derived from Facebook comments, categorized into ‘hate’ and ‘Not hate.’ Using algorithms like SVM, DT, and NB, the study demonstrates SVM’s superior performance with an f1-score of 0.687.

Hashmi et al. [18] explored multi-class HS detection in Norwegian by merging DL-based models with transformers. The authors demonstrated enhanced detection accuracy using both language-specific and multilingual transformers, fine-tuned with advanced generative configurations. The introduction of a new FAST-RNN model and the application of FastText embeddings contributed significantly to the detection accuracy. The use of Local Interpretable Model-Agonstic Explanations (LIME) [23] for the decision-making of the learning classifier further highlights the transparency of the predictions made by the classifiers in the detection of HS. Awal et al. [6] presented HateMAML, a meta-learning framework that is agnostic to specific models and targets HS detection in languages with limited resources. Utilizing self-supervision, the framework effectively addresses the challenge of limited data, preparing Language Models (LMs) for quick adaptation across new linguistic and contextual domains. In tests across five datasets and eight under-resourced languages, HateMAML achieved performance enhancements exceeding 3% over established baselines in scenarios involving multiple languages and varying domains. Lu et al. [34] developed a method to detect HS in English using a Dual Contrastive Learning (DCL) [27] framework across two benchmark datasets. This approach successfully addresses the difficulties of complex interpretations and uneven data distribution in HS identification. The system uses both self-supervised and supervised learning techniques to improve its ability to detect subtle semantic clues in abusive language. Furthermore, the use of focused loss improves performance by resolving the issues presented by imbalanced datasets. Their methods achieved accuracy scores of 0.68 and 0.96 on the two datasets, respectively.

Mozafari et al. [40] explored cross-lingual few-shot HS and offensive language detection using a meta-learning approach, particularly useful in low-resource languages. They leveraged both optimization-based and metric-based meta-learning models (MAML and Proto-MAML) across multiple languages. Testing on various datasets, Proto-MAML emerged as the most effective model, showcasing significant adaptability and achieving high-performance metrics, notably a peak accuracy of over 95% on multiple datasets. This study highlights the feasibility and effectiveness of meta-learning in managing the scarce data challenge in multilingual scenarios. Prasad et al. [48] developed a real-time, multi-lingual model for detecting hate and offensive speech on social networks using a meta-learning approach, focusing on English, Hindi, Hinglish, Bengali, and Marathi. In the study, employing meta-learning allowed for rapid adaptation to new languages, particularly beneficial for low-resource languages with limited data. Extensive testing with the meta-learning model showed that meta-learning models generally outperform traditional ML-based models, especially when data is scarce. The model achieved its highest accuracy of 0.88 for the Bengali language. Hashmi et al. [19] presented a study on detecting misogynistic content in bilingual (English and Italian) online communications using FastText word embeddings and explainable AI (XAI) techniques. The proposed model combines these embeddings with language-specific transformer-based models, optimized through hyperparameter tuning and regularization, to enhance interpretability and detection accuracy. The study integrated traditional ML-based models such as Decision Trees (DT), Logistic Regression (LR), and Random Forests (RF), along with DL-based models like GRU, LSTM, and CNN-LSTM configurations, into their methodology. They further leveraged the language-specific and general-purpose transformer-based models of hugging face^{Footnote 1} such as, including DeHateBert [4], ItalianBERT,^{Footnote 2} SetFit [11], BERT, and Robustly Optimized BERT (RoBERTa) [33], to refine their approach. By utilizing LIME for explainability, the study also offers insights into the decision-making process of the model. The approach achieved robust performance, with the CNN-LSTM configuration reaching an impressive F1-score of up to 0.95.

Njoku et al. [42] introduced ’MetaHate’, a DL-based HS detection system designed for metaverse environments, utilizing a GloVe (Global Vectors for Word Representation) [47] as word embedding with Multilayer Perceptron (MLP) [38] and lightweight CNN model deployed on the Roblox server. The system employs XAI with LIME for transparency, enhancing user trust through interpretable AI. Notably, model quantization reduced its size by 93.59%, ensuring efficient real-time performance. This study advances HS detection within the metaverse, promoting safer virtual interactions. The MLP model reached an F1 score of up to 0.85, which indicates its robust predictive capabilities in classifying HS within the given dataset. After reviewing the existing literature, it is evident that while many studies have explored HS detection using traditional ML and DL-based approaches, only a few have addressed the unique challenges of low-resource bilingual contexts. In this study, we extend these methods by incorporating a hybrid Transformer-LSTM architecture and a Model-Agnostic Meta-Learning (MAML) framework with adaptive learning rate schedulers, tailored specifically for Norwegian and English. This novel approach allows for a nuanced understanding of cross-lingual and few-shot learning capabilities in HS detection.

Work methodology

The proposed research methodology of this study involves a systematic approach to achieving promising results as shown in Fig. 1. Each of the steps from our research methodology is further elaborated in detail below.

Data collection

Data collection is crucial for creating algorithms that can effectively identify HS online. Gathering and thoroughly analyzing relevant data is essential for optimizing these computational models. In our study, we focused on HS detection within bilingual contexts, collecting data from Twitter in both English and Norwegian over one month. Norwegian, being a low-resource language in this domain, prompted a systematic collection aimed at addressing the lack of data. The primary goal was to develop effective methods for scenarios where linguistic data is limited.

Our targeted collection strategy resulted in a corpus of 1100 textual entries, with a higher number of instances in Norwegian compared to English. This imbalance in the dataset shows our aim to improve the strength of our models in Norwegian, a language that usually has less online content compared to more widely used global languages. This approach is essential for facilitating our cross-lingual learning experiments, allowing us to effectively explore and refine methods of detecting HS across different linguistic contexts. By analyzing a dataset where Norwegian entries predominate, we aim to address the inherent challenges associated with sourcing HS-specific content in less-represented languages, thereby enhancing our ability to conduct meaningful cross-lingual analysis. Figure 2 and Table 2 represent the percentage of language and count of instances in the dataset concerning both languages.

Data preprocessing

Data preprocessing is essential for enhancing classifier performance. This process involves removing redundant text and structuring data effectively, which significantly improves data quality and usability for both training and subsequent analysis [18]. Such meticulous data preparation greatly increases the efficiency of learning models [22]. In our research, we focused on two main columns: "text," which contained all user comments, and "label," which classified the comments into two distinct categories. The preprocessing steps for the "text" included converting all text to lowercase, eliminating HTML characters, discarding non-essential characters like ASCII symbols, removing stop words, and tokenizing the text.

Table 2 Distribution of hateful and neutral instances by language

Full size table

Our preprocessing approach incorporated the stemming of Norwegian text to bring words down to their basic forms. This was done using the SnowballStemmer^{Footnote 3} from the NLTK^{Footnote 4} library, designed specifically for Norwegian. For English text, we utilized the PorterStemmer, also from the NLTK library, known for its efficiency in reducing English words to their root forms. We also applied Python’s RegEx^{Footnote 5} library to remove elements like numbers, punctuation, and specific patterns including email addresses, URLs, and phone numbers, which helped clean up our dataset. Furthermore, after removing duplicate and null rows, the number of usable entries was reduced to 1043 as a result of these preprocessing efforts.

Data annotation

Data annotation is important in the development of learning-based algorithms because it ensures that the training data is correctly labeled, allowing the models to learn and generate accurate predictions. In our data annotation process for identifying hateful or neutral text, we opted to utilize the Llama 3 model from Meta, specifically leveraging the 70B parameter version due to its enhanced annotation quality and refined reasoning capabilities [1].

Ollama serves as the platform to operate Llama 3^{Footnote 6} with this parameter scale locally and also enables the customization of models via a Modelfile [7]. The larger parameter count significantly improves the model’s ability to comprehend and process complex language structures, making it a superior choice for precision in natural language understanding tasks. We presented the model with the refined prompt, "Could you classify this text as either hateful or neutral?", to ensure clear and direct queries were made. The choice of Llama 3, particularly the 70B variant, was driven by its robust performance across a variety of benchmarks, showcasing its superior dialogue handling and safety features crucial for the sensitive nature of HS detection. This strategic selection aligns with our goal to achieve high accuracy and reliability in our cross-lingual HS identification efforts. After data collection and annotation, validation becomes a crucial step in ensuring the reliability of the model. The initial data annotation was performed using the Llama 3 language model, which provided preliminary labels for our dataset. The validation of these annotations was then conducted over a period spanning from May to mid-July 2024. This crucial phase was carried out with the assistance of native Norwegian speakers, specifically students from the Norwegian University of Science and Technology (NTNU). To assess the reliability of Llama 3’s annotations and to identify potential biases, a structured inter-annotator agreement protocol was implemented. For validation, different segments of the dataset, along with their corresponding labels, were provided to 5 students from NTNU from various disciplines, including linguistics and computer science. These students were deliberately chosen based on their native language proficiency and their diverse academic disciplines to ensure a comprehensive review of the data annotations.

Table 3 highlights the features and benefits of using the Llama 3 model, alongside Table 4 which displays some preprocessed instances from the bilingual dataset along with their annotated class labels.

Table 3 Features and nenefits of the Llama 3Model

Full size table

Table 4 Examples of bilingual HS text after text preprocessing

Full size table

Modeling approaches

After gathering and annotating the data, we utilized a variety of methods, including transformer-based models and MAML in conjunction with DL-based models such as LSTM. The inclusion of LSTM was particularly important due to its efficiency in speeding up computation, which was essential given our computational constraints which included limitations on processing power and memory capacity. These constraints necessitated the use of models that balance performance with computational efficiency to ensure timely processing within our available resources. By feeding the transformer’s encodings into the LSTM, we achieved smoother results. Our experimental framework also included both intra-lingual and cross-lingual learning using transformers, complemented by zero-shot and few-shot learning through MAML. We considered other DL models like GRU and BiLSTM but opted not to include them in our study as the preliminary results were nearly identical to those achieved with our primary models, thereby aligning well with our research objectives.

Intral-lingual and cross-lingual learning with transformers

The Transformer is a model in NLP that performs sequence-to-sequence tasks using a self-attention mechanism, which effectively handles long-range dependencies. This architecture consists of two primary components: an encoder and a decoder [53].

Intra-lingual and cross-lingual learning involved the use of various transformers tailored to specific language needs, including language-specific models such as Nor-BERT, nb-BERT, and scandi-BERT, alongside multilingual models like mBERT, and general-purpose models such as BERT and XLNet. An in-depth analysis of Norwegian language-specific transformers is available in the literature [18, 29, 30]. Intra-lingual learning, where a model is trained and tested within the same language environment, was applied using the Norwegian language. This approach focuses on refining the model’s ability to understand and process nuances specific to Norwegian, enhancing both comprehension and prediction accuracy within this single-language framework. Mathematically, this can be represented by initially embedding, encoding, and then predicting outcomes from the target dataset $\mathcal {D}_{\text {tgt}}$, and is encapsulated in the comprehensive loss calculation:

$$\begin{aligned} \text {Loss} = -\sum \log \left( \text {Classifier}\left( \mathcal {D}_{\text {tgt}}; \theta \right) \right) \end{aligned}$$

(1)

In Eq. 1, $\text {Classifier}(\cdot ; \theta )$ encapsulates the entire process of embedding, encoding, and applying the softmax function within a parameterized function, where $\theta $ represents all the trainable parameters, including those of the embedding layer, the encoder, and the output classifier (including $W$ and $b$). - $\mathcal {D}_{\text {tgt}}$ is the input dataset in the target language used for both training and evaluation. The loss function directly computes the negative log-likelihood of the true class labels given the output probabilities from the classifier.

The training and evaluation cycle of the model, ensuring consistency in learning and testing environments, is summarized in the Eq. 2,

$$\begin{aligned} {Model}_{\text {Intra}} = {Train}\left( \mathcal {D}_{\text {tgt}}\right) \rightarrow {Test}\left( \mathcal {D}_{\text {tgt}}\right) \end{aligned}$$

(2)

In our cross-lingual classification task, we deployed a transformer-based architecture that utilizes the encoder to generate deep contextual embeddings from input texts. This process begins by embedding the raw input text from both the source and target languages into a dense vector space, subsequently processed by the TransformerEncoder using self-attention mechanisms. The entire process from embedding to context encoding can be represented using Eq. 3,

$$\begin{aligned} h_{\text {src}}, h_{\text {tgt}} = Encoder\left( \begin{bmatrix} E\left( \mathcal {D}_{\text {src}}\right) \\ E\left( \mathcal {D}_{\text {tgt}}\right) \end{bmatrix}; \theta \right) \end{aligned}$$

(3)

Here, $\theta $ includes all the shared parameters within the Encoder, which applies to both the source and target embeddings. This notation implies that the embeddings for both datasets are stacked or concatenated before being fed into a single Encoder model, which processes the data using parameter-shared mechanisms such as self-attention and feed-forward networks within the transformer blocks, $ h_{\text {src}} $ and $ h_{\text {tgt}} $ are the contextual embeddings for the source and target language texts, respectively, produced by applying a Encoder that integrates self-attention mechanism [53] described in the Eq. 4,

$$\begin{aligned} A = \text {softmax}\left( \frac{\left( W_Q h\right) \left( W_K h\right) ^T}{\sqrt{d_k}}\right) \left( W_V h\right) \end{aligned}$$

(4)

where $W_Q$, $W_K$, and $W_V$ are trainable weight matrices for queries, keys, and values, respectively, and $d_k$ is the dimensionality of the keys, optimizing the attention mechanism’s focus across different segments of the text. The classification output is generated by passing the encoded embeddings through a dense layer with a softmax function to predict class probabilities, combined into a single Eq. 5 for clarity:

$$\begin{aligned} y_{\text {src}}, y_{\text {tgt}} = \text {softmax}\left( W \left[ h_{\text {src}}; h_{\text {tgt}}\right] + b\right) \end{aligned}$$

(5)

where $W$ and $b$ are the weights and biases of the output layer, trained to minimize the cross-entropy loss function across both languages described in Eq. 6 as follows:

$$\begin{aligned} \text {Loss} = -\sum _{i=1}^{N} \log \left( p_{i,\text {src}}\left[ y_{i,\text {src}}\right] \cdot p_{i,\text {tgt}}\left[ y_{i,\text {tgt}}\right] \right) \end{aligned}$$

(6)

where $p_{i,\text {src}}$ and $p_{i,\text {tgt}}$ are the predicted probabilities for the source and target classes, $y_{i,\text {src}}$ and $y_{i,\text {tgt}}$ are the true labels, and $N$ is the total number of examples in both datasets.

This framework allows our model to effectively handle and classify texts across different languages by leveraging the robust encoding capabilities of the transformer. Training the model on the source language dataset $\mathcal {D}_{\text {src}}$ and testing it on the target English dataset $\mathcal {D}_{\text {tgt}}$ evaluates its ability to transfer learned knowledge and generalize across linguistic boundaries. The process emphasizes the model’s capability to manage complexities associated with multilingual understanding and categorization, ensuring accurate classification across diverse linguistic contexts. The configuration details and hyperparameters of transformer-based models are summarized in Table 5.

Table 5 Configuration details for transformer based models

Full size table

Model agnostic meta-learning for bilingual hate speech detection

Meta-learning, often described as "learning to learn," allows models to quickly adapt to new tasks with minimal data. It is invaluable in scenarios where task-specific data is scarce, such as with less-resourced languages.

Bilingual Model-Agnostic Meta-Learning (MAML)

Model-Agnostic Meta-Learning (MAML) is a versatile framework designed to rapidly train models to adapt to new tasks with only a few training examples [51, 54]. Unlike other meta-learning algorithms, MAML is uniquely structured to optimize a model’s initial parameters in a way that minimal adjustments are needed for effective adaptation to a variety of new tasks [56]. This ability makes MAML especially powerful in scenarios where data is scarce or tasks vary significantly, allowing for quick generalization without the need for task-specific tuning. Its model-agnostic nature means it can be applied across different types of neural networks, enhancing its utility in diverse applications from natural language processing to computer vision.

In our bilingual HS task, the Eq. 7 highlights the meta-optimization process in our bilingual context, where $ \theta $ represents model parameters, $ \alpha $ the learning rate, $ T_i $ the training tasks sampled according to their distribution $ p(T) $, and $ \mathcal {L} $ the task-specific loss function.

$$\begin{aligned} \text {Meta-Loss} = \sum _{T_i \sim p(T)} \mathcal {L}_{T_i}\left( f_{\theta - \alpha \nabla _{\theta } \mathcal {L}_{T_i}\left( f_\theta \right) }\right) \end{aligned}$$

(7)

MAML is utilized to optimize a model’s initial parameters using 15 different tasks from the Norwegian language dataset, each task containing 100 examples. This setup allows the model to efficiently adapt to new tasks with a minimal number of gradient updates. The adaptation process for training in Norwegian and testing in English using zero-shot learning has been described in Eq. 8:

$$\begin{aligned} \theta ' = \theta - \beta \nabla _\theta \sum _{T_i \in \mathcal {T}_{\text {norwegian}}} \mathcal {L}_{T_i}\left( f_{\theta _i'}\right) \end{aligned}$$

(8)

MAML facilitates rapid adaptation and cross-lingual efficiency, enabling smooth transfer from Norwegian to English, thus accommodating varied linguistic contexts effectively.

Hybrid transformer-LSTM architecture in MAML

To effectively combine deep contextual embeddings and sequence processing, we integrate a Transformer-LSTM architecture within our MAML framework. This architecture allows the model to capture complex dependencies and sequential information efficiently, enhancing its adaptability to new language tasks. We explored several recurrent neural network architectures, including GRU and BiLSTM, but selected LSTM as our base model due to its effective balance between computational efficiency and capability to manage long-range dependencies. This decision was determined by comparative performance assessments, where LSTM showed better results as compared to other RNN variants.

To enhance clarity and compactness, I will streamline the descriptive text and modify the equations to ensure they are concise while maintaining the integrity of the mathematical expressions. This revision aims to make the manuscript text more publication-ready.

Adaptation of the hybrid architecture in MAML

The meta-training of our hybrid Transformer-LSTM model leverages a series of tasks derived from the Norwegian dataset, strategically divided into support and query sets. This division is essential in MAML, simulating the model’s ability to quickly adapt to unseen tasks. The learning process for the base model involved 15 distinct tasks, with each iteration incorporating 100 diverse data points. The adaptation process is encapsulated by the following simplified Eq. 9:

$$\begin{aligned} \theta ' = \theta - \beta \sum _{T_i} \nabla _{\theta } \mathcal {L}_{\text {query}}\left( M\left( f_{\text {support}}'\right) \right) \end{aligned}$$

(9)

where, $ \theta $ denotes the initial parameters, $ \theta ' $ the parameters updated post-training on the support set $ T_{i, \text {support}} $, and $ \mathcal {L}_{\text {query}} $ the loss function evaluated on the query set. $ \beta $ represents the learning rate for the meta-optimization.

Practical application: zero-shot and few-shot learning

Our MAML-trained model employs both zero-shot and few-shot learning strategies for testing and adaptation to English, a language not included in the initial training data. The model utilizes parameters $ \theta ' $, meta-learned from the Norwegian dataset, to generalize to English without prior exposure (zero-shot learning) and is fine-tuned with a limited number of English examples (few-shot learning).

Zero-Shot Learning: This strategy applies the meta-learned parameters directly to English data, testing the model’s ability to transfer learning across linguistic boundaries mentioned in the Eq. 10:
$$\begin{aligned} \theta _{\text {zero-shot}} = \text {Transfer}\left( f_{\theta '}, \mathcal {D}_{\text {english}}\right) \end{aligned}$$
(10)
Few-Shot Learning: In this approach, $ \theta ' $ is further adjusted using a small, representative set of English examples, enhancing the model’s accuracy and adaptability as highlighted in the Eq. 11:
$$\begin{aligned} \theta _{\text {few-shot}} = \theta ' - \beta \nabla _{\theta '} \sum _{T_j in \mathcal {T}_{\text {english}}} \mathcal {L}_{T_j}\left( f_{\theta _j''}\right) \end{aligned}$$
(11)

In these equations, $ f_{\text {support}}' $ abstracts the combined embedding and sequence modeling processes, and $ \mathcal {L}_{\text {query}} $ is the loss calculated on the query sets to evaluate rapid adaptation. This method ensures the model not only adapts to English data efficiently but also maintains robust performance across different language contexts, demonstrating the effectiveness of the MAML framework in managing diverse and challenging linguistic tasks.

Results and discussion

In our evaluation, we utilized standard metrics such as accuracy, precision, recall, and F1-score to quantitatively assess the performance of our models.

Computational efficiency

Following are the computational resources used in this study,

Hardware and Optimization: All the experiments in this study were conducted on a MacBook M3 Max with 128GB of unified memory.
Transformer-Based Models: Due to their architectural complexity, transformer-based models necessitated about 20 min per epoch for training. Despite the longer duration, the significant improvements in detection capabilities justify the computational investment.
Model Optimization: In addition to post-quantization, we implemented several optimization strategies to further enhance model efficiency. These strategies included layer pruning, adjusting dropout rates, and applying batch normalization, which together helped minimize overfitting and speed up the training process.

In this section, we detail the methodology employed in our experiments, focusing on how we have structured our training and testing across various linguistic scenarios.

1.
Phased Analysis of Language Models: The train-test split for our intra-lingual experiment was set at 90% for training and 10% for testing, a decision driven by previous results indicating that this ratio provided the best performance outcomes for models operating within a single language. For the meta-training portion, the split was adjusted to 80% for training and 20% for validation across different tasks, accommodating the need for more robust testing scenarios typical in meta-learning frameworks. In our cross-lingual learning experiments, the models were exclusively trained on Norwegian data and subsequently tested on English, which served as the target language.
2.
Progression from Cross-Lingual Learning to Advanced Meta-Learning Techniques: The analysis of the results was segmented into four distinct phases: initially, we evaluated the model on the source language ($\mathcal {S}$, Norwegian) to understand its performance within the native linguistic context. In the second phase, we engaged in cross-lingual learning using transformers, where the model trained on Norwegian data was tested on the target language ($\mathcal {T}$, English). The third and fourth experiments were designed to explore more advanced meta-learning techniques. In the third phase, we applied zero-shot cross-lingual meta-learning strategies, training the model solely on Norwegian data and then evaluating it in English without any additional training examples from the target language. For the fourth phase, we extended this setup to few-shot cross-lingual meta-learning, incorporating a small number of examples from the target language to enhance adaptation and learning. This progression from cross-lingual learning with transformers to intricate meta-learning experiments was adopted for two primary reasons: firstly, the instances of the target language in our dataset were significantly fewer compared to the abundant data available in the source language; secondly, our project is keenly focused on advancing cross-lingual meta-learning capabilities.

The Table 6 represents the summary of train-test splits for various language learning experiments. In our MAML framework, we implemented a train-test split for both the support and query sets, allocating 80% of the data to the support set and 20% to the query set for all tasks.

Table 6 Summary of Train-Test Splits for Various Language Learning Experiments

Full size table

Analysis of results: intra-language model evaluation on Norwegian

Initially, we conducted an Intra-Language Model Evaluation using Norwegian as the source language ($\mathcal {D}_{src}$). For this evaluation, we utilized both language-specific and multilingual transformer-based pre-trained language models. The dataset was split into training and testing subsets at proportions of 80% and 20%, respectively.

The intra-language evaluation conducted on Norwegian reveals significant variations in performance among the different models, with $Nor-BERT$ demonstrating a distinct advantage. $Nor-BERT$ achieved remarkably high scores across all metrics such as precision, recall, accuracy, and F1-score each at 0.95. This exceptional performance can likely be attributed to the model’s specialized training in Norwegian language data, which ensures that it is well-optimized for the syntactic and semantic nuances of Norwegian. Comparatively, $nb-BERT$, which is also tailored towards Norwegian but perhaps to a lesser degree than $Nor-BERT$, shows strong results with an F1-score of 0.80 and consistent scores of 0.77 across the other metrics. This indicates that $nb-BERT$, while not as finely tuned as $Nor-BERT$, still effectively captures the linguistic features necessary for processing Norwegian text.

The other models in the evaluation such as XLNet, BERT, mBERT, and scandiBERT exhibit more moderate performances. scandiBERT, which is designed for Scandinavian languages, performs slightly better than the general-purpose models like BERT and mBERT, with a 0.70 F1-score and slightly higher scores in precision and recall. This suggests that even a regional focus, as seen with scandiBERT, can yield improvements in model efficacy over more globally trained models like BERT and mBERT. BERT and mBERT show nearly identical performances in precision and F1-scores but differ in recall and accuracy, where mBERT outperforms BERT. This could be due to mBERT’s exposure to multiple languages during training, providing it with a broader understanding that might be beneficial even in monolingual scenarios. XLNet, typically strong in a variety of NLP tasks, scores the lowest among the evaluated models in this intra-language setup. This might reflect its training regimen and architecture, which perhaps do not align as closely with the specific requirements of processing Norwegian text.

The following sub-figures in Fig. 3 represent the confusion matrices for BERT, mBERT, and Nor-BERT, respectively, each evaluated using the source language, Norwegian. We selected to highlight the confusion matrices for these three models because BERT functions as a versatile general-purpose transformer, mBERT has been trained across multiple languages including Norwegian, and Nor-BERT is specifically designed for Norwegian.

Analysis of results: cross-lingual transfer learning with transformers

The Table 8 summarizes the performance of various Transformer-based models in a cross-lingual transfer learning setup, where models were trained on a Norwegian dataset ($\mathcal {D}_{src}$) and evaluated on an English dataset ($\mathcal {D}_{tgt}$) (see Table 7).

Table 7 Analysis of results: intra-language Model Evaluation on Norwegian

Full size table

Both XLNet and BERT achieved identical performance with a precision of 0.68, a recall of 0.52, and an accuracy of 0.52, accompanied by an F1-Score of 0.55. This suggests moderate precision but lower recall, indicating that these models were conservative in predicting positive instances, missing a significant number of these. In comparison, mBERT, which benefits from multilingual training, provided us with the best performance across all metrics, achieving precision and recall of 0.75 and 0.74, respectively, and an F1-Score of 0.74. Language-specific models like scandiBERT and $nb-BERT$, likely fine-tuned for Scandinavian languages, showed balanced performance, with precision and recall around 0.70 and an F1-score of 0.70. This balance suggests that these models effectively manage both false positives and negatives, making them reliable for cross-lingual applications. $Nor-BERT$ slightly outperformed the other Scandinavian models in recall and accuracy, indicating a nuanced understanding of Norwegian linguistic features beneficial even when applied to English texts. The best performance of models like mBERT and language-specific variants like $Nor-BERT$ in cross-lingual tasks supports the utility of leveraging multilingual knowledge bases and specialized training in improving performance in linguistically diverse applications.

Table 8 Cross-Lingual Transfer Learning with Transformers

Full size table

The following sub-figures in Fig. 4 represent the confusion matrices for BERT, mBERT, and Nor-BERT respectively.

Addressing RQ1, our findings demonstrate that intra-lingual and cross-lingual learning strategies are effective by showing improved f1-scores and precision in language-specific contexts such as Norwegian and broader cross-lingual applications.

Analysis of Results: Zero-Shot Cross-Lingual Meta-Learning In response to RQ2, our meta-linguistic framework has proven its capability to adapt rapidly to new languages and dialects, as evidenced by the success of zero-shot learning approaches that effectively transition detection capabilities from Norwegian to English. For this experiment, we employed a zero-shot learning strategy to train the meta-model on the source language ($\mathcal {D}_{src}$). This approach involved training without using any examples from the target language ($\mathcal {D}_{tgt}$), which served as the foundational methodology for our training task. This section discusses the outcomes and implications of this technique. The results from our zero-shot cross-lingual learning analysis indicate significant variations in performance across different models, as summarized in Table 9.

$Nor-BERT+LSTM$ outperforms other models across various metrics, achieving the highest recall of 0.79, accuracy of 0.79, and an F1-score of 0.79. This suggests that $Nor-BERT$, specifically designed for Norwegian language data, more effectively generalizes to English in a zero-shot scenario compared to its counterparts. In comparison, models like XLNet and BERT with the integration of LSTM, show slightly lower efficacy for this cross-lingual application. $XLNet+LSTM$, for instance, scores a precision of 0.66 and an F1-score of 0.67, reflecting its comparative struggle to adapt from Norwegian to English without direct training examples from the target language. Similarly, $BERT+LSTM$ exhibits a moderate performance with an F1-score of 0.68, despite its slightly better accuracy and recall figures compared to $XLNet+LSTM$. Moreover, other language-specific models like $scandiBERT+LSTM$ and $nbBERT+LSTM$, which are likely optimized for Scandinavian languages, display a marked improvement over the more general models, $XLNet+LSTM$ and $BERT+LSTM$, but still fall short of the results posted by $Nor-BERT$. This highlights the potential benefits of using regionally specialized models, enhanced with LSTM layers, for cross-lingual transfer tasks.

Analysis of results: Few-Shot Cross-Lingual Meta-Learning

Regarding RQ3, enhancements in HS detection performance for Norwegian, due to cross-lingual meta-learning techniques, are clearly illustrated by the significant improvements in detection accuracy and recall rates observed in our few-shot learning experiments. In this experiment, we enriched the meta-trained model by introducing a small number of examples from both the hateful and neutral classes of the target language English ($\mathcal {D}_{tgt}$). These "few-shot" examples enabled the meta-model to grasp the morphological characteristics and specific task requirements of the English language. This strategic incorporation of a limited dataset substantially improved the performance of the few-shot Model-Agnostic Meta-Learning (MAML) approach, demonstrating a marked enhancement over the zero-shot configuration.

The results from the few-shot cross-lingual meta-learning experiment, as presented in Table 10, highlight improvements across all models when compared to the zero-shot learning scenario. Specifically, $Nor-BERT+LSTM$ exhibited the most significant gains, achieving a precision of 0.91, recall of 0.89, accuracy of 0.89, and an F1-score of 0.90. These enhancements are attributed to the model’s exposure to a targeted selection of examples from both the hateful and neutral classes in English ($\mathcal {D}_{tgt}$). $mBERT+LSTM$ also showed substantial improvements, recording the highest increases among the models with a precision of 0.85, recall of 0.86, accuracy of 0.86, and an F1-score of 0.83.

Table 9 Analysis of results: Zero-Shot Cross-Lingual Meta-Learning

Full size table

On the other hand, models like $XLNet+LSTM$ and $BERT+LSTM$, while showing improvements over their zero-shot performances, still lagged behind the language-specific and multilingual models. $XLNet+LSTM$ achieved a precision of 0.70 and an F1-score of 0.65, while $BERT+LSTM$ improved to an F1-score of 0.68. $scandiBERT+LSTM$ and $nbBERT+LSTM$, specifically designed to align with Scandinavian linguistic features, outperformed the more generalized models but did not reach the effectiveness of $mBERT+LSTM$ or $Nor-BERT+LSTM$. The inclusion of even a limited number of examples from the target language significantly enhances model understanding and performance, providing a clear advantage over purely zero-shot approaches. Furthermore, the results obtained using the zero-shot meta-learning framework surpassed those of various models listed in Table 8, highlighting its effectiveness in low-resource scenarios.

Table 10 Analysis of Results: Few-Shot Cross-Lingual Meta-Learning

Full size table

The confusion matrices in Fig. 5 present classification results from zero-shot and few-shot cross-lingual meta-learning experiments for the source language Norwegian ($\mathcal {D}_{src}$) and the target language English ($\mathcal {D}_{tgt}$). In the zero-shot scenario, the model demonstrates high accuracy in identifying hateful content but exhibits a bias towards classifying neutral instances as hateful, likely due to the lack of exposure to $\mathcal {D}_{tgt}$ examples. Conversely, in the few-shot scenario, after being trained with ten examples from both the hateful and neutral classes of $\mathcal {D}_{tgt}$, the model shows markedly improved accuracy and balance, particularly in identifying neutral instances correctly. This highlights the effectiveness of including even a small number of target language examples in enhancing the model’s understanding and reducing bias in predictions.

In the zero-shot learning matrix, the model demonstrates a bias towards classifying neutral instances as hateful, indicating a limitation in handling unseen data from the target language without prior exposure. This reflects a challenge in generalization when no target language examples are provided during training. Conversely, the confusion matrix from the few-shot learning scenario shows a significant improvement in correctly classifying neutral instances, which illustrates the benefits of incorporating even a small number of target language examples into the training process. These examples help mitigate bias and enhance the model’s ability to accurately distinguish between classes under varied conditions, pointing to a clear pathway for improving model performance through strategic data augmentation. The sub-figures in Fig. 6 summarize the comparison of the results for all the experiments carried out in this study.

Work limitation

This study’s primary limitation is its focus on binary classification of HS, categorizing texts merely into ’neutral’ and ’hateful’ classes. This approach simplifies the complex spectrum of human speech, potentially overlooking more nuanced expressions of negativity that do not fit neatly into these categories. Additionally, our research is initially limited to Norwegian and English, driven by the specific goals of the project. Future research will aim to expand the linguistic scope and develop a multi-class classification system to capture a broader range of negative speech nuances, thereby enhancing the model’s applicability and effectiveness in diverse linguistic and cultural contexts. While this study provides valuable insights by focusing on HS detection in Norwegian and English, the choice of these two languages does limit the generalizability of our findings to other linguistic contexts. This focus was primarily driven by the need to develop and validate our methodologies in a controlled environment, involving a less-resourced language paired with a widely studied language. We recognize that extending our research to include a broader array of languages would greatly enhance the applicability and robustness of our results.

Conclusion and future work

This paper presented a hybrid approach to cross-lingual HS detection, employing MAML along with intra-lingual and cross-lingual learning strategies. The methodology explained substantial adaptability and performance improvements across bilingual datasets. Our approach efficiently leveraged general-purpose, multilingual, and language-specific transformers along with meta-learning techniques to excel in both zero-shot and few-shot learning scenarios. This research work is limited to binary HS detection, categorizing texts into only two classes: neutral and hateful. This binary classification restricts our ability to detect more nuanced categories of speech. Looking ahead, our future work will focus on extending this approach to more diverse linguistic scenarios within the SOCYTI^{Footnote 7} project. We aim to enhance the model’s ability to detect and categorize HS across multiple classes, focusing on scenario-based and severity-based categories. This integration will improve SOCYTI’s functionality in preventing and investigating HS through detailed context analysis and robust data exploration.

Additionally, we will expand our methodology to accommodate multilingual data across various modalities, including text, images, and audio. This extension will explore advanced meta-learning techniques and enhance the model’s adaptability and effectiveness in handling complex, multimodal datasets. By incorporating these elements, we hope to provide a comprehensive framework that not only advances HS detection technology but also aligns with the evolving needs of digital communication platforms globally. These developments will contribute to the creation of more nuanced and effective HS detection frameworks, promoting inclusivity and safety in digital interactions.

Data Availability

Data will be made available on request.

Notes

References

AI@Meta (2024) Llama 3 model card https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md. Accessed 18 Apr 2024
Alsafari S, Sadaoui S (2021) Semi-supervised self-training of hate and offensive speech from social media. Appl Artif Intell 35(15):1621–1645
MATH Google Scholar
Alsafari S, Sadaoui S, Mouhoub M (2020) Hate and offensive speech detection on Arabic social media. Online Soc Netw Media 19:100096
Google Scholar
Aluru SS, Mathew B, Saha P, et al (2020) Deep learning models for multilingual hate speech detection. arXiv preprint arXiv:2004.06465
Andreassen Svanes M, Seim Gunstad T (2020) Detecting and grading hateful messages in the norwegian language. Master’s thesis, NTNU
Awal MR, Lee RKW, Tanwar E et al (2023) Model-agnostic meta-learning for multilingual hate speech detection. IEEE Trans Comput Soc Syst 11:1086–1095
MATH Google Scholar
Baumann J, Kramer O (2024) Evolutionary multi-objective optimization of large language model prompts for balancing sentiments. In: International Conference on the Applications of Evolutionary Computation (Part of EvoStar), Springer, pp 212–224
Bensalem I, Rosso P, Zitouni H (2024) Toxic language detection: a systematic review of Arabic datasets. Expert Syst, p e13551
Bohra A, Vijay D, Singh V, et al (2018) A dataset of hindi-english code-mixed social media text for hate speech detection. In: Proceedings of the Second Workshop on computational modeling of people’s opinions, personality, and emotions in social media, pp 36–41
Davidson T, Warmsley D, Macy M, et al (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the International AAAI Conference on web and social media, pp 512–515
Del Vigna12 F, Cimino23 A, Dell’Orletta F, et al (2017) Hate me, hate me not: Hate speech detection on Facebook. In: Proceedings of the first Italian Conference on cybersecurity (ITASEC17), pp 86–95
Devlin J, Chang MW, Lee K, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Dey R, Salem FM (2017) Gate-variants of gated recurrent unit (gru) neural networks. In: 2017 IEEE 60th International Midwest Symposium on circuits and systems (MWSCAS), IEEE, pp 1597–1600
Founta A, Djouvas C, Chatzakou D, et al (2018) Large scale crowdsourcing and characterization of twitter abusive behavior. In: Proceedings of the International AAAI Conference on web and social media
Founta AM, Chatzakou D, Kourtellis N, et al (2019) A unified deep learning architecture for abuse detection. In: Proceedings of the 10th ACM Conference on web science, pp 105–114
García-Díaz JA, Jiménez-Zafra SM, García-Cumbreras MA et al (2023) Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers. Complex Intell Syst 9(3):2893–2914
MATH Google Scholar
Groshek J, Cutino C (2016) Meaner on mobile: incivility and impoliteness in communicating online. In: Proceedings of the 7th 2016 International Conference on Social Media & Society, pp 1–7
Hashmi E, Yayilgan SY (2024) Multi-class hate speech detection in the Norwegian language using fast-rnn and multilingual fine-tuned transformers. Complex Intell Syst 10(3):4535–4556
Google Scholar
Hashmi E, Yamin MM, Imran S, et al (2024) Enhancing misogyny detection in bilingual texts using fasttext and explainable ai. In: 2024 International Conference on Engineering & Computing Technologies (ICECT), IEEE, pp 1–6
Hashmi E, Yamin MM, Yayilgan SY (2024) Securing tomorrow: a comprehensive survey on the synergy of artificial intelligence and information security. AI Ethics. https://doi.org/10.1007/s43681-024-00529-z
Hashmi E, Yayilgan SY, Hameed IA, et al (2024) Enhancing multilingual hate speech detection: From language-specific insights to cross-linguistic integration. IEEE Access 12:121507–121537
Hashmi E, Yayilgan SY, Shaikh S (2024) Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers. Soc Netw Anal Min 14(1):86
Google Scholar
Hashmi E, Yayilgan SY, Yamin MM et al (2024) Advancing fake news detection: hybrid deep learning with fasttext and explainable ai. IEEE Access 12:44462–44480
Hashmi E, Yayilgan SY, Yamin MM et al (2025) Self-supervised hate speech detection in Norwegian texts with lexical and semantic augmentations. Expert Syst Appl 264:125843
Google Scholar
Hashmi E, Yayilgan SY, Yamin MM et al (2025) Enhancing misogyny detection in bilingual texts using explainable ai and multilingual fine-tuned transformers. Complex Intell Syst 11(1):39
Google Scholar
Khanduja N, Kumar N, Chauhan A (2024) Telugu language hate speech detection using deep learning transformer models: corpus generation and evaluation. Syst Soft Comput. https://doi.org/10.1016/j.sasc.2024.200112
Article MATH Google Scholar
Khosla P, Teterwak P, Wang C et al (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673
Google Scholar
Kibriya H, Siddiqa A, Khan WZ et al (2024) Towards safer online communities: deep learning and explainable ai for hate speech detection and classification. Comput Electr Eng 116:109153
Google Scholar
Kummervold PE, De la Rosa J, Wetjen F, et al (2021) Operationalizing a national digital library: the case for a Norwegian transformer model. arXiv preprint arXiv:2104.09617
Kutuzov A, Barnes J, Velldal E, et al (2021) Large-scale contextualised language modelling for Norwegian. arXiv preprint arXiv:2104.06546
Lee S, Gilliland A (2024) Evolving definitions of hate speech: the impact of a lack of standardized definitions. In: International Conference on information, Springer, pp 141–156
Lepoutre M, Vilar-Lluch S, Borg E et al (2024) What is hate speech? the case for a corpus approach. Crim Law Philos 18(2):397–430
MATH Google Scholar
Liu Z, Lin W, Shi Y, et al (2021) A robustly optimized bert pre-training approach with post-training. In: China National Conference on Chinese Computational Linguistics, Springer, pp 471–484
Lu J, Lin H, Zhang X, et al (2023) Hate speech detection via dual contrastive learning. IEEE/ACM Trans Audio Speech Lang Process
Maity K, Poornash A, Bhattacharya S, et al (2024) Hatethaisent: sentiment-aided hate speech detection in Thai language. IEEE Trans Comput Soc Syst 11(5):5714–5727
Mandl T, Modha S, Kumar MA, et al (2020) Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in Tamil, Malayalam, Hindi, English and German. In: Proceedings of the 12th annual meeting of the forum for information retrieval evaluation, pp 29–32
Mazari AC, Boudoukhani N, Djeffal A (2024) Bert-based ensemble learning for multi-aspect hate speech detection. Clust Comput 27(1):325–339
MATH Google Scholar
Mfetoum IM, Ngoh SK, Molu RJJ et al (2024) A multilayer perceptron neural network approach for optimizing solar irradiance forecasting in central Africa with meteorological insights. Sci Rep 14(1):3572
Google Scholar
Mittal D, Singh H (2023) Enhancing hate speech detection through explainable ai. In: 2023 3rd International Conference on Smart Data Intelligence (ICSMDI), IEEE, pp 118–123
Mozafari M, Farahbakhsh R, Crespi N (2022) Cross-lingual few-shot hate speech and offensive language detection using meta learning. IEEE Access 10:14880–14896
MATH Google Scholar
Ndahinda FM, Mugabe AS (2024) Streaming hate: Exploring the harm of anti-Banyamulenge and anti-Tutsi hate speech on Congolese social media. J Genocide Res 26(1):48–72
Google Scholar
Njoku JN, Eneh AU, Nwakanma CI, et al (2023) Metahate: Text-based hate speech detection for metaverse applications using deep learning. In: 2023 14th International Conference on Information and Communication Technology Convergence (ICTC), IEEE, pp 979–984
Nkemelu D, Shah H, Best M, et al (2022) Tackling hate speech in low-resource languages with context experts. In: Proceedings of the 2022 International Conference on information and communication technologies and development, pp 1–11
i Orts ÒG (2019) Multilingual detection of hate speech against immigrants and women in twitter at semeval-2019 task 5: Frequency analysis interpolation for hate in speech detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp 460–463
P Nath L, Mishra P, Singh R, et al (2024) ’online hate speech in inda: Legal reforms and social impact on social media platforms’. Priyanshi and Singh, Ritika and Jain, Surabhi and Singh, Aditi and Benedict, Sunil Maria,’Online Hate Speech in Inda: Legal Reforms and Social Impact on Social Media Platforms’(February 2, 2024)
Pasupa K, Karnbanjob W, Aksornsiri M (2022) Hate speech detection in Thai social media with ordinal-imbalanced text classification. In: 2022 19th International Joint Conference on computer science and software engineering (JCSSE), IEEE, pp 1–6
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Prasad D, Kadambari K, Mukati R, et al (2023) Real-time multi-lingual hate and offensive speech detection in social networks using meta-learning. In: TENCON 2023-2023 IEEE Region 10 Conference (TENCON), IEEE, pp 31–35
Romim N, Ahmed M, Talukder H, et al (2021) Hate speech detection in the bengali language: A dataset and its baseline evaluation. In: Proceedings of International Joint Conference on advances in computational intelligence: IJCACI 2020, Springer, pp 457–468
Saeed AM, Ismael AN, Rasul DL, et al (2022) Hate speech detection in social media for the kurdish language. In: The International Conference on innovations in computing research, Springer, pp 253–260
Su H, Hu J, Yu S et al (2024) Successive model-agnostic meta-learning for few-shot fault time series prognosis. Neurocomputing 595:127879
MATH Google Scholar
Thapliyal K, Thapliyal M, Thapliyal D (2024) Social media and health communication: a review of advantages, challenges, and best practices. In: Emerging technologies for health literacy and medical practice, pp 364–384
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems 30
Vettoruzzo A, Bouguelia MR, Vanschoren J et al (2024) Advances and challenges in meta-learning: a technical review. IEEE Trans Pattern Anal Mach Intell 46(7):4763–4779
Wanasukapunt R, Phimoltares S (2021) Classification of abusive Thai language content in social media using deep learning. In: 2021 18th International Joint Conference on computer science and software engineering (JCSSE), IEEE, pp 1–6
Wang M, Gong Q, Wan Q et al (2024) A fast interpretable adaptive meta-learning enhanced deep learning framework for diagnosis of diabetic retinopathy. Expert Syst Appl 244:123074
MATH Google Scholar
Yamin MM, Hashmi E, Katt B (2024) Combining uncensored and censored llms for ransomware generation. In: International Conference on web information systems engineering, Springer, pp 189–202
Yu Y, Si X, Hu C et al (2019) A review of recurrent neural networks: Lstm cells and network architectures. Neural Comput 31(7):1235–1270
MathSciNet MATH Google Scholar
Zhu W, Gong H, Bansal R, et al (2021) Self-supervised euphemism detection and identification for content moderation. In: 2021 IEEE Symposium on Security and Privacy (SP), pp 229–246, https://doi.org/10.1109/SP40001.2021.00075

Download references

Acknowledgements

This research work has been acknowledged by SOCYTI. The SOCYTI project has received funding from the Research Council of Norway as a Researcher Project for Technological Convergence related to Enabling Technologies under grant agreement no 331736.

Funding

Open access funding provided by NTNU Norwegian University of Science and Technology (incl St. Olavs Hospital - Trondheim University Hospital).

Author information

Authors and Affiliations

Department of Information Security and Communication Technology (IIK), Norwegian University of Science and Technology (NTNU), Teknologivegen 22, 2815, Gjøvik, Innlandet, Norway
Ehtesham Hashmi, Sule Yildirim Yayilgan & Mohamed Abomhara

Authors

Ehtesham Hashmi
View author publications
Search author on:PubMed Google Scholar
Sule Yildirim Yayilgan
View author publications
Search author on:PubMed Google Scholar
Mohamed Abomhara
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Ehtesham Hashmi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical and informed consent for data used

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hashmi, E., Yayilgan, S.Y. & Abomhara, M. Metalinguist: enhancing hate speech detection with cross-lingual meta-learning. Complex Intell. Syst. 11, 179 (2025). https://doi.org/10.1007/s40747-025-01808-w

Download citation

Received: 12 August 2024
Accepted: 06 January 2025
Published: 27 February 2025
Version of record: 27 February 2025
DOI: https://doi.org/10.1007/s40747-025-01808-w

Keywords

Profiles

Sule Yildirim Yayilgan View author profile

Metalinguist: enhancing hate speech detection with cross-lingual meta-learning

Abstract

Similar content being viewed by others

Multi-class hate speech detection in the Norwegian language using FAST-RNN and multilingual fine-tuned transformers

Empowering Hate Speech Detection: A Comparative Exploration of Deep Learning Models

A comprehensive review on automatic hate speech detection in the age of the transformer

Explore related subjects

Introduction

Work contributions

Structure of the paper

Related works

Work methodology

Data collection

Data preprocessing

Data annotation

Modeling approaches

Intral-lingual and cross-lingual learning with transformers

Model agnostic meta-learning for bilingual hate speech detection

Bilingual Model-Agnostic Meta-Learning (MAML)

Hybrid transformer-LSTM architecture in MAML

Adaptation of the hybrid architecture in MAML

Practical application: zero-shot and few-shot learning

Results and discussion

Computational efficiency

Analysis of results: intra-language model evaluation on Norwegian

Analysis of results: cross-lingual transfer learning with transformers

Analysis of results: Few-Shot Cross-Lingual Meta-Learning

Work limitation

Conclusion and future work

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles