EiAP-BC: A Novel Emoji Aware Inter-Attention Pair Model for Contextual Spam Comment Detection Based on Posting Text
Abstract
1 Introduction
2 Related Works
Category | EiAP-BC Model Architecture | ASIM Architecture | Bi-ISCA Architecture | RE2 Architecture | BERT Architecture for Pair Input |
---|---|---|---|---|---|
Main Layers | 8 (Input, Embedding, Encoding, Inter- Alignment, Integration, Second Encoding (+Original Encoding), Pooling and Concatenation, and Prediction) | 8 (Input, Embedding, Encoding, Attention, Fusion, Matching Composition, Pooling, and Prediction) | 5 (Input, Embedding, Encoding, Bi-ISCA, Integration, and Prediction) | 7 (Input, Embedding, Encoder, Fusion, Alignment, Pooling, and Prediction) | 7 (Input, Multi-head Self-attention, Dropout, Add and Norms, Feed-forward Layer, Dropout, Add and Norms) in Transformer Encoder |
Total No of Blocks | - | 3 | - | 1–3 (as hyperparameters) | 12 blocks of Transformer Encoder |
Total Training Parameters | 6.8 M | Not mentioned | 1.12 M | 2.8 M | 109.5 M |
Epochs | 10 | 30 | 20 | 10 | 3 |
Input Layer | Siamese Pair | Siamese Pair | Siamese Pair | Siamese Pair | Input Pair |
Embedding Layer | Word embedding: Fasttext, GloVe, Word2Vec, Emoji2Vec (scenario: average and concatenation) trained from the dataset | Word embedding: GloVe Specific of Stack Overflow | Fasttext with dimension 30. trained from Twitter | Word embedding GloVe Pretrained 840B 300 dimension | Pre-trained mdhugol/indonesia-bert-sentiment-classification, Token and Positional Embedding, Max positional embedding = 512 |
Encoding Layer | Bi-LSTM pair with 128 units each, stacked | BiLSTM pair, each with 200 units | BiLSTM pair | 3 CNNs | BERT Tokenizer |
Attention Layer | Inter-attention, similarity score using dot product, attention score using softmax between similarity score of vector A and similar position of vector B | Inter-attention, similarity score using dot product, attention score using softmax between similarity score vector A and vector B | Intra-attention from the last cell state of BiLSTM Inter-attention from the hidden state of the dot product between the final state of BiLSTM | Inter-attention, similarity score using dot product, attention score using weighted sum between similarity score between vector A and vector B | BERT Attention mask |
Prediction Layer | Concatenation from original encoding, pooling output, the difference between pooling output, and element-wise multiplication between pooling output | Concatenation from pooling output, the difference between pooling output, and element-wise multiplication between pooling output | Feed-forward neural network from concatenation flattens layer from CNN output | The multi-feed-forward neural network between the result relation between final vector A (v1) and final vector B (v2) using this equation: [v1;v2;v1-v2;v1 dot v2] | Sigmoid |
Fusion/ Integration Layer | This layer uses a feed-forward neural network from concatenation between the initial vector using attention, the initial vector using subtraction of the initial vector and attention, the initial vector with multiplication between the initial vector and attention, then passes to the second encoding of the BiLSTM layer | This layer uses a feed-forward neural network from concatenation between the initial vector using attention, the initial vector using subtraction of the initial vector and attention, and the initial vector with multiplication between the initial vector and attention | This layer uses four CNN layers with kernel filter 64 units, which accept the result from the comment in forward and backward inter-attention and reply in forward and backward inter-attention; after that, it continues to flatten layers from each CNN unit | This layer uses a feed-forward neural network from concatenation between the original vector with attention, the initial vector with subtraction of the initial vector and attention, and the initial vector with multiplication between the initial vector and attention | - |
Emoji Features | Yes (emoji text and symbol) | - | - | - | Added emoji symbol on the Tokenizer layer |
Auxiliary Features | Yes (using 19 auxiliary generated selected features) | - | - | - | - |
Alignment/ Matching Composition Layer | Original encoding features, previously aligned features, aligned features of contextual information, and second encoding features Matching composition layer using CNN with two layers (filter: 64) | Using attention in the alignment layer Matching composition layer using BiLSTM pair | Matching composition layer using four CNN units from each forward and backward on Bi-ISCA (filter 64) | Using initial point-wise features, previously aligned features, and context information This model did not use a matching composition layer | BERT Model |
Pooling layer | Global Max pooling layer | Max pooling layer | - | Max overtime polling layer | Using 128 units in accordance with the 12 heads of self-attention |
Dense Layer | Unit: 128, 64, 32, Activation: ReLU | Unit: 150 | Unit: 1920 | Unit: 150, activation: GeLU | - |
Dropout Layer | 0.2 | 0.2 | Not mentioned | 0.2 | 0.1 |
Datasets | Spam comments based on posting context on social media, SPAMID-PAIR | The duplicate question, Stack Overflow question answering | Sarcasm comments detection; SARC Reddit, FigLang 2020 Workshop | Common text pair classification; SNLI, SciTail, Quora Question Pair, WikiQA | Spam comments based on posting context on social media, SPAMID-PAIR |
Results | SPAMID-PAIR: emoji symbol: 87%, emoji text: 88% | Stack Overflow: 96% | SARC Reddit: 75.7%, FigLang 2020 Workshop: 91.7% | SNLI: 88.9%, SciTail: 86%, Quora: 89.2% | SPAMID-PAIR: emoji symbol: 88%, emoji text: 89% |
Training Time | 1 hour 55 minutes | Not mentioned | Not mentioned | Not mentioned | 3 hours 47 minutes |
Model Size | 179 MB | Not mentioned | Not mentioned | Not mentioned | 396 MB |
Vocab Size | 50,992 | Not mentioned | Not mentioned | Not mentioned | 30,525 |
References | Our proposed model | Pei et al. [26] | Mishra et al. [50] | Yang et al. [23] | The Model uses Mdhugol1 |
3 Proposed Method

3.1 The Input Stage Using the SPAMID-PAIR Dataset
3.2 Data Exploration Stage and Scenario Creation
3.3 Data Pre-processing Stage

3.4 Tokenization and Feature Selection Stage
3.5 Embedding Layer Generation Stage
Word Embedding | Parameter(s) | Value (s) | Information |
---|---|---|---|
Fasttext | dim | 300 | Fasttext is built using the original Fasttext from www.fasttext.cc with dimensions of 300 |
sg | 1 | Using skip-gram | |
wordsNGram | 1 or 2 | N-gram | |
minn | 3 | N-word minimal neighbor | |
maxn | 6 | N-word maximal neighbor | |
epoch | 30 | Number of epochs | |
min_count | 1 | Minimal word occurrences | |
lr | 0.01 | Learning rate | |
output | Binary | Result output of Fasttext model in the form of binary (bin) file | |
Word2Vec | vector_size | 300 | Word2Vec is built using Gensim Word2Vec with dimensions of 300 |
window | 5 | Window context from the left and right from the current token position | |
min_count | 1 | Minimal token occurrences | |
sg | 1 | Using skip-gram | |
epoch | 30 | Number of epochs | |
GloVe | vector_size | 300 | The glove is built using the original Glove https://nlp.stanford.edu/projects/glove/ with vector dimensions of 300 |
memory | 5 | Memory usage | |
vocab_min_count | 1 | Minimal word occurrences | |
max_iter | 30 | Number of epochs | |
window_size | 15 | Window context from the left and right from the current token position | |
num_threads | 8 | Number of threads used by the system | |
Emoji2Vec | out_dim | 300 | Emoji2vec is built from the original emoji2vec from the Github repository https://github.com/uclnlp/emoji2vec dengan dimensi 300 |
dropout | 0 | No dropout | |
learning | 0.01 | Learning rate | |
max_epoch | 40 | Number of epochs | |
Word Embedding Layer | dimension | 300 | Each dimension |
dimension | 300 | Average dimension from both three embeddings or four embeddings | |
dimension | 900 | Dimension concat from Fasttext, Word2Vec, and Glove | |
dimension | 1200 | Dimension concat from Fasttext, Word2Vec, Glove, dan Emoji2Vec |
3.6 Proposed EiAP-BC Model and The Training Stage

3.7 Evaluation Stage
4 Results and Discussion
Model Name | Accuracy (%) | Macro Average F1-score (%) | Best Threshold |
---|---|---|---|
Emoji-text Scenario | |||
LSTM Pair 3 Emb Concatenation | 87.15 (≈87) | 82.72 (≈83) | 0.8 |
BiLSTM Pair Att 3 Embedding Avg | 87.23 (≈87) | 83.17 (≈83) | 0.73 |
LSTM–CNN Pair 3 Embedding Avg | 87.48 (≈87) | 83.27 (≈83) | 0.74 |
EiAP-BC using Token | 85.15 (≈85) | 80.06 (≈80) | 0.78 |
EiAP-BC 3-Embedding Concat* | 87.99 (≈88) | 83.58 (≈84) | 0.59 |
EiAP-BC 3-Embedding Avg | 87.94 (≈88) | 83.32 (≈83) | 0.78 |
EiAP-BC using BERT Embedding | 84.62 (≈85) | 79.07 (≈79) | 0.63 |
Fine-tuned BERT using Mdhugol | 88.07 (≈88) | 83.78 (≈84) | 0.50 |
Emoji-symbol Scenario | |||
LSTM Pair 4 Emb Concatenation | 85.02 (≈85) | 78.92 (≈79) | 0.84 |
BiLSTM Pair 4 Embedding Avg | 84.77 (≈85) | 78.08 (≈78) | 0.68 |
LSTM–CNN Pair 4 Embedding Avg | 84.4 (≈84) | 78.06 (≈78) | 0.69 |
EiAP-BC using Token | 82.65 (≈83) | 77.68 (≈78) | 0.89 |
EiAP-BC 4-Embedding Concat | 85.35 (≈85) | 80.22 (≈80) | 0.88 |
EiAP-BC 4-Embedding Avg* | 85.94 (≈86) | 80.20 (≈80) | 0.77 |
EiAP-BC using BERT Embedding | 84.41 (≈84) | 77.01 (≈77) | 0.65 |
Fine-tuned BERT using Mdhugol | 86.94 (≈87) | 83.78 (≈84) | 0.5 |
Model | Emoji-symbol Scenario and Dimension | Accuracy (%) | F1-score (%) |
---|---|---|---|
EiAP-BC 4 Embedding Concat | With Emoji2Vec embedding concatenation (1,200 dimensions) | 85.46 | 79.63 |
EiAP-BC 3 Embedding Concat | Without Emoji2Vec embedding concatenation (900 dimensions) | 85.56 | 80.35 |
Difference | +0.1 | +0.72 | |
EiAP-BC 4 Embedding Average | With Emoji2Vec embedding average (300 dimensions) | 85.58 | 79.72 |
EiAP-BC 3 Embedding Average | Without Emoji2Vec embedding average (300 dimensions) | 86.02 | 81.12 |
Difference | +0.44 | +1.14 |
EiAP-BC Model (Auxiliary) | Accuracy (%) | Macro Average F1-score (%) | Best Threshold |
---|---|---|---|
Emoji-text Scenario | |||
EiAP 3 Embedding Concat* | 88.07 (≈88) | 83.75 (≈84) | 0.76 |
EiAP 3 Embedding Avg | 87.95 (≈88) | 83.26 (≈83) | 0.77 |
Emoji-symbol Scenario | |||
EiAP 4 Embedding Concat* | 85.55 (≈86) | 80.15 (≈80) | 0.71 |
EiAP 4 Embedding Avg | 85.34 (≈85) | 79.83 (≈80) | 0.72 |
Test Data | EMT | EMS | EMT add | EMS add | EiAP BCT | EiAP BCS | EiAP BCT add | EiAP BCS add | Chat GPT |
---|---|---|---|---|---|---|---|---|---|
A-1 (S) | True | True | True | True | True | False | True | True | False |
A-2 (NS) | True | True | False | False | True | True | True | True | True |
B-1 (S) | True | True | True | True | True | True | True | True | True |
B-2 (NS) | False | False | False | False | False | True | False | True | False |
C-1 (NS) | True | True | True | False | True | True | True | True | True |
C-2 (S) | False | False | False | True | False | True | False | False | False |
D-1 (S) | True | True | True | True | True | True | True | True | True |
D-2 (NS) | True | True | False | False | True | False | True | False | True |
E-1 (S) | True | True | False | True | True | True | True | True | True |
E-2 (NS) | True | True | True | False | True | True | True | True | True |
Accuracy | 80% | 80% | 50% | 50% | 80% | 80% | 80% | 80% | 70% |
Model Name | Dataset/Study Case | Accuracy (%) | Parameter (millions) | Best Threshold |
---|---|---|---|---|
IndoNLU- IndoBERT-base-p2 (Benchmark)5 | Entailment Wrete (IndoNLU) | 78.68 | 124.5 | - |
IndoNLU- fastText CC-ID (6L) (Benchmark) | Entailment Wrete (IndoNLU) | 61.13 | 15.1 | - |
IndoNLU- IndoBERT-large-p2 (Benchmark) | Entailment Wrete (IndoNLU) | 80.30 | 335.2 | - |
EiAP-BC Embedding Token | Entailment Wrete (IndoNLU) | 70 | 5.07 | 0.11 |
EiAP-BC 3 Embedding Concatenation | Entailment Wrete (IndoNLU) | 81* | 6.9 | 0.52 |
EiAP-BC 3 Embedding Average | Entailment Wrete (IndoNLU) | 73 | 3.2 | 0.12 |
ESIM+Elmo (Stanford)6 | SNLI English | 89 | 8 | - |
EiAP-BC Embedding Token | SNLI English | 74 | 25 | 0.5 |
EiAP-BC Fasttext | SNLI English | 86* | 3.2 | 0.5 |
EiAP-BC 3 Embedding Concatenation | SNLI English | 83 | 6.8 | 0.5 |
EiAP-BC 3 Embedding Average | SNLI English | 84 | 3.2 | 0.5 |
RE2 (Benchmark) [23] | SciTAIL | 86 | - | - |
Hierarchical BiLSTM Max Pooling (Benchmark) [66] | SciTAIL | 86 | - | - |
EiAP-BC Embedding Token | SciTAIL | 71 | 18.6 | 0.5 |
EiAP-BC Emebdding Fasttext | SciTAIL | 85.5* | 3.2 | 0.5 |
EiAP-BC 3 Embedding Concatenation | SciTAIL | 84.2 | 6.8 | 0.5 |
EiAP-BC 3 Embedding Average | SciTAIL | 84.5 | 3.7 | 0.5 |
Finetuned RoBERTa (SotA) [34] | IndoNLI | 60.7 | - | - |
EiAP-BC Embedding Token | IndoNLI | 46.62 | 12.2 | - |
EiAP-BC Embedding Fasttext | IndoNLI | 65 | 3.2 | - |
EiAP-BC 3 Embedding Concatenation | IndoNLI | 66* | 6.8 | - |
EiAP-BC 3 Embedding Average | IndoNLI | 63 | 3.2 | - |
Removed Layers | Information (Using Emoji-Text Scenario) 3 Embedding Average | Accuracy (%) | Precision Macro Average (%) | F1-score Macro Average (%) | Best Threshold |
---|---|---|---|---|---|
Normal (Base) | EiAP-BC 3 Embedding Concat | 87.99 | 85.98 | 83.58 | 0.59 |
Embedding* | Without using the weighted pre-trained embedding | –2.84 | –4.66 | –3.52 | 0.78 |
Original Embedding | Without original embedding for concatenation before the second encoding layer | –0.2 | –0.74 | –0.11 | 0.69 |
Encoding*** | Without using the first and second encoding layer | –0.46 | –1.78 | –0.04 | 0.66 |
Attention | Without inter-attention layer | –0.24 | –1.35 | –0.16 | 0.67 |
CNN** | Without CNN layer | –0,6 | –0.21 | –1.24 | 0.77 |
Enc1 + Att | Without first encoding and attention layer | –0.3 | –0.8 | –0.29 | 0.69 |
Enc2 + Att | Without a second encoding and attention layer | –0.36 | –1.17 | –0.21 | 0.77 |
Att + CNN | Without attention and CNN layer | –0.1 | –12.7 | 0.42 | 0.7 |
Concat Att Features | Without concatenation of the attention layer with original embedding, subtraction, and multiplication | –0.13 | –0.5 | –0.31 | 0.68 |
Removed Layers | Information (Using Emoji-Text Scenario) 3 Embedding Concat | Accuracy (%) | Precision Macro Average (%) | F1-score Macro Average (%) | Best Threshold |
Original Embedding and Concat | Without original embedding for concatenation before the second encoding layer (concat embedding) | –0.2 | 0.04 | –0.5 | 0.76 |
Encoding*** | Without using the first and second encoding layers (concat embedding) | –0.53 | –1.47 | –0.39 | 0.66 |
Attention | Without an inter-attention layer (concat embedding) | –0.2 | 0.04 | –0.51 | 0.74 |
CNN** | Without CNN layer (concat embedding) | –0.65 | –1.31 | –0.77 | 0.67 |
Enc1 + Att | Without first encoding and attention layer (concat embedding) | –0.13 | –0.64 | –0.03 | 0.62 |
Enc2 + Att**** | Without a second encoding and attention layer (concat embedding) | –0.57 | –1.96 | –0.15 | 0.64 |
Att + CNN | Without attention and CNN layer (concat embedding) | –0.07 | –0.26 | –0.08 | 0.79 |
Concat Att Features | Without concatenation of the attention layer with original embedding, subtraction, and multiplication (concat embedding) | –0.16 | –0.37 | –0.23 | 0.76 |
Removed Layers | Information (Using Emoji-Symbol Scenario) 4 Embedding Average | Accuracy (%) | Precision Macro Average (%) | F1-score Macro Average (%) | Best Threshold |
Normal (Base) | EiAP 4 Embedding Average | 85.94 | 83.75 | 80.20 | 0.77 |
Embedding* | Without using the weighted pre-trained embedding | –3.15 | –5.36 | –2.76 | 0.89 |
Original Embedding | Without original embedding for concatenation before the second encoding layer | –0.28 | –0.87 | –0.12 | 0.65 |
Encoding | Without using the first and second encoding layer | –0.16 | –0.49 | –0.08 | 0.74 |
Attention | Without inter-attention layer | –0.15 | –0.49 | –0.06 | 0.81 |
CNN*** | Without CNN layer | –0.37 | –0.94 | –0.28 | 0.70 |
Enc1 + Att | Without first encoding and attention layer | –0.03 | –0.5 | –0.19 | 0.66 |
Enc2 + Att | Without a second encoding and attention layer | –0.06 | –1.42 | 1.4 | 0.66 |
Att + CNN | Without attention and CNN layer | 0.0 | –0.61 | –0.79 | 0.72 |
Concat Att Features** | Without concatenation of the attention layer with original embedding, subtraction, and multiplication | –0.51 | –1.58 | –0,16 | 0.79 |
Removed Layers | Information (using Emoji-Symbol Scenario) 4 Embedding Concat | Accuracy (%) | Precision Macro Average (%) | F1-score Macro Average (%) | Best Threshold |
Original Embedding and Concat | Without original embedding for concatenation before the second encoding layer (concat embedding) | –0.19 | –0.21 | –0.32 | 0.77 |
Encoding*** | Without using the first and second encoding layers (concat embedding) | –0.57 | –1.63 | –0.27 | 0.59 |
Attention | Without an inter-attention layer (concat embedding) | 0.11 | –0.92 | 0.87 | 0.7 |
CNN** | Without CNN layer (concat embedding) | –2.6 | –0.93 | –0.84 | 0.59 |
Enc1 + Att | Without first encoding and attention layer (concat embedding) | –0.26 | –1.16 | 0.14 | 0.76 |
Enc2 + Att | Without a second encoding and attention layer (concat embedding) | –0.25 | –0.64 | –0.19 | 0.78 |
Att + CNN | Without attention and CNN layer (concat embedding) | –0.19 | 0.68 | –0.79 | 0.67 |
Concat Att Features**** | Without concatenation of the attention layer with original embedding, subtraction, and multiplication (concat embedding) | –0.27 | –1.16 | 0.1 | 0.63 |
5 Conclusion
Acknowledgments
Footnotes
References
Index Terms
- EiAP-BC: A Novel Emoji Aware Inter-Attention Pair Model for Contextual Spam Comment Detection Based on Posting Text
Recommendations
Using Inter-comment Similarity for Comment Spam Detection in Chinese Blogs
ASONAM '11: Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and MiningBlog has become one of the most popular ways of communication among social communities since blog posts can be replied, commented, and even shared to other users in a convenient way. All posts and comments, no matter good or bad, have to be manually ...
Bayesian Approach Based Comment Spam Defending Tool
ISA '09: Proceedings of the 3rd International Conference and Workshops on Advances in Information Security and AssuranceSpam messes up user's inbox, consumes network resources and spread worms and viruses. Spam is flooding of unsolicited,unwanted e mail.Spam in blogs is called blog spam or comment spam.It is done by posting comments or flooding spams to the services such ...
Comment spam classification in blogs through comment analysis and comment-blog post relationships
CICLing'12: Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part IISpamming refers to the process of providing unwanted and irrelevant information to the users. It is a widespread phenomenon that is often noticed in e-mails, instant messages, blogs and forums. In our paper, we consider the problem of spamming in blogs. ...
Comments
Information & Contributors
Information
Published In

Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 396Total Downloads
- Downloads (Last 12 months)396
- Downloads (Last 6 weeks)99
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in