TY - CHAP
T1 - The Impact of Data Augmentation on Sentiment Analysis of Translated Textual Data
AU - Omran, Thuraya
AU - Sharef, Baraa
AU - Grosan, Crina
AU - Li, Yongmin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Sentiment analysis is an application of natural language processing that requires an abundance of data that may not be achieved sometimes for some reason. Data augmentation is one technique that deals with the lack of data by creating synthetic training data without adding new ones. It boosts model performance, especially with deep learning ones. Despite its influential role in boosting the model performance, it attracted very little attention from the researchers of the Arabic NLP community, specifically with scarce language resources such as Arabic and its dialects. In this study, one of the augmentation techniques called random swap was applied with LSTM deep learning model to classify three parallel datasets. The three parallel datasets are Bahraini dialects, Modern Standard Arabic and English. The results show an improvement in the LSTM model by 14.06%, 12.57%, and 11.04% on Bahraini dialects, Modern Standard Arabic, and English datasets, respectively, when applying the augmentation technique over that of no application.
AB - Sentiment analysis is an application of natural language processing that requires an abundance of data that may not be achieved sometimes for some reason. Data augmentation is one technique that deals with the lack of data by creating synthetic training data without adding new ones. It boosts model performance, especially with deep learning ones. Despite its influential role in boosting the model performance, it attracted very little attention from the researchers of the Arabic NLP community, specifically with scarce language resources such as Arabic and its dialects. In this study, one of the augmentation techniques called random swap was applied with LSTM deep learning model to classify three parallel datasets. The three parallel datasets are Bahraini dialects, Modern Standard Arabic and English. The results show an improvement in the LSTM model by 14.06%, 12.57%, and 11.04% on Bahraini dialects, Modern Standard Arabic, and English datasets, respectively, when applying the augmentation technique over that of no application.
KW - Bahraini dialects
KW - Data augmentation
KW - LSTM
KW - Modern standard Arabic
KW - translation-based
UR - http://www.scopus.com/inward/record.url?scp=85158164339&partnerID=8YFLogxK
U2 - 10.1109/ITIKD56332.2023.10099851
DO - 10.1109/ITIKD56332.2023.10099851
M3 - Conference paper
AN - SCOPUS:85158164339
T3 - 2023 International Conference on IT Innovation and Knowledge Discovery, ITIKD 2023
BT - 2023 International Conference on IT Innovation and Knowledge Discovery, ITIKD 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 International Conference on IT Innovation and Knowledge Discovery, ITIKD 2023
Y2 - 8 March 2023 through 9 March 2023
ER -