Text Data Augmentation Using Generative Adversarial Networks – A Systematic Review

Authors

  • Kanishka Silva University of Wolverhampton
  • Burcu Can University of Stirling
  • Raheem Sarwar Manchester Metropolitan University
  • Frederic Blain Tilburg University
  • Ruslan Mitkov University of Wolverhampton

DOI:

https://doi.org/10.33919/JCAL.23.1.1

Keywords:

Text Data Augmentation, Generative Adversarial Networks, Adversarial Training, Text Generation

Abstract

Insufficient data is one of the main drawbacks in natural language processing tasks, and the most prevalent solution is to collect a decent amount of data that will be enough for the optimisation of the model. However, recent research directions are strategically moving towards increasing training examples due to the nature of the data-hungry neural models. Data augmentation is an emerging area that aims to ensure the diversity of data without attempting to collect new data exclusively to boost a model’s performance. Limitations in data augmentation, especially for textual data, are mainly due to the nature of language data, which is precisely discrete. Generative Adversarial Networks (GANs) were initially introduced for computer vision applications, aiming to generate highly realistic images by learning the image representations. Recent research has focused on using GANs for text generation and augmentation. This systematic review aims to present the theoretical background of GANs and their use for text augmentation alongside a systematic review of recent textual data augmentation applications such as sentiment analysis, low resource language generation, hate speech detection and fraud review analysis. Further, a notion of challenges in current research and future directions of GAN-based text augmentation are discussed in this paper to pave the way for researchers especially working on low-text resources.

References

Aghakhani, H., Machiry, A., Nilizadeh, S., Krügel, C. and Vigna, G. (2018). Detecting Deceptive Reviews Using Generative Adversarial Networks. In: 2018 IEEE Security and Privacy Workshops (SPW). IEEE Computer Society, 89–95. Available at: https://doi.org/10.1109%2Fspw.2018.00022.

Anaby-Tavor, A., Carmeli, B., Goldbraich, E., Kantor, A., Kour, G., Shlomov, S., Tepper, N. and Zwerdling, N. (2020). Do Not Have Enough Data? Deep Learning to the Rescue!. In: Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI Press, 7383–7390. Available at: https://ojs.aaai.org/index. php/AAAI/article/view/6233.

Anand, A., Gorde, K., Antony Moniz, J. R., Park, N., Chakraborty, T. and Chu, B.-T. (2018). Phishing URL Detection with Oversampling based on Text Generative Adversarial Networks. In: Abe, N. et al., (eds.). 2018 IEEE International Conference on Big Data (Big Data). IEEE, 1168–1177. Available at: https:// doi.org/10.1109%2Fbigdata.2018.8622547.

Arjovsky, M. and Bottou, L. (2017). Towards Principled Methods for Training Generative Adversarial Networks. In: 5th International Conference on Learning Representations, ICLR 2017. OpenReview.net. Available at: https://openreview.net/forum?id=Hk4_qw5xe.

Betti, F., Ramponi, G. and Piccardi, M. (2020). Controlled Text Generation with Adversarial Learning. In: Davis, B. et al., (eds.). Proceedings of the 13th International Conference on Natural Language Generation, INLG 2020. Association for Computational Linguistics, 29–34. Available at: https://aclanthology.org/2020.inlg-1.5/.

Cadigan, J., Sikka, K., Ye, M. and Graciarena, M. (2021). Resilient Data Augmentation Approaches to Multimodal Verification in the News Domain. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.

Cao, R. and Lee, R. K.-W. (2020). HateGAN: Adversarial Generative-Based Data Augmentation for Hate Speech Detection. In: Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 6327–6338. Available at: https://aclanthology.org/2020.coling-main.557.

Carrasco, X. A., Elnagar, A. and Lataifeh, M. (2021). A Generative Adversarial Network for Data Augmentation: The Case of Arabic Regional Dialects. In: Fifth International Conference On Arabic Computational Linguistics, ACLING 2021. Online: Elsevier, 92–99. Available at: https://www.sciencedirect.com/science/article/pii/S1877050921011674.

Chang, C.-T., Chuang, S.-P. and Lee, H. (2019). Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation. In: Kubin, G. and Kacic, Z., (eds.). Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, 554–558. Available at: https://doi.org/10.21437%2Finterspeech.2019-3214.

Chawla, N. v, Bowyer, K. W., Hall, L. O. and Kegelmeyer, W.P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. Available at: https://doi.org/10.1613%2Fjair.953.

Chen, H., Ji, Y. and Evans, D. (2020). Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory. In: Cohn, T., He, Y., and Liu, Y., (eds.). Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics, 4741–4751. Available at: https://aclanthology.org/2020.findings-emnlp.426.

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I. and Abbeel, P. (2016). Infogan: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. In: Lee, D. D. et al., (eds.). Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2172–2180. Available at: https://proceedings.neurips.cc/paper/2016/hash/7c9d0b1f96aebd7b5eca8c3edaa19ebb-Abstract.html.

Chen, X., Zhu, D., Lin, D. and Cao, D. (2021). Rumor Knowledge Embedding Based Data Augmentation for Imbalanced Rumor Detection. Information Sciences, 580, 352–370. Available at: https://doi.org/10.1016/j.ins.2021.08.059.

Donahue, D. and Rumshisky, A. (2018). Adversarial Text Generation Without Reinforcement Learning. CoRR, abs/1810.06640. Available at: http://arxiv.org/abs/1810.06640.

Fabbri, A., Han, S., Li, H., Li, H., Ghazvininejad, M., Joty, S., Radev, D. and Mehdad, Y. (2021). Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation. In: Toutanova, K. et al., (eds.). Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for Computational Linguistics, 704–717. Available at: https://aclanthology.org/2021.naacl-main.57.

Fadaee, M., Bisazza, A. and Monz, C. (2017). Data Augmentation for Low-Resource Neural Machine Translation. In: Barzilay, R. and Kan, M.-Y., (eds.). Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 567–573. Available at: https://aclanthology.org/P17-2090.

Fadhel, M. ben and Nyarko, K. (2019). GAN Augmented Text Anomaly Detection with Sequences of Deep Statistics. In: 2019 53rd Annual Conference on Information Sciences and Systems (CISS). IEEE, 1–5. Available at: https://doi.org/10.1109/CISS.2019.8693024.

Fedus, W., Goodfellow, I. J. and Dai, A. M. (2018). MaskGAN: Better Text Generation via Filling in the ______. In: 6th International Conference on Learning Representations, ICLR 2018-Conference Track Proceedings. Available at: https://openreview.net/pdf?id=ByOExmWAb .

Feng, S. Y., Gangal, V., Wei, J., Chandar, S., Vosoughi, S., Mitamura, T. and Hovy, E. (2021). A Survey of Data Augmentation Approaches for NLP. In: Zong, C. et al., (eds.). Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Online: Association for Computational Linguistics, 968–988. Available at: https://aclanthology.org/2021.findings-acl.84.

Feng, S. Y., Gangal, V., Kang, D., Mitamura, T. and Hovy, E. (2020). GenAug: Data Augmentation for Finetuning Text Generators. In: Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Online: Association for Computational Linguistics, 29–42. Available at: https://aclanthology.org/2020.deelio-1.4.

Gao, Y., Feng, J., Liu, Y., Hou, L., Pan, X. and Ma, Y. (2019). Code-Switching Sentence Generation by Bert and Generative Adversarial Networks. In: Kubin, G. and Kacic, Z., (eds.). Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, 3525–3529. Available at: https://doi.org/10.21437%2Finterspeech.2019-2501.

Goldberg, A. and Zhu, X. (2006). Seeing Stars When There Aren’t Many Stars: Graph-based Semi-supervised Learning for Sentiment Categorization. In: Proceedings of TextGraphs: The First Workshop on Graph Based Methods for Natural Language Processing. Association for Computational Linguistics, 45–52. Available at: https://aclanthology.org/W06-3808.

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. C. and Bengio, Y. (2014). Generative Adversarial Nets. In: Ghahramani, Z. et al., (eds.). Advances in Neural Information Processing Systems 27: Annual Conference on NIPS 2014, 2672–2680. Available at: https://proceedings.neurips.cc/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html.

Guan, J., Li, R., Yu, S. and Zhang, X. (2018). Generation of Synthetic Electronic Medical Record Text. In: Zheng, H. J. et al., (eds.). 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE Computer Society, 374–380. Available at: https://doi.org/10.1109%2Fbibm.2018.8621223.

Guo, H. (2020). Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, 4044–4051. Available at: https://ojs.aaai.org/index.php/AAAI/article/view/5822.

Gupta, R. (2019). Data Augmentation for Low Resource Sentiment Analysis Using Generative Adversarial Networks. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7380–7384. Available at: https://doi.org/10.1109%2Ficassp.2019.8682544.

Gupta, R., Sahu, S., Espy-Wilson, C. Y. and Narayanan, S. S. (2018). Semi-Supervised and Transfer Learning Approaches for Low Resource Sentiment Classification. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5109–5113. Available at: https://doi.org/10.1109/ICASSP.2018.8461414.

He, H., Bai, Y., Garcia, E. A. and Li, S. (2008). ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, 1322–1328. Available at: https://doi.org/10.1109/IJCNN.2008.4633969.

Hou, Y., Liu, Y., Che, W. and Liu, T. (2018). Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding. In: Bender, E. M., Derczynski, L., and Isabelle, P., (eds.). Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018. Association for Computational Linguistics, 1234–1245. Available at: https://aclanthology.org/C18-1105.

Jia, R., Raghunathan, A., Göksel, K. and Liang, P. (2019). Certified Robustness to Adversarial Word Substitutions. In: Inui, K. et al., (eds.). Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 4129–4142. Available at: https://aclanthology.org/D19-1423.

Kang, D., Khot, T., Sabharwal, A. and Hovy, E. (2018). AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples. In: Gurevych, I. and Miyao, Y., (eds.). Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2418–2428. Available at: https://aclanthology.org/P18-1225.

Karras, T., Laine, S. and Aila, T. (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Computer Vision Foundation/IEEE, 4396–4405. Available at: https://doi.org/10.1109%2Fcvpr.2019.00453.

Kasthurirathne, S. N., Dexter, G. and Grannis, S. (2021). Generative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Models. In: Proceedings – AMIA Joint Summits Translational Science, 335–344.

Kim, H.-Y., Roh, Y.-H. and Kim, Y.-K. (2019). Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding. In: Kar, S. et al., (eds.). Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, 97–102. Available at: https://aclanthology.org/N19-3014.

Kumar, A., Bhattamishra, S., Bhandari, M. and Talukdar, P. (2019). Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 3609–3619. Available at: https://aclanthology.org/N19-1363.

Kusner, M. J. and Hernández-Lobato, J. M. (2016). GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution. CoRR, abs/1611.04051. Available at: http://arxiv.org/abs/1611.04051.

Le, Q. and Mikolov, T. (2014). Distributed Representations of Sentences and Documents. In: Proceedings of the 31st International Conference on International Conference on Machine Learning – Volume 32. PMLR, 1188–1196. Available at: http://proceedings.mlr.press/v32/le14.html.

Lee, J. S., Yam, G. P. D. and Chan, J. H. (2020). PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks. In: 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI). IEEE, 1–6. Available at: https://doi.org/10.1109%2Fccci49893.2020.9256804.

Li, Y., Pan, Q., Wang, S., Yang, T. and Cambria, E. (2018). A Generative Model for Category Text Generation. Information Sciences, 450, 301–315. Available at: https://doi.org/10.1016%2Fj.ins.2018.03.050.

Liu, A. Y., Ghosh, J. and Martin, C. E. (2007). Generative Oversampling for Mining Imbalanced Datasets. In: Stahlbock, R., Crone, S. F., and Lessmann, S., (eds.). Proceedings of the 2007 International Conference on Data Mining, DMIN. CSREA Press, 66–72.

Liu, M.-Y. and Tuzel, O. (2016). Coupled Generative Adversarial Networks. In: Lee, D. D. et al., (eds.). Advances in Neural Information Processing Systems 29: Annual Conference on NIPS 2016. NeurIPS, 469–477. Available at: https://proceedings.neurips.cc/paper/2016/hash/502e4a16930e414107ee22b6198c578f-Abstract.html.

Longpre, S., Lu, Y., Tu, Z. and DuBois, C. (2019). An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering. In: Proceedings of the 2nd Workshop on Machine Reading for Question Answering. Association for Computational Linguistics, 220–227. Available at: https://aclanthology.org/D19-5829.

Louvan, S. and Magnini, B. (2020). Simple is Better! Lightweight Data Augmentation for Low Resource Slot Filling and Intent Classification. In: Nguyen, M. le, Luong, M. C., and Song, S., (eds.). Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation. Association for Computational Linguistics, 167–177. Available at: https://aclanthology.org/2020.paclic-1.20.

Lu, K., Mardziel, P., Wu, F., Amancharla, P. and Datta, A. (2020). Gender Bias in Neural Natural Language Processing. In: Nigam, V. et al., (eds.). Logic, Language, and Security – Essays Dedicated to Andre Scedrov on the Occasion of His 65th Birthday. Springer, 189–202. Available at: https://doi.org/10.1007/978-3-030-62077-6_14.

Luo, J., Bouazizi, M. and Ohtsuki, T. (2021). Data Augmentation for Sentiment Analysis Using Sentence Compression-Based SeqGAN With Data Screening. IEEE, 9, 99922–99931. Available at: https://doi.org/10.1109%2Faccess.2021.3094023.

Ma, W., Yan, B. and Sun, L. (2022). Generative Adversarial Network-based Short Sequence Machine Translation from Chinese to English. Scientific Programming, 2022, 1–10. Available at: https://doi.org/10.1155%2F2022%2F7700467.

Malandrakis, N., Shen, M., Goyal, A., Gao, S., Sethi, A. and Metallinou, A. (2019). Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents. In: Proceedings of the 3rd Workshop on Neural Generation and Translation. Association for Computational Linguistics, 90–98. Available at: https://aclanthology.org/D19-5609.

Mao, X., Wang, Y., Liu, X. and Guo, Y. (2017). An Adaptive Weighted Least Square Support Vector Regression for Hysteresis in Piezoelectric Actuators. Sensors and Actuators A: Physical, 263, 423–429. Available at: https://doi.org/10.1016%2Fj.sna.2017.06.030.

Mi, C., Zhu, S. and Nie, R. (2021). Improving Loanword Identification in Low-Resource Language with Data Augmentation and Multiple Feature Fusion. Computational Intelligence and Neuroscience, 2021, 1–9. Available at: https://doi.org/10.1155%2F2021%2F9975078.

Mimura, M. (2020). Using Fake Text Vectors to Improve the Sensitivity of Minority Class for Macro Malware Detection. Journal of Information Security and Applications, 54, 102600. Available at: https://www.sciencedirect.com/science/article/pii/S2214212620307651.

Mirza, M. and Osindero, S. (2014). Conditional Generative Adversarial Nets. CoRR, abs/1411.1784. Available at: http://arxiv.org/abs/1411.1784.

Moher, D., Liberati, A., Tetzlaff, J. and Altman, D. G. (2009). Preferred Reporting Items for Systematic Reviews and Meta-analyses: the PRISMA statement. BMJ, 339, b2535–b2535. Available at: https://www.bmj.com/content/339/bmj.b2535.

Nowozin, S., Cseke, B. and Tomioka, R. (2016). f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. In: Lee, D. D. et al., (eds.). Advances in Neural Information Processing Systems 29: Annual Conference on NIPS 2016. NeurIPS, 271–279. Available at: https://proceedings.neurips.cc/paper/2016/hash/cedebb6e872f539bef8c3f919874e9d7-Abstract.html.

Ott, M., Choi, Y., Cardie, C. and Hancock, J. T. (2011). Finding Deceptive Opinion Spam by Any Stretch of the Imagination. In: Lin, D., Matsumoto, Y., and Mihalcea, R., (eds.). Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 309–319. Available at: https://aclanthology.org/P11-1032.

Parida, S. and Motlicek, P. (2019). Abstract Text Summarization: A Low Resource Challenge. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 5994–5998. Available at: https://aclanthology.org/D19-1616.

Park, T., Liu, M.-Y., Wang, T.-C. and Zhu, J.-Y. (2019). GauGAN: Semantic Image Synthesis with Spatially Adaptive Normalization. In: ACM SIGGRAPH 2019 Real-Time Live! Association for Computing Machinery. Available at: https://doi.org/10.1145%2F3306305.3332370.

Paschali, M., Simson, W., Roy, A. G., Naeem, M.F., Göbl, R., Wachinger, C. and Navab, N. (2019). Manifold Exploring Data Augmentation with Geometric Transformations for Increased Performance and Robustness. In: Chung Albert C. S. and Gee, J. C. and Y. P. A. and B. S., (eds.). Information Processing in Medical Imaging. Cham: Springer International Publishing, 517–529. Available at: https://doi.org/10.1007%2F978-3-030-20351-1_40.

Qin, L., Ni, M., Zhang, Y. and Che, W. (2021). CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP. In: Bessiere, C., (ed.). Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. IJCAI’20. International Joint Conferences on Artificial Intelligence Organization, 3853–3860. Available at: https://doi.org/10.24963/ijcai.2020/533.

Quan, J. and Xiong, D. (2019). Effective Data Augmentation Approaches to End-to-End Task-Oriented Dialogue. In: 2019 International Conference on Asian Language Processing (IALP). IEEE, 47–52. Available at: https://doi.org/10.1109%2Fialp48816.2019.9037690.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. and Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI blog, 1(8), 9.

Radford, A., Metz, L. and Chintala, S. (2018). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In: 2018 37th Chinese Control Conference (CCC). IEEE, 9159–9163. Available at: https://doi.org/10.23919%2Fchicc.2018.8482813.

Rayana, S. and Akoglu, L. (2015). Collective Opinion Spam Detection: Bridging Review Networks and Metadata. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’15. Association for Computing Machinery, 985–994. Available at: https://doi.org/10.1145/2783258.2783370.

Riabi, A., Scialom, T., Keraron, R., Sagot, B., Seddah, D. and Staiano, J. (2021). Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 7016–7030. Available at: https://aclanthology.org/2021.emnlp-main.562.

Şahin, G.G. and Steedman, M. (2018). Data Augmentation via Dependency Tree Morphing for Low-Resource Languages. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 5004–5009. Available at: https://aclanthology.org/D18-1545.

Sennrich, R., Haddow, B. and Birch, A. (2016). Improving Neural Machine Translation Models with Monolingual Data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin: Association for Computational Linguistics, 86–96. Available at: https://aclanthology.org/P16-1009.

Shahriar, S. (2022). GAN Computers Generate Arts? A Survey on Visual Arts, Music, and Literary Text Generation using Generative Adversarial Network. Displays, 73, 102237. Available at: https://www.sciencedirect.com/science/article/pii/S0141938222000658.

Shang, Y., Su, X., Xiao, Z. and Chen, Z. (2021). Campus Sentiment Analysis with GAN-based Data Augmentation. In: 13th International Conference on Advanced Infocomm Technology (ICAIT). IEEE, 209–214. Available at: https://doi.org/10.1109%2Ficait52638.2021.9702068.

Shehnepoor, S., Togneri, R., Liu, W. and Bennamoun, M. (2022). ScoreGAN: A Fraud Review Detector Based on Regulated GAN With Data Augmentation. IEEE Transactions on Information Forensics and Security, 17, 280–291.

Sindhwani, V. and Melville, P. (2008). Document-Word Co-regularization for Semi-supervised Sentiment Analysis. In: 2008 Eighth IEEE International Conference on Data Mining. IEEE Computer Society, 1025–1030. Available at: https://doi.org/10.1109/ICDM.2008.113.

Socher, R., Pennington, J., Huang, E. H.-C., Ng, A. and Manning, C. D. (2011). Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 151–161. Available at: https://aclanthology.org/D11-1014/.

Stanton, G. and Irissappane, A. A. (2019). GANs for Semi-Supervised Opinion Spam Detection. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 5204–5210. Available at: https://doi.org/10.24963/ijcai.2019/723.

Sun, Y., Chen, C., Xia, T. and Zhao, X. (2019). QuGAN: Quasi Generative Adversarial Network for Tibetan Question Answering Corpus Generation. IEEE Access, 7, 116247–116255. Available at: https://doi.org/10.1109%2Faccess.2019.2934581.

Täckström, O. and McDonald, R. T. (2011). Semi-supervised Latent Variable Models for Sentence-level Sentiment Analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 569–574. Available at: https://aclanthology.org/P11-2100.

Wang, K. and Wan, X. (2018). SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 4446–4452. Available at: https://doi.org/10.24963%2Fijcai.2018%2F618.

Wei, J. and Zou, K. (2019). EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 6382–6388. Available at: https://aclanthology.org/D19-1670.

Wu, B., Liu, L., Yang, Y., Zheng, K. and Wang, X. (2020). Using Improved Conditional Generative Adversarial Networks to Detect Social Bots on Twitter. IEEE Access, 8, 36664–36680. Available at: https://doi.org/10.1109%2Faccess.2020.2975630.

Wu, J., Zhang, C., Xue, T., Freeman, B. and Tenenbaum, J. (2016). Learning a Probabilistic Latent Space of Object Shapes via 3d Generative-adversarial Modeling. In: Lee, D. D. et al., (eds.). Advances in neural information processing systems. Barcelona, 82–90. Available at: https://proceedings.neurips.cc/paper/2016/hash/44f683a84163b3523afe57c2e008bc8c-Abstract.html.

Xiao, X., Xiao, W., Zhang, D., Zhang, B., Hu, G., Li, Q. and Xia, S. (2021). Phishing Websites Detection via CNN and Multi-head Self-attention on Imbalanced Datasets. Computers & Security, 108, 102372. Available at: https://www.sciencedirect.com/science/article/pii/S0167404821001966.

Xie, Q., Dai, Z., Hovy, E., Luong, M.-T. and Le, Q. v. (2020). Unsupervised Data Augmentation for Consistency Training. In: Larochelle, H. et al., (eds.). Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20. Red Hook: Curran Associates Inc., 6256–6268. Available at: https://proceedings.neurips.cc/paper/2020/hash/44feb0096faa8326192570788b38c1d1-Abstract.html.

Yang, W., Xie, Y., Tan, L., Xiong, K., Li, M. and Lin, J. J. (2019). Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering. CoRR, abs/1904.06652. Available at: https://arxiv.org/abs/1904.06652.

Yang, Y., Malaviya, C., Fernandez, J., Swayamdipta, S., le Bras, R., Wang, J.-P., Bhagavatula, C., Choi, Y. and Downey, D. (2020). Generative Data Augmentation for Commonsense Reasoning. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics, 1008–1025. Available at: https://aclanthology.org/2020.findings-emnlp.90.

Yu, L., Zhang, W., Wang, J. and Yu, Y. (2017). SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In: Singh, S. and Markovitch, S., (eds.). AAAI Conference on Artificial Intelligence. AAAI Press, 2852–2858. Available at: http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14344.

Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J. and Yoo, Y. J. (2019). CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 6022–6031. Available at: https://doi.org/10.1109%2Ficcv.2019.00612.

Zhang, H., Cissé, M., Dauphin, Y. and Lopez-Paz, D. (2018). Mixup: Beyond Empirical Risk Minimization. In: 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net. Available at: https://openreview.net/forum?id=r1Ddp1-Rb.

Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X. and Metaxas, D.N. (2017). StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 5908–5916. Available at: https://doi.org/10.1109%2Ficcv.2017.629.

Zhao, J., Wang, T., Yatskar, M., Ordonez, V. and Chang, K.-W. (2018). Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, 15–20. Available at: https://aclanthology.org/N18-2003.

Zhu, H., Dong, L., Wei, F., Qin, B. and Liu, T. (2022). Transforming Wikipedia into Augmented Data for Query-Focused Summarization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 2357–2367. Available at: https://doi.org/10.1109%2Ftaslp.2022.3171963.

Zhu, J.-Y., Park, T., Isola, P. and Efros, A. A. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2242–2251. Available at: https://doi.org/10.1109%2Ficcv.2017.244.

Downloads

Published

18.07.2023

How to Cite

Silva, K., Can, B., Sarwar, R., Blain, F., & Mitkov, R. (2023). Text Data Augmentation Using Generative Adversarial Networks – A Systematic Review. Journal of Computational and Applied Linguistics, 1, 6–38. https://doi.org/10.33919/JCAL.23.1.1