A Federated Approach to Predict Emojis in Hindi Tweets

Published:

Abstract:

The use of emojis provide for adding a visual modality to textual communication. The task of predicting emojis however provides a challenge for computational approaches as emoji use tends to cluster into the frequently used and the rarely used emojis. Much of the research on emoji use has focused on high resource languages and conceptualised the task of predicting emojis around traditional servers-side machine learning approaches, which can introduce privacy concerns, as user data is transmitted to a central storage. In this paper, we provide a benchmark dataset of 118k tweets for emoji prediction in Hindi. The dataset will be made publicly available upon request. Specifically, we show that a privacy preserving approach, Federated Learning exhibits comparable performance to traditional servers-side transformer models.