Abstract:
The popularity of social media platform is increasing day by day as it becomes
easier for internet users to generate, share and describe their content. One of
the easiest ways to describe media content is through tags. Tags are mostly
keywords randomly assigned by the user to present their web content. Most
of the time images that are uploaded by users does not have any tags or
incorrect tags or terms used is too general.
In recent years, retrieving images through tags become popular as it is
easy to access images through tags. However, different methods e.g. tag
co-occurrence, clustering-based and hybrid techniques have been introduced
over the time aiming to help users with tag generation and increasing the
quality of tags. Most of research work focus on visual features of images and
ignoring semantic content of images. We therefore propose a model which
not only work with visual features of images but also textual features which
are user provided tags.
We propose YOLOv3 model to detect objects in images and to get visual
content of images. ResNet-18 model is used to extract image features and
group similar images together using Kmeam. Semantic embedding model,
word2vec using skipgram model is used to suggest more tags by taking in consideration already provided user tags of similar images. YFCC100M dataset
is used for experiment purpose. The experimental results shows that, combine effect of image based models with semantic embedding model improve
accuracy of new suggested tags up to 42%.