Abstract:
Twitter is one of those social networking websites which is part of daily routine in the lives
of the people around the world. As per the quarter first of 2018 it has more than 336 million
active user which is overwhelming figure. Twitter users share their opinions about any topic,
or what is going in their lives with the people who follow them by doing tweets, a short message
of maximum 140 characters, and to think of the number of Twitter users the number of
tweets is very huge. Most twitter users depend on appropriate hashtags, a hash (#) symbol
followed by a keyword or phrase to categorize the tweets and help tweets to show up more
easily in Twitter search, inserted into tweets to effectively organize and search tweets. It
also helps in keeping the text of the tweet short and to the point, thus saving time. Among
all those tweets only a very few number of tweets, as less as 8%, contain hashtags, which
compromises the quality of the desired search results which eventually lead us for this work.
Most of the hashtags have very short life span as these hashtags are often used as trends and
majority of them are used during the specific days. Trending hashtags can be easily propagated
among the Twitter users by their frequent usage, which eventually creates a community
having similar interests. By implementing the feature of hashtag search in Twitter, the users
and business marketers have iniatiated the practice of using hashtags to organize their tweets
into inter-related discussions and for facilitating a comparatively easier search by using the
appropriate hashtags. So, it gained our interest and lead us to the problem that the recommendation
of appropriate hashtag is very vital for the user as well as for the general search
on Twitter.
We have used a method for hashtags recommendation for tweets which relies upon Latent
Dirichlet Allocation (LDA). It was used for assigning the latent topics to tweets by the users
for getting efficiency in the recommendation as per the user’s interest in a specific topic.
Hashtags associations and relatedness has been determined by the co-occurrences of different
hashtags in different tweets. A hashtag might have been used with one or more hashtags
or else it might have been used alone but that specific hashtag must belong to specific topic(s)
i.e., Politics, Fashion, Sports. The hashtags with lower frequencies were discarded to cancel
their effect on the efficiency on the algorithm. We have used Probabilistic Matrix Factorization
as a collaborative filtering technique to get the feature vectors of users and hashtags to
make hashtag recommendations. Non-English language tweets were discarded and only the
tweets in English language are used for the evaluation and implementation purposes. Also,
only the tweets’ text and twitter users’ screennames were used and all other information
iii
returned from the Twitter API was not used.
The advantage of the proposed approach is that we have identified topics of the users’ interest
to recommend general hashtags associated with that specific topic. The objective is to make
the Twitter user use the appropriate hashtags related to as per the interest of the user in any
specific topic. To validate the effectiveness of the proposed approach, set of experiments
have been performed on collected tweets dataset of Twitter in comparison to previously
proposed similar in context approaches.