Abstract:
Social networking sites e.g. Facebook, Twitter, LinkedIn are becoming popu-
lar among users as they are successful in connecting people and have become
a great means of information dissemination, communication and entertain-
ment. Popularity of these services motivates us to study characteristics of
online social networks. Twitter, is a micro-blogging service launched in July,
2006 by Jack Dorsey. In 2012, more than 500 million users have subscribed
Twitter, generating over 340 million posts and 1.6 billion search queries each
day. Due to the enormous number of posts generated by Twitter, it is often
di cult to understand what is being said by people on a speci c topic. Post
summarization is a technique to extract short summaries from the collection
of posts on a particular topic. In this research work, I have used simple K-
Means clustering, an unsupervised learning technique of machine learning to
perform post summarization using Twitter status updates. Three di erent
distance metrics: Euclidean, Cosine Similarity and Manhattan were used in
K-means clustering. These clustering results were evaluated against previous
best post summarization algorithms, and results showed that clustering us-
ing Euclidean distance is performing better than existing post summarization
algorithms, in terms of Precision, Recall and F-Measure.