NUST Institutional Repository

Semantic Analysis of Micro-blogs

Show simple item record

dc.contributor.author Niazi, Umar Hayat Khan
dc.date.accessioned 2020-11-05T10:02:09Z
dc.date.available 2020-11-05T10:02:09Z
dc.date.issued 2011
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/10245
dc.description Supervisor: Dr. Khalid Latif en_US
dc.description.abstract Micro-blogging platforms have proven their importance as vital communication channels over the internet. Individuals use micro-blogging platforms to keep in touch with friends and families whereas corporate users make use of it to introduce new products and services to their clients. Spammers also cash in on the global reach of micro-blogs to spread irrelevant, immaterial and offensive stuff like viruses, porn etc. Spammers are wasting resources, valued user time and annoying valid users by polluting these platforms with their orthogonal messages. Identifying an irrelevant message on such platforms is a challenging task. A user sending legitimate messages most of the times and infrequently sending junk replies cannot be declared as a spammer. Similarly, public messages, such as advertisements, can be considered irrelevant by one reader but relevant by another due to their diverse personal interests. These messages contain named entities, URLs, events, facts and figures. These named entities have different relationships among them. With the current, state of the art semantic information extraction and analysis techniques it has become possible to dig out these named entities and their relationships with each other. In this research we have implemented an algorithm to detect the irrelevant messages on one of the famous micro-blogging platforms known as Twitter. Our algorithm utilizes the semantic information extraction and analysis techniques to compute relevance among different parts of the messages and compares it with a user set threshold. The messages with higher similarity among their components are most likely the relevant message and vice versa. We have validated our algorithm to detect irrelevant messages from a dataset collected from Twitter. Our algorithm has successfully achieved a precision of up to 97% with equally good values for recall and F-Measure up to 100% and 97% respectively. en_US
dc.publisher SEECS, National University of Science and Technology, Islamabad. en_US
dc.subject Information Technology, Micro-blogs en_US
dc.title Semantic Analysis of Micro-blogs en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [432]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account