Abstract:
Web-Crawlers, BOTS and scrappers are all running across the web sniffing, stealing and picking data of their own interest from web applications. Web applications on request of data listens to all incoming requests and return the response in plaintext without obfuscating which can be used by the bots to steal information. Bots exploit the patterns used by the website to present data to the user and make use of the DOM (Document Object Model) and powerful Regex (Regular Expressions) to get the required data out of the whole HTML content. Despite implementation of SSL and HTTPS applications are vulnerable as how they present data to the clients. The target of BOTs are to gather large chunks of data that can be used for either malicious purposes or mass advertisements This is usually done by obtaining personal information such as email addresses, phone numbers, Facebook ids etc. They can spam using this information, cause identity theft, information theft (like articles, research papers etc.) and can fake human activity. A safer web-space demands efficient detection of such data stealth attacks and take necessary preventive measures to minimize the threat.