Abstract:
The web contains billions of web pages today (550 billion documents in 2001). Many web pages have to be searched in order to get to the required information. But this information is in natural language text and is not actually understood by computer programs. I-ANSWER goes beyond keyword searching into the field of natural language processing in order to extract relevant information from the web and provide it to the user in a concise form. The system first retrieves the relevant information from the web through google and yahoo search engines using a new relevance algorithm for determining relevance. The relevant text is dumped into a corpus. It then uses the Stanford University NLP research group parser to extract pieces of information from the text in the corpus. It extracts information in the form of entities and relationships between two entities. The extracted information can be used for question-answering, summary generation, ontology generation etc. This research project has successfully demonstrated that a web information extraction system is feasible and useful