Abstract:
Communication of information is seminal to the success of any project, particularly the construction industry. A huge amount of data is generated daily in a construction project, the bulk of which is available in textual form. All this data is traditionally analyzed manually which is a process marred with time delays, cost ineffectiveness, error-prone, etc. To automate the process of content analysis, Text Mining has been used extensively in unstructured texts, but it falls short of understanding human language. Natural Language Processing (NLP) is a new AI based approach which allows a computer to understand texts in a human-like manner. Rule-based NLP was chosen for better results as the area of application is specific. A framework for automated information extraction from construction correspondence was formed and a Rule based NLP system reflecting the framework was also created. The inputs for the system are the corpus, manually defined Information Extraction rules, ontology, and a standard letter format. The question that what is to be extracted from a letter was put to field experts and their opinion was reflected by making headers of a summary table. For validation, the system was fed with Sixty letters from an existing project, and results were verified by field experts. The metric used was the F-1 score which is a harmonic mean of recall and precision. The score obtained, after repeated tuning of rules, was above ninety-five percent, signifying that the framework can be implemented to automate correspondence content analysis in construction projects.