NUST Institutional Repository

Document Zone Identification and Classification

Show simple item record

dc.contributor.author Shamim, Nauman
dc.date.accessioned 2020-11-02T08:25:27Z
dc.date.available 2020-11-02T08:25:27Z
dc.date.issued 2015
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/8262
dc.description Supervisor: Dr. Khalid Latif en_US
dc.description.abstract Automated data extraction from resume has a variety of applications such as online recruiting, human resource management. An e cient technique for resume zone identi cation and classi cation can help data extraction from resume and assist resume analysis against a job description. The segmenta- tion of resume into zones is a challenging problem as the order and number of resume sections, their length and content representation is not according to a set model. The classi cation of resume segments is di cult as the classes for possible resume segments are not well de ned in any previous work. Another issue is that the lengths of constituent segments are highly varied across dif- ferent resumes. In comparison to text classi cation and segmentation the problem of resume zone identi cation and classi cation is relatively unex- plored, the approaches already proposed have limitations in terms of order of resume sections and classes. Based on the fact that a resume consists of sections and each section is preceded by a section heading we proposed a technique to e ciently segment a resume into its constituent segments. The textual and structural features along with named entities of a section heading are used to detect section boundaries. To detect the content type of individual segments we trained a SVM classi er over word vectors. We further trained the classi er over word vectors of named entities and root words from contents of constituent resume segments. For training and test- ing we develop a data set of 1730 segments from 300 resume, presently there is no suitable data set available for research in this area. The work shows that the proposed technique segments resume with high precision (0.91) and recall (0.85). en_US
dc.publisher SEECS, National University of Science & Technology en_US
dc.subject Document Zone, Identifi cation, Classifi cation, Computer Sciences en_US
dc.title Document Zone Identification and Classification en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [375]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account