Abstract:
Automated data extraction from resume has a variety of applications such
as online recruiting, human resource management. An e cient technique for
resume zone identi cation and classi cation can help data extraction from
resume and assist resume analysis against a job description. The segmenta-
tion of resume into zones is a challenging problem as the order and number of
resume sections, their length and content representation is not according to a
set model. The classi cation of resume segments is di cult as the classes for
possible resume segments are not well de ned in any previous work. Another
issue is that the lengths of constituent segments are highly varied across dif-
ferent resumes. In comparison to text classi cation and segmentation the
problem of resume zone identi cation and classi cation is relatively unex-
plored, the approaches already proposed have limitations in terms of order
of resume sections and classes. Based on the fact that a resume consists
of sections and each section is preceded by a section heading we proposed
a technique to e ciently segment a resume into its constituent segments.
The textual and structural features along with named entities of a section
heading are used to detect section boundaries. To detect the content type
of individual segments we trained a SVM classi er over word vectors. We
further trained the classi er over word vectors of named entities and root
words from contents of constituent resume segments. For training and test-
ing we develop a data set of 1730 segments from 300 resume, presently there
is no suitable data set available for research in this area. The work shows
that the proposed technique segments resume with high precision (0.91) and
recall (0.85).