dc.description.abstract |
The legal domain remains among various areas that have many opportuni ties when it comes to improvement and innovation through computational
advancements. In Pakistan, in the recent past, the courts have made reported
judgments available to the public. As this data continues to grow at a rapid
rate, it has become essential to process this massive chunk of data to better
meet the requirements of the respective stakeholders. However, extracting
the required information from this unstructured legal text is the main issue.
Therefore, our goal is to have a machine learning system that can automat ically extract information out of these publicly available judgments of the
Supreme Court. Once this information has been extracted, it can then be
used by the lawyers, judges as well as civilians and also for policy making in
Pakistan. For the purpose of our work, a total of thirteen entities are being
extracted including dates, case-numbers, respondent names, reference cases,
FIR no., person names, references etc. A labeled dataset is created using
the publicly available legal judgments from the Supreme Court of Pakistan
by using annotation guidelines. A pre-trained BERT model is then further
trained and fine-tuned on the created dataset for Named Entity Recognition
to extract the desired information. Our model also improved the results of
the similar dataset available consisting of judgments from Lahore High Court
which has smaller number of labels. |
en_US |