Canonicalizing Organization Names for Recruitment Domain

Published: 15 January 2020


Online recruitment industry relies on various Knowledge Bases (KB) for enabling search and recommendation systems. These KBs comprise of diverse, non-standard, and large volume of named-entities as they are created from vast unstructured user-generated content (mostly CVs). Such non-standard representation of each entity causes significant vocabulary gap in KB which results in redundancy incompleteness, and ambiguity in the retrieved information. The problem is even more challenging in domains where external sources of context do not exist.
To address these challenges, we propose a two-tier architecture that (a) finds the distance parameter for clustering entities using a novel pairwise similarity between all entity mentions, and, (b) then uses these similarity (scores) to create canonical clusters representing unique entity in the KB. Our experiments on proprietary data of 25,602 unique companies and 23,690 unique institutes show that the pair-wise similarity score using Siamese network outperforms (97% and 82% F1-score) standard string similarity measures. Finally, clustering methods over the similarity scores achieve 90% and 80% micro F1-score.


  • (2023)Text Classification In The Wild: A Large-Scale Long-Tailed Name Normalization DatasetICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096769(1-5)Online publication date: 4-Jun-2023
  • (2020)Canonicalizing Knowledge Bases for Recruitment DomainAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47436-2_38(500-513)Online publication date: 11-May-2020



  • (2023)Text Classification In The Wild: A Large-Scale Long-Tailed Name Normalization DatasetICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096769(1-5)Online publication date: 4-Jun-2023
  • (2020)Canonicalizing Knowledge Bases for Recruitment DomainAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47436-2_38(500-513)Online publication date: 11-May-2020

