Query-specific subtopic clustering

Published: 20 June 2022


We propose a Query-Specific Siamese Similarity Metric (QS3M) for query-specific clustering of text documents. Our approach uses fine-tuned BERT embeddings to train a non-linear projection into a query-specific similarity space. We build on the idea of Siamese networks but include a third component, a representation of the query. QS3M is able to model the fine-grained similarity between text passages about the same broad topic and also generalizes to new unseen queries during evaluation. The empirical evaluation for clustering employs two TREC datasets and a set of academic abstracts from arXiv. When used to obtain query-relevant clusters, QS3M achieves a 12% performance improvement on the TREC datasets over a strong BERT-based reference method and many baselines such as TF-IDF and topic models. A similar improvement is observed for the arXiv dataset suggesting the general applicability of QS3M to different domains. Qualitative evaluation is carried out to gain insight into the strengths and limitations of the model.


Information & Contributors


Published In

cover image ACM Conferences
JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries
June 2022
392 pages
  • General Chairs:
  • Akiko Aizawa,
  • Thomas Mandl,
  • Zeljko Carevic,
  • Program Chairs:
  • Annika Hinze,
  • Philipp Mayr,
  • Philipp Schaer
Published: 20 June 2022

Published: 20 June 2022


Author Tags

  1. clustering
  2. neural networks
  3. query-specific clustering
  4. siamese neural networks
  5. similarity metric
  6. topic detection
  7. topic model


JCDL '22

