Skip to main content

A Formal and Empirical Study of Unsupervised Signal Combination for Textual Similarity Tasks

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10193))

Included in the following conference series:

Abstract

We present an in-depth formal and empirical comparison of unsupervised signal combination approaches in the context of tasks based on textual similarity. Our formal study introduces the concept of Similarity Information Quantity, and proves that the most salient combination methods are all estimations of Similarity Information Quantity under different statistical assumptions that simplify the computation. We also prove a Minimal Voting Performance theorem stating that, under certain plausible conditions, estimations of Information Quantity should at least match the performance of the best measure in the set. This explains, at least partially, why unsupervised combination methods perform robustly. Our empirical analysis compares a wide range of unsupervised combination methods in six different Information Access tasks based on textual similarity: Document Retrieval and Clustering, Textual Entailment, Semantic Textual Similarity, and the automatic evaluation of Machine Translation and Summarization systems. Empirical results on all datasets corroborate the result of the formal analysis and help establishing recommendations on which combining method to use depending on nature of the set of measures to be combined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Explicit proofs are avoided due to lack of space.

  2. 2.

    Note that a zero value avoids the effect of the rest of measures in geometric and harmonic means, and maximum and minimum only consider at the end one of the combined measures.

  3. 3.

    http://www.nist.gov/speech/tests/mt.

  4. 4.

    http://www.lsi.upc.edu/~nlp/Asiya.

  5. 5.

    http://duc.nist.gov/.

References

  1. Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A.: Semeval-2012 task 6: A pilot on semantic textual similarity. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics (SemEval 2012), pp. 385–393. Association for Computational Linguistics, Montréal, Canada, 7–8 June 2012

    Google Scholar 

  2. Akiba, Y., Imamura, K., Sumita, E.: Using multiple edit distances to automatically rank machine translation output. In: Proceedings of Machine Translation Summit VIII, pp. 15–20 (2001)

    Google Scholar 

  3. Albrecht, J., Hwa, R.: The role of pseudo references in MT evaluation. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 187–190 (2008)

    Google Scholar 

  4. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: Combining evaluation metrics via the unanimous improvement ratio and its application to clustering tasks. J. Artif. Intell. Res. (JAIR) 42, 689–718 (2011)

    MathSciNet  MATH  Google Scholar 

  5. Artiles, J., Amigó, E., Gonzalo, J.: The role of named entities in web people search. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, EMNLP 2009, pp. 534–542. Association for Computational Linguistics (2009)

    Google Scholar 

  6. Artiles, J., Gonzalo, J., Sekine, S.: The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task. In: Proceedings of the 4th International Workshop on Semantic Evaluations, SemEval 2007, pp. 64–69. Association for Computational Linguistics, Stroudsburg (2007)

    Google Scholar 

  7. Aslam, J.A., Savell, R.: On the effectiveness of evaluating retrieval systems in the absence of relevance judgments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 361–362. ACM, New York (2003)

    Google Scholar 

  8. Bar-Haim, R., Dagan, I., Dolan, B., Ferro, L., Giampiccolo, D., Magnini, B., Szpektor, I.: The second PASCAL recognising textual entailment challenge. In: Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment (2006)

    Google Scholar 

  9. Linguistic Data Consortium. Linguistic Data Annotation Specification: Assessment of Adequacy and Fluency in Translations. Revision 1.5. Technical report (2005)

    Google Scholar 

  10. Cormack, G.V., Clarke, C.L.A., Büttcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods

    Google Scholar 

  11. Corston-Oliver, S., Gamon, M., Brockett, C.: A machine learning approach to the automatic evaluation of machine translation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 140–147 (2001)

    Google Scholar 

  12. Dang, H.T.: Overview of DUC 2005. In: Proceedings of the 2005 Document Understanding Workshop (2005)

    Google Scholar 

  13. Dang, H.T.: Overview of DUC 2006. In: Proceedings of the 2006 Document Understanding Workshop (2006)

    Google Scholar 

  14. de Borda, J.C.: Memoire sur les Elections au Scrutin. Histoire de l’Academie Royale des Sciences, Paris (1781)

    Google Scholar 

  15. de Condorcet, M.: Essai Sur l’Application de l’Analyse Á la Probabilite des Decisions Rendues e la Pluralite des Voix (1785)

    Google Scholar 

  16. Giménez, J., Màrquez, L.: Asiya: an open toolkit for automatic machine translation (meta-)evaluation. Prague Bull. Math. Linguist. 94, 77–86 (2010)

    Article  Google Scholar 

  17. Kaniovski, S., Zaigraev, A.: Optimal jury design for homogeneous juries with correlated votes. Theory Decis. 71(4), 439–459 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  18. Kuncheva, L.I., Whitaker, C.J., et al.: Is independence good for combining classifiers? pp. 168–171 (2000)

    Google Scholar 

  19. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)

    Article  MATH  Google Scholar 

  20. Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Moens, M.-F., Szpakowicz, S., (eds.), Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, pp. 74–81. Association for Computational Linguistics, Barcelona, Spain, July 2004

    Google Scholar 

  21. Lin, D.: Dependency-based evaluation of MINIPAR. In: Proceedings of Workshop on the Evaluation of Parsing Systems, Granada (1998)

    Google Scholar 

  22. Liu, D., Gildea, D.: Source-language features and maximum correlation training for machine translation evaluation. In: Proceedings of the 2007 Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 41–48 (2007)

    Google Scholar 

  23. Montague, M.H., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, 4–9 November 2002, pp. 538–548. ACM (2002)

    Google Scholar 

  24. Partridge, D., Krzanowski, W.: Software diversity: practical statistics for its measurement and exploitation. Inf. Softw. Technol. 39(10), 707–717 (1997)

    Article  Google Scholar 

  25. Sharkey, A.J. (ed.): Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, 1st edn. Springer-Verlag New York Inc., Secaucus (1999)

    MATH  Google Scholar 

  26. Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, pp. 66–73. ACM, New York (2001)

    Google Scholar 

  27. Tversky, A.: Features of similarity. Psychol. Rev. 84, 327–352 (1977)

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by the Spanish Ministry of Science and Innovation (VoxPopuli Project, TIN2013-47090-C3-1-P and Vemodalen, TIN2015-71785-R).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enrique Amigó .

Editor information

Editors and Affiliations

A Appendix: Minimal Voting Performance Proof

A Appendix: Minimal Voting Performance Proof

Given two similarity instances, \(x, y \in \varOmega ^2\), we will denote an increase in signal f, the information quantity \(\mathcal{I}_\mathcal{F}(x)\) or the true similarity sim(x) by \(\varDelta \mathcal{I}_\mathcal{F}\equiv \mathcal{I}_\mathcal{F}(x)>\mathcal{I}_\mathcal{F}(y) \) and \(\varDelta f\equiv f(x)>f(y)\) and \(\varDelta sim\equiv sim(x)>sim(y)\). Similarly, decreases will be denoted by \(\nabla f\). Therefore, the optimality theorem can be expressed as \(P(\varDelta \mathcal{I}_\mathcal{F}| \varDelta sim) \ge P(\varDelta f|\varDelta sim), \forall f \in \mathcal{F}\). Assuming high granularity, we have \(P(\varDelta f)=P(\varDelta \mathcal{I}_\mathcal{F})=P(\varDelta sim)=\frac{1}{2}\). Therefore \(P(\varDelta f| \varDelta sim)=\frac{P(\varDelta f|\varDelta sim) \cdot P(\varDelta sim)}{P(\varDelta f)}=P(\varDelta f|\varDelta sim)\). This is valid for any other conditional probability. Therefore, the optimality theorem can be rewritten as:

$$\begin{aligned}&P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F})\ge P(\varDelta sim|\varDelta f)\equiv P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \varDelta f) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\varDelta f)+P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\ge \\&~~~~~~P(\varDelta sim|\varDelta f, \varDelta \mathcal{I}_\mathcal{F}) \cdot P(\varDelta f|\varDelta \mathcal{I}_\mathcal{F})+P(\varDelta sim|\varDelta f, \nabla \mathcal{I}_\mathcal{F}) \cdot P(\nabla f|\varDelta \mathcal{I}_\mathcal{F})\equiv \\&P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \varDelta f) \cdot (P(\varDelta \mathcal{I}_\mathcal{F}|\varDelta f)-P(\varDelta f|\varDelta \mathcal{I}_\mathcal{F}))+\\&~~~~~~P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)-P(\varDelta sim|\varDelta f, \nabla \mathcal{I}_\mathcal{F})\cdot P(\nabla f|\varDelta \mathcal{I}_\mathcal{F})\ge 0 \end{aligned}$$

Assuming high granularity, we have that \(P(\varDelta \mathcal{I}_\mathcal{F}|\varDelta f)-P(\varDelta f|\varDelta \mathcal{I}_\mathcal{F})=0\) and the previous expression is equivalent to:

$$P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)-P(\varDelta sim|\varDelta f, \nabla \mathcal{I}_\mathcal{F}) \cdot P(\nabla f|\varDelta \mathcal{I}_\mathcal{F})\ge 0 $$

On the other hand \(P(\varDelta sim|\varDelta f, \nabla \mathcal{I}_\mathcal{F})=1-P(\nabla sim|\varDelta f\nabla \mathcal{I}_\mathcal{F})= 1-P(\varDelta sim|\nabla f, \varDelta \mathcal{I}_\mathcal{F})\). And assuming granularity \(P(\nabla f|\varDelta \mathcal{I}_\mathcal{F})=P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\). Therefore, we need to prove that:

$$P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}\nabla f) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)-(1-P(\varDelta sim|\nabla f, \varDelta \mathcal{I}_\mathcal{F})) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\ge 0 \equiv $$
$$P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f) \cdot (P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)+P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f))-P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\ge 0 \equiv $$
$$P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f)\cdot 2 \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)-P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\ge 0 \equiv $$
$$(2 \cdot P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}\nabla f)-1) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\ge 0 \equiv $$
$$2 \cdot P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f)-1\ge 0 \equiv P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f)\ge \frac{1}{2}$$

Then, we have to prove that \(P\big (\varDelta sim(x) \ | \ \varDelta \mathcal{I}_\mathcal{F}, \nabla f \big ) \ge \frac{1}{2}\). Assuming SIH, we have that:

$$P\left( sim(x)>th \ | \ f^1(x), \ldots , f^n(x)\right) \simeq \mathcal{I}_\mathcal{F}(x)=\mathcal{I}\left( \mathcal{A}^\mathcal{F}_{\{f^1(x)..f^n(x)\}}\right) .$$

Therefore, when \(\mathcal{I}_\mathcal{F}(x)>\mathcal{I}_\mathcal{F}(y)\), we can infer that:

$$P\left( sim(x)>th \ | \ f^1(x), \ldots , f^n(x)\right)>P\left( sim(y)>th \ | \ f^1(y), \ldots , f^n(y)\right) $$

It is true for every th values, so we can infer that:

$$P\left( sim(x)> sim(y)\ | \ f^1(x), \ldots , f^n(x), \ f^1(y), \ldots , f^n(y),\mathcal{I}_\mathcal{F}(x)>\mathcal{I}_\mathcal{F}(y)\right) \ge \frac{1}{2}.$$

It is true even when a single measure decreases \(f^i(x)<f^i(y)\), so we can derive that \(P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f)\ge \frac{1}{2}.\)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Amigó, E., Giner, F., Gonzalo, J., Verdejo, F. (2017). A Formal and Empirical Study of Unsupervised Signal Combination for Textual Similarity Tasks. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56608-5_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56607-8

  • Online ISBN: 978-3-319-56608-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics