Abstract
We present an in-depth formal and empirical comparison of unsupervised signal combination approaches in the context of tasks based on textual similarity. Our formal study introduces the concept of Similarity Information Quantity, and proves that the most salient combination methods are all estimations of Similarity Information Quantity under different statistical assumptions that simplify the computation. We also prove a Minimal Voting Performance theorem stating that, under certain plausible conditions, estimations of Information Quantity should at least match the performance of the best measure in the set. This explains, at least partially, why unsupervised combination methods perform robustly. Our empirical analysis compares a wide range of unsupervised combination methods in six different Information Access tasks based on textual similarity: Document Retrieval and Clustering, Textual Entailment, Semantic Textual Similarity, and the automatic evaluation of Machine Translation and Summarization systems. Empirical results on all datasets corroborate the result of the formal analysis and help establishing recommendations on which combining method to use depending on nature of the set of measures to be combined.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Explicit proofs are avoided due to lack of space.
- 2.
Note that a zero value avoids the effect of the rest of measures in geometric and harmonic means, and maximum and minimum only consider at the end one of the combined measures.
- 3.
- 4.
- 5.
References
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A.: Semeval-2012 task 6: A pilot on semantic textual similarity. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics (SemEval 2012), pp. 385–393. Association for Computational Linguistics, Montréal, Canada, 7–8 June 2012
Akiba, Y., Imamura, K., Sumita, E.: Using multiple edit distances to automatically rank machine translation output. In: Proceedings of Machine Translation Summit VIII, pp. 15–20 (2001)
Albrecht, J., Hwa, R.: The role of pseudo references in MT evaluation. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 187–190 (2008)
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: Combining evaluation metrics via the unanimous improvement ratio and its application to clustering tasks. J. Artif. Intell. Res. (JAIR) 42, 689–718 (2011)
Artiles, J., Amigó, E., Gonzalo, J.: The role of named entities in web people search. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, EMNLP 2009, pp. 534–542. Association for Computational Linguistics (2009)
Artiles, J., Gonzalo, J., Sekine, S.: The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task. In: Proceedings of the 4th International Workshop on Semantic Evaluations, SemEval 2007, pp. 64–69. Association for Computational Linguistics, Stroudsburg (2007)
Aslam, J.A., Savell, R.: On the effectiveness of evaluating retrieval systems in the absence of relevance judgments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 361–362. ACM, New York (2003)
Bar-Haim, R., Dagan, I., Dolan, B., Ferro, L., Giampiccolo, D., Magnini, B., Szpektor, I.: The second PASCAL recognising textual entailment challenge. In: Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment (2006)
Linguistic Data Consortium. Linguistic Data Annotation Specification: Assessment of Adequacy and Fluency in Translations. Revision 1.5. Technical report (2005)
Cormack, G.V., Clarke, C.L.A., Büttcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods
Corston-Oliver, S., Gamon, M., Brockett, C.: A machine learning approach to the automatic evaluation of machine translation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 140–147 (2001)
Dang, H.T.: Overview of DUC 2005. In: Proceedings of the 2005 Document Understanding Workshop (2005)
Dang, H.T.: Overview of DUC 2006. In: Proceedings of the 2006 Document Understanding Workshop (2006)
de Borda, J.C.: Memoire sur les Elections au Scrutin. Histoire de l’Academie Royale des Sciences, Paris (1781)
de Condorcet, M.: Essai Sur l’Application de l’Analyse Á la Probabilite des Decisions Rendues e la Pluralite des Voix (1785)
Giménez, J., Màrquez, L.: Asiya: an open toolkit for automatic machine translation (meta-)evaluation. Prague Bull. Math. Linguist. 94, 77–86 (2010)
Kaniovski, S., Zaigraev, A.: Optimal jury design for homogeneous juries with correlated votes. Theory Decis. 71(4), 439–459 (2011)
Kuncheva, L.I., Whitaker, C.J., et al.: Is independence good for combining classifiers? pp. 168–171 (2000)
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Moens, M.-F., Szpakowicz, S., (eds.), Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, pp. 74–81. Association for Computational Linguistics, Barcelona, Spain, July 2004
Lin, D.: Dependency-based evaluation of MINIPAR. In: Proceedings of Workshop on the Evaluation of Parsing Systems, Granada (1998)
Liu, D., Gildea, D.: Source-language features and maximum correlation training for machine translation evaluation. In: Proceedings of the 2007 Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 41–48 (2007)
Montague, M.H., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, 4–9 November 2002, pp. 538–548. ACM (2002)
Partridge, D., Krzanowski, W.: Software diversity: practical statistics for its measurement and exploitation. Inf. Softw. Technol. 39(10), 707–717 (1997)
Sharkey, A.J. (ed.): Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, 1st edn. Springer-Verlag New York Inc., Secaucus (1999)
Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, pp. 66–73. ACM, New York (2001)
Tversky, A.: Features of similarity. Psychol. Rev. 84, 327–352 (1977)
Acknowledgments
This research was supported by the Spanish Ministry of Science and Innovation (VoxPopuli Project, TIN2013-47090-C3-1-P and Vemodalen, TIN2015-71785-R).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix: Minimal Voting Performance Proof
A Appendix: Minimal Voting Performance Proof
Given two similarity instances, \(x, y \in \varOmega ^2\), we will denote an increase in signal f, the information quantity \(\mathcal{I}_\mathcal{F}(x)\) or the true similarity sim(x) by \(\varDelta \mathcal{I}_\mathcal{F}\equiv \mathcal{I}_\mathcal{F}(x)>\mathcal{I}_\mathcal{F}(y) \) and \(\varDelta f\equiv f(x)>f(y)\) and \(\varDelta sim\equiv sim(x)>sim(y)\). Similarly, decreases will be denoted by \(\nabla f\). Therefore, the optimality theorem can be expressed as \(P(\varDelta \mathcal{I}_\mathcal{F}| \varDelta sim) \ge P(\varDelta f|\varDelta sim), \forall f \in \mathcal{F}\). Assuming high granularity, we have \(P(\varDelta f)=P(\varDelta \mathcal{I}_\mathcal{F})=P(\varDelta sim)=\frac{1}{2}\). Therefore \(P(\varDelta f| \varDelta sim)=\frac{P(\varDelta f|\varDelta sim) \cdot P(\varDelta sim)}{P(\varDelta f)}=P(\varDelta f|\varDelta sim)\). This is valid for any other conditional probability. Therefore, the optimality theorem can be rewritten as:
Assuming high granularity, we have that \(P(\varDelta \mathcal{I}_\mathcal{F}|\varDelta f)-P(\varDelta f|\varDelta \mathcal{I}_\mathcal{F})=0\) and the previous expression is equivalent to:
On the other hand \(P(\varDelta sim|\varDelta f, \nabla \mathcal{I}_\mathcal{F})=1-P(\nabla sim|\varDelta f\nabla \mathcal{I}_\mathcal{F})= 1-P(\varDelta sim|\nabla f, \varDelta \mathcal{I}_\mathcal{F})\). And assuming granularity \(P(\nabla f|\varDelta \mathcal{I}_\mathcal{F})=P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\). Therefore, we need to prove that:
Then, we have to prove that \(P\big (\varDelta sim(x) \ | \ \varDelta \mathcal{I}_\mathcal{F}, \nabla f \big ) \ge \frac{1}{2}\). Assuming SIH, we have that:
Therefore, when \(\mathcal{I}_\mathcal{F}(x)>\mathcal{I}_\mathcal{F}(y)\), we can infer that:
It is true for every th values, so we can infer that:
It is true even when a single measure decreases \(f^i(x)<f^i(y)\), so we can derive that \(P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f)\ge \frac{1}{2}.\)
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Amigó, E., Giner, F., Gonzalo, J., Verdejo, F. (2017). A Formal and Empirical Study of Unsupervised Signal Combination for Textual Similarity Tasks. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-56608-5_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56607-8
Online ISBN: 978-3-319-56608-5
eBook Packages: Computer ScienceComputer Science (R0)