A Formal and Empirical Study of Unsupervised Signal Combination for Textual Similarity Tasks

Amigó, Enrique; Giner, Fernando; Gonzalo, Julio; Verdejo, Felisa

doi:10.1007/978-3-319-56608-5_29

Enrique Amigó²⁰,
Fernando Giner²⁰,
Julio Gonzalo²⁰ &
…
Felisa Verdejo²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10193))

Included in the following conference series:

European Conference on Information Retrieval

2482 Accesses
1 Citations
2 Altmetric

Abstract

We present an in-depth formal and empirical comparison of unsupervised signal combination approaches in the context of tasks based on textual similarity. Our formal study introduces the concept of Similarity Information Quantity, and proves that the most salient combination methods are all estimations of Similarity Information Quantity under different statistical assumptions that simplify the computation. We also prove a Minimal Voting Performance theorem stating that, under certain plausible conditions, estimations of Information Quantity should at least match the performance of the best measure in the set. This explains, at least partially, why unsupervised combination methods perform robustly. Our empirical analysis compares a wide range of unsupervised combination methods in six different Information Access tasks based on textual similarity: Document Retrieval and Clustering, Textual Entailment, Semantic Textual Similarity, and the automatic evaluation of Machine Translation and Summarization systems. Empirical results on all datasets corroborate the result of the formal analysis and help establishing recommendations on which combining method to use depending on nature of the set of measures to be combined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Explicit proofs are avoided due to lack of space.
2.
Note that a zero value avoids the effect of the rest of measures in geometric and harmonic means, and maximum and minimum only consider at the end one of the combined measures.
3.
http://www.nist.gov/speech/tests/mt.
4.
http://www.lsi.upc.edu/~nlp/Asiya.
5.
http://duc.nist.gov/.

References

Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A.: Semeval-2012 task 6: A pilot on semantic textual similarity. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics (SemEval 2012), pp. 385–393. Association for Computational Linguistics, Montréal, Canada, 7–8 June 2012
Google Scholar
Akiba, Y., Imamura, K., Sumita, E.: Using multiple edit distances to automatically rank machine translation output. In: Proceedings of Machine Translation Summit VIII, pp. 15–20 (2001)
Google Scholar
Albrecht, J., Hwa, R.: The role of pseudo references in MT evaluation. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 187–190 (2008)
Google Scholar
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: Combining evaluation metrics via the unanimous improvement ratio and its application to clustering tasks. J. Artif. Intell. Res. (JAIR) 42, 689–718 (2011)
MathSciNet MATH Google Scholar
Artiles, J., Amigó, E., Gonzalo, J.: The role of named entities in web people search. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, EMNLP 2009, pp. 534–542. Association for Computational Linguistics (2009)
Google Scholar
Artiles, J., Gonzalo, J., Sekine, S.: The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task. In: Proceedings of the 4th International Workshop on Semantic Evaluations, SemEval 2007, pp. 64–69. Association for Computational Linguistics, Stroudsburg (2007)
Google Scholar
Aslam, J.A., Savell, R.: On the effectiveness of evaluating retrieval systems in the absence of relevance judgments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 361–362. ACM, New York (2003)
Google Scholar
Bar-Haim, R., Dagan, I., Dolan, B., Ferro, L., Giampiccolo, D., Magnini, B., Szpektor, I.: The second PASCAL recognising textual entailment challenge. In: Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment (2006)
Google Scholar
Linguistic Data Consortium. Linguistic Data Annotation Specification: Assessment of Adequacy and Fluency in Translations. Revision 1.5. Technical report (2005)
Google Scholar
Cormack, G.V., Clarke, C.L.A., Büttcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods
Google Scholar
Corston-Oliver, S., Gamon, M., Brockett, C.: A machine learning approach to the automatic evaluation of machine translation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 140–147 (2001)
Google Scholar
Dang, H.T.: Overview of DUC 2005. In: Proceedings of the 2005 Document Understanding Workshop (2005)
Google Scholar
Dang, H.T.: Overview of DUC 2006. In: Proceedings of the 2006 Document Understanding Workshop (2006)
Google Scholar
de Borda, J.C.: Memoire sur les Elections au Scrutin. Histoire de l’Academie Royale des Sciences, Paris (1781)
Google Scholar
de Condorcet, M.: Essai Sur l’Application de l’Analyse Á la Probabilite des Decisions Rendues e la Pluralite des Voix (1785)
Google Scholar
Giménez, J., Màrquez, L.: Asiya: an open toolkit for automatic machine translation (meta-)evaluation. Prague Bull. Math. Linguist. 94, 77–86 (2010)
Article Google Scholar
Kaniovski, S., Zaigraev, A.: Optimal jury design for homogeneous juries with correlated votes. Theory Decis. 71(4), 439–459 (2011)
Article MathSciNet MATH Google Scholar
Kuncheva, L.I., Whitaker, C.J., et al.: Is independence good for combining classifiers? pp. 168–171 (2000)
Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)
Article MATH Google Scholar
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Moens, M.-F., Szpakowicz, S., (eds.), Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, pp. 74–81. Association for Computational Linguistics, Barcelona, Spain, July 2004
Google Scholar
Lin, D.: Dependency-based evaluation of MINIPAR. In: Proceedings of Workshop on the Evaluation of Parsing Systems, Granada (1998)
Google Scholar
Liu, D., Gildea, D.: Source-language features and maximum correlation training for machine translation evaluation. In: Proceedings of the 2007 Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 41–48 (2007)
Google Scholar
Montague, M.H., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, 4–9 November 2002, pp. 538–548. ACM (2002)
Google Scholar
Partridge, D., Krzanowski, W.: Software diversity: practical statistics for its measurement and exploitation. Inf. Softw. Technol. 39(10), 707–717 (1997)
Article Google Scholar
Sharkey, A.J. (ed.): Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, 1st edn. Springer-Verlag New York Inc., Secaucus (1999)
MATH Google Scholar
Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, pp. 66–73. ACM, New York (2001)
Google Scholar
Tversky, A.: Features of similarity. Psychol. Rev. 84, 327–352 (1977)
Article Google Scholar

Download references

Acknowledgments

This research was supported by the Spanish Ministry of Science and Innovation (VoxPopuli Project, TIN2013-47090-C3-1-P and Vemodalen, TIN2015-71785-R).

Author information

Authors and Affiliations

NLP & IR Group at UNED, Madrid, Spain
Enrique Amigó, Fernando Giner, Julio Gonzalo & Felisa Verdejo

Authors

Enrique Amigó
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Giner
View author publications
You can also search for this author in PubMed Google Scholar
Julio Gonzalo
View author publications
You can also search for this author in PubMed Google Scholar
Felisa Verdejo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enrique Amigó .

Editor information

Editors and Affiliations

University of Glasgow , Glasgow, United Kingdom
Joemon M Jose
TU Delft - EWI/ST/WIS , Delft, The Netherlands
Claudia Hauff
Middle East Technical University , Ankara, Turkey
Ismail Sengor Altıngovde
Open University , Milton Keynes, United Kingdom
Dawei Song
Signal Media , London, United Kingdom
Dyaa Albakour
Toronto, Canada
Stuart Watt
JohnTait.net Ltd. and BCS IRSG , Sunderland, United Kingdom
John Tait

A Appendix: Minimal Voting Performance Proof

Given two similarity instances, $x, y \in \varOmega ^2$, we will denote an increase in signal f, the information quantity $\mathcal{I}_\mathcal{F}(x)$ or the true similarity sim(x) by $\varDelta \mathcal{I}_\mathcal{F}\equiv \mathcal{I}_\mathcal{F}(x)>\mathcal{I}_\mathcal{F}(y) $ and $\varDelta f\equiv f(x)>f(y)$ and $\varDelta sim\equiv sim(x)>sim(y)$. Similarly, decreases will be denoted by $\nabla f$. Therefore, the optimality theorem can be expressed as $P(\varDelta \mathcal{I}_\mathcal{F}| \varDelta sim) \ge P(\varDelta f|\varDelta sim), \forall f \in \mathcal{F}$. Assuming high granularity, we have $P(\varDelta f)=P(\varDelta \mathcal{I}_\mathcal{F})=P(\varDelta sim)=\frac{1}{2}$. Therefore $P(\varDelta f| \varDelta sim)=\frac{P(\varDelta f|\varDelta sim) \cdot P(\varDelta sim)}{P(\varDelta f)}=P(\varDelta f|\varDelta sim)$. This is valid for any other conditional probability. Therefore, the optimality theorem can be rewritten as:

$$\begin{aligned}&P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F})\ge P(\varDelta sim|\varDelta f)\equiv P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \varDelta f) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\varDelta f)+P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\ge \\&~~~~~~P(\varDelta sim|\varDelta f, \varDelta \mathcal{I}_\mathcal{F}) \cdot P(\varDelta f|\varDelta \mathcal{I}_\mathcal{F})+P(\varDelta sim|\varDelta f, \nabla \mathcal{I}_\mathcal{F}) \cdot P(\nabla f|\varDelta \mathcal{I}_\mathcal{F})\equiv \\&P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \varDelta f) \cdot (P(\varDelta \mathcal{I}_\mathcal{F}|\varDelta f)-P(\varDelta f|\varDelta \mathcal{I}_\mathcal{F}))+\\&~~~~~~P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)-P(\varDelta sim|\varDelta f, \nabla \mathcal{I}_\mathcal{F})\cdot P(\nabla f|\varDelta \mathcal{I}_\mathcal{F})\ge 0 \end{aligned}$$

Assuming high granularity, we have that $P(\varDelta \mathcal{I}_\mathcal{F}|\varDelta f)-P(\varDelta f|\varDelta \mathcal{I}_\mathcal{F})=0$ and the previous expression is equivalent to:

$$P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)-P(\varDelta sim|\varDelta f, \nabla \mathcal{I}_\mathcal{F}) \cdot P(\nabla f|\varDelta \mathcal{I}_\mathcal{F})\ge 0 $$

On the other hand $P(\varDelta sim|\varDelta f, \nabla \mathcal{I}_\mathcal{F})=1-P(\nabla sim|\varDelta f\nabla \mathcal{I}_\mathcal{F})= 1-P(\varDelta sim|\nabla f, \varDelta \mathcal{I}_\mathcal{F})$. And assuming granularity $P(\nabla f|\varDelta \mathcal{I}_\mathcal{F})=P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)$. Therefore, we need to prove that:

$$P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}\nabla f) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)-(1-P(\varDelta sim|\nabla f, \varDelta \mathcal{I}_\mathcal{F})) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\ge 0 \equiv $$

$$P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f) \cdot (P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)+P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f))-P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\ge 0 \equiv $$

$$P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f)\cdot 2 \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)-P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\ge 0 \equiv $$

$$(2 \cdot P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}\nabla f)-1) \cdot P(\varDelta \mathcal{I}_\mathcal{F}|\nabla f)\ge 0 \equiv $$

$$2 \cdot P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f)-1\ge 0 \equiv P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f)\ge \frac{1}{2}$$

Then, we have to prove that $P\big (\varDelta sim(x) \ | \ \varDelta \mathcal{I}_\mathcal{F}, \nabla f \big ) \ge \frac{1}{2}$. Assuming SIH, we have that:

$$P\left( sim(x)>th \ | \ f^1(x), \ldots , f^n(x)\right) \simeq \mathcal{I}_\mathcal{F}(x)=\mathcal{I}\left( \mathcal{A}^\mathcal{F}_{\{f^1(x)..f^n(x)\}}\right) .$$

Therefore, when $\mathcal{I}_\mathcal{F}(x)>\mathcal{I}_\mathcal{F}(y)$, we can infer that:

$$P\left( sim(x)>th \ | \ f^1(x), \ldots , f^n(x)\right)>P\left( sim(y)>th \ | \ f^1(y), \ldots , f^n(y)\right) $$

It is true for every th values, so we can infer that:

$$P\left( sim(x)> sim(y)\ | \ f^1(x), \ldots , f^n(x), \ f^1(y), \ldots , f^n(y),\mathcal{I}_\mathcal{F}(x)>\mathcal{I}_\mathcal{F}(y)\right) \ge \frac{1}{2}.$$

It is true even when a single measure decreases $f^i(x)<f^i(y)$, so we can derive that $P(\varDelta sim|\varDelta \mathcal{I}_\mathcal{F}, \nabla f)\ge \frac{1}{2}.$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amigó, E., Giner, F., Gonzalo, J., Verdejo, F. (2017). A Formal and Empirical Study of Unsupervised Signal Combination for Textual Similarity Tasks. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-56608-5_29
Published: 08 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56607-8
Online ISBN: 978-3-319-56608-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Formal and Empirical Study of Unsupervised Signal Combination for Textual Similarity Tasks

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix: Minimal Voting Performance Proof

A Appendix: Minimal Voting Performance Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation