Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection

Mavroeidis, Dimitrios; Bingham, Ella

doi:10.1007/s10115-009-0215-1

Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection

Regular Paper
Published: 03 June 2009

Volume 23, pages 243–265, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Dimitrios Mavroeidis¹ &
Ella Bingham²

112 Accesses
Explore all metrics

Abstract

Several studies have demonstrated the prospects of spectral ordering for data mining. One successful application is seriation of paleontological findings, i.e. ordering the sites of excavation, using data on mammal co-occurrences only. However, spectral ordering ignores the background knowledge that is naturally present in the domain: paleontologists can derive the ages of the sites within some accuracy. On the other hand, the age information is uncertain, so the best approach would be to combine the background knowledge with the information on mammal co-occurrences. Motivated by this kind of partial supervision we propose a novel semi-supervised spectral ordering algorithm that modifies the Laplacian matrix such that domain knowledge is taken into account. Also, it performs feature selection by discarding features that contribute most to the unwanted variability of the data in bootstrap sampling. Moreover, we demonstrate the effectiveness of the proposed framework on the seriation of Usenet newsgroup messages, where the task is to find out the underlying flow of discussion. The theoretical properties of our algorithm are thoroughly analyzed and it is demonstrated that the proposed framework enhances the stability of the spectral ordering output and induces computational gains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simple strategies for semi-supervised feature selection

Article Open access 17 July 2017

Active learning of constraints for weighted feature selection

Article 10 July 2020

Sky-signatures: detecting and characterizing recurrent behavior in sequential data

Article 29 August 2023

References

Achlioptas D (2004) Random matrices in data analysis. In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Proceedings of the 15th European conference on machine learning (ECML), number 3201 in Lecture notes in computer science. Springer, Heidelberg, pp 1–7
Atkins JE, Boman EG, Hendrickson B (1998) A spectral algorithm for seriation and the consecutive ones problem. SIAM J Comput 28(1): 297–310
Article MATH MathSciNet Google Scholar
Bach FR, Jordan MI (2006) Learning spectral clustering, with application to speech separation. J Mach Learn Res 7: 1963–2001
MathSciNet Google Scholar
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 30(1–7): 107–117
Google Scholar
Chen Y, Rege M, Dong M, Hua J (2008) Non-negative matrix factorization for semi-supervised data clustering. Knowl Inf Syst 17(3): 355–379
Article Google Scholar
Ding CHQ, He X (2004) Linearized cluster assignment via spectral ordering. In: Brodley CE (ed) Proceedings of the 21st international conference on machine learning (ICML)’, vol. 69 of ACM International Conference Proceeding Series. ACM, pp 233–240
Ding CHQ, He X, Zha H (2001) A spectral method to separate disconnected and nearly-disconnected web graph components. In: Proceedings of the 7th international conference on knowledge discovery and data mining (KDD), pp 275–280
Fortelius (coordinator) M (2007) Neogene of the Old World database of fossil mammals (NOW), University of Helsinki. http://www.helsinki.fi/science/now/
Fortelius M, Gionis A, Jernvall J, Mannila H (2006) Spectral ordering and biochronology of European fossil mammals. Paleobiology 32(2): 206–214
Article Google Scholar
Fortelius M, Werdelin L, Andrews P, Bernor RL, Gentry A, Humphrey L, Mittmann W, Viranta S (1996) Provinciality, diversity, turnover and paleoecology in land mammal faunas of the later Miocene of western Eurasia. In: Bernor R, Fahlbusch V, Mittmann W (eds) The Evolution of Western Eurasian Neogene Mammal Faunas. Columbia University Press, New York, pp 414–448
Google Scholar
George A, Pothen A (1997) An analysis of spectral envelope reduction via quadratic assignment problems. SIAM J Matrix Anal Appl 18(3): 706–732
Article MATH MathSciNet Google Scholar
Haveliwala T, Kamvar S (2003) The second eigenvalue of the Google matrix. Technical report, Stanford University. http://dbpubs.stanford.edu:8090/pub/2003-35
Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high- dimensional spaces. Knowl Inf Syst 12(1): 95–116
Article Google Scholar
Kamvar SD, Haveliwala TH, Manning CD, Golub GH (2003) Extrapolation methods for accelerating PageRank computations. In: Proceedings of the 12th international world wide web conference, pp 261–270
Li T (2008) Clustering based on matrix approximation: a unifying view. Knowl Inf Syst 17(1): 1–15
Article MATH Google Scholar
Mavroeidis D, Vazirgiannis M (2007) Stability based sparse LSI/PCA: Incorporating feature selection in LSI and PCA. In: Kok JN, Koronacki J, de Mántaras RL, Matwin S, Mladenic D, Skowron A (eds) Proceedings of the 18th European conference on machine learning (ECML). Lecture notes in computer science, vol 4701. Springer, Heidelberg, pp 226–237
Meilă M, Shortreed S, Xu L (2005) Regularized spectral learning. In: Cowell RG, Ghahramani Z (eds) Proceedings of the Tenth international workshop on artificial intelligence and statistics (AISTATS). Society for Artificial Intelligence and Statistics, pp 230–237
Mika S (2002) Kernel Fisher discriminants. Ph.D. thesis, University of Technology, Berlin
Puolamäki K, Fortelius M, Mannila H (2006) Seriation in paleontological data using Markov Chain Monte Carlo methods. PLoS Comput Biol 2(2): e6
Article Google Scholar
Stewart GW, Sun G-J (1990) Matrix perturbation theory. Academic Press, London
MATH Google Scholar
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4): 395–416
Article MathSciNet Google Scholar
von Luxburg U, Belkin M, Bousquet O (2008) Consistency of spectral clustering. Ann Stat 36(2): 555–586
Article MATH Google Scholar
Wilkinson JH (2004) The algebraic eigenvalue problem. Oxford University Press, New York
Google Scholar
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng AFM, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Athens University of Economics and Business, Athens, Greece
Dimitrios Mavroeidis
Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland
Ella Bingham

Authors

Dimitrios Mavroeidis
View author publications
You can also search for this author inPubMed Google Scholar
Ella Bingham
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Dimitrios Mavroeidis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mavroeidis, D., Bingham, E. Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection. Knowl Inf Syst 23, 243–265 (2010). https://doi.org/10.1007/s10115-009-0215-1

Download citation

Received: 14 January 2009
Revised: 28 March 2009
Accepted: 11 April 2009
Published: 03 June 2009
Issue Date: May 2010
DOI: https://doi.org/10.1007/s10115-009-0215-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Simple strategies for semi-supervised feature selection

Active learning of constraints for weighted feature selection

Sky-signatures: detecting and characterizing recurrent behavior in sequential data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now