Abstract
Data acquisition in Biology and Astronomy has seen unprecedented growth in volume since the turn of the century. It will not be an exaggeration to state that the needs of these two sciences are pushing computer science research to new frontiers. The focus of this paper is astronomy, which since inception of Virtual Observatory and commissioning of massive sky surveys is gasping for knowledge in data deluge.
Astrocomputing, which subsumes Astroinformatics, is a recent multi-disciplinary field of research with computer science and astronomy at the core. In this article we dwell upon the opportunities and challenges for machine learning and data mining research thrown open by this emerging discipline. We present a case study of an ongoing work on exploratory analysis of unclassified light curves. Though scientific analysis and interpretation of the results of the study are pending, the exercise demonstrates the merit of customized exploratory approach for study. The approach is general and can be applied to light curves obtained from any survey. Owing to the gargantuan scale of astronomy data processing requirements, we discuss scalability of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
http://www.sciops.esa.int/index.php?project=ASTROF&page=index
http://avyakta.caltech.edu/science/datasets/SAMSI_DC/index.html
Mahabal, A., Djorgovski, S., Drake, A., Donalek, C., et al.: Discovery, classification, and scientific exploration of transient events from the Catalina Real-time Transient Survey, arXiv:1111.0313v1
Aggarwal, C.C. (ed.): Data Streams - Models and Algorithms. Advances in Database Systems, vol. 31. Springer (2007)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998)
Szalay, A.S., Kunszt, P., Thakar, A., Gray, J., Slutz, D.: The Sloan Digital Sky Survey and its Archive, arXiv:astro-ph/9912382v1
Ball, N.M., Brunner, R.J.: Data mining and machine learning in astronomy. International Journal of Modern Physics D 19(07), 1049–1106 (2010)
Bhaduri, K., Das, K., Borne, K.D., Giannella, C., Mahule, T., Kargupta, H.: Scalable, asynchronous, distributed eigen monitoring of astronomy data streams. Statistical Analysis and Data Mining 4(3), 336–352 (2011)
Bhatnagar, V., Dobariyal, R., Jain, P., Mahabal, A.: Data understanding using semi-supervised clustering. In: CIDU, pp. 118–123. IEEE (2012)
Bhatnagar, V., Kaur, S., Chakravarthy, S.: Clustering data streams using grid-based synopsis. Knowledge and Information Systems, 1–26 (2013)
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proc. VLDB Endow. 5(12) (August 2012)
de Andrade Silva, J., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C.P.L.F., Gama, J.: Data stream clustering: A survey. ACM Comput. Surv. 46(1), 13 (2013)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation, pp. 137–150 (2004)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM, New York (2000)
Dutta, H., Giannella, C., Borne, K.D., Kargupta, H.: Distributed top-k outlier detection from astronomy catalogs using the demac system. In: SDM (2007)
The Apache Software Foundation. Welcome to HadoopTM; Distributed File System (2007)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)
Heer, J., Kandel, S.: Interactive analysis of big data. XRDS 19(1), 50–54 (2012)
Heer, J., Shneiderman, B.: Interactive dynamics for visual analysis. Commun. ACM 55(4), 45–54 (2012)
Mambretti, M.B.J., DeFanti, T.: Starlight: Next-generation communication services, exchanges, and global facilities (chapter). Advances in Computer 80, 191–207 (2010)
Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for mapreduce. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 938–948. Society for Industrial and Applied Mathematics, Philadelphia (2010)
Keim, D.A.: Visual exploration of large data sets. Commun. ACM 44(8), 38–44 (2001)
Borne, K.D.: Astroinformatics: A 21st Century Approach to Astronomy, arXiv:0909.3892v1
Borne, K.D.: Scientific Data Mining in Astronomy, arXiv:0911.0505v1
Nigro, S.E.G.C., Oscar, H., Xodo, D.H.: Data Mining with Ontologies: Implementations, Findings, and Frameworks. IGI Global (2008)
Lupton, R., Gunn, J.E., Ivezic, Z., Knapp, G.R., Kent, S., Yasuda, N.: The SDSS Imaging Pipelines, arXiv:astro-ph/0101420v2
Kaur, S., Saxena, R., Khanna, D., Bhatnagar, V.: Comparing data processing frameworks for scalable clustering. To appear in Proceedings of FLAIRS 2014, to be held in (May 2016)
Simoff, S.J., Maher, M.L.: Ontology-based multimedia data mining for design information retrieval. In: Proceedings of ACSE Computing Congress, vol. 320, ACSE, Cambridge (1998)
Singh, S., Vajirkar, P., Lee, Y.: Context-based data mining using ontologies. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 405–418. Springer, Heidelberg (2003)
Thompson, D., Burke-Spolaor, S., Deller, A., Majid, W., Palaniswamy, D., Tingay, S., Wagstaff, K., Wayth, R.: Real time adaptive event detection in astronomical data streams: Lessons from the very long baseline array. IEEE Intelligent Systems 99, 1 (2013)
York, D.G., et al.: The Sloan Digital Sky Survey: Technical Summary. Astron. J. 120, 1579–1587 (2000)
Zudilova-Seinstra, E., Adriaansen, T., van Liere, R.: Trends in Interactive Visualization: State-of-the-Art Survey, 1st edn. Springer Publishing Company, Incorporated (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Mittal, A., Santra, A., Bhatnagar, V., Khanna, D. (2014). Exploratory Analysis of Light Curves: A Case-Study in Astronomy Data Understanding. In: Madaan, A., Kikuchi, S., Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2014. Lecture Notes in Computer Science, vol 8381. Springer, Cham. https://doi.org/10.1007/978-3-319-05693-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-05693-7_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05692-0
Online ISBN: 978-3-319-05693-7
eBook Packages: Computer ScienceComputer Science (R0)