skip to main content
10.1145/3126858.3126893acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
research-article

Transductive Event Classification through Heterogeneous Networks

Published: 17 October 2017 Publication History

Abstract

Events can be defined as "something that occurs at specific place and time associated with some specific actions". In general, events extracted from news articles and social networks are used to map the information from web to the various phenomena that occur in our physical world. One of the main steps to perform this relationship is the use of machine learning algorithms for event classification, which has received great attention in the web document engineering field in recent years. Traditional machine learning algorithms are based on vector space model representations and supervised classification. However, events are composed of multiple representations such as textual data, temporal information, geographic location and other types of metadata. All these representations are poorly represented together in a vector space model. Moreover, supervised classification requires the labeling of a significant sample of events to construct a training set for learning process, thereby hampering the practical application of event classification. In this paper, we propose a method called TECHN (Transductive Event Classification through Heterogeneous Networks), which considers event metadata as different objects in an heterogeneous network. Besides, the TECHN method has the ability to automatically learn which types of network objects (event metadata) are most efficient in the classification task. In addition, our TECHN method is based on a transductive classification that considers both labeled events and a vast amount of unlabeled events. The experimental results show that TECHN method obtains promising results, especially when we consider different weights of importance for each type of event metadata and a small set of labeled events.

References

[1]
Robert Ackland. 2013. Web social science: Concepts, data and tools for social scientists in the digital age. Sage.
[2]
Charu C. Aggarwal and ChengXiang Zhai (Eds.). 2012. Mining Text Data. Springer.
[3]
James Allan. 2002. Topic detection and tracking: event-based information organization. Vol. 12. Springer Science & Business Media.
[4]
M. Belkin, I. Matveeva, and P. Niyogi. 2004. Regularization and semi-supervised learning on large graphs. In Proceedings of the Conference on Learning Theory. 624--638.
[5]
Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. 2006. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. Journal of Machine Learning Research 7 (2006), 2399--2434.
[6]
Michele Berlingerio, Michele Coscia, Fosca Giannotti, Anna Monreale, and Dino Pedreschi. 2013. Multidimensional networks: foundations of structural analysis. World Wide Web 16, 5--6 (2013), 567--593.
[7]
Roi Blanco and Christina Lioma. 2012. Graph-based Term Weighting for Information Retrieval. Information Retrieval 15, 1 (2012), 54--92. https://doi.org/10. 1007/s10791-011--9172-x
[8]
Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In COLT'98: Conference on Computational Learning Theory. ACM, New York, NY, USA, 92--100.
[9]
Olivier Chapelle, Bernhard Schlkopf, and Alexander Zien. 2010. Semi-Supervised Learning (1st ed.). The MIT Press.
[10]
Jack G Conrad and Michael Bender. 2016. Semi-Supervised Events Clustering in News Retrieval. In Recent Trends in News Information Retrieval Workshop. 21--26.
[11]
Sanjoy Dasgupta, Michael Littman, and David McAllester. 2001. PAC generalization bounds for co-training. In NIPS'01: Neural Information Processing Systems.
[12]
Olivier Delalleau, Yoshua Bengio, and Nicolas Le Roux. 2005. Efficient nonparametric function induction in semi-supervised learning. In Proceedings of the International Workshop on Artificial Intelligence and Statistics. 96--103.
[13]
Jeremy Ginsberg, Matthew H Mohebbi, Rajan S Patel, Lynnette Brammer, Mark S Smolinski, and Larry Brilliant. 2009. Detecting influenza epidemics using search engine query data. Nature 457, 7232 (2009), 1012--1014.
[14]
Frederik Hogenboom, Flavius Frasincar, Uzay Kaymak, Franciska de Jong, and Emiel Caron. 2016. A Survey of event extraction methods from text for decision support systems. Decision Support Systems 85 (2016), 12--22.
[15]
Shintaro Horie, Keisuke Kiritoshi, and Qiang Ma. 2016. Abstract-Concrete Relationship Analysis of News Events Based on a 5W Representation Model. In International Conference on Database and Expert Systems Applications. Springer, 102--117.
[16]
Lei Hou, Juanzi Li, Zhichun Wang, Jie Tang, Peng Zhang, Ruibing Yang, and Qian Zheng. 2015. Newsminer: multifaceted news analysis for event search. Knowledge-Based Systems 76 (2015), 17--29.
[17]
Ming Ji, Yizhou Sun, Marina Danilevsky, Jiawei Han, and Jing Gao. 2010. Graph regularized transductive classification on heterogeneous information networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 570--586.
[18]
Ming Ji, Yizhou Sun, Marina Danilevsky, Jiawei Han, and Jing Gao. 2010. Graph regularized transductive classification on heterogeneous information networks. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. Springer-Verlag, 570--586. http://dl.acm.org/citation.cfm?id=1888258.1888302
[19]
Hyun Duk Kim, Danila Nikitin, ChengXiang Zhai, Malu Castellanos, and Meichun Hsu. 2013. Information retrieval with time series query. In Proceedings of the 2013 Conference on the Theory of Information Retrieval. ACM, 14.
[20]
Giridhar Kumaran and James Allan. 2004. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 297--304.
[21]
Gregor Leban, Blaz Fortuna, Janez Brank, and Marko Grobelnik. 2014. Event registry: learning about world events from news. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 107--110.
[22]
Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5, 1 (2012), 1--167.
[23]
Matteo Magnani and Luca Rossi. 2013. Pareto distance for multi-layer network analysis. In International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer, 249--256.
[24]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. An Introduction to Information Retrieval. Cambridge University Press. http://dl.acm. org/citation.cfm?id=1394399.
[25]
Gautam Mitra and Leela Mitra. 2011. The handbook of news analytics in finance. Vol. 596. John Wiley & Sons.
[26]
Christopher Phethean, Elena Simperl, Thanassis Tiropanis, Ramine Tinati, and Wendy Hall. 2016. The Role of Data Science in Web Science. IEEE Intelligent Systems 31, 3 (2016), 102--107.
[27]
Kira Radinsky. 2012. Learning to predict the future using Web knowledge and dynamics. In ACM SIGIR Forum, Vol. 46. ACM, 114--115.
[28]
Kira Radinsky and Eric Horvitz. 2013. Mining the web to predict future events. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 255--264.
[29]
Marko A Rodriguez and Joshua Shinavier. 2010. Exposing multi-relational networks to single-relational network analysis algorithms. Journal of Informetrics 4, 1 (2010), 29--41.
[30]
R.G. Rossi, R. M. Marcacini, and S. O. Rezende. 2013. Benchmarking Text Collections for Classification and Clustering Tasks. Technical Report 395. Institute of Mathematics and Computer Sciences - University of São Paulo. http://www. icmc.usp.br/CMS/Arquivos/arquivos_enviados/BIBLIOTECA_113_RT_395.pdf.
[31]
Rafael Geraldeli Rossi, Alneu de Andrade Lopes, Thiago de Paulo Faleiros, and Solange Oliveira Rezende. 2014. Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network. Journal of Computer Science and Technology 3, 29 (2014), 361--375.
[32]
Rafael Geraldeli Rossi, Alneu de Andrade Lopes, and Solange Oliveira Rezende. 2015. Optimization and Label Propagation in Bipartite Heterogeneous Networks to Improve Transductive Classification of Texts (IN PRESS). Information Processing & Management (2015).
[33]
Rafael Geraldeli Rossi, Alneu de Andrade Lopes, and Solange Oliveira Rezende. 2016. Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Information Processing & Management 52, 2 (2016), 217--257.
[34]
Rafael Geraldeli Rossi, Solange Oliveira Rezende, and Alneu de Andrade Lopes. 2015. Term Network Approach for Transductive Classification. In International Conference on Intelligent Text Processing and Computational Linguistics. 497--515. https://doi.org/10.1007/978--3--319--18117--2_37
[35]
Fabrizio Sebastiani. 2002. Machine Learning in Automated Text Categorization. Comput. Surveys 34, 1 (2002), 1--47. https://doi.org/10.1145/505282.505283
[36]
Hristo Tanev, Jakub Piskorski, and Martin Atkinson. 2008. Real-time news event extraction for global crisis monitoring. In International Conference on Application of Natural Language to Information Systems. Springer, 207--218.
[37]
Lei Tang, Huan Liu, Jianping Zhang, and Zohreh Nazeri. 2008. Community evolution in dynamic multi-mode networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 677--685.
[38]
Jeroen BP Vuurens, Arjen P de Vries, Roi Blanco, and Peter Mika. 2015. Online news tracking for ad-hoc information needs. In International Conference on The Theory of Information Retrieval. ACM, 221--230.
[39]
FeiWang and Changshui Zhang. 2006. Label Propagation Through Linear Neighborhoods. In Proceedings of the International Conference on Machine Learning. ACM, 985--992. https://doi.org/10.1145/1143844.1143968
[40]
Jidong Wang, Huajun Zeng, Zheng Chen, Hongjun Lu, Li Tao, and Wei-Ying Ma. 2003. ReCoM: reinforcement clustering of multi-type interrelated data objects. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 274--281.
[41]
Yiming Yang, Tom Pierce, and Jaime Carbonell. 1998. A study of retrospective and on-line event detection. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 28--36.
[42]
Dengyong Zhou, Olivier Bousquet, Thomas N. Lal, Jason Weston, and Bernhard Schölkopf. 2004. Learning with local and global consistency. In Advances in Neural Information Processing Systems, Vol. 16. 321--328. http://citeseerx.ist.psu. edu/viewdoc/summary?doi=10.1.1.115.3219
[43]
Xiaojin Zhu. 2005. Semi-Supervised Learning Literature Survey. Technical Report 1530. Computer Sciences, University of Wisconsin-Madison.
[44]
Xiaojin Zhu. 2005. Semi-supervised learning with graphs. Ph.D. Dissertation. Carnegie Mellon University.
[45]
Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. 2003. Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning. AAAI Press, 912--919.
[46]
Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani. 2003. Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, Vol. 3.

Cited By

View all
  • (2021)Embedding propagation over heterogeneous event networks for link prediction2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671645(4812-4821)Online publication date: 15-Dec-2021
  • (2019)A Sampling-Based Framework for Transductive Classification in Information Networks2019 8th Brazilian Conference on Intelligent Systems (BRACIS)10.1109/BRACIS.2019.00120(657-662)Online publication date: Oct-2019

Index Terms

  1. Transductive Event Classification through Heterogeneous Networks

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        WebMedia '17: Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web
        October 2017
        522 pages
        ISBN:9781450350969
        DOI:10.1145/3126858
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        • SBC: Brazilian Computer Society
        • CNPq: Conselho Nacional de Desenvolvimento Cientifico e Tecn
        • CGIBR: Comite Gestor da Internet no Brazil
        • CAPES: Brazilian Higher Education Funding Council

        In-Cooperation

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 17 October 2017

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. document engineering
        2. event classification
        3. web mining
        4. websensors

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        Webmedia '17
        Sponsor:
        • SBC
        • CNPq
        • CGIBR
        • CAPES
        Webmedia '17: Brazilian Symposium on Multimedia and the Web
        October 17 - 20, 2017
        RS, Gramado, Brazil

        Acceptance Rates

        WebMedia '17 Paper Acceptance Rate 38 of 138 submissions, 28%;
        Overall Acceptance Rate 270 of 873 submissions, 31%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 17 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)Embedding propagation over heterogeneous event networks for link prediction2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671645(4812-4821)Online publication date: 15-Dec-2021
        • (2019)A Sampling-Based Framework for Transductive Classification in Information Networks2019 8th Brazilian Conference on Intelligent Systems (BRACIS)10.1109/BRACIS.2019.00120(657-662)Online publication date: Oct-2019

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media