research-article

A Multi-Label Multi-View Learning Framework for In-App Service Usage Analysis

Authors:

Hui XiongAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 9, Issue 4

Article No.: 40, Pages 1 - 24

https://doi.org/10.1145/3151937

Published: 30 January 2018 Publication History

Abstract

The service usage analysis, aiming at identifying customers’ messaging behaviors based on encrypted App traffic flows, has become a challenging and emergent task for service providers. Prior literature usually starts from segmenting a traffic sequence into single-usage subsequences, and then classify the subsequences into different usage types. However, they could suffer from inaccurate traffic segmentations and mixed-usage subsequences. To address this challenge, we exploit a multi-label multi-view learning strategy and develop an enhanced framework for in-App usage analytics. Specifically, we first devise an enhanced traffic segmentation method to reduce mixed-usage subsequences. Besides, we develop a multi-label multi-view logistic classification method, which comprises two alignments. The first alignment is to make use of the classification consistency between packet-length view and time-delay view of traffic subsequences and improve classification accuracy. The second alignment is to combine the classification of single-usage subsequence and the post-classification of mixed-usage subsequences into a unified multi-label logistic classification problem. Finally, we present extensive experiments with real-world datasets to demonstrate the effectiveness of our approach. We find that the proposed multi-label multi-view framework can help overcome the pain of mixed-usage subsequences and can be generalized to latent activity analysis in sequential data, beyond in-App usage analytics.

References

[1]

Janos Abonyi, Balazs Feil, Sandor Nemeth, and Peter Arva. 2003. Fuzzy clustering based segmentation of time-series. In Advances in Intelligent Data Analysis V. Springer, 275--285.

[2]

Riyad Alshammari and A. Nur Zincir-Heywood. 2009. Machine learning based encrypted traffic classification: Identifying SSH and Skype. In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA’09). IEEE, 1--8.

Digital Library

[3]

B. R. Bakshi and G. Stephanopoulos. 1994. Representation of process trends-IV. Induction of real-time patterns from operating data for diagnosis and supervisory control. Computers 8 Chemical Engineering 18, 4 (1994), 303--332.

[4]

Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM, 92--100.

Digital Library

[5]

Tony F. Chan, Gene Howard Golub, and Randall J. LeVeque. 1982. Updating formulae and a pairwise algorithm for computing sample variances. In COMPSTAT 1982 5th Symposium Held at Toulouse 1982. Springer, 30--41.

[6]

Manuel Crotti, Maurizio Dusi, Francesco Gringoli, and Luca Salgarelli. 2007. Traffic classification through simple statistical fingerprinting. ACM SIGCOMM Computer Communication Review 37, 1 (2007), 5--16.

Digital Library

[7]

Hossein Falaki, Ratul Mahajan, Srikanth Kandula, Dimitrios Lymberopoulos, Ramesh Govindan, and Deborah Estrin. 2010. Diversity in smartphone usage. In Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services. ACM, 179--194.

Digital Library

[8]

Yanjie Fu, Junming Liu, Xiaolin Li, Xinjiang Lu, Jingci Ming, Chu Guan, and Hui Xiong. 2016a. Service usage analysis in mobile messaging apps: A multi-label multi-view perspective. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 877--882.

[9]

Y. Fu, H. Xiong, X. Lu, J. Yang, and C. Chen. 2016b. Service usage classification with encrypted Internet traffic in mobile messaging apps. IEEE Transactions on Mobile Computing 15, 11 (Nov 2016), 2851--2864.

Digital Library

[10]

Anindya Ghose and Sang Pil Han. 2011. An empirical analysis of user content generation and usage behavior on the mobile Internet. Management Science 57, 9 (2011), 1671--1691.

Digital Library

[11]

Patrick Haffner, Subhabrata Sen, Oliver Spatscheck, and Dongmei Wang. 2005. ACAS: Automated construction of application signatures. In Proceedings of the 2005 ACM SIGCOMM Workshop on Mining Network Data. ACM, 197--202.

Digital Library

[12]

Johan Himberg, Kalle Korpiaho, Heikki Mannila, Johanna Tikanmaki, and Hannu T. T. Toivonen. 2001. Time series segmentation for context recognition in mobile devices. In Proceedings IEEE International Conference on Data Mining (ICDM). IEEE, 203--210.

Digital Library

[13]

Marios Iliofotou, Prashanth Pappu, Michalis Faloutsos, Michael Mitzenmacher, Sumeet Singh, and George Varghese. 2007. Network monitoring using traffic dispersion graphs (tdgs). In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. ACM, 315--320.

Digital Library

[14]

Shuen-Lin Jeng and Ya-Ti Huang. 2007. Time series classification based on spectral analysis. Communications in Statistics-Simulation and Computation 37, 1 (2007), 132--142.

[15]

Thomas Karagiannis, Andre Broido, Michalis Faloutsos, and others. 2004. Transport layer identification of P2P traffic. In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement. ACM, 121--134.

Digital Library

[16]

Thomas Karagiannis, Konstantina Papagiannaki, and Michalis Faloutsos. 2005. BLINC: Multilevel traffic classification in the dark. ACM SIGCOMM Computer Communication Review 35, 4 (2005), 229--240.

Digital Library

[17]

Eamonn J. Keogh and Michael J. Pazzani. 1998. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In KDD, Vol. 98. 239--243.

Digital Library

[18]

Hyunchul Kim, Kimberly C. Claffy, Marina Fomenkov, Dhiman Barman, Michalis Faloutsos, and KiYoung Lee. 2008. Internet traffic classification demystified: Myths, caveats, and the best practices. In Proceedings of the 2008 ACM CoNEXT Conference. ACM, 11.

Digital Library

[19]

Chung-Sheng Li, Philip S. Yu, and Vittorio Castelli. 1998. MALM: A framework for mining sequence database at multiple abstraction levels. In Proceedings of the 7th International Conference on Information and Knowledge Management. ACM, 267--272.

Digital Library

[20]

Junming Liu, Yanjie Fu, Jingci Ming, Yong Ren, Leilei Sun, and Hui Xiong. 2017. Effective and real-time in-app activity analysis in encrypted Internet traffic streams. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 335--344.

Digital Library

[21]

Ye Liu, Yu Zheng, Yuxuan Liang, Shuming Liu, and David S. Rosenblum. 2016. Urban water quality prediction based on multi-task multi-view learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 2576--2582. http://dl.acm.org/citation.cfm?id=3060832.3060981

Digital Library

[22]

Luca Nanetti, Leonardo Cerliani, Valeria Gazzola, Remco Renken, and Christian Keysers. 2009. Group analyses of connectivity-based cortical parcellation using repeated k-means clustering. Neuroimage 47, 4 (2009), 1666--1677.

[23]

Kamal Nigam and Rayid Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Proceedings of the 9th International Conference on Information and Knowledge Management. ACM, 86--93.

Digital Library

[24]

Themis Palpanas, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos. 2008. Streaming time series summarization using user-defined amnesic functions. IEEE Transactions on Knowledge and Data Engineering 20, 7 (2008), 992--1006.

Digital Library

[25]

Vern Paxson. 1994. Empirically derived analytic models of wide-area TCP connections. IEEE/ACM Transactions on Networking (TON) 2, 4 (1994), 316--336.

Digital Library

[26]

Feng Qian, Zhaoguang Wang, Alexandre Gerber, Zhuoqing Mao, Subhabrata Sen, and Oliver Spatscheck. 2011. Profiling resource usage for mobile applications: A cross-layer approach. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services. ACM, 321--334.

Digital Library

[27]

Subhabrata Sen, Oliver Spatscheck, and Dongmei Wang. 2004. Accurate, scalable in-network identification of p2p traffic using application signatures. In Proceedings of the 13th International Conference on World Wide Web. ACM, 512--521.

Digital Library

[28]

Hagit Shatkay and Stanley B. Zdonik. 1996. Approximate queries and representations for large data sequences. In Proceedings of the 12th International Conference on Data Engineering. IEEE, 536--545.

Digital Library

[29]

Chuan Shi, Xiangnan Kong, Di Fu, Philip S. Yu, and Bin Wu. 2014. Multi-label classification based on multi-objective optimization. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 2 (2014), 35.

Digital Library

[30]

Vikas Sindhwani, Partha Niyogi, and Mikhail Belkin. 2005. A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of ICML Workshop on Learning with Multiple Views. Citeseer, 74--79.

[31]

Vikas Sindhwani and David S. Rosenberg. 2008. An RKHS for multi-view learning and manifold co-regularization. In Proceedings of the 25th International Conference on Machine Learning. ACM, 976--983.

Digital Library

[32]

Alok Tongaonkar, Shuaifu Dai, Antonio Nucci, and Dawn Song. 2013. Understanding mobile app usage patterns using in-app advertisements. In Passive and Active Measurement. Springer, 63--72.

Digital Library

[33]

Qingyao Wu, Mingkui Tan, Hengjie Song, Jian Chen, and Michael K. Ng. 2016. ML-FOREST: A multi-label tree ensemble method for multi-label classification. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2665--2680.

Digital Library

[34]

Qingyao Wu, Yunming Ye, Haijun Zhang, Tommy W. S. Chow, and Shen-Shyang Ho. 2015. ML-TREE: A tree-structure-based approach to multilabel learning. IEEE Transactions on Neural Networks and Learning Systems 26, 3 (2015), 430--443.

[35]

Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013).

[36]

Qiang Xu, Jeffrey Erman, Alexandre Gerber, Zhuoqing Mao, Jeffrey Pang, and Shobha Venkataraman. 2011. Identifying diverse usage behaviors of smartphone apps. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. ACM, 329--344.

Digital Library

[37]

Shipeng Yu, Balaji Krishnapuram, Rómer Rosales, and R. Bharat Rao. 2011. Bayesian co-training. The Journal of Machine Learning Research 12 (2011), 2649--2680.

Digital Library

[38]

Sebastian Zander, Thuy Nguyen, and Grenville Armitage. 2005. Self-learning IP traffic classification based on statistical flow characteristics. In Passive and Active Network Measurement. Springer, 325--328.

Digital Library

[39]

Deming Zhai, Hong Chang, Shiguang Shan, Xilin Chen, and Wen Gao. 2012. Multiview metric learning with global consistency and local smoothness. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 3 (2012), 53.

Digital Library

[40]

Zhi-Hua Zhou and Ming Li. 2005. Semi-supervised regression with co-training. In IJCAI, Vol. 5. 908--913.

Digital Library

[41]

Ji Zhu, Hui Zou, Saharon Rosset, Trevor Hastie, and others. 2009. Multi-class adaboost. Statistics and its Interface 2, 3 (2009), 349--360.

Cited By

Fu YYang LPan HChen YXue GRen J(2025)MagSpy: Revealing User Privacy Leakage via Magnetometer on Mobile DevicesIEEE Transactions on Mobile Computing10.1109/TMC.2024.349550624:3(2455-2469)Online publication date: Mar-2025
https://doi.org/10.1109/TMC.2024.3495506
Okonkwo ZFoo EHou ZLi QJadidi Z(2025)A graph representation framework for encrypted network traffic classificationComputers & Security10.1016/j.cose.2024.104134148(104134)Online publication date: Jan-2025
https://doi.org/10.1016/j.cose.2024.104134
Zhou XXiao XLi QZhang BHu GLuo XZhang TQuek TGao DZhou JCardenas A(2024)CapsuleFormer: A Capsule and Transformer combined model for Decentralized Application encrypted traffic classificationProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3637664(1418-1429)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3637664
Show More Cited By

Index Terms

A Multi-Label Multi-View Learning Framework for In-App Service Usage Analysis
1. Information systems
  1. Information systems applications

Recommendations

Global and local multi-view multi-label learning
Abstract
In order to process multi-view multi-label data sets, we propose global and local multi-view multi-label learning (GLMVML). This method can exploit global and local label correlations of both the whole data set and each view ...
Discriminative metric learning for multi-view graph partitioning

We propose discriminative metric learning for multi-view graph partitioning.We envision the multi-view graph as an adaptive dynamic system.The intra-view connections and the inter-view couplings are interplayed.Extensive experiments have been conducted ...
Multi-view based multi-label propagation for image annotation

Multi-view learning and multi-label propagation are two common approaches to address the problem of image annotation. Traditional multi-view methods disregard the consistencies among different views while existing algorithms toward multi-label ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 9, Issue 4

Research Survey and Regular Papers

July 2018

280 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3183892

Editor:
Yu Zheng
Microsoft Research, China

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2018

Accepted: 01 October 2017

Revised: 01 September 2017

Received: 01 June 2017

Published in TIST Volume 9, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation of China (NSFS)
University of Missouri Research Board (UMRB)
Philosophy and Social Science Foundation of the Higher Education Institutions of Jiangsu Province, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
532
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fu YYang LPan HChen YXue GRen J(2025)MagSpy: Revealing User Privacy Leakage via Magnetometer on Mobile DevicesIEEE Transactions on Mobile Computing10.1109/TMC.2024.349550624:3(2455-2469)Online publication date: Mar-2025
https://doi.org/10.1109/TMC.2024.3495506
Okonkwo ZFoo EHou ZLi QJadidi Z(2025)A graph representation framework for encrypted network traffic classificationComputers & Security10.1016/j.cose.2024.104134148(104134)Online publication date: Jan-2025
https://doi.org/10.1016/j.cose.2024.104134
Zhou XXiao XLi QZhang BHu GLuo XZhang TQuek TGao DZhou JCardenas A(2024)CapsuleFormer: A Capsule and Transformer combined model for Decentralized Application encrypted traffic classificationProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3637664(1418-1429)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3637664
Zhang PChen FYue H(2024)Detection and utilization of new-type encrypted network traffic in distributed scenariosEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107196127(107196)Online publication date: Jan-2024
https://doi.org/10.1016/j.engappai.2023.107196
Zhang HYu LXiao XLi QMercaldo FLuo XLiu Q(2023)TFE-GNN: A Temporal Fusion Encoder Using Graph Neural Networks for Fine-grained Encrypted Traffic ClassificationProceedings of the ACM Web Conference 202310.1145/3543507.3583227(2066-2075)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583227
Wu TXiao XLi QLiu QHu GLuo XJiang Y(2023)BehavSniffer: Sniff User Behaviors from the Encrypted Traffic by Traffic Burst Graphs2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)10.1109/SECON58729.2023.10287511(456-464)Online publication date: 11-Sep-2023
https://doi.org/10.1109/SECON58729.2023.10287511
Shan YCheng GChen Z(2023)Identifying Fine-Grained Douyin User Behaviors via Analyzing Encrypted Network Traffic2023 19th International Conference on Mobility, Sensing and Networking (MSN)10.1109/MSN60784.2023.00128(868-875)Online publication date: 14-Dec-2023
https://doi.org/10.1109/MSN60784.2023.00128
Li XChen S(2022)A Concise Yet Effective Model for Non-Aligned Incomplete Multi-View and Missing Multi-Label LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.308689544:10_Part_1(5918-5932)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/TPAMI.2021.3086895
Grolman ECohen DFrenklach TShabtai APuzis R(2022)How and when to stop the co-training processExpert Systems with Applications10.1016/j.eswa.2021.115841187(115841)Online publication date: Jan-2022
https://doi.org/10.1016/j.eswa.2021.115841
Pan HYang LLi HYou CJi XChen YHu ZXue G(2021)MagThief: Stealing Private App Usage Data on Mobile Devices via Built-in Magnetometer2021 18th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)10.1109/SECON52354.2021.9491601(1-9)Online publication date: 6-Jul-2021
https://dl.acm.org/doi/10.1109/SECON52354.2021.9491601
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents