skip to main content
research-article

A Multi-Label Multi-View Learning Framework for In-App Service Usage Analysis

Published: 30 January 2018 Publication History

Abstract

The service usage analysis, aiming at identifying customers’ messaging behaviors based on encrypted App traffic flows, has become a challenging and emergent task for service providers. Prior literature usually starts from segmenting a traffic sequence into single-usage subsequences, and then classify the subsequences into different usage types. However, they could suffer from inaccurate traffic segmentations and mixed-usage subsequences. To address this challenge, we exploit a multi-label multi-view learning strategy and develop an enhanced framework for in-App usage analytics. Specifically, we first devise an enhanced traffic segmentation method to reduce mixed-usage subsequences. Besides, we develop a multi-label multi-view logistic classification method, which comprises two alignments. The first alignment is to make use of the classification consistency between packet-length view and time-delay view of traffic subsequences and improve classification accuracy. The second alignment is to combine the classification of single-usage subsequence and the post-classification of mixed-usage subsequences into a unified multi-label logistic classification problem. Finally, we present extensive experiments with real-world datasets to demonstrate the effectiveness of our approach. We find that the proposed multi-label multi-view framework can help overcome the pain of mixed-usage subsequences and can be generalized to latent activity analysis in sequential data, beyond in-App usage analytics.

References

[1]
Janos Abonyi, Balazs Feil, Sandor Nemeth, and Peter Arva. 2003. Fuzzy clustering based segmentation of time-series. In Advances in Intelligent Data Analysis V. Springer, 275--285.
[2]
Riyad Alshammari and A. Nur Zincir-Heywood. 2009. Machine learning based encrypted traffic classification: Identifying SSH and Skype. In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA’09). IEEE, 1--8.
[3]
B. R. Bakshi and G. Stephanopoulos. 1994. Representation of process trends-IV. Induction of real-time patterns from operating data for diagnosis and supervisory control. Computers 8 Chemical Engineering 18, 4 (1994), 303--332.
[4]
Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM, 92--100.
[5]
Tony F. Chan, Gene Howard Golub, and Randall J. LeVeque. 1982. Updating formulae and a pairwise algorithm for computing sample variances. In COMPSTAT 1982 5th Symposium Held at Toulouse 1982. Springer, 30--41.
[6]
Manuel Crotti, Maurizio Dusi, Francesco Gringoli, and Luca Salgarelli. 2007. Traffic classification through simple statistical fingerprinting. ACM SIGCOMM Computer Communication Review 37, 1 (2007), 5--16.
[7]
Hossein Falaki, Ratul Mahajan, Srikanth Kandula, Dimitrios Lymberopoulos, Ramesh Govindan, and Deborah Estrin. 2010. Diversity in smartphone usage. In Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services. ACM, 179--194.
[8]
Yanjie Fu, Junming Liu, Xiaolin Li, Xinjiang Lu, Jingci Ming, Chu Guan, and Hui Xiong. 2016a. Service usage analysis in mobile messaging apps: A multi-label multi-view perspective. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 877--882.
[9]
Y. Fu, H. Xiong, X. Lu, J. Yang, and C. Chen. 2016b. Service usage classification with encrypted Internet traffic in mobile messaging apps. IEEE Transactions on Mobile Computing 15, 11 (Nov 2016), 2851--2864.
[10]
Anindya Ghose and Sang Pil Han. 2011. An empirical analysis of user content generation and usage behavior on the mobile Internet. Management Science 57, 9 (2011), 1671--1691.
[11]
Patrick Haffner, Subhabrata Sen, Oliver Spatscheck, and Dongmei Wang. 2005. ACAS: Automated construction of application signatures. In Proceedings of the 2005 ACM SIGCOMM Workshop on Mining Network Data. ACM, 197--202.
[12]
Johan Himberg, Kalle Korpiaho, Heikki Mannila, Johanna Tikanmaki, and Hannu T. T. Toivonen. 2001. Time series segmentation for context recognition in mobile devices. In Proceedings IEEE International Conference on Data Mining (ICDM). IEEE, 203--210.
[13]
Marios Iliofotou, Prashanth Pappu, Michalis Faloutsos, Michael Mitzenmacher, Sumeet Singh, and George Varghese. 2007. Network monitoring using traffic dispersion graphs (tdgs). In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. ACM, 315--320.
[14]
Shuen-Lin Jeng and Ya-Ti Huang. 2007. Time series classification based on spectral analysis. Communications in Statistics-Simulation and Computation 37, 1 (2007), 132--142.
[15]
Thomas Karagiannis, Andre Broido, Michalis Faloutsos, and others. 2004. Transport layer identification of P2P traffic. In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement. ACM, 121--134.
[16]
Thomas Karagiannis, Konstantina Papagiannaki, and Michalis Faloutsos. 2005. BLINC: Multilevel traffic classification in the dark. ACM SIGCOMM Computer Communication Review 35, 4 (2005), 229--240.
[17]
Eamonn J. Keogh and Michael J. Pazzani. 1998. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In KDD, Vol. 98. 239--243.
[18]
Hyunchul Kim, Kimberly C. Claffy, Marina Fomenkov, Dhiman Barman, Michalis Faloutsos, and KiYoung Lee. 2008. Internet traffic classification demystified: Myths, caveats, and the best practices. In Proceedings of the 2008 ACM CoNEXT Conference. ACM, 11.
[19]
Chung-Sheng Li, Philip S. Yu, and Vittorio Castelli. 1998. MALM: A framework for mining sequence database at multiple abstraction levels. In Proceedings of the 7th International Conference on Information and Knowledge Management. ACM, 267--272.
[20]
Junming Liu, Yanjie Fu, Jingci Ming, Yong Ren, Leilei Sun, and Hui Xiong. 2017. Effective and real-time in-app activity analysis in encrypted Internet traffic streams. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 335--344.
[21]
Ye Liu, Yu Zheng, Yuxuan Liang, Shuming Liu, and David S. Rosenblum. 2016. Urban water quality prediction based on multi-task multi-view learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 2576--2582. http://dl.acm.org/citation.cfm?id=3060832.3060981
[22]
Luca Nanetti, Leonardo Cerliani, Valeria Gazzola, Remco Renken, and Christian Keysers. 2009. Group analyses of connectivity-based cortical parcellation using repeated k-means clustering. Neuroimage 47, 4 (2009), 1666--1677.
[23]
Kamal Nigam and Rayid Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Proceedings of the 9th International Conference on Information and Knowledge Management. ACM, 86--93.
[24]
Themis Palpanas, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos. 2008. Streaming time series summarization using user-defined amnesic functions. IEEE Transactions on Knowledge and Data Engineering 20, 7 (2008), 992--1006.
[25]
Vern Paxson. 1994. Empirically derived analytic models of wide-area TCP connections. IEEE/ACM Transactions on Networking (TON) 2, 4 (1994), 316--336.
[26]
Feng Qian, Zhaoguang Wang, Alexandre Gerber, Zhuoqing Mao, Subhabrata Sen, and Oliver Spatscheck. 2011. Profiling resource usage for mobile applications: A cross-layer approach. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services. ACM, 321--334.
[27]
Subhabrata Sen, Oliver Spatscheck, and Dongmei Wang. 2004. Accurate, scalable in-network identification of p2p traffic using application signatures. In Proceedings of the 13th International Conference on World Wide Web. ACM, 512--521.
[28]
Hagit Shatkay and Stanley B. Zdonik. 1996. Approximate queries and representations for large data sequences. In Proceedings of the 12th International Conference on Data Engineering. IEEE, 536--545.
[29]
Chuan Shi, Xiangnan Kong, Di Fu, Philip S. Yu, and Bin Wu. 2014. Multi-label classification based on multi-objective optimization. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 2 (2014), 35.
[30]
Vikas Sindhwani, Partha Niyogi, and Mikhail Belkin. 2005. A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of ICML Workshop on Learning with Multiple Views. Citeseer, 74--79.
[31]
Vikas Sindhwani and David S. Rosenberg. 2008. An RKHS for multi-view learning and manifold co-regularization. In Proceedings of the 25th International Conference on Machine Learning. ACM, 976--983.
[32]
Alok Tongaonkar, Shuaifu Dai, Antonio Nucci, and Dawn Song. 2013. Understanding mobile app usage patterns using in-app advertisements. In Passive and Active Measurement. Springer, 63--72.
[33]
Qingyao Wu, Mingkui Tan, Hengjie Song, Jian Chen, and Michael K. Ng. 2016. ML-FOREST: A multi-label tree ensemble method for multi-label classification. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2665--2680.
[34]
Qingyao Wu, Yunming Ye, Haijun Zhang, Tommy W. S. Chow, and Shen-Shyang Ho. 2015. ML-TREE: A tree-structure-based approach to multilabel learning. IEEE Transactions on Neural Networks and Learning Systems 26, 3 (2015), 430--443.
[35]
Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013).
[36]
Qiang Xu, Jeffrey Erman, Alexandre Gerber, Zhuoqing Mao, Jeffrey Pang, and Shobha Venkataraman. 2011. Identifying diverse usage behaviors of smartphone apps. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. ACM, 329--344.
[37]
Shipeng Yu, Balaji Krishnapuram, Rómer Rosales, and R. Bharat Rao. 2011. Bayesian co-training. The Journal of Machine Learning Research 12 (2011), 2649--2680.
[38]
Sebastian Zander, Thuy Nguyen, and Grenville Armitage. 2005. Self-learning IP traffic classification based on statistical flow characteristics. In Passive and Active Network Measurement. Springer, 325--328.
[39]
Deming Zhai, Hong Chang, Shiguang Shan, Xilin Chen, and Wen Gao. 2012. Multiview metric learning with global consistency and local smoothness. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 3 (2012), 53.
[40]
Zhi-Hua Zhou and Ming Li. 2005. Semi-supervised regression with co-training. In IJCAI, Vol. 5. 908--913.
[41]
Ji Zhu, Hui Zou, Saharon Rosset, Trevor Hastie, and others. 2009. Multi-class adaboost. Statistics and its Interface 2, 3 (2009), 349--360.

Cited By

View all
  • (2025)MagSpy: Revealing User Privacy Leakage via Magnetometer on Mobile DevicesIEEE Transactions on Mobile Computing10.1109/TMC.2024.349550624:3(2455-2469)Online publication date: Mar-2025
  • (2025)A graph representation framework for encrypted network traffic classificationComputers & Security10.1016/j.cose.2024.104134148(104134)Online publication date: Jan-2025
  • (2024)CapsuleFormer: A Capsule and Transformer combined model for Decentralized Application encrypted traffic classificationProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3637664(1418-1429)Online publication date: 1-Jul-2024
  • Show More Cited By

Index Terms

  1. A Multi-Label Multi-View Learning Framework for In-App Service Usage Analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 9, Issue 4
    Research Survey and Regular Papers
    July 2018
    280 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/3183892
    • Editor:
    • Yu Zheng
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 January 2018
    Accepted: 01 October 2017
    Revised: 01 September 2017
    Received: 01 June 2017
    Published in TIST Volume 9, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. In-App analytics
    2. Internet traffic
    3. multi-label
    4. multi-view
    5. service usage

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Science Foundation of China (NSFS)
    • University of Missouri Research Board (UMRB)
    • Philosophy and Social Science Foundation of the Higher Education Institutions of Jiangsu Province, China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)MagSpy: Revealing User Privacy Leakage via Magnetometer on Mobile DevicesIEEE Transactions on Mobile Computing10.1109/TMC.2024.349550624:3(2455-2469)Online publication date: Mar-2025
    • (2025)A graph representation framework for encrypted network traffic classificationComputers & Security10.1016/j.cose.2024.104134148(104134)Online publication date: Jan-2025
    • (2024)CapsuleFormer: A Capsule and Transformer combined model for Decentralized Application encrypted traffic classificationProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3637664(1418-1429)Online publication date: 1-Jul-2024
    • (2024)Detection and utilization of new-type encrypted network traffic in distributed scenariosEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107196127(107196)Online publication date: Jan-2024
    • (2023)TFE-GNN: A Temporal Fusion Encoder Using Graph Neural Networks for Fine-grained Encrypted Traffic ClassificationProceedings of the ACM Web Conference 202310.1145/3543507.3583227(2066-2075)Online publication date: 30-Apr-2023
    • (2023)BehavSniffer: Sniff User Behaviors from the Encrypted Traffic by Traffic Burst Graphs2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)10.1109/SECON58729.2023.10287511(456-464)Online publication date: 11-Sep-2023
    • (2023)Identifying Fine-Grained Douyin User Behaviors via Analyzing Encrypted Network Traffic2023 19th International Conference on Mobility, Sensing and Networking (MSN)10.1109/MSN60784.2023.00128(868-875)Online publication date: 14-Dec-2023
    • (2022)A Concise Yet Effective Model for Non-Aligned Incomplete Multi-View and Missing Multi-Label LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.308689544:10_Part_1(5918-5932)Online publication date: 1-Oct-2022
    • (2022)How and when to stop the co-training processExpert Systems with Applications10.1016/j.eswa.2021.115841187(115841)Online publication date: Jan-2022
    • (2021)MagThief: Stealing Private App Usage Data on Mobile Devices via Built-in Magnetometer2021 18th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)10.1109/SECON52354.2021.9491601(1-9)Online publication date: 6-Jul-2021
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media