Elsevier

Knowledge-Based Systems

Volume 181, 1 October 2019, 104791
Knowledge-Based Systems

Adaptive resource prefetching with spatial–temporal and topic information for educational cloud storage systems

https://doi.org/10.1016/j.knosys.2019.05.034Get rights and content

Highlights

  • A novel topic model is proposed to estimate the resource popularity in terms of educational topics.

  • An adaptive prefetching inference approach is designed to predict popular educational resources.

  • An efficient prefetching management mechanism is introduced to support geo-distributed cloud storage systems.

Abstract

Prefetching proactively resources at datanodes within distribution networks plays a key role in improving the efficiency of data access for e-learning, which requires assistance from semantic knowledge on educational applications and resource popularity. To capture such information, we should exploit the characteristics of education end-users’ requests with high spatial–temporal locality. This paper aims to develop resource popularity modeling techniques for enhancing the performance of educational resource prefetching. Specifically, a novel topic model, built on an accelerated spectral clustering and an ontology concept similarity, is proposed to support resource access based on semantic features, including topic relevance and spatial–temporal locality. Using the proposed model, an adaptive prefetching inference approach is presented to associate possible popular resources in the future data requests. Also, an efficient prefetching management mechanism incorporating with replica techniques is suggested to design resource cloud storage systems for geo-distributed educational applications. Experiments over a simulation setting and a real-world case study with seven million users across China are carried out. Results demonstrate that the proposed method performs favorably compared to the state-of-the-art approaches.

Introduction

Online educational resources have been experiencing an explosive growth in recent years, while cloud storage systems based on geo-distributed datacenters, as a promising solution to manage such big volume of data, gain a widely use in educational applications in the era of cloud computing [1], [2], [3]. Meanwhile, network traffic of these applications also grows dramatically [4], and imposes a heavy burden on core networks, resulting in a long latency of resource access and poor experience for end-users [5]. It is an effective method to alleviate the traffic burden, reduce access latency, and provide a favorable QoS (Quality of Services) for users through proactive resource prefetching or caching at the datanodes bypass core networks [6], [7]. Due to the vast amount of educational resources delivered through core networks, we cannot expect to perform prefetching all the resources for local end-users by using few datanodes. Thus, it is interesting and necessary to develop some effective prefetching strategies to place educational resources with high-frequency requests at datanodes that reduce resource access latency for end-users. To achieve this goal, we need a method to characterize the resource popularity in this working domain. Most of the previous studies focus on the use of statistical features to model the resource popularity. For instance, some works assume that resource popularity follows some probability distribution, and adopt certain statistical or machine learning techniques, such as a cost-effective scheme [8], collaborative similarity [9], PDE (Partial Differential Equation) & ODE (Ordinary Differential Equation) analysis [10], a transfer learning-based approach [11], a linear learning model [12], to estimate popularity of resources from historical data. It is observed that these methods have not taken domain knowledge into consideration for resource popularity modeling. In the educational domain, we can take advantages of characteristics from learning behaviors, such as spatial–temporal patterns resulted from topic oriented online resource demands [13], [14]. Here, we refer to some statistical or semantic patterns of resource requests within a specific time slot and location as the spatial–temporal locality [15], [16]. Such spatial–temporal locality usually indicates regularity of the curriculum activities that users engaged in, commonly focusing on certain educational topics at some period and place [17], [18]. Thus, it is of great significance to develop an efficient prefetching mechanism incorporating with such valuable information for maximizing the efficiency of datanode storage within high-speed distribution or access networks. To practically leverage such information to estimate the popularity of resources, the key is to discovery these educational topics related to the corresponding locality. Thus, it is of great significance to develop an efficient prefetching mechanism incorporating with such valuable topics for maximizing the efficiency of datanode storage within high-speed distribution networks.

In general, the locality occurs while topic-oriented resources are accessed. Accordingly, it is appropriate that a topic model is built to uniformly describe the access locality and topics. Concretely, taking large-scale user queries or data request logs as the initial input, the model firstly leverages an accelerated spectral clustering algorithm to detect possible educational topics. Then an ontology similarity based algorithm is employed to improve the model in distinguishing real topics from candidate educational topics. With the proposed model, a relevant prefetching inference approach is presented to match possible popular resources. Furthermore, an adaptive prefetching management mechanism, including the selection of datanodes for prefetching, the calculation of window size for prefetching, and the operations for prefetching with different lifecycles of topics, is suggested to design cloud storage systems for geo-distributed educational applications. The proposed algorithms are both evaluated by simulation experiments and a real-world educational application named WorldUC 1 (World-wide City of Universities 2 ) with more than seven million online users across China. The results show that our algorithms promise a state-of-the-art performance and promote the efficiency of cloud storage systems for education applications greatly.

Our contributions are summarized as follows:

  • 1.

    By employing an accelerated spectral clustering and an ontology concept similarity, a novel topic model is proposed to estimate the resource popularity for exploring educational topics behind the spatial–temporal locality.

  • 2.

    With respect to the spatial–temporal locality, an adaptive light-weighted semantic prefetching inference approach is designed to predict popular educational resources.

  • 3.

    Through incorporating with replica techniques, a systematic and efficient prefetching management mechanism is introduced to support cloud storage systems for geo-distributed educational applications.

The remainder is organized as follows: Section 2 reviews related work, Section 3 formulates the problems in this paper, Section 4 describes the system architecture and solution overview, Section 5 details algorithms for prefetching management. Section 6 reports the results of experiments, followed by the conclusion in Section 7.

Section snippets

Related work

The optimization of data placement in geo-distributed data-intensive applications is an important solution to lower the implication (limited bandwidth or monetary cost) brought by the transmission network besides job scheduling [19]. Prefetching techniques incorporating with replica management can mitigate the implication in such application environments [20]. Prefetching techniques are commonly used in different levels of cyber-based systems [21], such as instruction prefetching [22],

Problem formulation

As discussed before, to employ the spatial–temporal locality to estimate the popularity of resources, it is vital to model educational topics for the resultant locality. Most of topic models gain a low explainability in their outputs and cannot be applied to trace educational topics that cause a spatial–temporal locality [37], [38]. Thus, we leverage ontology techniques to build a more explainable topic model to trace topics for the resultant locality. Before detailing the model, some notations

System architecture

As depicted in Fig. 1, the architecture of an educational cloud storage system in this paper mainly consists of two components: essential modules and prefetching management. Essential modules consist of User Interface, Data Request Controller, Data Storage Severs, and Replica Management. The proposed Prefetching Management (PM) in this paper is a system optimization module for cloud storage systems to promote the efficiency of data access, and the main parts of it are briefly described below.

  • 1.

Algorithms for prefetching management

In this section, we detail the algorithms to solve the problems mentioned in Section 3, including the construction of Eq. (2), the prefetching inference of Sˆ, the calculation of z and TD, and a prefetching management mechanism incorporating with lifecycles of topics.

Performance evaluation

This section reports the results from both simulation experiments and a real-world case study on an educational application named WorldUC [47].

Conclusion

It is important to develop an effective way to estimate the resource popularity for enhancing the performance of prefetching, but most of the previous research have not taken domain knowledge into account and resulted in an unsatisfactory performance on the resource popularity estimation. The paper explores the idea of modeling the resource popularity estimation by taking advantages from spatial–temporal locality in education domain and incorporates resource prefetching management with such

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61877020), the Science and Technology Projects of Guangdong Province, China (No. 2015A030401087, 2018B010109002), the Science and Technology Project of Guangzhou Municipality, China (No. 201904010393), and the Golden Seed Project of Challenge Cup of Extra-Curricular Academic Competition Works by South China Normal University, China (No. 19JXKC01).

References (56)

  • Rios-AlvaradoA.B. et al.

    Learning concept hierarchies from textual resources for ontologies construction

    Expert Syst. Appl.

    (2013)
  • JiaH. et al.

    Approximate normalized cuts without eigen-decomposition

    Inform. Sci.

    (2016)
  • LashkariF. et al.

    Efficient indexing for semantic search

    Expert Syst. Appl.

    (2017)
  • HuQ. et al.

    Learning peer recommendation using attention-driven CNN with interaction tripartite graph

    Inform. Sci.

    (2019)
  • JiaJ. et al.

    Bagging-based spectral clustering ensemble selection

    Pattern Recognit. Lett.

    (2011)
  • XuY. et al.

    Effective community division based on improved spectral clustering

    Neurocomputing

    (2018)
  • ZhouY. et al.

    Personalized learning full-path recommendation model based on LSTM neural networks

    Inform. Sci.

    (2018)
  • CamineroA.C. et al.

    VirTUal Remote laboratories management system (TUTORES): Using cloud computing to acquire university practical skills

    IEEE Trans. Learn. Technol.

    (2016)
  • BaldassarreM.T. et al.

    Cloud computing for education: A systematic mapping study

    IEEE Trans. Educ.

    (2018)
  • RenX. et al.

    Datum: Managing data purchasing and data placement in a geo-distributed data market

    IEEE/ACM Trans. Netw.

    (2018)
  • El MhoutiA. et al.

    Using cloud computing services in e-learning process: Benefits and challenges

    Educ. Inf. Technol.

    (2018)
  • AliW. et al.

    A survey of web caching and prefetching

    Int. J. Adv. Soft Comput. Appl.

    (2011)
  • TranT.X. et al.

    Cooperative hierarchical caching and request scheduling in a cloud radio access network

    IEEE Trans. Mob. Comput.

    (2018)
  • KimH.-C. et al.

    Performance impact of large file transfer on web proxy caching: A case study in a high bandwidth campus network environment

    J. Commun. Netw.

    (2010)
  • BhandarkarS.M. et al.

    Collaborative caching for efficient dissemination of personalized video streams in resource constrained environments

    Multimedia Syst.

    (2014)
  • XuY. et al.

    Flow-level QoE of video streaming in wireless networks

    IEEE Trans. Mob. Comput.

    (2016)
  • BharathB. et al.

    A learning-based approach to caching in heterogenous small cell networks

    IEEE Trans. Commun.

    (2016)
  • YangP. et al.

    Content popularity prediction towards location-aware mobile edge Caching

    IEEE Trans. Multimed.

    (2019)
  • Cited by (11)

    • Trustworthiness assessment for industrial IoT as multilayer networks with von Neumann entropy

      2021, Applied Soft Computing
      Citation Excerpt :

      The industrial process requests real-time data analytics and control instructions to be communicated over the Internet to production management services. In this context, fog and edge computing play an important role [7,8], which supports computation, storage and connectivity for various IIoT applications, collecting data from instruments and attached sensors, processing it locally, and sending selected data through to the cloud for production management [9]. In IIoT applications with a mission-critical and remote computing resource, the need for Edge-Fog-Cloud interplay [10] is even more important.

    • Topic sensitive hybrid expertise retrieval system in community question answering services

      2021, Knowledge-Based Systems
      Citation Excerpt :

      Topic models are powerful tools to construct knowledge based systems. Hence, it has been extensively used in CQA services [17–25] as well as in other applications [41,42]. During the development of a topic model, two assumptions are usually made.

    • Cross-modal recipe retrieval via parallel- and cross-attention networks learning

      2020, Knowledge-Based Systems
      Citation Excerpt :

      The main contributions of this work are summarized as follows: Two sub-directions in the knowledge-based field [8–10] are tightly related to our work, namely, cross-modal recipe retrieval and attention for information retrieval. Cross-modal retrieval [11] has become an emerging topic in the research community, which aims to retrieve relevant items that are of different characters with respect to the query format.

    • HDSM: A distributed data mining approach to classifying vertically distributed data streams

      2020, Knowledge-Based Systems
      Citation Excerpt :

      Given the proliferation of quality sensors at relatively low cost, a resurgence in distributed mining, particularly in mining heterogeneous data sources, is evident. Such applications that can benefit from distributed data stream mining techniques include analyzing data streams originating from many different mobile devices [1], addressing environmental issues by analyzing heterogeneous streams of traffic, wind, and weather data [2], topic modeling of resource requests for adaptive prefetching in distributed networks [3], and monitoring networks of streaming sensors in manufacturing plants for quality assurance [4]. This research has spawned a new set of approaches that feature distributed architectures, just as HDSM does.

    • Vulnerability Evaluation Method of Big Data Storage in Mobile Education Based on Bootstrap Framework

      2022, Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.05.034.

    View full text