Adaptive resource prefetching with spatial–temporal and topic information for educational cloud storage systems

doi:10.1016/j.knosys.2019.05.034

Knowledge-Based Systems

Volume 181, 1 October 2019, 104791

https://doi.org/10.1016/j.knosys.2019.05.034 Get rights and content

Highlights

•
A novel topic model is proposed to estimate the resource popularity in terms of educational topics.
•
An adaptive prefetching inference approach is designed to predict popular educational resources.
•
An efficient prefetching management mechanism is introduced to support geo-distributed cloud storage systems.

Abstract

Prefetching proactively resources at datanodes within distribution networks plays a key role in improving the efficiency of data access for e-learning, which requires assistance from semantic knowledge on educational applications and resource popularity. To capture such information, we should exploit the characteristics of education end-users’ requests with high spatial–temporal locality. This paper aims to develop resource popularity modeling techniques for enhancing the performance of educational resource prefetching. Specifically, a novel topic model, built on an accelerated spectral clustering and an ontology concept similarity, is proposed to support resource access based on semantic features, including topic relevance and spatial–temporal locality. Using the proposed model, an adaptive prefetching inference approach is presented to associate possible popular resources in the future data requests. Also, an efficient prefetching management mechanism incorporating with replica techniques is suggested to design resource cloud storage systems for geo-distributed educational applications. Experiments over a simulation setting and a real-world case study with seven million users across China are carried out. Results demonstrate that the proposed method performs favorably compared to the state-of-the-art approaches.

Introduction

Online educational resources have been experiencing an explosive growth in recent years, while cloud storage systems based on geo-distributed datacenters, as a promising solution to manage such big volume of data, gain a widely use in educational applications in the era of cloud computing [1], [2], [3]. Meanwhile, network traffic of these applications also grows dramatically [4], and imposes a heavy burden on core networks, resulting in a long latency of resource access and poor experience for end-users [5]. It is an effective method to alleviate the traffic burden, reduce access latency, and provide a favorable QoS (Quality of Services) for users through proactive resource prefetching or caching at the datanodes bypass core networks [6], [7]. Due to the vast amount of educational resources delivered through core networks, we cannot expect to perform prefetching all the resources for local end-users by using few datanodes. Thus, it is interesting and necessary to develop some effective prefetching strategies to place educational resources with high-frequency requests at datanodes that reduce resource access latency for end-users. To achieve this goal, we need a method to characterize the resource popularity in this working domain. Most of the previous studies focus on the use of statistical features to model the resource popularity. For instance, some works assume that resource popularity follows some probability distribution, and adopt certain statistical or machine learning techniques, such as a cost-effective scheme [8], collaborative similarity [9], PDE (Partial Differential Equation) & ODE (Ordinary Differential Equation) analysis [10], a transfer learning-based approach [11], a linear learning model [12], to estimate popularity of resources from historical data. It is observed that these methods have not taken domain knowledge into consideration for resource popularity modeling. In the educational domain, we can take advantages of characteristics from learning behaviors, such as spatial–temporal patterns resulted from topic oriented online resource demands [13], [14]. Here, we refer to some statistical or semantic patterns of resource requests within a specific time slot and location as the spatial–temporal locality [15], [16]. Such spatial–temporal locality usually indicates regularity of the curriculum activities that users engaged in, commonly focusing on certain educational topics at some period and place [17], [18]. Thus, it is of great significance to develop an efficient prefetching mechanism incorporating with such valuable information for maximizing the efficiency of datanode storage within high-speed distribution or access networks. To practically leverage such information to estimate the popularity of resources, the key is to discovery these educational topics related to the corresponding locality. Thus, it is of great significance to develop an efficient prefetching mechanism incorporating with such valuable topics for maximizing the efficiency of datanode storage within high-speed distribution networks.

In general, the locality occurs while topic-oriented resources are accessed. Accordingly, it is appropriate that a topic model is built to uniformly describe the access locality and topics. Concretely, taking large-scale user queries or data request logs as the initial input, the model firstly leverages an accelerated spectral clustering algorithm to detect possible educational topics. Then an ontology similarity based algorithm is employed to improve the model in distinguishing real topics from candidate educational topics. With the proposed model, a relevant prefetching inference approach is presented to match possible popular resources. Furthermore, an adaptive prefetching management mechanism, including the selection of datanodes for prefetching, the calculation of window size for prefetching, and the operations for prefetching with different lifecycles of topics, is suggested to design cloud storage systems for geo-distributed educational applications. The proposed algorithms are both evaluated by simulation experiments and a real-world educational application named WorldUC ¹ (World-wide City of Universities ² ) with more than seven million online users across China. The results show that our algorithms promise a state-of-the-art performance and promote the efficiency of cloud storage systems for education applications greatly.

Our contributions are summarized as follows:

1.
By employing an accelerated spectral clustering and an ontology concept similarity, a novel topic model is proposed to estimate the resource popularity for exploring educational topics behind the spatial–temporal locality.
2.
With respect to the spatial–temporal locality, an adaptive light-weighted semantic prefetching inference approach is designed to predict popular educational resources.
3.
Through incorporating with replica techniques, a systematic and efficient prefetching management mechanism is introduced to support cloud storage systems for geo-distributed educational applications.

The remainder is organized as follows: Section 2 reviews related work, Section 3 formulates the problems in this paper, Section 4 describes the system architecture and solution overview, Section 5 details algorithms for prefetching management. Section 6 reports the results of experiments, followed by the conclusion in Section 7.

Section snippets

Related work

The optimization of data placement in geo-distributed data-intensive applications is an important solution to lower the implication (limited bandwidth or monetary cost) brought by the transmission network besides job scheduling [19]. Prefetching techniques incorporating with replica management can mitigate the implication in such application environments [20]. Prefetching techniques are commonly used in different levels of cyber-based systems [21], such as instruction prefetching [22],

Problem formulation

As discussed before, to employ the spatial–temporal locality to estimate the popularity of resources, it is vital to model educational topics for the resultant locality. Most of topic models gain a low explainability in their outputs and cannot be applied to trace educational topics that cause a spatial–temporal locality [37], [38]. Thus, we leverage ontology techniques to build a more explainable topic model to trace topics for the resultant locality. Before detailing the model, some notations

System architecture

As depicted in Fig. 1, the architecture of an educational cloud storage system in this paper mainly consists of two components: essential modules and prefetching management. Essential modules consist of User Interface, Data Request Controller, Data Storage Severs, and Replica Management. The proposed Prefetching Management (PM) in this paper is a system optimization module for cloud storage systems to promote the efficiency of data access, and the main parts of it are briefly described below.

Algorithms for prefetching management

In this section, we detail the algorithms to solve the problems mentioned in Section 3, including the construction of Eq. (2), the prefetching inference of $\hat{S}$ , the calculation of $z$ and TD, and a prefetching management mechanism incorporating with lifecycles of topics.

Performance evaluation

This section reports the results from both simulation experiments and a real-world case study on an educational application named WorldUC [47].

Conclusion

It is important to develop an effective way to estimate the resource popularity for enhancing the performance of prefetching, but most of the previous research have not taken domain knowledge into account and resulted in an unsatisfactory performance on the resource popularity estimation. The paper explores the idea of modeling the resource popularity estimation by taking advantages from spatial–temporal locality in education domain and incorporates resource prefetching management with such

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61877020), the Science and Technology Projects of Guangdong Province, China (No. 2015A030401087, 2018B010109002), the Science and Technology Project of Guangzhou Municipality, China (No. 201904010393), and the Golden Seed Project of Challenge Cup of Extra-Curricular Academic Competition Works by South China Normal University, China (No. 19JXKC01).

References (56)

González-MartínezJ.A. et al.
Cloud computing and education: A state-of-the-art survey
Comput. Educ.
(2015)
Muñoz-MerinoP.J. et al.
Precise effectiveness strategy for analyzing the effectiveness of students with educational resources and activities in MOOCs
Comput. Hum. Behav.
(2015)
LoJ.-J. et al.
Applying GIS to develop a web-based spatial-person-temporal history educational system
Comput. Educ.
(2009)
GrabczewskiK. et al.
Saving time and memory in computational intelligence system with machine unification and task spooling
Knowl.-Based Syst.
(2011)
LiX.S. et al.
A self-learning pattern adaptive prefetching method for big data applications
Sustain. Comput.: Inform. Syst.
(2018)
HuangY.F. et al.
Mining web logs to improve hit ratios of prefetching and caching
Knowl.-Based Syst.
(2008)
AliW. et al.
Intelligent Naïve Bayes-based approaches for Web proxy caching
Knowl.-Based Syst.
(2012)
ZhangN. et al.
Using grouped linear prediction and accelerated reinforcement learning for online content caching
ZhangW. et al.
Fast media caching for geo-distributed data centers
Comput. Commun.
(2018)
YehJ.-F. et al.
Topic detection and tracking for conversational content by using conceptual dynamic latent dirichlet allocation
Neurocomputing
(2016)

Rios-AlvaradoA.B. et al.

Learning concept hierarchies from textual resources for ontologies construction

Expert Syst. Appl.

(2013)

JiaH. et al.

Approximate normalized cuts without eigen-decomposition

Inform. Sci.

(2016)

LashkariF. et al.

Efficient indexing for semantic search

Expert Syst. Appl.

(2017)

HuQ. et al.

Learning peer recommendation using attention-driven CNN with interaction tripartite graph

Inform. Sci.

(2019)

JiaJ. et al.

Bagging-based spectral clustering ensemble selection

Pattern Recognit. Lett.

(2011)

XuY. et al.

Effective community division based on improved spectral clustering

Neurocomputing

(2018)

ZhouY. et al.

Personalized learning full-path recommendation model based on LSTM neural networks

Inform. Sci.

(2018)

CamineroA.C. et al.

VirTUal Remote laboratories management system (TUTORES): Using cloud computing to acquire university practical skills

IEEE Trans. Learn. Technol.

(2016)

BaldassarreM.T. et al.

Cloud computing for education: A systematic mapping study

IEEE Trans. Educ.

(2018)

RenX. et al.

Datum: Managing data purchasing and data placement in a geo-distributed data market

IEEE/ACM Trans. Netw.

(2018)

El MhoutiA. et al.

Using cloud computing services in e-learning process: Benefits and challenges

Educ. Inf. Technol.

(2018)

AliW. et al.

A survey of web caching and prefetching

Int. J. Adv. Soft Comput. Appl.

(2011)

TranT.X. et al.

Cooperative hierarchical caching and request scheduling in a cloud radio access network

IEEE Trans. Mob. Comput.

(2018)

KimH.-C. et al.

Performance impact of large file transfer on web proxy caching: A case study in a high bandwidth campus network environment

J. Commun. Netw.

(2010)

BhandarkarS.M. et al.

Collaborative caching for efficient dissemination of personalized video streams in resource constrained environments

Multimedia Syst.

(2014)

XuY. et al.

Flow-level QoE of video streaming in wireless networks

IEEE Trans. Mob. Comput.

(2016)

BharathB. et al.

A learning-based approach to caching in heterogenous small cell networks

IEEE Trans. Commun.

(2016)

YangP. et al.

Content popularity prediction towards location-aware mobile edge Caching

IEEE Trans. Multimed.

(2019)

Cited by (11)

Trustworthiness assessment for industrial IoT as multilayer networks with von Neumann entropy
2021, Applied Soft Computing
Citation Excerpt :
The industrial process requests real-time data analytics and control instructions to be communicated over the Internet to production management services. In this context, fog and edge computing play an important role [7,8], which supports computation, storage and connectivity for various IIoT applications, collecting data from instruments and attached sensors, processing it locally, and sending selected data through to the cloud for production management [9]. In IIoT applications with a mission-critical and remote computing resource, the need for Edge-Fog-Cloud interplay [10] is even more important.
The Industrial Internet of Things (IIoT) has expanded worldwide rapidly, which brings key devices and applications of IIoT under a trustworthy umbrella that reinforces secure and safe IIoT services have never been more important. However, there are few effective methods for assessing the trustworthiness of IIoT networks and services, which may lead to a compromised system and massive decreases in productivity, or even catastrophic consequences. Complex networks have emerged to be a promising method to assess the trustworthiness of IIoT because they can reveal the latent features of networks and services. Enlightened by the potential of complex networks, a cloud-fog-edge computing paradigm for IIoT is presented and mapped to multilayer networks. Furthermore, we propose a Trustworthiness Assessment with Entropy (TAE) method, which quantitatively analyzes the topological characteristics of the IIoT networks and services. Experimental results on synthetic and real-world datasets present a comprehensive assessment of IIoT trustworthiness with the qualitative and quantitative analysis of von Neumann entropy, which proves the feasibility and robustness of the proposed method.
Topic sensitive hybrid expertise retrieval system in community question answering services
2021, Knowledge-Based Systems
Citation Excerpt :
Topic models are powerful tools to construct knowledge based systems. Hence, it has been extensively used in CQA services [17–25] as well as in other applications [41,42]. During the development of a topic model, two assumptions are usually made.
Here, we propose a topic sensitive hybrid expertise retrieval system in community question answering services. We introduce three new expertise signatures: knowledge, reputation, and authority. These signatures consider the questions, and hence, their answerers from a topic sensitive perspective. We estimate the knowledge of an answerer on a new question based on the previously answered subset of questions with similar topic distributions to the new question. The reputation of an answerer, moreover, is derived from the qualities of previously answered questions by the answerer with similar distributions of topics. Furthermore, we propose a topic sensitive authority model. It considers some topic related information associated with questions and the relationships among their answerers. We compare the proposed method with 26 existing methods on 4 real-world datasets using 5 performance measures. It outperforms the comparing algorithms in 91.73% (477 out of 520) cases.
Cross-modal recipe retrieval via parallel- and cross-attention networks learning
2020, Knowledge-Based Systems
Citation Excerpt :
The main contributions of this work are summarized as follows: Two sub-directions in the knowledge-based field [8–10] are tightly related to our work, namely, cross-modal recipe retrieval and attention for information retrieval. Cross-modal retrieval [11] has become an emerging topic in the research community, which aims to retrieve relevant items that are of different characters with respect to the query format.
Cross-modal recipe retrieval refers to the problem of retrieving a food image from a list of image candidates given a textual recipe as the query, or the reverse side. However, existing cross-modal recipe retrieval approaches mostly focus on learning the representations of images and recipes independently and sewing them up by projecting them into a common space. Such methods overlook the interplay between images and recipes, resulting in the suboptimal retrieval performance.
Toward this end, we study the problem of cross-modal recipe retrieval from the viewpoint of parallel- and cross-attention networks learning. Specifically, we first exploit a parallel-attention network to independently learn the attention weights of components in images and recipes. Thereafter, a cross-attention network is proposed to explicitly learn the interplay between images and recipes, which simultaneously considers word-guided image attention and image-guided word attention. Lastly, the learnt representations of images and recipes stemming from parallel- and cross-attention networks are elaborately connected and optimized using a pairwise ranking loss. By experimenting on two datasets, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses.
HDSM: A distributed data mining approach to classifying vertically distributed data streams
2020, Knowledge-Based Systems
Citation Excerpt :
Given the proliferation of quality sensors at relatively low cost, a resurgence in distributed mining, particularly in mining heterogeneous data sources, is evident. Such applications that can benefit from distributed data stream mining techniques include analyzing data streams originating from many different mobile devices [1], addressing environmental issues by analyzing heterogeneous streams of traffic, wind, and weather data [2], topic modeling of resource requests for adaptive prefetching in distributed networks [3], and monitoring networks of streaming sensors in manufacturing plants for quality assurance [4]. This research has spawned a new set of approaches that feature distributed architectures, just as HDSM does.
The rise in the Internet of Things (IoT) and other sensor networks has created many vertically-distributed and high-velocity data streams that require specialized algorithms for true distributed data mining. This paper proposes a novel Hierarchical Distributed Stream Miner (HDSM) that learns relationships between the features of separate data streams with minimal data transmission to central locations. Experimental evaluation demonstrates significant improvements in classification accuracy over previously proposed distributed stream-mining approaches while minimizing data transmission and computational costs. HDSM’s potential for dynamically trading off accuracy with computational resource costs is also demonstrated.
File fetching in distributed file system via optimization assisted hybrid deep learning model
2024, Multimedia Tools and Applications
Vulnerability Evaluation Method of Big Data Storage in Mobile Education Based on Bootstrap Framework
2022, Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST

View all citing articles on Scopus

^☆: No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.05.034.

View full text

Adaptive resource prefetching with spatial–temporal and topic information for educational cloud storage systems☆

Highlights

Abstract

Introduction

Section snippets

Related work

Problem formulation

System architecture

Algorithms for prefetching management

Performance evaluation

Conclusion

Acknowledgments

Comput. Educ.

Comput. Hum. Behav.

Comput. Educ.

Knowl.-Based Syst.

Sustain. Comput.: Inform. Syst.

Knowl.-Based Syst.

Knowl.-Based Syst.

Comput. Commun.

Neurocomputing

Expert Syst. Appl.

Inform. Sci.

Expert Syst. Appl.

Inform. Sci.

Pattern Recognit. Lett.

Neurocomputing

Inform. Sci.

VirTUal Remote laboratories management system (TUTORES): Using cloud computing to acquire university practical skills

IEEE Trans. Learn. Technol.

Cloud computing for education: A systematic mapping study

IEEE Trans. Educ.

Datum: Managing data purchasing and data placement in a geo-distributed data market

IEEE/ACM Trans. Netw.

Using cloud computing services in e-learning process: Benefits and challenges

Educ. Inf. Technol.

A survey of web caching and prefetching

Int. J. Adv. Soft Comput. Appl.

Cooperative hierarchical caching and request scheduling in a cloud radio access network

IEEE Trans. Mob. Comput.

Performance impact of large file transfer on web proxy caching: A case study in a high bandwidth campus network environment

J. Commun. Netw.

Collaborative caching for efficient dissemination of personalized video streams in resource constrained environments

Multimedia Syst.

Flow-level QoE of video streaming in wireless networks

IEEE Trans. Mob. Comput.

A learning-based approach to caching in heterogenous small cell networks

IEEE Trans. Commun.

Content popularity prediction towards location-aware mobile edge Caching

IEEE Trans. Multimed.