A Semantic Concast service for data discovery, aggregation and processing on NDN

https://doi.org/10.1016/j.jnca.2018.10.017Get rights and content

Abstract

Offering a flexible paradigm for intelligently discovering, aggregating and processing big distributed data is a crucial requirement in large content-centric Internet. However, the major hindrances to this paradigm are network's dynamic feature, traffic balance, wired forwarding and the absence of cooperation between communications and computations. In this paper, we present a scalable Semantic Concast service on Named Data Networking (NDN) being considered as a promising paradigm for the future Internet. The service enables cooperation between data discovering, aggregating and processing among intermediate nodes for a user's Interest that contained a hierarchical name and semantic constraints. Specifically, multiple types and strategies of data aggregation and processing for combining and processing the positive data and suppressing the negative, futile data, as well as a determination of response completeness are introduced for enhancing relevant results recall and sharing. The experimentation demonstrated the Semantic Concast service can effectively improve service quality, reduce network traffic and shorten response time.

Introduction

Distributed data storages and content-centric networking technologies connect big distributed data from everywhere. Further, cloud and edge computing technologies provide easy access to mass computational and storage resources, which are distributed everywhere. Accordingly, the Internet has become a large repository that stored big distributed data and managed numerous computational resources. With thin terminals, more and more consumers fancy retrieving, collecting and aggregating (or combining) the distributed data from the Internet, in which most of positive data are further needed processing, while all of negative data are needed to be suppressed for reducing bandwidth consumption. However, in distributed networks, these issues are still separately handled by third-party platforms or applications, since the IP network only focuses on transmitting limited block size of data from one host to another host. For example, if a consumer or application want to retrieve the top-k data from a large network, firstly the consumer or the application will retrieve multiple top-k data from related data storages. Supposing there are n related data storages, in the client here n − 1 top-k data are negative after performing the top-k data aggregation or combining for retrieving a global top-k data. The P2P network over IP network provides overlay multicast communication but still show some clumsy in routing optimization and the cooperation between data communications and semantic processing. Many applications have been proposed to address the data discovery and aggregation on P2P (Abdullah et al. (2005); Akbarinia et al. (2006)), WSN (Al-Karaki et al. (2009)) and V2X (Jiru et al. (2014)), or perform some parallel data processing, such as MapReduce/Hadoop (Foundation (2018)), over the Internet. However, today these applications are confined to the host-based communication model and the separation of data addresses and semantics. And in large content-centric Internet, there is a lack of the intelligence of in-network data discovery and processing. So the data discovery, aggregation and processing problems still cannot be addressed concurrently in distributed networks while transmitting a group of distributed data, which often cause duly response time, heavy network traffic and lower data quality.

In fact, the sources of big data are coupled with their generating domains and various features. Recently, a novel data-centric network paradigm, called Named Data Networking (NDN) (Zhang et al. (2014)), offers flexible access to huge collections of hierarchical categories, objects or functions (Liu et al. (2016); Sifalakis et al. (2014)). In NDN, Interests can be purposefully routed to related parts or a single source of the network through the Interests’ hierarchical names and the Longest Prefix Match (LPM) algorithm. For returning the results, the outgoing Interests will be recorded in the PIT (Pending Interest Table) of each passing router. At a router of NDN, a received Data Packet is identified by its hierarchical name and gets returned faces by Exact Match (EM) between its name and the entries in PIT, that is, the name of the data that will be returned has to be the same as the name of the related PIT entry. A router may receive the response data more than one chunk from different faces, but only one of them will be returned to downstream. So, one Interest only fetches one Data from one source at most, and which Data will be fetched is up to the response time of the Data. Generally, NDN uses the FCFS (First Come First Service) policy. Moreover, NDN supports certifying the authentication of response data, whether the response data are in motion or at rest. In addition, introduced by the in-network caching mechanism, massive cached data distributed on Internet is also a kind of wealth for speeding up the response time and reducing network traffic. So, potentially NDN could become an efficient network architecture for distributed and dynamic big data services. However, the NDN does not focus on a group of distributed data aggregation and response at a time. That is, the content-centric network technologies do not support positive data discovering, aggregating and processing concurrently, nor duplicate differentiating and inferior data suppressing over Internet. Or rather, there is a lack of the semantic cooperation of discovery, aggregation and computation between routers. This will cause high bandwidth consumption and lower service quality. However, for exploring some intelligence of content centric network and efficiently discovering, aggregating and processing big distributed data simultaneously in networks, the influence factors should be investigated and a distributed cooperation mechanism should be furnished.

Based on the principles of NDN, this paper focuses on a new distributed cooperation mechanism which combines the multicast and concast (Calvert et al. (2001)) (the reverse of multicast) communications to discover data from distributed storages, and then concurrently aggregate and process the distributed response data in multiple routers. That is, we present a Semantic Concast service based on the NDN for concurrently supporting distributed data discovery, aggregation and processing over Internet. With the Semantic Concast service, each powerful router or processing nodes can aggregate and process multiple answers that coming from different sources based on limited cache. And then the aggregated data are returned fleetly to downstreams by the hints of PIT. The Semantic Concast service also aims to support a growing number of applications involving data discovery, aggregation and processing in network layer by reinforcing NDN. Better still, when we restricted to a set of relevant tiny data retrieval, the bandwidth consumption and transmission cost will not increase than that of NDN.

The main technical contributions of our work are summarized as follows:

  • (1)

    We firstly give the definition of Semantic Concast service, which combines the multicast and concast communications, for distributed data aggregation and processing after multiple data discovery, then make comparison with NDN;

  • (2)

    A novel semantic and cooperative mechanism on multiple routers is presented for distributed data aggregation and processing, and empowering networks with some content-centric intelligence.

  • (3)

    For intelligently communication of Interests and response data, the steerable Interest multicast and multiple data concast protocols are presented by extending NDN;

  • (4).

    The response data completeness estimation and tracing method are introduced for timely finishing data admission, aggregation and results sharing in networks. In the method, a rigorous loop and duplicate Interest suppressing technique is presented for suppressing all invalid outgoing and ingoing Interests, which can contribute to estimating the completeness state of an answer in routers.

The remaining paper is organized as follows. Section 2 discusses the literature work. In Section 3, we describe the semantic multicast and concast framework. Based on the proposed Semantic Concast service framework, in Section 4, we propose the data aggregation and computation processing scheme in concast phase. Section 5 discusses the issues that merit attention for the Semantic Concast service. Section 6 presents the experiments and analysis on the performance of the Semantic Concast service. Finally, Section 7 gives the concluding remarks along with extensions and directions for future research.

Section snippets

Related works

NDN provides a multicast communication mode to send Interests to all relevant producers and uses the selection function f: n → 1 to return a data at every intermediate router although it received multiple data. So the aim of NDN is not to carry multiple data that coming from different producers to the consumers at a time, and not to support the data concast service for distributed data discovering, aggregating and processing. In fact, in many cases, there are lots of applications or components

The semantic multicast and concast framework

Our Semantic Concast service is based on the NDN infrastructure, which presents a novel communication infrastructure because of some distinct communication modes for request/data transmission.

Definition 1

Semantic multicast. In NDN, one Interest will be semantically forwarded to all related routers or sources (generally more than one) through the Out_faces that obtained from FIB by matching the hierarchical name of an Interest and FIB in every passed router. We call this data communication mode as Semantic

Aggregation and computation in concast

In the concast phase, to efficiently aggregate and process the distributed and local response data in NDN, a minor aggregation and computation (called Agg&C operation) is introduced. Totally, with minor and asynchronous aggregations and computations in response paths, we can empower NDN with the ability to interleave data discovery, aggregation and processing in a network, but no violating the network principles, or seriously, such as traffic balance and wired forwarding. And the monolithic

Discussion

There are some issues that merit attention for the Semantic Concast service.

  • (1)

    Network Scale and performance. The routing protocols used in NDN are similar to conventional routing protocols in the present-day Internet, such as OSPF. The main difference is the nature of naming. Moreover, although the Internet is composite of a large number of networks, actually it's hierarchical architecture. According to the measuring results of the Internet (Zhang et al. (2008)), the average degree of nodes and

Implementation and evaluation

Based on the Semantic Concast service, we implement a hierarchical name-based multiple information retrieval application on ndnSIM (a popular NDN simulator based on ns-3) (Alexander et al. (2017)), which can realize top-k data discovering, aggregating and processing (e.g. top-k data ranking) for large distributed data on the distributed network. In our experimentation, we use the DMOZ data (http://dmoztools.net/), an Open Directory Project that categorizes large Web sites. The attributes of the

Conclusions

In the paper, we present a Semantic Concast service framework by extending the protocol of NDN. In our framework, a cooperation approach for intelligently data discovery, aggregation and computation are presented in a multiple-hop content-centric network. The framework can efficiently discover, aggregate and process most needed data simultaneously during data routing and transmission, and also shorten response time or largely reduce network traffic or balance between the two in the whole

Acknowledgement

This work was supported by the grant from the Key Project of Hunan Provincial Education Department (17A070), the National Natural Science Foundation of China (61370227) and the Hunan Provincial Natural Science Foundation of China (2017JJ2081, 2018JJ4052). We are grateful to Prof. Lan Wang (the University of Memphis) and the anonymous reviewers for their helpful suggestions and comments.

Zhuhua Liao received B.S.degree in Computer Science and Technology from Hunan Normal University, China in 2001; the M.S. degree in Computer Application Technology from Graduate University of Chinese Academy of Sciences in 2004; and the Ph.D. degree in Computer Architecture from Institute of Computing Technology, Chinese Academy of Sciences, in 2012. He is currently an Associate Professor with the College of Computer Science and Engineering, Hunan University of Science and Technology, China. His

References (30)

  • J. Chen et al.

    Copss:an efficient content oriented publish/subscribe system

  • H. Dai et al.

    Bfast: unified and scalable index for ndn forwarding architecture

  • W. Drira et al.

    NDN-q: an ndn query mechanism for efficient v2x data collection

  • E. Fasolo et al.

    In-network aggregation techniques for wireless sensor networks: a survey

    IEEE Wireless Commun.

    (2007)
  • T.A.S. Foundation

    Apache Hadoop

    (2018)
  • Cited by (0)

    Zhuhua Liao received B.S.degree in Computer Science and Technology from Hunan Normal University, China in 2001; the M.S. degree in Computer Application Technology from Graduate University of Chinese Academy of Sciences in 2004; and the Ph.D. degree in Computer Architecture from Institute of Computing Technology, Chinese Academy of Sciences, in 2012. He is currently an Associate Professor with the College of Computer Science and Engineering, Hunan University of Science and Technology, China. His research interests include Named Data Networking, distributed data processing, and distributed computing. He has published more than 30 peer-reviewed papers in the related area, including international journals such as Journal of Computational Science, Journal of Networks, and also in conference proceedings such as ISPA, MMM, CIT and IFIP IIP.

    Zengde Teng received B.S. degree in computer science and technology from Anqing Normal University, China, in 2016. He currently working towards his M.S. degree at Hunan University of Science and technology, China. His current research interests include Named Data Networking and distributed data discovery.

    Jian Zhang received B.S. degree in computer science and technology from Hunan University of Science and Technology, China, in 2016. He currently working towards his M.S. degree at Hunan University of Science and technology, China. His current research interests include distributed data processing and aggregation.

    Yizhi Liu received B.S.degree in Industrial electronic automation from Hunan University of Science and Technology, China in 1994; the M.S. degree in Computer Application Technology from Xiangtan University, China in 2003; and the Ph.D. degree in Computer Application Technology from Institute of Computing Technology, Chinese Academy of Sciences, in 2011. He is currently an Associate Professor with the College of Computer Science and Engineering, Hunan University of Science and Technology, China. His research interests include mobile computing, distributed data mining, and multimedia content analysis and retrieval. He has published more than 40 peer-reviewed papers in the related area, including well-archived international journals such as IEEE Trans. on Multimedia (TMM), Future Generation Computer Systems (FGCS), Journal of Visual Communication and Image Representation (JVCI) and conference proceedings such as ICMR, ASSP, MM, ECCV and IFIP IIP.

    View full text