1 Introduction
The rapid growth of a large number of applications has led to a tremendous rise in data. Such data generated has stringent networking requirements. Traditional network devices have both data and control planes strongly coupled together with proprietary protocols and closed interfaces, thus making handling issues such as policy enforcement and user-aware routing that can vary in complexity [
17].
Software-defined networks (
SDNs) are a recent network paradigm that sets apart data and control planes [
22]. This separation of planes and centralization of controllers offers a great deal of flexibility and innovation in the network for policy enforcement based on network requirements, thus removing vendor lock-in [
6]. The communication between the control plane and data plane is governed by a southbound
Application Programming Interface (
API) known as OpenFlow [
8]. The OpenFlow protocol in SDN is leveraged by the data plane to dispatch the network statistics to the control plane. The control plane then formulates policies for every flow in the network and thus imparts logic to the data plane which is depicted in Figure
1.
SDN has paved the path for easy handling of big data flows in networks, be it data streams from
Internet of Things (
IoT) devices to Cloud datacenters or intra-datacenter network traffic. This has been made possible by leveraging distributed information amassed across the network available to the SDN controller for informed decision-making. However, with the proliferation of next-generation, real-time IoT applications that vary greatly in terms of data frequency and data stream volumes, data traffic classification can substantially assist SDN controllers toward efficient routing and traffic engineering decisions. Existing works on network classification are limited by their application-centric nature, thus overlooking the key criterion for real-time IoT applications, namely,
Quality of Service (
QoS).
Machine learning (
ML) helps to make effective decisions from the prediction of real-time and historical data [
8]. Network statistics amassed at every switch of an SDN network (that collectively makes up the data plane of SDN) can be easily monitored and leveraged gainfully by the controller for implementing intelligent decisions to be implemented at the forwarding plane.
To cater to the demands of a large number of applications and to effectively handle conflicting resource requests, the need to design application-aware networks could be felt. For instance, underwater wireless sensor networks comprise nodes that are deployable on the surface and under the water. All nodes need to communicate and exchange information with other nodes in the same network and with the base station. Communication systems in the sensor network involve the transmission of data using acoustic, electromagnetic, or optical wave media. Among these types of media, acoustic communication is the most popular and widely used method due to its attenuation features in the water. The factor of low transmission is derived from the absorption and conversion of energy into heat in the water. Meanwhile, acoustic signals operate at low frequencies, which enables them to be transmitted and received over long distances. The key requirement for this kind of application-aware networking is network traffic classification; though it is not so easy to implement traffic classification when the sensor networks are underwater. However, the traffic classification at the controller helps to make informed decisions about the applications’ network requirements. Such traffic classification would pave the path for segregation of large and small flows that affect resource requirement fulfillment and thus datacenter performance considerably [
21]. This separation of large and small flows is necessary as the large flows consume bandwidth considerably, thus overcoming performance deterioration of small flows which are typically delay intolerant. Further, for QoS-aware-based applications to meet the resource allocation requirements, it is required network traffic classification and fulfilling such network requirements becomes desirable for seamless functioning of the network [
21]. The centralized view of the whole network, traffic classification at the controller in SDN helps to formulate application-specific rules which are critical for the network to work efficiently and in a seamless manner. However, accurate traffic classifications are still a research problem. In this article, an attempt has been made to the software-defined traffic classification problem from a new perspective by employing evolutionary-based ML algorithms jointly for improved network traffic classification. The main contributions of this article include the incorporation of evolutionary algorithms-based classifiers for network traffic classification and selecting proper network traffic databases obtained from real-life Internet data. Three classification algorithms, namely, Feed-Forward Neural Network (FFNN), Naïve Bayes, and
Logistics Regression (
LR), have been employed on a hybrid
Neural Network using Particle Swarm Optimization (
NN-PSO) for fine-tuning the performance; particularly readily available datasets are not there for this purpose. To the best of our knowledge, such SDN traffic classification attempted in this study and performance improvement with hybrid approaches have still been unexplored in literature.
The remainder of this article is organized as follows: Section
2 presents a brief background to this research and discusses the state of the art. Section
3 describes the underwater sensor networks related to ML techniques. Section
4 discusses the problem formulation and provides the solution methodology. Implementation details and results are discussed in Section
5. Lastly, Section
6 concludes and provides further avenues for this research.
2 Related Works
Underwater Wireless Sensor Networks (
UWSNs) are developing quickly and receiving significant attention, becoming the main focus of both researchers and practitioners [
23]. With high technological advances in UWSN, sensors have become smarter, smaller, and more flexible with lower power consumption, increased processing capacity, and the ability to operate in various underwater applications. Also, UWSN technology can be integrated with Internet Protocol-based systems in supporting the IoT and
machine-to-machine (
M2M) frameworks for real-time monitoring. The rapid growth of the UWSN domain and the availability of modern sensor node technologies have forced the necessity to ensure that awareness is increasing every year due to their compatibility and broad application in various sectors.
There have been several attempts to classify network traffic into a set of varied categories. These categories include QoS-aware, flow-aware, and application-aware traffic classification. Most of the research has focused on application-aware traffic classification. Very few have focused on classification based on QoS and flow awareness. QoS classification helps to detect the classes of a multitude of flows. Wang et al. [
19] classified traffic into various classes based on the QoS. The proposed work utilizes
Deep Packet Inspection (
DPI) and semi-supervised learning for the classification of traffic. Former DPI is utilized for labeling a part of predefined applications. Flow classification of new applications is done using trained models from
a priori known datasets using the Laplacian
Support Vector Machine (
SVM). This methodology is used for categorizing known and unknown applications into varied sets of QoS classes. The obtained results demonstrated for the proposed system have an accuracy of over 90%.
Flow-aware classification aims to segregate the network traffic into a set of mice and elephant flows. Elephant flows transfer huge data into the network while the latter are cursory and are usually delay tolerant. Glick et al. [
5] focused on scheduling flows in a hybrid data center. For making elephant flow-aware traffic classification ML techniques are employed at the edge of the network. This classification is used by the SDN controller to implement an efficient traffic flow optimization algorithm. Xiao et al. [
20] used a two-way cost-effective strategy for the identification of elephant flows. Firstly, the head packet was used for the identification of elephant flows. Secondly, a decision tree is employed to analyze whether the categorized flows are an elephant or not. Amaral et al. [
2] employed an OpenFlow-based SDN system is deployed in an enterprise network to allow the collection of traffic data. After the collection of data, several classifier algorithms are used for the classification of traffic flows into varied applications. Li and Li [
9] used a MultiClassifier to classify applications by using a combination of ML and DPI-based classifiers. The First ML-based classifier is used for every new flow arrival. The application is deemed to be of MultiClassifier only if the reliability of the ML-based classifier is larger than a threshold value. Otherwise, an accurate classifier like DPI is used. If DPI does not return “unknown,” its result will be selected. Rossi and Valenti [
18] classify applications based on running
User Datagram Protocol (
UDP). In this article, we present classification of a behavioral classification engine that is application aware. Depending upon the count of received packets and bytes, UDP-based traffic is classified with the help of the SVM algorithm. The SVM-based classification has an accuracy of over
\(90\%\). Qazi et al. [
16] propose a framework called Atlas. The proposed method was used to classify mobile applications. For the allocation of ground truth data, a crowdsourcing approach is used. The collected data from the end devices are used for training the decision tree. This training model helps in the identification of traffic flows belonging to mobile applications. The accuracy of the top 40 Google Play applications is over
\(94\%\). Nakao and Du [
13] identified mobile applications using deep
Neural Networks (
NNs). The data collected belongs to an experimental network. In the eight-layer deep NN model five flow features are selected (Packet size, TTL, Destination Port, Destination Address, and Protocol type). The results demonstrate the accuracy of
\(93.5\%\) for about 200 mobile applications.
3 ML-based Data Organization for UWSN
UWSNs consist of sensor nodes and vehicles deployed underwater used to monitor underwater conditions. These underwater conditions can be temperature and pressure. It is also known as an underwater Acoustic network or underwater communication network. Nowadays, UWSN is challenging because of limited battery power and bandwidth and the requirement of dense deployment of the sensor. Some of the important applications of UWSN are Oceanography data collection, Pollution, and environment monitoring, Disaster prevention, scientific exploration in the underwater environment, and so on. These applications depend on collected and transmitted data in UWSN and they predict disasters like floods, hurricanes, earthquakes, tsunamis, tornadoes, and volcanic eruptions. Data agglomeration is a process that can be used to solve the problem related to the collection and storage of data. This process may serve as a supplementary process to the routing process. UWSN poses various functional challenges that have been addressed so far by the usage of ML techniques. Functional challenges consist of Clustering and Data Agglomeration, Event Detection, Query Processing Routing in UWSN, and Localization and Object Tracking.
Data Agglomeration is an iterative classification method. In this method, firstly all the data points are a cluster of their own, then take two nearest and join them to form one single cluster, and lastly processed recursively until it obtains the desired number of clusters. It is a bottom-up technique and it works from the differences between the objects to be grouped. The data agglomeration method can be used by principle Component analysis and self-organization map technique. \(K\)-means algorithm is a popular ML clustering algorithm for collaborative data processing in Clustering and Data Agglomeration. The network property of utmost concern is that of clustering so Large Scale Network Clustering can be used by Neural networks.
Event Detection and Query Processing are most important for the functional challenges of UWSN. There are lots of Event Detection and Query Processing methods like event Recognition, Forest Fire Detection, Query Processing, Distributed Event Detection, and Query Optimization methods. Using a Bayesian algorithm for event Recognition, using
\(K\)-Nearest Neighbour for Query Processing, using Neural Network for Forest Fire Detection, and using Principal Component Analysis for Query Optimization methods [
11].
The design task of routing protocols for UWSN is quite challenging because of multiple characteristics which differentiate them from wireless infrastructure-less networks. Some design challenges are observed in UWSN due to bandwidth, energy, and processing storage, so some essential features are most important for UWSN, such as energy efficiency, data transmission models, and sensor location. Using Self Organized Map and Reinforcement Learning for Data Routing and Routing Enhancement in UWSN.
Object Detection is that your algorithm may find multiple detections of the same objects. In UWSN, we first localize the object using SVM and a Decision tree. One of the popular applications of CNN is Object Detection/Localization in UWSN.
UWSN poses various Non-functional challenges also which have been addressed so far by the usage of ML techniques. They are Security and Anomaly Intrusion detection, QoS, Data Integrity and Fault Detection, and Varied Applications.
Anomaly-based network intrusion detection performs in protecting networks against malicious activities. Outliers are extreme values that deviate from other observations on data and three algorithms can detect the outlier. Those are Bayesian Belief Network, \(K\)-Nearest Neighbor, and SVM.
In UWSN, random occurrences of faulty nodes degrade the QoS of the network. In this article, we propose an efficient fault detection scheme to manage a large-size UWSN. Using Neural Network, It estimates a set of technologies that work on a network to guarantee its ability to dependably run high-priority applications. We can find the Accuracy and Reliability Prediction of the Sensor Network. Nowadays, Air Quality Observing and Intelligent Lighting Control is a popular non-functional challenge in UWSN using Neural Networks [
12].
3.1 Illustrative Example
Consider a cluster of four machines in which nodes \(1\text{--}3\) are the WorkerNodes and node 4 is the MasterNode, which is implicit. There are some tasks in the system which have a different syntax of task execution like tasks scheduled at nodes 2 and 3 after time \(3t\). They represent the tasks that are yet to be completed and the representation \(t++t\) represents that the task already has executed for time \(t\) and the \(++t\) represents that it is the estimated time left for the completion of this task. Let us suppose that normal tasks take a time of \(t\) and \(2t\). There are a total of nine map tasks in the job. At \(4t\) in the diagram, all the tasks are scheduled; at this instant, we do profiling of nodes by the number of map tasks completed in the job. Thus, node 1 performs the majority of computation of four tasks, node 2 has completed two tasks with another task currently being processed on it, and node 3 has completed one task with another task currently processing on it. Since after \(4t\), node 1 is free, it will notify the Jobtracker that it is free via heartbeat. Since all the tasks are already scheduled at \(4t\), then node 1 can become a better candidate for scheduling speculated tasks. Now we check the remaining time of the tasks which are yet to be completed, i.e., at nodes 2 and 3. Based on the processed data and the data left unprocessed in this task, let the remaining time to complete the task at nodes 2 and 3 be \(t\) and \(3t\), respectively. The task at node 3 becomes suitable to be executed speculatively as it has the largest remaining time to be completed as its backing up time for this task is \(2t\). Since the backup time of \(2t\) is less than the estimated remaining time of \(3t\), this node is speculated at node 1 and hence it completes within a time of \(2t\). After the task is completed at node 1, it will let the Jobtracker know about its completion and the task still running at node 3 will be killed automatically. Hence in this scenario, with the help of speculation, there is a savings of time \(t\) in the completion time of the job. Thus, the performance will automatically improve with this controlled speculation.
6 Conclusion
In this article, ML algorithms have been applied for traffic classification of SDN networks for making informed decisions about underlying applications and their QoS requirements. Three ML classifiers, namely, FFNN, BN, and LR, were juxtaposed with a hybrid NN-PSO to normalize datasets for classification purposes as well as for improving the efficacy of the training and testing dataset collected from open source sites. Accuracy of the traffic classification has been carried out using ML algorithms. Additionally, the implementation of NN-PSO enhances the accuracy of the traffic classification with the same classifiers. The proposed method is promising because it does not impose any processing overhead. Even though UWSNs have received a great number of improvements in the previous few years, there is still substantial room for improvement, especially in implementing systems on a large scale. In future work, researchers can offer better solutions on node mobility with high monitoring area (with high neighborhood range) scenarios to investigate the effect on network connectivity, coverage, energy consumption, and network lifetime. To increase efficiencies of the UWSNs and improve their performance, the studies should direct the focus of the prospective research toward implementing cooperative control among a few underwater vehicles. In future works, the detection of flows of a newer application that have not yet been part of the trained classifier will be explored with the implementations on a varied set of platforms (Windows, iOS, Linux).