Deep Learning Model and Its Application in Big Data

Zhou, Yuanming; Zhao, Shifeng; Wang, Xuesong; Liu, Wei

doi:10.1007/978-3-319-91797-9_55

Yuanming Zhou¹⁵,
Shifeng Zhao¹⁵,
Xuesong Wang¹⁵ &
…
Wei Liu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10918))

Included in the following conference series:

International Conference of Design, User Experience, and Usability

6434 Accesses

Abstract

In the era of big data, many of the data that previously seemed hard to collect and use began to be utilized, resulting in an increase of millions of the data to be processed. In order to obtain valuable information, large-scale data can be processed and analyzed using a well-developed deep learning framework. This study introduces the concept of deep learning and three common deep learning models - Multilayer Perceptron, Convolutional Neural Network and Recurrent Neural Network, and analyzes the improvement of the model in dealing with large-scale data and gives the capacity and diversity analysis. Introducing the innovative application of deep learning in various fields under big data. Looking forward to the development of deep learning in the era of big data, the integration of big data and in-depth learning will make breakthroughs in various fields. Through constant innovation, they will gradually create more value for mankind.

You have full access to this open access chapter, Download conference paper PDF

Integration of Big Data and Deep Learning

Deep Learning for Big Data Analytics

Heterogenous Applications of Deep Learning Techniques in Diverse Domains: A Review

Keywords

1 Introduction

Big data or huge amounts of data, as the name implies, refers to a large amount of information. Big data is called big data when it is huge enough that the database system can not store, calculate, process and analyze information that can be interpreted within a reasonable time. There is a wealth of information in these huge amounts of information, such as relevance, unexplained patterns, market trends, may be buried with unprecedented knowledge and application waiting for us to discover. However, the traditional methods do not work because the data volume is too large and the flow rate is too fast. Therefore, promoting us to continue to develop a new generation of data storage devices and technology, hope we can extract from the big data that valuable information. Deep learning is the key to unlocking the big data era [1].

Deep learning is a sub-area of machine learning that simulates the human brain to analyze, interpret and manipulate text, images [2], and voice [3] messages. It builds a learning model with multiple hidden layers to train massive data through supervised/unsupervised training. After training, it can automatically learn features and improve classification or prediction accuracy. The advantages of the deep learning model are the next two sides. The first is the ability to extract features. The resulting feature data is more representative of the original data, which will greatly facilitate the classification and visualization issues. The second is the combinatorial ability of features. Because deep learning has a more complicated network structure and has many non-linear parts, its feature combination ability is very strong.

At the same time, the evolution of big data has spawned the upgrade of hardware and software systems. The distributed architecture makes the performance of the algorithm no longer have a bottleneck, parallel framework and training methods to speed up the deep learning more efficient. Deep learning has transformed the mindset of problem solving and is the best partner for big data [4]. This study describes three typical deep learning models and their applications.

2 The Main Model of Deep Learning

Although the concept of deep learning [5] has been short-lived since its introduction, the algorithm model has rich research achievements and has also performed well in an era of rapid data growth. Deep learning algorithms often involve large-scale hidden neurons and millions of parameters that can process vast amounts of data and handle complex models. This section presents three depth models: Multilayer Perceptron, Convolutional Neural Networks, and Recurrent Neural Networks, as well as algorithmic and model improvements when working with large-scale data.

2.1 Multilayer Perceptron

Multilayer perceptron [6, 7] is a feedforward artificial neural network model, which contains multiple neurons arranged in multiple layers. Adjacent layers have nodes or edges and connections are provided with random weights. Usually they are the random numbers which between [−0.5, 0.5]. The principle is to map multiple input data sets to a single output data set. In statistical analysis, pattern recognition, optical symbol recognition, multi-layer perceptron is a powerful tool (Fig. 1).

At every layer of neural network output, the trigger value of each neuron is calculated. The trigger value is the sum of the product of the value of all the neurons connected to the previous level of this neuron and the corresponding weights. The activation function is used to normalize the output of each neuron. For a K-layer multi-layer perceptron machine, the matrix is expressed as follows:

$$ y(x)\, = \,f_{K} ( \ldots f_{2} (w_{2}^{T} f_{1} (w_{1}^{T} x\, + \,b1)\, + \,b2) \ldots \, + \,b_{K\, - \,1} ) $$

(1)

The backtracking algorithm uses delta rules to compute a local gradient drop from the output neuron back to each neuron in the input layer. First get the error of each output neuron

$$ e_{j} (n)\, = \,d_{j} (n)\, - \,o_{j} (n) $$

(2)

The $ j $-th neuron for the output layer $ l $ is then calculated

$$ delta_{j}^{(L)} (n)\, = \,e_{j}^{(L)} (n)\,*\,f^{{\prime }} (u_{j}^{(L)} (n)) $$

(3)

Finally calculate there $ j $-th neuron for hidden layer

$$ delta_{j}^{(l)} (n)\, = \,f^{{\prime }} (u_{j}^{(l)} (n))\sum\nolimits_{k} {(delta_{k}^{(l + 1)} (n)\,*\,w_{kj}^{l + 1} (n))} $$

(4)

After getting the delta for all the neurons, adjust the weight according to the following formula:

$$ w_{ij}^{l} (n\, + \,1)\, = \,w_{ij}^{(l)} (n)\, + \,\alpha \,*\,[w_{ij}^{(l)} (n)\, - \,w_{ij}^{(l)} (n\, - \,1)]\, + \,\eta \,*\,delta_{j}^{(l)} (n)y_{i}^{(l - 1)} (n) $$

(5)

For layer $ l $, the new weight is the current weight plus a potential coefficient α and the learning coefficient η multiplied by the layer $ l $’s delta and the output of the neurons of the previous layer $ l\, - \,1 $. The implementation of the delta rule is usually also an approximation of the gradient of the error sum. If there is a sufficiently small learning rate, the delta rule will find a set of weight-minimization error equations.

The research [8] shows that the multi-layer perceptron model neural network has the ability of parallel processing and self-learning. When the neural network has more than two hidden neural nodes, it can approximate the nonlinear function with arbitrary precision. Zen et al. [9] proposed a speech synthesis model based on multi-layer perceptrons using the algorithm in literature [10]. The model uses the input feature sequence to represent the input text. Each frame of the input feature sequence is mapped to the respective output features through multiple layers of perceptrons to generate speech parameters. Finally, the speech is synthesized through voiceprint. The training data consists of 33,000 segments of voice material recorded in US English by a female professional speaker. The results of this model are superior to those of the Hidden Markov Model method under a large number of data tests.

Large scale data will undoubtedly lead to slow training, in order to improve the problem, Cheng et al. [11] proposed a Learning-NEAT (LNEAT), a grid training method for large-scale data classification problems, which simplifies network evolution by splitting a problem into several subtasks. Learning subtasks is accomplished by applying Back Propagation rules in the NEAT algorithm. The LNEAT algorithm takes into account the advantages of the NEAT algorithm and the BP algorithm in topology and weight search, and overcomes the problems caused by the use of the NEAT algorithm. LNEAT algorithm has got satisfactory results in speech recognition, and has greatly improved the speed of network training.

2.2 Convolutional Neural Networks

Convolutional neural network [12] is a deep machine learning method from artificial neural network. In recent years, it has achieved great success in the field of image recognition. Convolutional neural network retains the deep structure of the network because of using local connection and weight sharing, and at the same time greatly reduces the network parameters so that the model has good generalization ability and is easier to train. A convolutional neural network is mainly composed of the following four parts: convolutional layer, pooling layer, fully connected layer and loss function. In the process of concatenation of different layers, the feature extraction and feature combination of image features are realized (Fig. 2).

Convolution layer is used for feature extraction. Usually, multi-layer convolution layer is used to get deeper feature maps. Low-level convolution layer performs edge detection on the image. Higher-level convolution layer extracts Feature map for further feature extraction. Each convolution layer contains a number of convolution kernels, through different convolution kernel to complete the extraction of different features, the public expression is:

$$ x_{j}^{l} \, = \,f(\sum\limits_{{i \in M_{j} }} {x_{i}^{l - 1} \,*\,k_{ij} } \, + \,b_{j}^{l} ) $$

(6)

The activation function f is used to increase the nonlinear factors to enhance the system’s ability to express. Commonly used activation functions are as follows:

$$ Sigmoid(x)\, = \,\frac{1}{{1\, + \,e^{x} }} $$

(7)

$$ TanH(x)\, = \,\frac{{e^{x} \, - \,e^{ - x} }}{{e^{x} \, + \,e^{ - x} }} $$

(8)

$$ \text{Re} LU(x)\, = \,\hbox{max} (0,\,x) $$

(9)

After the features are obtained by the convolution layer, the features need to be sorted by the pooling layer [13]. The pooling layer aggregates the features of different locations. Common collection methods are the largest collection and average collection

$$ O_{i,jk} \, = \,\mathop {MAX}\limits_{0\, \le \,x\, \le \,m,0\, \le \,y\, \le \,m} (I_{im\, + \,x,jm\, + \,y,k} ) $$

(10)

$$ O_{i,jk} \, = \,\mathop {AVE}\limits_{0\, \le \,x\, \le \,m,0\, \le \,y\, \le \,m} (I_{im\, + \,x,jm\, + \,y,k} ) $$

(11)

The fully connected layer acts as a “classifier” throughout the convolutional neural network [14]. If the operations such as convolutional layer, pooling layer and activation function layer map the original data to hidden layer feature space, then the fully connected layer serves to map the learned “distributed feature representation” to the sample markup space. In practice, the fully connected layer can be implemented by a convolution operation: the fully connected layer to the front layer can be transformed into a convolution kernel with a 1 the “fully coated feature representation” to the sample markup space follows: of different features, the public expression is the global convolution of $ h\, \times \,w $, $ h $ and $ w $ are respectively the height and width of the convolution results of the previous layer. Usually fully connected layer is located at the top of the network.

The loss function can continuously compare the output characteristic distance with the target, and use the backward conduction algorithm to continuously adjust the parameters in the whole network so as to achieve the goal of continuously optimizing the network structure and making the network develop in the expected direction. The European distance loss function [15] and softMax loss function are commonly used. The Euclidean distance loss function is a fundamental loss function aimed at reducing the Euclidean distance between the system output and a given label. The objective function of softMax can be written as

$$ J(\theta )\, = \, - \frac{1}{m}[\sum\limits_{i = 1}^{m} {1\, - \,y^{(i)} } \log (1\, - \,h_{\theta } (x^{(i)} ))\, + \,y^{(i)} \log h_{\theta } (x^{(i)} )] $$

(12)

The use of convolutional neural networks in large-scale data requires the use of multicore GPUs. The number of threads required for training depends on the size of the filter selected. Researchers at Microsoft Research Asia [16] used a network of depths up to 100 in the ImageNet Challenge, winning at an error rate as low as 3.57%. The number of layers in this network is more than 5 times that of any neural network that has been successfully used in the past, and has achieved good results in dealing with large-scale images.

When the required space in the GPU exceeds the available memory, the data needs to be copied into the CPU memory, but the transfer rate between the GPU and the CPU is relatively slow. Satish et al. [17] modulated data transmission into integer linear programs and improved simulated annealing algorithms/mixed integer linear programming algorithms significantly reduce the data transfer between the GPU and the CPU. Compared with the non-optimized method, the 30-fold reduction in data throughput is obtained, which provides important support for convolutional neural network processing large-scale data and parallel computing.

2.3 Recurrent Neural Networks

Recurrent neural networks (RNN) are neural networks with fixed weights [18], external inputs, and internal states that can be thought of as behavior dynamics of internal states with weights and external inputs as parameters. According to the basic variables is the neuron state or local field state can be divided into static field neural network model and local field neural network model. According to different ways of processing signals, the neural network can be divided into continuous system and discrete system.

Figure 3 is example of a fully deployed RNN network [19]. $ x_{t} $ represents the $ t $-step input. $ s_{t} $ is the state of the $ t $-step hidden layer, which is the memory unit of the network. $ s_{t} $ Calculate based on the output of the current input layer and the state of the hidden layer in the previous step

$$ S{}_{t}\, = \,f(Ux_{t} \, + \,Ws_{t - 1} ) $$

(13)

$ f $ is generally a non-linear activation function such as $ \tanh $ or $ RELU $.$ s_{ - 1} $, the hidden state of the first word, is needed to calculate $ s_{0} $, but it does not exist and is usually set to $ o $ in the implementation. $ o_{t} $ is the output of the step $ t $, expressed in vector as

$$ o_{t} \, = \,soft\hbox{max} (Vs_{t} ) $$

(14)

The recursive neural network introduces the ring structure, so the output at a certain time is not only related to the input of the current time, but also related to the state of the previous moment, and it can be used to deal with the variable length sequence problem through the weight sharing so that the recursive neural network can enhance its robustness.

However, the recursive neural network is more complicated than the feedforward neural network and has more computation and decoding speed. Which limits the application of recurrent neural network in the task of high real-time and large amount of data. In order to achieve the purpose of accelerating computation, Zhang et al. [20] proposed a method of frame skipping computation, which can reduce computational overhead by regularly dropping overlapping frames and directly reducing the number of frames to be calculated in the neural network. It can be applied directly to the recursive neural network model by adding the necessary cross-state transitions to the HMM in the Hidden Markov Model and the network structure itself does not change. The method can get 2–4 times speedup with less loss of accuracy.

Yosuke et al. [21] proposed a performance model for a distributed Deep Neural Network (DNN) training system called SPRINT, as Fig. 4, which takes DNN architecture and machine specifications as input parameters, taking into account the low-volume and gradient probability distributions that are core parameters of asynchronous SGD training (ASGD Training). Using asynchronous GPU processing based on the smallest batch SGD, the average error time was estimated as 5%, 9%, and 19%, respectively, in the processing of the entire dataset on supercomputers on thousands of GPUs. Progress has been made in dealing with the speed of large-scale data.

3 Deep Learning Application in Big Data

This section will introduce some applications of deep learning under large-scale data. The results of the integration of big data and deep learning in engineering applications are mainly reflected in intelligent voice systems and machine vision images. As learning progresses and technology grows, it also begins to take advantage of the following areas.

3.1 Multi-function Network

Although the combination of deep learning and big data already has a lot of flexibility, the combination of the two can only do one problem at a time. For example, training a network or only recognize the picture, or only recognize the voice, they can not be identified at the same time. There is not yet a network that recognizes objects both visually and audibly. Despite the multitasking learning technology, the web can identify profiles, gestures, shades, texts, and more while recognizing image categories, but today’s deep neural networks can be very low-energy compared to our human-versatile brain.

At present, if an application needs different capabilities, multiple networks must be combined, which is not only a huge consumption for computing resources but also difficult to form an effective interaction between different networks. How to enable them to achieve multiple goals at the same time, the current enlightenment from the human brain is that, in some way, it may be possible to connect the networks responsible for different functions to form a larger network. Noam et al. [22] introduced a decentralized gated mixed-expert layer (MOE) consisting of up to thousands of feedforward subnets, using a trainable gated network to determine the sparse combination of these experts, then present model architectures in which a MOE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.

3.2 Medical System

Medical institutions, whether pathological reports, cure programs or drug reports and so on are relatively large data industries, the face of many viruses, tumor cells are in the process of evolution, the diagnosis will find the diagnosis of the disease and treatment programs to determine is very difficult. This can be based on the pain points in the medical field, building a deep learning model, the use of big data analysis to solve medical problems, such as users can use the big data acquisition behavior analysis, visual query and analysis, multidimensional analysis, retention analysis, funnel analysis, return visit analysis, etc., to build a deep learning model, the depth of the patient’s various problems.

Google’s DeepMind and IBM’s Watson are active in this area, especially Watson, in some specific areas, its diagnostic accuracy has exceeded human experts. Because the majority of medical cases are unstructured textual data, the deep belief networks stacked by multi-layer limiting Boltzmann machine could automatically extract features from textual cases and learning knowledge in the medical records effectively, and can be efficiently diagnosed.

3.3 Translation

Real-time machine translation [23, 24] is one of the most promising directions for deep learning technology to use under big data. From the machine translation approach originally based on rules based on human compilation to the later statistical-based statistical machine translation (SMT) approach to the present neuro-machined translation (NMT), translation technology has been continuously updated over the past six decades, especially in 2012. The deep learning technology get into people’s perspective, the machine translation accuracy constantly updated. Based on the deep learning technology, translation technology adopts an end-to-end structure, does not require human to abstract features and network structure design is simple, does not require word segmentation, alignment, syntax tree design and other complex design work, is very suitable for work under a lot of data.

Dzmitry and Yoshua [25] mapped an original language sentence into an implicit vector of a fixed length through an encoder, which is the bottleneck to improve the translation effect of NMT. Actually, when the decoder decodes the target language sentence, it only correlates with a part of the input source sentence. Based on this, they put forward a mechanism of “attention mechanism” that allows the model to automatically find the key correspondence between the source language sentence and the target language sentence word and the word, without limiting the length of the hidden vector. A set of content-based attention calculation method is given. The essence is to use a two-way Long Short-Term Memory (LSTM) to learn the importance of none of the words in the source language. This method is very effective and can also be used to study deeply in many other fields under large scale of data, for example Q & A, large-based number of sentence reasoning, entity extraction, huge document generation and so on (Fig. 5).

3.4 Playing Strategic Games

When people are still shocked by the impact of ALPHAGO, DeepMind team non-stop brought a new surprise. London local time on October 18, DeepMind [26] team announced the most powerful version of AlphaGo, code-named AlphaGo Zero. AlphaGo Zero Intensive learning self-play. After several days of training, AlphaGo Zero completed almost 5 million self-game, has been able to surpass humans and beat all previous versions of AlphaGo. The neural network takes the checkerboard position $ s $ as an input, outputs a vector of the movement probability $ p $ for the component $ p\_a\, = \,P\_r(a|s) $ for each action $ a $, and scalar value $ v\, \approx \,E[z|s] $ for estimating the expected result $ z $ from the position $ s $. AlphaZero learns the probability of winning these steps entirely from self-play; these results are then used to guide the search of the program.

DeepMind team said in the official blog, Zero with the updated neural network and search algorithm reorganization, as the training deepened, the performance of the system a little bit of progress, through a powerful neural network search algorithm, after the self-game obtained The results are getting better and better, while the neural network has become more accurate.

4 Opportunities and Challenges

Current deep learning algorithms and big data technologies perform far less well than their theoretically achievable performance. Unsupervised learning can solve the real-world identification of very large objects, learn the rules, patterns and features of large-scale data without the intervention of artificial models through direct training using unlabeled data, and understand that ordinary human beings The problem that the brain can not be directly extracted and extracted is used by mankind to solve practical problems. To fully tap the hidden value in big data can serve human life.

Although the existing data volume is already large, it is still not enough. The complexity, dimensions and diversity of common data are not enough to cover all possible boundary conditions in the real world. In the existing distributed system, a large amount of data and parameters need to be transmitted between nodes, and the communication cost is too high. When the number of nodes exceeds a certain number, a continuous acceleration ratio can not be obtained. How to design a distributed system requires DNN algorithm experts and system experts to work together to solve the problem. The solution may need to modify the algorithm to match the underlying hardware architecture, but also requires the system expert to design a powerful computing single machine, but also design high-density Integrated, efficient communication server. Secondly, the model of deep learning in big data, its data volume and computational volume are very large, often need a few weeks or even months of training time, bound to require parallel training to improve training speed, but when training different data between multiple nodes, how to coordinate and synchronize may need to be redesigned from an algorithmic perspective.

5 Conclusion

This study introduces three models of deep learning, and introduces the challenges and applications of each model in big data environments. In this era of massive growth of data, the first issue to be considered in dealing with vast amounts of data is how to effectively analyze and process the data and mine the value of the data. Deep learning methods play a key role in processing big data by adaptively extracting their internal representations from data, minimizing human involvement, and providing greater generalization. If the analogy of artificial intelligence as a rocket, then the deep learning is the rocket engine, big data is the rocket fuel, two parts get together at the same time could be able to successfully launch the rocket into space. In the context of big data, with the continuous deepening of deep learning research, the efficient combination of the two will surely make the computer more intelligent so as to assist human decision-making and bring good news to mankind.

References

Yu, B., Li, S., et al.: Deep learning: a key of stepping into the era of big data. J. Eng. Stud. 20–45 (2014)
Google Scholar
Wu, M., Chen, L.: Image recognition based on deep learning. In: Chinese Automation Congress, Wuhan, China, pp. 542–546 (2015)
Google Scholar
Zhao, Y., Xu, Y.M., Sun, M.J., et al.: Cross-language transfer speech recognition using deep learning. In: Proceedings of the 11th IEEE International Conference of Control & Automation (ICCA), Munich, Germany, pp. 1422–1426 (2014)
Google Scholar
Wang, J., Chen, H., Liu, Q.: The study of deep learning under big data. Chin. High Technol. Lett. 1, 005 (2017)
Google Scholar
Hinton, G.E., Osindero, S., Ten, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., et al.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160. MIT Press, Cambridge (2007)
Google Scholar
Zeiler, M.D., Krishnan, D., Taylor, G.W., et al.: Deconvolutional networks. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2528–2535. IEEE, Piscataway (2010)
Google Scholar
Sagar, G.V.R., Venkata, C.S.: Simultaneous evolution of architecture and connection weights in artificial neural network. Int. J. Comput. Appl. 53(4), 23–28 (2012)
Google Scholar
Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7962–7966. IEEE, Piscataway (2013)
Google Scholar
Tokuda, K., Yoshimura, T., Masuko, T., et al.: Speech parameter generation algorithms for HMM-based speech synthesis. In: Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1315–1318. IEEE, Piscataway (2000)
Google Scholar
Cheng, Z.: Research and Application of Large Scale Multi-layer Perceptron Neural Network Jilin University, vol. 6 (2017)
Google Scholar
Harley, A.W.: An interactive node-link visualization of convolutional neural networks. In: ISVC, pp. 867–877 (2015)
Chapter Google Scholar
He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2014)
Article Google Scholar
Simard, P., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: Seventh International Conference on Document Analysis and Recognition (ICDAR 2003), vol. 2, pp. 958–962. IEEE (2003)
Google Scholar
Lecun, Y., BottouL, B.Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Sun, J., He, K.M., Zhang, X.Y., et al.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification (2015)
Google Scholar
Satish, N., Sundaram, N., Keutzer, K.: Optimizing the use of GPU memory in applications with large data sets. In: Proceedings of the 16th International Conference on High Performance Computing, Kochi, India, pp. 408–418 (2009)
Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., et al.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems 19, p. 153. Neural Information Processing Systems Foundation, Inc. (2006)
Google Scholar
Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), Singapore, pp. 338–342 (2014)
Google Scholar
Zhang, G., Zhang, P.Y., Pan, J., et al.: Fast decoding algorithm for automatic speech recognition based on recurrent neural networks. J. Electron. Inf. Technol. 4 (2017)
Google Scholar
Yosuke, O., Akihiro, N.: Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers, pp. 306–331 (2016)
Google Scholar
Noam, S., Azalia, M., Krzysztof, M., et al.: Outrageously large neural networks: the sparsely-gated mixture-of-experts layer, pp. 204–220 (2017)
Google Scholar
Ciresan, D.C., Meier, U., Gamarde lla, L.M., et al.: Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)
Article Google Scholar
Ranzato, M., Poultney, C., Chopra, S., et al.: Efficient learning of sparse representations with an energy-based model. In: Advances in Neural Information Processing Systems Foundations, Inc., pp. 379–420 (2006)
Google Scholar
Dzmitry, B.,Yoshua, B.: Neural machine translation by jointly learning to align translate. Published as a Conference Paper at ICLR 2015, pp. 167–190 (2015)
Google Scholar
David, S., Julian, S., Karen, S., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)
Article Google Scholar

Download references

Acknowledgements

This work was supported by Beijing Natural Science Foundation (Grant No. 4174094).

Author information

Authors and Affiliations

College of Information Science and Technology, Beijing Normal University, Beijing, China
Yuanming Zhou, Shifeng Zhao & Xuesong Wang
Department of Psychology, Beijing Normal University, Beijing, China
Wei Liu

Authors

Yuanming Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shifeng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xuesong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shifeng Zhao .

Editor information

Editors and Affiliations

Aaron Marcus and Associates, Berkeley, California, USA
Aaron Marcus
Baidu Inc., Beijing, China
Wentao Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y., Zhao, S., Wang, X., Liu, W. (2018). Deep Learning Model and Its Application in Big Data. In: Marcus, A., Wang, W. (eds) Design, User Experience, and Usability: Theory and Practice. DUXU 2018. Lecture Notes in Computer Science(), vol 10918. Springer, Cham. https://doi.org/10.1007/978-3-319-91797-9_55

Download citation

DOI: https://doi.org/10.1007/978-3-319-91797-9_55
Published: 02 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91796-2
Online ISBN: 978-3-319-91797-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Learning Model and Its Application in Big Data

Abstract

Similar content being viewed by others