A novel directional object detection method for piled objects using a hybrid region-based convolutional neural network

doi:10.1016/j.aei.2021.101448

Advanced Engineering Informatics

Volume 51, January 2022, 101448

https://doi.org/10.1016/j.aei.2021.101448 Get rights and content

Abstract

Digital transformation is an information technology (IT) process that integrates digital information with operating processes. Its introduction to the workplace can promote the development of progressively efficient manufacturing processes, accelerating competition in terms of speed and production capacity. Equipment combined with computer vision has begun to replace manpower in certain industries including manufacturing. However, current object detection methods are unable to identify the actual rotation angle of a specific grasped target while objects are piled. Hence this study proposes a framework based on deep learning that integrates two object detection models. Faster R-CNN (region based convolutional neural network) is utilized to search for the direction reference point of the target, and Mask R-CNN is adopted to obtain the segmentation that not only forms the basis of an area filter but also generates a rotated bounding box by minAreaRect function. After integrating the output from two models, the location and actual rotated angle of target can be obtained. The purpose of this research is to provide the robot arm with the position and angle information of the object located on the top for grasping. An empirical dataset of piled footwear insoles was employed to test the proposed method during the assembly process. Results show that the accuracy of the detection reached 96.26%. The implementation of proposed method in the manufacturing process not only can save man power who responsible for sorting out products but also reduce process time to enlarge production capacity. The proposed method can serve as a part of smart manufacturing system to enhance the enterprise’s competitiveness in the future.

Introduction

Paralleling the advance of technology, many factories have gradually introduced automation equipment to minimize the burdens of production, to reduce human error, to increase production capacity and to lower costs. Among these automations, the use of robotic arms to grasp specific objects presents a classic problem associated with the object detection task in computer vision. As it happens, piled and irregularly arranged objects are very common in real industrial scenarios. For example, certain processes utilize injection molding machines to manufacture products with identical specifications. These products are often stacked at slightly different angles when placed onto the collection platform. Consequently, an additional process to rearrange these objects is needed, that not only increases the extra manpower but also the cycle time in the manufacturing process. Manufacturers have used two different methods to grasp such targets at the correct angle from a group of piled objects. One is to dispatch personnel to pick up the targets manually. As the working time increases, the working efficiency of workers decrease quickly because physically tired. The accuracy of identifying and rectifying the problem decreases and the related costs (such as time cost or rework cost) will increase. The second method, incorporating automation, is to engage computer vision technology to help a robotic arm grasp the target objects. Compared to the first method, the second one enables the overall performance to be more evenly stable, and the cost is relatively low and consistent. A case of shoe manufacturer is facing the same problem that they mass-produce insoles by injection molding machines in production line and need to allocate extra worker to sort out the production. Thus, we applied deep learning method with computer vision to achieve automation which can reduce process time to enlarge production capacity and further enhances the enterprise’s competitiveness.

In order to achieve certain goals involving artificial intelligence, researchers have developed a systematic approach that can automatically extract information from raw data, learn the features of that information, and enable the system with the capability to make judgements about it. This is known as machine learning [12]. Deep learning is a popular branch of machine learning [51]. It involves an algorithm that uses artificial neural networks as a framework to perform analysis on data [41]. Because deep learning can automatically extract and learn features from data through neurons in what is known as a network’s hidden layers, it can be very effective at analyzing data [16]. Deep learning methods exhibit good performance and can be applied in many fields, such as computer vision [33] and natural language processing [2], [7]. Deep learning is now considered as one of the main enablers for digital transformation of many industries. It has been widely employed in companies to improve productivity and competitiveness, while helping accelerate digital transformation. Using the data collected from smart devices, digital transformation can easily be adapted for use in the Industry 4.0 era [40].

Computer vision has a wide range of applications in real life such as image recognition [6], action recognition [33], object localization, image restoration, tracking [20], and motion analysis [9]. To train a robot arm to accurately grasp an object, we need to explore the problem of object detection—that is, combining object recognition with precise localization [45]. Deep learning related models can be used to deal with this challenge of object detection. Krizhevsky et al. [22] proposed AlexNet, a convolutional neural network (CNN) which won the 2012 ImageNet competition, that was a major milestone in the arena of deep learning. Since then, deep learning models related to images have developed rapidly, and more and more researchers have conducted investigations related to object detection. The current object detection models can be divided into two types, namely “two-stage detectors” and “one-stage detectors”. Girshick et al. [11] first proposed a region-based convolutional neural network (R-CNN) machine-learning model to help solve the object detection problem. Since then, many related algorithms have been developed based on this model, including Fast R-CNN [10], Faster R-CNN [37], and Feature Pyramid Networks [26]. These models extract the candidate region of the target through a neural network or algorithm at the beginning of the detection process and then use another neural network for classification, which means they belong in the two-stage detector category. The other model types can complete both recognition and localization in one neural network, and are thus recognized as one-stage detectors, such as YOLO (you only look once) [35], SSD (single shot MultiBox detector) [28], and RetinaNet [27]. Both types of object detection algorithms (two-stage and one-stage) have advantages and disadvantages owing to their architecture. Sultana et al., [44] review recent object detection models based on convolutional neural network and obtained the results as shown in Table 1. The two-stage detector has higher object recognition and positioning accuracy. However, its inference speed is slower than the speed of the one-stage detector because it needs to propose candidate regions through an algorithm first [18]. One main goal of this current research, in addition to obtaining the bounding box of objects, was to calculate the rotation angle of the detected object. The detection accuracy is more important than inference speed. Although Faster RCNN is not the optimal solution. Considering the similarity of the architecture, we think that Faster RCNN and Mask RCNN can be further integrated into one model in the future to shorten detection time and save computer memory. Thus, we select Faster R-CNN, a two-stage method, as the detection model of the direction reference point for our investigation.

Several models for object detection have been mentioned. Most of them output horizontally aligned bounding boxes, including R-CNN variants and YOLO series. While many object detection models for rotated bounding boxes have been proposed, it has remained a challenge to determine how to grasp a target at the correct angle when objects are piled. Most of these researches are applied on aerial images which without stacking of objects. For example, Bhat [5] developed a YOLO-based model and took angle into the calculation of loss. Zhong and Ao [57] adopted a new rotation-decoupled anchor matching strategy on FPN-based architecture to detect arbitrarily oriented targets. Once the two overlapping objects are detected, the model is unable to recognize which one is on the top and can be picked up. We utilized Mask R-CNN to solve this problem. The inference results of Mask R-CNN generate a pixel-level bounding box drawn along the contours of an object. We used this property to determine the unobstructed objects in the image and to obtain rotated bounding boxes. Additionally, those detected mask with smaller area are abandoned because they might be covered by other objects.

We proposed a framework based on deep learning that integrates two object detection models. Faster R-CNN is utilized to search for the direction reference point of the object, and Mask R-CNN is adopted to obtain the segmentation that not only forms the basis of an area filter but also generates a rotated bounding box. The proposed framework could identify relatively complete objects and to assess their rotation angle for grasping in piled objects. The proposed Artificial Intelligence (AI) using deep learning approaches brought several benefits through analyzing digitalized image data and further utilizing the result to replace manpower with stable robotic device. Therefore, it also encourages companies to carry out digital transformation to improve production efficiency and corporate competitiveness. The remainder of this study is described as follows. In Section 2, we present a literature review to identify the research gap. In Section 3, we describe the methodology and process in greater detail. In Section 4, a case study is presented to validate the proposed method. Finally, we conclude the work and provide directions for future research in Section 5.

Section snippets

Literature review

This section presents a literature review of related work. In Section 2.1, we address the application of deep learning in the industrial area, including defect detection and the prediction of remaining useful life (RUL). In Section 2.2, we introduce the development history of object detection and briefly review common object detection models. Finally, we summarize the shortcomings of these models and compare them with the proposed model.

Methodology

The framework for this research is divided into three stages, as shown in Fig. 3. The first stage involved preparation of the dataset for training the deep learning models, including collecting data and labelling images. In the second stage, image data and annotated files were used to train the Faster R-CNN and Mask R-CNN models. The third stage involved integrating and analyzing the results generated by Faster R-CNN and Mask R-CNN, which enabled identifying the position and rotation angle of

Case study

The focal company for this case study was the shoe original equipment manufacturer. An original equipment manufacturer (OEM) is a company that produces parts and equipment that may be marketed by another manufacturer. Its shoemaking and development technology leads other worldwide competitors and is widely trusted by the world's major leading brands. The company has an annual production capacity of about 17 million pairs of shoes, an annual value of production of about 2.2 billion dollars, and

Conclusion

This research integrates two deep learning models to address the problem of piled objects with irregular arrangement. In a final test involving 30 fine-tuned images, the results show that the grasping accuracy, referring to the outcome generated by integrating Faster R-CNN and Mask R-CNN, achieved a success rate of 96.26% with a reasonable computation time. The main contributions of this research can be divided into academic and practical aspects. Academically, the proposed method integrates

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The authors would like to thank the Ministry of Science and Technology of Taiwan for ﬁnancially supporting this research under Contract no. MOST 109-2628-E-007-002-MY3.

References (61)

L. Bergamini et al.
Deep learning-based method for vision-guided robotic grasping of unknown objects
Adv. Eng. Inform.
(2020)
M.-C. Chiu et al.
Developing a personalized recommendation system in a smart product service system based on unsupervised learning model
Comput. Industry
(2021)
S.H. Lee et al.
An efficient selection of HOG feature for SVM classification of vehicle
L. Ren et al.
Multi-bearing remaining useful life collaborative prediction: A deep learning approach
J. Manuf. Syst.
(2017)
S. Shao et al.
Generative adversarial networks for data augmentation in machine fault diagnosis
Comput. Industry
(2019)
Y. Song et al.
A novel robotic grasp detection method based on region proposal networks
Robotics Comput.-Integrated Manuf.
(2020)
J. Wang et al.
Machine vision intelligence for product defect inspection based on deep learning and Hough transform
J. Manuf. Syst.
(2019)
J. Wang et al.
Deep learning for smart manufacturing: Methods and applications
J. Manuf. Syst.
(2018)
Y. Wang et al.
A smart surface inspection system using faster R-CNN in cloud-edge computing environment
Adv. Eng. Inform.
(2020)
Z. Wang
The applications of deep learning on traffic identification
BlackHat USA
(2015)

X. Yin et al.

Ensemble deep learning based semi-supervised soft sensor modeling method and its application on quality prediction for coal preparation process

Adv. Eng. Inform.

(2020)

J.P. Yun et al.

Automated defect inspection system for metal surfaces based on deep learning and data augmentation

J. Manuf. Syst.

(2020)

J. Zhang et al.

Long short-term memory for machine remaining life prediction

J. Manuf. Syst.

(2018)

N.H. Aung, Y.K. Thu, S.S. Maung, Feature Based Myanmar Fingerspelling Image Classification Using SIFT, SURF and BRIEF,...

S. Bacchi et al.

Deep learning natural language processing successfully predicts the cerebrovascular cause of transient ischemic attack-like presentations

Stroke

(2019)

B. Benjdira, T. Khursheed, A. Koubaa, A. Ammar, K. Ouni, Car detection using unmanned aerial vehicles: Comparison...

A. Bhat, Aerial Object Detection using Learnable Bounding Boxes,...

M.C. Chiu et al.

Applying transfer learning to achieve precision marketing in an omni-channel system–a case study of a sharing kitchen platform

Int. J. Prod. Res.

(2021)

M.C. Chiu et al.

An integrative machine learning method to improve fault detection and productivity performance in a cyber-physical system

J. Comput. Inform. Sci. Eng.

(2020)

S.L. Colyer et al.

A review of the evolution of vision-based motion analysis and the integration of advanced computer vision methods towards developing a markerless system

Sports Med.-open

(2018)

R. Girshick, Fast r-cnn. InProceedings of the IEEE international conference on computer vision, 2015, pp....

R. Girshick et al.

Rich feature hierarchies for accurate object detection and semantic segmentation

I. Goodfellow et al.

Deep learning

(2016)

D. Guo, F. Sun, H. Liu, T. Kong, B. Fang, N. Xi, A hybrid deep architecture for robotic grasp detection, in: 2017 IEEE...

K. He et al.

Mask r-cnn

K. He et al.

Spatial pyramid pooling in deep convolutional networks for visual recognition

IEEE Trans. Pattern Anal. Mach. Intelligence

(2015)

A.K. Jain et al.

Artificial neural networks: A tutorial

Computer

(1996)

Y. Jiang, X. Zhu, X. Wang, S. Yang, W. Li, H. Wang, P. Fu, Z. Luo, R2CNN: Rotational region CNN for orientation robust...

L. Jiao et al.

A Survey of Deep Learning-Based Object Detection

IEEE Access

(2019)

H.S. Kang et al.

Smart manufacturing: Past research, present findings, and future directions

Int. J. Precision Eng. Manuf.-green Technol.

(2016)

Cited by (21)

Integrating object detection and natural language processing models to build a personalized attraction recommendation agent in a smart product service system
2024, Advanced Engineering Informatics
Product Service System (PSS) is a new business model that integrates tangible products and intangible services to better meet customer needs and expectations. In recent years, scholars had some efforts to enhance the capability of PSS with the support of artificial intelligence (AI) which is known as Smart PSS(SPSS). So far, most previous studies adopted a single model which cannot handle multiple tasks simultaneously that results in unsatisfactory services to customers. In addition, customer preference cannot be fully addressed to achieve personalization. Therefore, this study proposes a method that integrates multiple models into an SPSS with three steps: (1) Collect data and construct an appropriate object detection model. (2) Develop smart PSS solutions. Then, (3) Optimize the system based on feedback through Natural Language Processing (NLP) to provide customers with personalized services. An attraction recommendation case study with experiment is designed to verify the proposed method. The results show that proposed SPSS can optimize the system in time according to the feedback of users, and provide better personalized services. This research is the first research that utilizes applied both text and image data to extract customer characteristics to better capture the voice of customers.
Faster region based convolution neural network with context iterative refinement for object detection
2024, Measurement: Sensors
In this paper, proposed a novel method to improve the localization precision of identified objects. We present a framework for iteratively enhancing image region recommendations to meet ground truth values in this research. The Faster R–CNN (FR-CNN) seems to be an object recognition deep convolutional network. It gives the user the impression that the network is cohesive and single. The network can provide accurate and timely predictions about the whereabouts of a range of objects. We first build a unified model based on rapid predictions to relocate inaccurate area recommendations. Because the emphasis is on object detection, it may be utilised with a wide range of datasets and is compatible with various FR-CNN architectures. Second, we focus on the application of the joint score function to a variety of picture features. This joint score function depicts the location of the concealed object concerning other objects. The picture data and an updated structured production loss function are the only two inputs that influence the parameters of the joint scoring function. The join-score function and iterative context refinement (CIR) are used to generate our final unified model, which is then classified using Smooth Support Vector Machine (SSVM). We measured accuracy using the mean average precision after training FR-CNN + CIR and SSVM on a low-cost GPU using the PASCAL VOC 2012 dataset. Our results are 3.6 % more exact than rival deep learning algorithms on average.
Prognostic fault prevention by segmented digital transformation of manufacturing process signals
2023, Advanced Engineering Informatics
Faults during operation of a system can occur at any time. Contemporary fault diagnosis systems focus on identifying the problem. However, when faults occur, the system has already incurred losses, either as rejects or machine damage. In this paper, a new method that predicts future faults and subsequently prevents losses caused by the faults has been developed. The method makes use of a series of digital transformations of signals from the manufacturing process. An important finding of this research is to divide the continuous signal stream into data segments from which a parameter called sum standard deviation frequency (SSDF) can be extracted. SSDF reflects the change of inconspicuous conditions in the signal in different data segments over time. To predict how the signal stream will perform in the next minute, a new density peak clustering computational algorithm is developed to transform the SSDF database into “normal”, “marginal”, and “abnormal” records. Finally, a novel method “moving SSDF deep learning” (MSDL) concept has been proven to have a better prediction of future capability than four other prediction methods. The method has been applied to a 3D printing process in which belt tension affecting machine performance has been predicted 30 s ahead of the problem, giving sufficient time for the operator to adjust or stop the machine before a fault occurs.
An efficient 3D object detection method based on Fast Guided Anchor Stereo RCNN
2023, Advanced Engineering Informatics
In most binocular 3D detection algorithms, a large number of anchor points need to be selected, which leads to the problem of slow feature extraction. To solve this problem, an anchor-guided 3D object detection algorithm for autonomous driving is proposed based on Stereo Recurrent Convolutional Neutral Network (Stereo RCNN), which is called Fast Guided Anchored Stereo RCNN (FGAS RCNN). The proposed FGAS framework is divided into two stages. In the first stage, a probability map is generated for the left and right input images to determine the foreground position. Sparse anchor points and corresponding sparse anchor boxes are generated from the prior information. Left and right anchors are used as a whole to generate a 2D preselection box. In the second stage, a Feature Pyramid Network (FPN) based on key point generation network is used to generate key points, which are combined with stereo regression to generate 3D preselected boxes. Finally, instance-level disparity estimation is proposed to solve the problem of pixel-level information loss in the original image. Instance-level disparity is combined with instance segmentation masks to improve the accuracy of center depth on the 3D bounding box. Extensive experiments on the challenging Kitti dataset and NuScences dataset show that the proposed method reduces the computational cost while maintaining a high regression rate without any depth information and prior information of position. Compared to other methods, the proposed method has higher efficiency, better robustness and stronger generalization ability.
Continuous frame motion sensitive self-supervised collaborative network for video representation learning
2023, Advanced Engineering Informatics
Motion, as a feature of video that changes in temporal sequences, is crucial to visual understanding. The powerful video representation and extraction models are typically able to focus attention on motion features in challenging dynamic environments to complete more complex video understanding tasks. However, previous approaches discriminate mainly based on similar features in the spatial or temporal domain, ignoring the interdependence of consecutive video frames. In this paper, we propose the motion sensitive self-supervised collaborative network, a video representation learning framework that exploits a pretext task to assist feature comparison and strengthen the spatiotemporal discrimination power of the model. Specifically, we first propose the motion-aware module, which extracts consecutive motion features from the spatial regions by frame difference. The global–local contrastive module is then introduced, with context and enhanced video snippets being defined as appropriate positive samples for a broader feature similarity comparison. Finally, we introduce the snippet operation prediction module, which further assists contrastive learning to obtain more reliable global semantics by sensing changes in continuous frame features. Experimental results demonstrate that our work can effectively extract robust motion features and achieve competitive performance compared with other state-of-the-art self-supervised methods on downstream action recognition and video retrieval tasks.
Defect-aware transformer network for intelligent visual surface defect detection
2023, Advanced Engineering Informatics
Surface defect detection plays an increasing role in intelligent manufacturing and product life-cycle management, such as quality inspection, process monitoring, and preventive maintenance. The existing intelligent methods almost adopt convolution architecture, and the limited receptive field hinders performance improvement of defect detection. In general, a larger receptive field can bring richer contextual information, resulting in better performance. Although operations such as dilated convolution can expand the receptive field, this improvement is still limited. Recently, benefitting from the ability to model long-range dependencies, Transformer-based models achieve great success in computer vision and image processing. However, applying Transformer-based models without modification is not desirable because there is no awareness and pertinence to defects. In this paper, an intelligent method is proposed by using defect-aware Transformer network (DAT-Net). In DAT-Net, Transformer replaces convolution in encoder to overcome the difficulty of modeling long-range dependencies. Defect-aware module assembled by basic weight matrixes is incorporated into Transformer to perceive and capture geometry and characteristic of defect. Graph position encoding by constructing a dynamic graph on tokens is designed to provide auxiliary positional information, which brings desired improved performance and fine adaptability. Specially, we carry out field experiments and painstakingly construct blade defect and tool wear datasets to compare DAT-Net with other methods. The comprehensive experiments demonstrate that DAT-Net has superior performance with 90.19 mIoU on blade defect dataset and 87.24 mIoU on tool wear dataset.

View all citing articles on Scopus

View full text

A novel directional object detection method for piled objects using a hybrid region-based convolutional neural network

Abstract

Introduction

Section snippets

Literature review

Methodology

Case study

Conclusion

Declaration of Competing Interest

Acknowledgement

Adv. Eng. Inform.

Comput. Industry

J. Manuf. Syst.

Comput. Industry

Robotics Comput.-Integrated Manuf.

J. Manuf. Syst.

J. Manuf. Syst.

Adv. Eng. Inform.

BlackHat USA

Adv. Eng. Inform.

J. Manuf. Syst.

J. Manuf. Syst.

Deep learning natural language processing successfully predicts the cerebrovascular cause of transient ischemic attack-like presentations

Stroke

Applying transfer learning to achieve precision marketing in an omni-channel system–a case study of a sharing kitchen platform

Int. J. Prod. Res.

An integrative machine learning method to improve fault detection and productivity performance in a cyber-physical system

J. Comput. Inform. Sci. Eng.

A review of the evolution of vision-based motion analysis and the integration of advanced computer vision methods towards developing a markerless system

Sports Med.-open

Rich feature hierarchies for accurate object detection and semantic segmentation

Deep learning

Mask r-cnn

Spatial pyramid pooling in deep convolutional networks for visual recognition

IEEE Trans. Pattern Anal. Mach. Intelligence

Artificial neural networks: A tutorial

Computer

A Survey of Deep Learning-Based Object Detection

IEEE Access

Smart manufacturing: Past research, present findings, and future directions

Int. J. Precision Eng. Manuf.-green Technol.