Feature selection generating directed rough-spanning tree for crime pattern analysis

Das, Priyanka; Das, Asit Kumar; Nayak, Janmenjoy

doi:10.1007/s00521-018-3880-8

Feature selection generating directed rough-spanning tree for crime pattern analysis

Soft Computing Techniques: Applications and Challenges
Published: 20 November 2018

Volume 32, pages 7623–7639, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

390 Accesses
9 Citations
Explore all metrics

Abstract

Nowadays, crime is a major threat to the society that affects the normal life of human beings all over the world. It is very important to make the world free from all aspects of crime activities. The main motivation of this work is to understand various crime patterns for avoiding and preventing the crime events to occur in future and save the world from such curse. Though research is going on for solving such problems, no work is noticed to handle the roughness or ambiguity that exists in the crime reports. The present work extracts all possible crime features from the crime reports and selects only the important features required for crime pattern analysis. For this purpose, it develops a purely supervised feature selection model integrating rough set theory and graph theory (spanning tree of a directed weighted graph). The crime reports are preprocessed, and crime features are extracted to represent each report as a feature vector (i.e., a set of distinct crime features). For crime pattern analysis, the main objective of our work, all extracted features are not necessarily essential, rather a minimal subset of relevant features are sufficient. Thus, feature selection is the main contribution in the paper that not only enhances the efficiency of subsequent mining process but also increases its correctness. The rough set theory-based relative indiscernibility relation is defined to measure the similarity between two features relative to the crime type. Based on the similarity score, a weighted and directed graph has been constructed that comprises the features as nodes and the inverse of the similarity score representing the similarity of feature v to u as the weight of the corresponding edge. Then, a minimal spanning tree (termed as rough-spanning tree) is generated using Edmond/Chu–Liu algorithm from the constructed directed graph and the importance of the nodes in the spanning tree is measured using the weights of the edges and the degrees (in-degrees and out-degrees) of the nodes in the spanning tree. Finally, a feature selection algorithm has been proposed that selects the most important node and remove it from the spanning tree iteratively until the modified graph (not necessarily a tree) becomes a null graph. The selected nodes are considered as the important feature subset sufficient for crime pattern analysis. The method is evaluated using various statistical measures and compared with related state-of-the-art methods to express its effectiveness in crime pattern analysis. The Wilcoxon rank-sum test, a popular nonparametric version of the two-sample t test, is done to express that the proposed supervised model is statistically significant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental feature selection approach to multi-dimensional variation based on matrix dominance conditional entropy for ordered data set

Article 10 April 2024

A systematic literature review of soft set theory

Article Open access 25 February 2024

A survey on topological structures on fuzzy rough sets

Article 08 April 2024

Notes

Throughout the paper, the term feature is used instead of conditional feature.
As the relative indiscernibility relation is defined on conditional features relative to decision feature, so conditional features more correlated to decision feature are selected. Thus, the conditional features independent to each other and correlated to decision feature are selected, which is the main objective of our feature selection algorithm.

References

Abdi H, Williams JL (2010) Principal component analysis. WIREs Comput Stat 2(4):433–459
Article Google Scholar
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Article Google Scholar
Bazlamac CF, Hindi KS (2001) Minimum-weight spanning tree algorithms a survey and empirical study. Comput Oper Res 28(8):767–785
Article MathSciNet Google Scholar
Blondel VD, Jean-Loup Guillaume RL, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):1–12
Article Google Scholar
Broutin N, Devroye L, McLeish E (2008) Note on the structure of Kruskal’s algorithm. Algorithmica 56(2):141
Article MathSciNet Google Scholar
Chu YJ, Liu TH (1965) On the shortest arborescence of a directed graph. Sci Sin 14:1396–1400
MathSciNet MATH Google Scholar
Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJournal Complex Syst 1695:1–9. http://igraph.org
Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65(C):400–411
Article Google Scholar
Das P, Das AK (2017) 8th international conference on computing, communication and networking technologies, pp 1–6
Deo N (1974) Graph theory with applications to engineering and computer science. Prentice-Hall Inc, Upper Saddle River
MATH Google Scholar
Fazayeli F, Wang L, Mandziuk J (2008) Feature selection based on the rough set theory and expectation-maximization clustering algorithm. In: Chan C-C, Grzymala-Busse JW, Ziarko WP (eds) Rough sets and current trends in computing. Springer, Berlin, pp 272–282
Chapter Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Article Google Scholar
Hu XT, Lin TY, Han J (2003) A new rough sets model based on database systems. In: Wang G, Liu Q, Yao Y, Skowron A (eds) Rough sets, fuzzy sets, data mining, and granular computing. Springer, Berlin, pp 114–121
Chapter Google Scholar
Huda RK, Banka H (2018) Efficient feature selection and classification algorithm based on PSO and rough sets. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3317-9
Article Google Scholar
Jalil MMA, Ling CP, Noor NMM, Mohd F (2017a) Knowledge representation model for crime analysis. Procedia Comput Sci 116:484–491
Article Google Scholar
Jalil MMA, Mohd F, Noor NMM (2017b) A comparative study to evaluate filtering methods for crime data feature selection. Procedia Comput Sci 116:113–120
Article Google Scholar
Janeela Theresa MM, Joseph Raj V (2016) A maximum spanning tree-based dynamic fuzzy supervised neural network architecture for classification of murder cases. Soft Comput 20(6):2353–2365
Article Google Scholar
Edmonds J (1967) Optimum branchings. J Res Natl Bureau Stand 71:233–240
Article MathSciNet Google Scholar
Keerthika T, Premalatha K (2016) Rough set reduct algorithm based feature selection for medical domain. J Chem Pharm Sci 9(2):896–902
Google Scholar
Lehrmann A, Huber M, Polatkan AC, Pritzkau A, Nieselt K (2013) Visualizing dimensionality reduction of systems biology data. Data Min Knowl Discov 27(1):146–165
Article MathSciNet Google Scholar
Loper E, Bird S (2002) NLTK: the natural language toolkit. In: Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, vol 1, pp 63–70
Girvan M (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826
Article MathSciNet Google Scholar
Maji P, Paul S (2011) Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. Int J Approx Reason 52(3):408–426
Article Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. CoRR. arXiv:abs/1301.3781:1–12
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013b) Distributed representations of words and phrases and their compositionality. CoRR. arXiv:abs/1310.4546:1–9
Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582
Article Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Article Google Scholar
Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29(7):661–688
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Rosvall M, Axelsson D, Bergstrom CT (2009) The map equation. Eur Phys J Spec Top 178(1):13–23
Article Google Scholar
Sabu MK (2018) A rough set based feature selection approach for the prediction of learning disabilities. Int J Adv Comput Eng Netw 2(12):43–48
Google Scholar
Sengupta S, Das AK (2012) Single reduct generation based on relative indiscernibility of rough set theory. Int J Soft Comput 3(1):107–119
Article Google Scholar
Shalabi LA (2017) Perceptions of crime behavior and relationships: rough set based approach. Int J Comput Sci Inf Secur 15(3):413–420
Google Scholar
Singh B, Sankhwar JS, Vyas OP (2014) Optimization of feature selection method for high dimensional data using fisher score and minimum spanning tree. In: 2014 annual IEEE India conference (INDICON), pp 1–6
Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14
Article Google Scholar
Steven Bird EK, Loper E (2009) Natural language processing in python. O’Reilly Media, Sebastopol
MATH Google Scholar
JeraldBeno TR, K M (2012) Dimensionality reduction: rough set based feature reduction. Int J Sci Res Publ 2(9):1–6
Google Scholar
Taha K, Yoo PD (2017) Using the spanning tree of a criminal network for identifying its leaders. IEEE Trans Inf Forensics Secur 12(2):445–453
Article Google Scholar
Weng J, Young DS (2017) Some dimension reduction strategies for the analysis of survey data. J Big Data 4(1):43
Article Google Scholar
Yager RR, Alajlan N (2015) Dempster-shafer belief structures for decision making under uncertainty. Knowl Based Syst 80(C):58–66
Article Google Scholar
Yang HH, Moody J (1999) Feature selection based on joint mutual information. In: Proceedings of international ICSC symposium on advances in intelligent data analysis, pp 22–25
Alapati Yaswanth Kumar, Sindhu SSK (2015) Relevant feature selection from high-dimensional data using MST based clustering. Int J Emerg Trends Sci Technol 2(3):1997–2001
Google Scholar
Zadeh L (1965) Fuzzy sets. Inf Control 8(3):338–353
Article Google Scholar
Zaher AA, Berretta R, Arefin AS, Moscato P (2015) Proceedings of the 13th Australasian data mining conference (AusDM 2015). In: FSMEC: a feature selection method based on the minimum spanning tree and evolutionary computation, pp 129–139
Zhang M, Yao JT (2004) A rough sets based approach to feature selection. In: IEEE annual meeting of the fuzzy information, vol 1, pp 434–439

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, 711103, India
Priyanka Das & Asit Kumar Das
Department of Computer Science and Engineering, Sri Sivani College of Engineering, Chilakapalem, Andhra Pradesh, India
Janmenjoy Nayak

Authors

Priyanka Das
View author publications
You can also search for this author in PubMed Google Scholar
Asit Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar
Janmenjoy Nayak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Priyanka Das.

Ethics declarations

Conflict of interest

The authors declare that this manuscript has no conflict of interest with any other published source and has not been published previously (partly or in full). No data have been fabricated or manipulated to support our conclusions.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, P., Das, A.K. & Nayak, J. Feature selection generating directed rough-spanning tree for crime pattern analysis. Neural Comput & Applic 32, 7623–7639 (2020). https://doi.org/10.1007/s00521-018-3880-8

Download citation

Received: 19 July 2018
Accepted: 09 November 2018
Published: 20 November 2018
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00521-018-3880-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection generating directed rough-spanning tree for crime pattern analysis

Abstract

Access this article

Similar content being viewed by others

Incremental feature selection approach to multi-dimensional variation based on matrix dominance conditional entropy for ordered data set

A systematic literature review of soft set theory

A survey on topological structures on fuzzy rough sets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature selection generating directed rough-spanning tree for crime pattern analysis

Abstract

Access this article

Similar content being viewed by others

Incremental feature selection approach to multi-dimensional variation based on matrix dominance conditional entropy for ordered data set

A systematic literature review of soft set theory

A survey on topological structures on fuzzy rough sets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation