ABSTRACT
The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS dataset is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS dataset are designed by combining expert knowledges and simple statistics. In this paper, we proposed a comprehensive data management and data mining approach for GSS datasets. The approach is designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute preprocessing and filter-based attribute selection; a data mining phase which can extract hidden knowledges from the dataset by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis. By leveraging the power of data mining techniques, our proposed approach can explore knowledges in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey dataset are conducted at the end to evaluate the performance of our approach.
- Davis J A, Smith T W. The NORC general social survey: A user's guide[M]. SAGE publications, 1991.Google Scholar
- Hu A, Leamaster R J. Intergenerational religious mobility in contemporary China[J]. Journal for the Scientific Study of Religion, 2015, 54(1): 79--99.Google ScholarCross Ref
- Tan H. The Problems in Rural English Teaching and the Optimization Path: A Study Based on the Chinese General Social Survey Data[J]. Asian Agricultural Research, 2014, 6(1812-2016-143451): 86--92.Google Scholar
- Wu X, Ye H, He G G. Fertility decline and women's status improvement in China[J]. Chinese Sociological Review, 2014, 46(3): 3--25.Google ScholarCross Ref
- Johnston M P. Secondary data analysis: A method of which the time has come[J]. Qualitative and quantitative methods in libraries, 2017, 3(3): 619--626.Google Scholar
- Riccardi, Lorenzo, "Individual Income Tax Law", Chinese Tax Law and International Treaties, Springer International Publishing, 2013, pp. 9--21Google ScholarCross Ref
- Statistics Canada, "Age Categories, Life Cycle Groupings". 2017. https://www.statcan.gc.ca/eng/concepts/definitions/age2Google Scholar
- Australian Bureau of Statistics, "1200.0.55.006 - Age Standard", 2014. https://www.abs.gov.au/ausstats/[email protected]/Lookup/1200.0.55.006main+features62014,%20Version%201.7Google Scholar
- Du, Peng; Yang, Hui. "China's population ageing and active ageing". China Journal of Social Work., 2010, 3 (2--3): 139--152.Google ScholarCross Ref
- Zhao Z, Liu H. Spectral feature selection for supervised and unsupervised learning[C]//Proceedings of the 24th international conference on Machine learning. ACM, 2007: 1151--1157.Google Scholar
- Mitra, P., Murthy, C.A., Pal, S.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301--312 (2002)Google ScholarDigital Library
- Kruidenier L M, Nicolaï S P A, Willigendael E M, et al. Functional claudication distance: a reliable and valid measurement to assess functional limitation in patients with intermittent claudication[J]. BMC cardiovascular disorders, 2009, 9(1): 9.Google Scholar
- Dittman, D.J., Khoshgoftaar, T.M., Wald, R., Napolitano, A.: Classification performance of rank aggregation techniques for ensemble gene selection. In: The Twenty-Sixth International FLAIRS Conference (2013).Google Scholar
- Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th international conference on World Wide Web, pp.613--622. ACM (2001).Google ScholarDigital Library
- Borgelt C. An Implementation of the FP-growth Algorithm, In Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations. ACM, 2005: 1--5.Google Scholar
- Gao J, Liu N, Lawley M, Hu X. An Interpretable Classification Framework for Information Extraction from Online Healthcare Forums. J Healthc Eng. 2017;2017:2460174. doi:10.1155/2017/2460174Google Scholar
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.Google ScholarCross Ref
- Friedman, Jerome H, and Bogdan E Popescu. "Predictive learning via rule ensembles." The Annals of Applied Statistics. JSTOR, 916--54. (2008).Google ScholarCross Ref
- National Survey Research Center (NSRC) at Renmin University of China "Chinese General Society Survey, 2019", http://cgss.ruc.edu.cn/index.php?r=index/index&hl=enGoogle Scholar
Index Terms
- Comprehensive Data Management and Analytics for General Society Survey Dataset
Recommendations
Big Data Management and Analytics for Disability Datasets
ICCSE'18: Proceedings of the 3rd International Conference on Crowd Science and EngineeringThe disability datasets is the datasets which contains the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can have a better understanding of how to make working plans and policies, so ...
Scalable Vertical Mining for Big Data Analytics of Frequent Itemsets
Database and Expert Systems ApplicationsAbstractAdvances in technology and the increasing growth of popularity on Internet of Things (IoT) for many applications have produced huge volume of data at a high velocity. These valuable big data can be of a wide variety or different veracity. Embedded ...
From Big Data to Big Data Mining: Challenges, Issues, and Opportunities
Proceedings of the 18th International Conference on Database Systems for Advanced Applications - Volume 7827While "big data" has become a highlighted buzzword since last year, "big data mining", i.e., mining from big data, has almost immediately followed up as an emerging, interrelated research area. This paper provides an overview of big data mining and ...
Comments