Abstract
Data mining and statistical learning techniques are powerful analysis tools yet to be incorporated in the domain of urban studies and transportation research. In this work, we analyze an activity-based travel survey conducted in the Chicago metropolitan area over a demographic representative sample of its population. Detailed data on activities by time of day were collected from more than 30,000 individuals (and 10,552 households) who participated in a 1-day or 2-day survey implemented from January 2007 to February 2008. We examine this large-scale data in order to explore three critical issues: (1) the inherent daily activity structure of individuals in a metropolitan area, (2) the variation of individual daily activities—how they grow and fade over time, and (3) clusters of individual behaviors and the revelation of their related socio-demographic information. We find that the population can be clustered into 8 and 7 representative groups according to their activities during weekdays and weekends, respectively. Our results enrich the traditional divisions consisting of only three groups (workers, students and non-workers) and provide clusters based on activities of different time of day. The generated clusters combined with social demographic information provide a new perspective for urban and transportation planning as well as for emergency response and spreading dynamics, by addressing when, where, and how individuals interact with places in metropolitan areas.
Similar content being viewed by others
References
Axhausen KW, Zimmermann A, Schönfelder S, Rindsfüser G, Haupt T (2002) Observing the rhythms of daily life: a six-week travel diary. Transportation 29(2): 95–124. doi:10.1023/a:1014247822322
Balcan D, Colizza V, Gonçalves B, Hu H, Ramasco JJ, Vespignani A (2009) Multiscale mobility networks and the spatial spreading of infectious diseases. Proc Natl Acad Sci USA 106(51): 21484–21489. doi:10.1073/pnas.0906910106
Balmer M, Axhausen KW, Nagel K (1985) Agent-based demand-modeling framework for large-scale microsimulations. vol 1985. National Research Council, Washington, DC, ETATS-UNIS
Batty M (2005) Cities and complexity: understanding cities with cellular automata, agent-based models, and fractals. The MIT press, Cambridge
Becker GS (1965) A theory of the allocation of time. Econ J 75(299): 493–517
Becker GS (1977) The economic approach to human behavior. University of Chicago Press, Chicago
Becker GS (1991) A treatise on the family. Harvard University Press, Cambridge
Bekhor S, Dobler C, Axhausen KW (2011) Integration of activity-based with agent-based models: an example from the tel aviv model and MATSim. In: Transportation Research Board 90th Annual Meeting, Washington DC
Ben-Akiva M, Bowman JL (1998) Integration of an activity-based model system and a residential location model. Urban Stud 35(7): 1131–1153. doi:10.1080/0042098984529
Bhat CR, Koppelman FS (1999) A retrospective and prospective survey of time-use research. Transportation 26(2): 119–139. doi:10.1023/a:1005196331393
Bishop CM (2009) Pattern recognition and machine learning. Springer, New York
Bowman JL, Ben-Akiva M (2001) Activity-based disaggregate travel demand model system with activity schedules. Transp Res Part A Policy Pract 35(1): 1–28
Brun M, Sima C, Hua J, Lowey J, Carroll B, Suh E, Dougherty ER (2007) Model-based evaluation of clustering validation measures. Pattern Recognit 40(3): 807–824
Calabrese F, Reades J, Ratti C (2010) Eigenplaces: segmenting space through digital signatures. vol 9
Candia J, González MC, Wang P, Schoenharl T, Madey G, Barabási A-L (2008) Uncovering individual and collective human dynamics from mobile phone records. J Phys A Math Theor 41(22): 224015
Chapin FS (1974) Human activity patterns in the city: things people do in time and in space. Wiley, New York
Chicago Travel Tracker Household Travel Inventory (2008) http://www.cmap.illinois.gov/travel-tracker-survey
Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci 105(41): 15649–15653. doi:10.1073/pnas.0803685105
Ding C, He X (2004) K-means clustering via principal component analysis. Paper presented at the Proceedings of the twenty-first international conference on Machine learning, Banff, Alberta, Canada
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York
Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3): 32–57
Durrett R (2005) Probability: theory and examples. Thomson Brooks/Cole, Belmont
Eagle N, Pentland A (2009) Eigenbehaviors: identifying structure in routine. Behav Ecol Sociobiol 63(7): 1057–1066. doi:10.1007/s00265-009-0739-0
Eagle N, Pentland A, Lazer D (2009) Inferring friendship network structure by using mobile phone data. Proc Natl Acad Sci USA. doi:10.1073/pnas.0900282106
Foth, M, Forlano, L, Satchell, C, Gibbs, M (eds) (2011) From social butterfly to engaged citizen: urban informatics, social media, ubiquitous computing, and mobile technology to support citizen engagement. MIT Press, Cambridge
Freud S (1953) Collected papers, vol IV. vol v. 1–5. Hogarth Press and The Institute of Psychoanalysis, London
Geerken M, Gove WR (1983) At home and at work: the family’s allocation of labor. Sage Publications; Published in cooperation with the National Council on Family Relations, Beverly Hills, CA
Gonzalez MC, Hidalgo CA, Barabasi A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782. http://www.nature.com/nature/journal/v453/n7196/suppinfo/nature06958_S1.html
Goodchild MF, Janelle DG (1984) The city around the clock: space–time patterns of urban ecological structure. Environ Plan A 16(6): 807–820
Greaves S (2004) GIS and the collection of travel survey data. In: Hensher DA Handbook of transport geography and spatial systems. Elsevier, New York
Gupta S, Rao K, Bhatnagar V (1999) K-means clustering algorithm for categorical attributes. Data Warehous Knowl Discov 1676: 797–797. doi:10.1007/3-540-48298-9_22
Hägerstrand T (1989) Reflections on “what about people in regional science?”. Pap Reg Sci 66(1): 1–6
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2): 107–145. doi:10.1023/a:1012801612483
Hanson S, Hanson P (1980) Gender and urban activity patterns in Uppsala, Sweden. Geogr Rev 70(3): 291–299
Hanson S, Kwan M-P (eds) (2008) Transport: critical essays in human geography. 1 edn
Harvey A, Taylor M (2000) Activity settings and travel behaviour: a social contact perspective. Transportation 27(1): 53–73. doi:10.1023/a:1005207320044
Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3): 283–304. doi:10.1023/a:1009769707641
Jolliffe IT (2002) Principal component analysis. Springer, New York
Kargupta, H, Han, J (eds) (2009) Next generation of data mining. CRC Press, Boca Raton
Kim M, Kotz D, Kim S (2006) Extracting a mobility model from real user traces. In: IEEE INFOCOM’06, Barcelona, Spain. doi:citeulike-article-id:903652
Kwan M-P (1999) Gender and individual access to urban opportunities: a study using space–time measures. Prof Geogr 51(2): 210–227
Li L, Prakash BA (2011) Time series clustering: complex is simpler! In: Proceedings of the 28th international conference on machine learning
Maslow AH, Frager R (1987) Motivation and personality. Harper and Row, New York
Nature Editorial (2008) A flood of hard data. Nature 453(7196):698
Ordonez C (2003) Clustering binary data streams with K-means. Paper presented at the proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, San Diego, California
Portugali, J, Meyer, H, Stolk, E, Tan, E (eds) (2012) Complexity theories of cities have come of age: an overview with implications to urban planning and design. Springer, Berlin
Ralambondrainy H (1995) A conceptual version of the K-means algorithm. Pattern Recognit Lett 16(11): 1147–1157. doi:10.1016/0167-8655(95)00075-r
Reggiani, A, Nijkamp, P (eds) (2009) Complexity and spatial networks: in search of simplicity. Springer, Berlin
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20: 53–65
Sang S, O’Kelly M, Kwan M-P (2011) Examining commuting patterns. Urban Stud 48(5): 891–909. doi:10.1177/0042098010368576
Shen Q (1998) Location characteristics of inner-city neighborhoods and employment accessibility of low-wage workers. Environ Plan B Plan Des 25(3): 345–365
Song C, Qu Z, Blumm N, Barabási A-L (2010) Limits of predictability in human mobility. Science 327(5968): 1018–1021. doi:10.1126/science.1177170
Taylor PJ, Parkes DN (1975) A Kantian view of the city: a factorial-ecology experiment in space and time. Environ Plan A 7(6): 671–688
Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1): 71–86. doi:10.1162/jocn.1991.3.1.71
Waddell P (2002) UrbanSim: modeling urban development for land use, transportation and environmental planning. J Am Plan Assoc 68(3): 297–314
Wang D, Pedreschi D, Song C, Giannotti F, Barabási A-L (2011a) Human mobility, social ties and link prediction. Paper presented at the 17th ACM SIGKDD conference on knowledge discovery and data mining (KDD’11)
Wang D, Wen Z, Tong H, Lin C-Y, Song C, Barabási A-L (2011b) Information spreading in context. Paper presented at the proceedings of the 20th international conference on World wide web, Hyderabad, India
Wang P, González MC, Hidalgo CA, Barabási A-L (2009) Understanding the spreading patterns of mobile phone viruses. Science 324(5930): 1071–1076. doi:10.1126/science.1167053
Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou Z-H, Steinbach M, Hand D, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37. doi:10.1007/s10115-007-0114-2
Xu R, Wunsch DC (2008) Partitional clustering. In: Clustering. Wiley, pp 63–110. doi:10.1002/9780470382776.ch4
Yang J, Leskovec J (2011) Patterns of temporal variation in online media. Paper presented at the proceedings of the fourth ACM international conference on Web search and data mining, Hong Kong, China
Yu H, Shaw S-L (2008) Exploring potential human activities in physical and virtual spaces: a spatio-temporal GIS approach. Int J Geogr Inf Sci 22(4): 409–430
Zha H, Ding C, Gu M, He X, Simon H (2001) Spectral relaxation for K-means clustering. Adv Neural Inf Process Syst 14(NIPS’01): 1057–1064
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Fei Wang, Hanghang Tong, Phillip Yu, Charu Aggarwal.
Rights and permissions
About this article
Cite this article
Jiang, S., Ferreira, J. & González, M.C. Clustering daily patterns of human activities in the city. Data Min Knowl Disc 25, 478–510 (2012). https://doi.org/10.1007/s10618-012-0264-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0264-z