Skip to main content
Log in

Mobile social group sizes and scaling ratio

  • Original Article
  • Published:
AI & SOCIETY Aims and scope Submit manuscript

Abstract

Social data mining has become an emerging area of research in information and communication technology fields. The scope of social data mining has expanded significantly in the recent years with the advance of telecommunication technologies and the rapidly increasing accessibility of computing resources and mobile devices. People increasingly engage in and rely on phone communications for both personal and business purposes. Hence, mobile phones become an indispensable part of life for many people. In this article, we perform social data mining on mobile social networking by presenting a simple but efficient method to define social closeness and social grouping, which are then used to identify social sizes and scaling ratio of close to “8”. We conclude that social mobile network is a subset of the face-to-face social network, and both groupings are not necessary the same, hence the scaling ratios are distinct. Mobile social data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Bhaskar P, Ahamed SI (2007) Privacy in pervasive computing and open issues. In: Proceedings of the second international conference on availability, reliability and security, pp 147–154

  • Birtchnell J (1997) Personality set within an octagonal model of relating. In: Plutchik R, Hope C (eds) Circumplex models of personality and emotions. American Psychological Association, pp 155–182

  • Bleecker J (2006) What’s your social doing in my mobile? Design patterns for mobile social software. In: Proceedings of WWW2006 workshop MobEA IV—empowering the mobile web, pp 1–6

  • Cheng J, Wong S, Yang H, Lu S (2007) SmartSiren: virus detection and alert for smartphones. In: Proceedings of the 5th international conference on mobile systems, applications and services, pp 258–271

  • Coleman JS (1964) An introduction to mathematical sociology. Collier-Macmillan, London

    Google Scholar 

  • Dantu R, Kolan P (2005) Detecting spam in VoIP networks. In: Proceedings of the steps to reducing unwanted traffic on the internet on steps to reducing unwanted traffic on the internet workshop, pp 5–10

  • Dunbar RIM (1992) Neocortex size as a constraint on group size in primates. J Hum Evol 22:469–493

    Article  Google Scholar 

  • Dunbar RIM (1993) Co-evolution of neocortex size, group size, and language in humans. Behav Brain Sci 16:681

    Article  Google Scholar 

  • Dunbar RIM, Spoors M (1995) Social networks, support cliques, and kinship. Hum Nat 6:273–291

    Article  Google Scholar 

  • Erzan A, Eckmann JP (1997) q-analysis of fractal sets. Phys Lett 78:3245

    Google Scholar 

  • Freeman L (1978) Centrality in social networks conceptual clarification. Soc Networks 1:215–239

    Article  Google Scholar 

  • Granovetter MS (1973) The strength of weak ties. Am J Sociol 78:1360–1380

    Article  Google Scholar 

  • Hakkila J, Mantyjarvi J (2005) Collaboration in context-aware mobile phone applications. In: Proceedings of the 38th annual Hawaii international conference on system sciences (HICSS’05), pp 33–39

  • Hill RA, Dunbar RIM (2003) Social network size in humans. Hum Nat 14:53–72

    Article  Google Scholar 

  • Hossain L, Chung KK, Murshed SH (2007) Exploring temporal communication through social networks. In: Proceedings of IFIP international federation for information processing, pp 19–20

  • Jamieson L (1998) Intimacy, personal relationship in modern societies. Polity Press & Blackwell Publishers Ltd., Oxford

    Google Scholar 

  • Kayser K, Himle DP (1994) Dysfunctional beliefs about intimacy. J Cogn Psychother 8:127–139

    Google Scholar 

  • Kolan P, Dantu R (2007) Socio-technical defense against voice spamming. ACM Trans Auton Adapt Syst 2(1):1–44

    Article  Google Scholar 

  • Kolan P, Dantu R, Cangussu JW (2008) Nuisance level of a voice call. ACM Trans Multi Comput Commun Appl (TOMCCAP) 5(1):6–22

    Google Scholar 

  • Kottak CP (1991) Cultural anthropology. Mcgraw Hill, New York

    Google Scholar 

  • Marden PV, Cambell KE (1984) Measuring tie strength. Soc Forces 63(2):482–501

    Article  Google Scholar 

  • Mesch GS, Talmud I (2006) Online friendship formation, communication channels, and social closeness. Int J Inter Sci 1(1):29–44

    Google Scholar 

  • Milne LD (1999) Social therapy. A guide to social support interventions for mental health practitioners. Wiley, Chichester

    Google Scholar 

  • Nolker RD, Zhou L (2005) Social computing and weighting to identify member roles in online communities. In: Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence, pp 87–93

  • Orlofsky JL, Marcia JE, Lesser IM (1973) Ego identity status and the intimacy versus isolation crisis of young adulthood. J Pers Soc Psychol 27:211–219

    Article  Google Scholar 

  • Phithakkitnukoon S, Dantu R (2008) UNT mobile phone communication dataset, University Of North Texas, http://nsl.unt.edu/santi/data_desc.pdf

  • Popovic M, Milne D, Barrett P (2003) The scale of perceived interpersonal closeness (PICS). Clin Psychol Psychother 10:286–301

    Article  Google Scholar 

  • Press W, Teukolsky S, Flannery B (1996) Numerical recipes in FORTRAN: the art of science computing. Cambridge University, Cambridge

    Google Scholar 

  • Schaefer MT, Olson DH (1981) Assessing intimacy: the PAIR inventory. J Marital Fam Ther 7:47–60

    Article  Google Scholar 

  • Scupin R (1992) Cultural anthropology—a global perspective. Prentice Hall, Englewood Cliffs

    Google Scholar 

  • Shang J, Croson R (2005a) The impact of social influence on the voluntary provision of public goods. Working paper, University of Pennsylvania

  • Shang J, Croson R (2005b) The impact of social comparisons on nonprofit fundraising. Forthcoming in research in experimental economics

  • Shang J, Croson R, Reed II A (2008) “I” give, but “We” give more: the impact of identity and the mere social information effect on donation behavior. J Mark Res 45:1–10

    Google Scholar 

  • Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc B 53:683–690

    MATH  MathSciNet  Google Scholar 

  • Sherman MD, Thelen MH (1996) Fear of intimacy scale: validation and extension with adolescents. J Soc Pers Relat 13:507–521

    Article  Google Scholar 

  • Smith GE, Berger PD (1996) The impact of direct marketing appeals on charitable marketing effectiveness. Acad Market Sci 24(3):219–232

    Article  Google Scholar 

  • Sornette D (1998) Multiplicative processes and power laws. Phys Rep 297:239

    Google Scholar 

  • Wan X, Milios E, Kalyaniwalla N, Janssen J (2008) Link-based anomaly detection in communication networks. In: Proceedings of the 2008 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, pp 402–405

  • Wasserman L (2005) All of statistics: a concise course in statistical inference. Springer Texts in Statistics, Berlin

    Google Scholar 

  • Wong W, Moore A, Cooper G, Wagner M (2002) Rule-based anomaly pattern detection for detecting disease outbreaks. Eighteenth national conference on Artificial intelligence, pp 217–223

  • Zhdanova AV, Predoiu L, Pellegrini T, Fensel D (2007) A social networking model of a web community. In: Proceedings of 10th international symposium on social communication, pp 537–541

  • Zhou WX, Sornette D (2002) Generalized q analysis of log-periodicity: applications to critical ruptures. Phys Rev E 66:046111

    Google Scholar 

  • Zhou WX, Hill RA, Dunbar RIM (2005) Discrete hierarchical organization of social group sizes. In: Proceedings of Biological Science 22, 272(1561), 439–44

Download references

Acknowledgments

This work is supported by the National Science Foundation under grants CNS-0627754, CNS-0619871 and CNS-0551694.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santi Phithakkitnukoon.

Appendices

Appendix

Terminology

  • Associated User is a person who has made or received call(s) to or from center user.

  • Call frequency is a number of calls.

  • Callee is a person/entity receiving a call.

  • Caller is a person/entity generating a call.

  • Center user is a phone user for whom social closeness and group of all Associated Users are computed.

  • High active user is a center user who has more than ten outgoing calls per day.

  • Low active user is a center user who has less than six outgoing calls per day.

  • Medium active user is a center user who has between six to ten outgoing calls per day.

  • Talk time is call duration.

Mathematical equations and formulations

Normalized call frequency and normalized call duration

F (i, j) is the normalized call frequency (normalized to the maximum call frequency among all users with whom user i communicate) between user i and user j which is given by Eq. (3), and T(i, j) is the normalized call duration or talk time (normalized to the maximum talk time among all users with whom user i communicate) between user i and user j, which is given by Eq. (4).

$$ F(i,j) = {\frac{f(i,j)}{{\mathop {\text{max}}\limits_{{k \in U_{i} }} \{ f(i,k)\} }}}, $$
(3)
$$ T(i,j) = {\frac{t(i,j)}{{\mathop {\text{max}}\limits_{{k \in U_{i} }} \{ t(i,k)\} }}}, $$
(4)

where f(i, j) is the total number of calls or call frequency between user i and user j, t(i, j) is the total call duration or talk time between user i and user j, and U i  = {1, 2, …, N} is the set of all users associated with user i (i.e., all users who have made/received calls to/from user i with total of N users).

Mathematical proof of property 1

From Eqs. (1), (3), and (4), S(i, j) can be defined as

$$ S(i,j) = \sqrt {\left( {1 - {\frac{f(i,j)}{{\mathop {\text{max}}\limits_{{k \in U_{i} }} \{ f(i,k)\} }}}} \right)^{2} + \left( {1 - {\frac{t(i,j)}{{\mathop {\text{max}}\limits_{{k \in U_{i} }} \{ t(i,k)\} }}}} \right)^{2} } , $$
(5)

and S(j, i) can be defined as

$$ S(j,i) = \sqrt {\left( {1 - {\frac{f(j,i)}{{\mathop {\text{max}}\limits_{{m \in U_{j} }} \{ f(j,m)\} }}}} \right)^{2} + \left( {1 - {\frac{t(j,i)}{{\mathop {\text{max}}\limits_{{m \in U_{j} }} \{ t(j,m)\} }}}} \right)^{2} } $$
(6)

Since f(i, j) = f(j, i) and t(i, j) = t(j, i), Eq. (6) can be rewritten as

$$ S(j,i) = \sqrt {\left( {1 - {\frac{f(i,j)}{{\mathop {\text{max}}\limits_{{m \in U_{j} }} \{ f(j,m)\} }}}} \right)^{2} + \left( {1 - {\frac{t(i,j)}{{\mathop {\text{max}}\limits_{{m \in U_{j} }} \{ t(j,m)\} }}}} \right)^{2} } $$
(7)

If social closeness is symmetric, i.e., S(i, j) = S(j, i), then

$$ \sqrt {\left( {1 - {\frac{f(i,j)}{{\mathop {\text{max}}\limits_{{k \in U_{i} }} \{ f(i,k)\} }}}} \right)^{2} + \left( {1 - {\frac{t(i,j)}{{\mathop {\text{max}}\limits_{{k \in U_{i} }} \{ t(i,k)\} }}}} \right)^{2} } = \sqrt {\left( {1 - {\frac{f(i,j)}{{\mathop {\text{max}}\limits_{{m \in U_{j} }} \{ f(j,m)\} }}}} \right)^{2} + \left( {1 - {\frac{t(i,j)}{{\mathop {\text{max}}\limits_{{m \in U_{j} }} \{ t(j,m)\} }}}} \right)^{2} } , $$
(8)

where equality holds if and only if \( \mathop {\text{max}}\limits_{{k \in U_{i} }} \{ f(i,k)\} = \mathop {\text{max}}\limits_{{m \in U_{j} }} \{ f(j,m)\} \) and \( \mathop {\text{max}}\limits_{{k \in U_{i} }} \{ t(i,k)\} = \mathop {\text{max}}\limits_{{m \in U_{j} }} \{ t(j,m)\} \).

Note: Symmetry of the social closeness here means that the social closeness between the user i and j perceived by the user i(S(i, j)) is the same as the social closeness perceived between the user i and j perceived by user j(S(j, i)). For example, suppose there are Subj. #1 and Subj. #2 who have been communicating with each other via mobile phones, so that they have established a mobile social relationship. In other words, Subj. #1 is an Associated User of Subj. #2, and Subj. #2 is also an Associated User of Subj. #2. Suppose we ask each subject (independently) to quantitatively identify the social closeness between them with a value from 0 to \( \sqrt 2 \) (where 0 implies the closest and \( \sqrt 2 \) implies the furthest). If Subj. #1 thinks that the social closeness between him and Subj. #2 is S(1, 2), and Subj. #2 thinks that the social closeness between him and Subj. #1 is S(2, 1). We say that social closeness between Subj. #1 and Subj. #2 is “symmetric” if S(1, 2) = S(2, 1). According to Eq. (5), the social closeness (S(i, j)) depends on four parameters: call frequency between user i and user j(f(i, j)), call duration between user i and user j(t(i, j)), the maximum call frequency among all Associated Users with whom user i communicate (\( \mathop {\text{max}}\limits_{{k \in U_{i} }} \{ f(i,k)\} \)), and the maximum call duration among all Associated Users with whom user i communicate (\( \mathop {\text{max}}\limits_{{k \in U_{i} }} \{ t(i,k)\} \)). Likewise, the social closeness (S(j, i)) from the user j’s perception depends on the same aforementioned parameters but from user j’s perspective: f(j, i ), t(j, i), \( \mathop {\text{max}}\limits_{{m \in U_{j} }} \{ f(j,m)\} \), and \( \mathop { \max }\limits_{{m \in U_{j} }} \{ t(j,m)\} \)—as given by Eq. (6). Since the call frequency between user i and user j(f(i, j)) is always equal to the call frequency between user j and user i (f(j, i)) i.e., \( f(i,j) = f(j,i) \), and the call duration between user i and user j(t(i, j)) is always equal to the call duration between user j and user i (t(j, i)) i.e., \( t(i,j) = t(j,i) \)—as shown in Eq. (7), therefore, this symmetry can only occur when \( \mathop {\text{max}}\limits_{{k \in U_{i} }} \{ f(i,k)\} = \mathop {\text{max}}\limits_{{m \in U_{j} }} \{ f(j,m)\} \) and \( \mathop {\text{max}}\limits_{{k \in U_{i} }} \{ t(i,k)\} = \mathop {\text{max}}\limits_{{m \in U_{j} }} \{ t(j,m)\} \)—as shown in Eq. (8). A symmetric social closeness is rare because it can only happen under the mentioned condition (\( \mathop {\text{max}}\limits_{{k \in U_{i} }} \{ f(i,k)\} = \mathop {\text{max}}\limits_{{m \in U_{j} }} \{ f(j,m)\} \) and \( \mathop {\text{max}}\limits_{{k \in U_{i} }} \{ t(i,k)\} = \mathop {\text{max}}\limits_{{m \in U_{j} }} \{ t(j,m)\} \)).

Dirac’s delta functions and probability density function f(s)

Probability density function f(s) is given by Eq. (9) where \( \delta \) is Dirac’s delta function and N is the number of grouping clusters. Probability density function represents the density of probability at each point s in the sense that the probability that S is in a small interval in the vicinity of s is approximately f(s)h where the probability Pr[s < S ≤ s + h] ≈ f(s)h. The Dirac’s delta is a generalized function representing an infinitely sharp peak bounding unit area—the function that has value of zero everywhere except at zero (s i  = 0) where its value is infinitely large by which its total integral is equal to one

$$ f(s) = \sum\limits_{i = 1}^{N} \delta \left( {s - s_{i} } \right). $$
(9)

(H, q)-derivative

(H, q)-derivative is given by Eq. (10), which is a q-analog of the ordinary derivative where parameter q recovers the usual definition of a derivative as q → 1. For q ≠ 1, it is not just a derivative. It compares the relative variations of f(s) and of s when s is magnified by the finite factor q. Intuitively, the (H, q)-derivative tests the scale invariance property of the function f(s). It is a natural tool for describing discrete scale invariance as a fixed finite q compares f(s) with f(sq) at s magnified by a fixed factor, and thus it also compares f(sq) with f(sq 2), f(sq 2) with f(sq 3) and so on. \( D_{q}^{H = 1} f(s) \) recovers the standard q derivative \( D_{q}^{{}} f(s) \). There are two control parameters: the discrete scale factor q devised to characterize the log-periodic structure, and the exponent H formulated to account for a possible power-law dependence, i.e., to correct for the existence of trends in semi-log/log–log plots (Zhou et al. 2002)

$$ D_{q}^{H} f(s)\,\underline{{\underline{\Updelta } }}\, {\frac{f(s) - f(qs)}{{[(1 - q)s]^{H} }}}.$$
(10)

The Lomb periodogram

The Lomb periodogram or Lomb power \( P(\omega ) \) is given by Eq. (11). The Lomb periodogram is a method of estimating a frequency spectrum based on a least squares fit of sinusoids to data samples, similar to Fourier analysis. It is also known as Least-squares spectral analysis (LSSA).

$$ P(\omega ) = {\frac{1}{{2\sigma^{2} }}}\left\{ {{\frac{{\left[ {\sum\limits_{s} f(s){ \cos }\omega (s - \tau (\omega ))} \right]^{2} }}{{\sum\limits_{s} \mathop { \cos }\nolimits^{ 2} \omega (s - \tau (\omega ))}}} + {\frac{{\left[ {\sum\limits_{s} f(s){ \sin }\omega (s - \tau (\omega ))} \right]^{2} }}{{\sum\limits_{s} \mathop { \sin }\nolimits^{2} \omega (s - \tau (\omega ))}}}} \right\}, $$
(11)

where \( \sigma^{2} \) is the variance of f(s) and \( \tau (\omega ) \) is given by (12)

$$ \tau (\omega ) = {\frac{1}{2\omega }}{ \arctan }\left\{ {{\frac{{\sum\limits_{s} { \sin }2\omega s}}{{\sum\limits_{s} { \cos }2\omega s}}}} \right\}. $$
(12)

Survey of mobile social groups

The following document is the survey that we have used for our analysis of mobile social groups and its validation in Sect. 2.2:

- - - - - Survey begins here - - - - -

Behavior analysis of mobile phone users

This project is aimed to provide a better understanding of behavior, pattern, and social structure of mobile phone users, as well as facilitate research and development of mobile social applications as we take an early evolutionary steps toward a new era of mobile and pervasive computing, which is aimed to enhance quality of life with more sensitive and responsive mobile devices.

Survey process

The survey process is carried in two simple steps. First, you will download the call record details (records having details of each call you have dialed or received) from your service provider’s website. The required information for the survey from the downloaded call records are explained in the Data Collection process. You are requested to bring the downloaded information (soft copy) to a 15-min session. Then, you will review how to identify mobile social closeness for your associated callers/callees in the Mobile Social Closeness Identification process and provide your social closeness for each associated call ID in the Feedback process.

Data collection

You are requested to download the call detail records from your cellular service providers for the last 3 months (longer period preferred). You would be able to download these call records in an Excel sheet format (we can help you if have any problem in this regard). Next, you need to merge the call records of all those months into a single excel file (again we can help you if required). The call information for our survey is described in the following table. When necessary, you may have to remove some unnecessary data (fields) in the excel sheet.

 

Date

Start time

Type

Anonymous call ID

Talk time

1/5/2008

2:20 PM

Incoming

C1

2

1/6/2008

3:15 PM

Outgoing

C2

28

  1. 1.

    Date: Date when the call has taken place. This should be in the format MM/DD/YYYY.

  2. 2.

    Start Time: Start time of the call. This should be in the format HH:MM AM/PM.

  3. 3.

    Type: The call type i.e., whether the call is an “Incoming” or an “Outgoing” call. Some service providers record this field as “Incoming” for an incoming call and the destination location for an outgoing call.

  4. 4.

    Anonymous Call ID: You can choose an anonymous Call ID to each of the caller/callee. If you are unable to do that, we can run your data through our system and generate a set of anonymous IDs to each caller/callee.

  5. 5.

    Talk time: The amount of time spent during the call.

Mobile social closeness identification

You are requested to attend a 15-min session for the data analysis. You are requested to identify the Social Closeness for each Call ID as following:

  • Enter “1” if Call ID indicates the person who is a Socially Closest Member:

    These are the people with whom you maintain the highest socially connectivity. Most of the calls you receive, come from individuals within this category. You receive more calls from them and you tend to talk with them for longer periods. Typically, the face-to-face social tie of these people is family member, friend, and colleagues.

  • Enter “2” if Call ID indicates the person who is a Socially Near Member:

    People in this group are not as highly connected as family members and friends, but when you connect to them, you talk to them for considerably longer periods. Mostly, you observe intermittent frequency of calls from these people. These people are typically neighbors and distant relatives.

  • Enter “3” if Call ID indicates the person who is a Socially Distant Member:

These individuals have less connection with your social life. These people call you with less frequency. You acknowledge them rarely. Among these would be, for example, a newsletter group or a private organization with whom you have previously subscribed. This group also includes individuals who have no previous interaction or communication with you. You have the least tolerance for calls from them e.g., strangers, telemarketers, fund raisers.

Feedback

Your call records will be processed using our system to extract all distinct Call IDs, and then you will be asked to identify Social Closeness for each Call ID as shown in an example below.

 

Anonymous call ID

Social closeness

C1

1

C2

3

C3

2

C4

1

.

.

.

.

.

.

- - - - - Survey ends here - - - - -

Rights and permissions

Reprints and permissions

About this article

Cite this article

Phithakkitnukoon, S., Dantu, R. Mobile social group sizes and scaling ratio. AI & Soc 26, 71–85 (2011). https://doi.org/10.1007/s00146-009-0230-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00146-009-0230-5

Keywords

Navigation