Skip to main content
Log in

Event history, spatial analysis and count data methods for empirical research in information systems

  • Published:
Information Technology and Management Aims and scope Submit manuscript

Abstract

A large number of interesting business and technology problems in IS and e-commerce research center around events and the associated variables that influence them. Researchers are often interested in studying the timing, patterns, and frequencies of events. Some of those events are related to the timing of strategic decisions such as new technology adoption, functionality upgrades to established software products, new outsourcing contracts, and the termination of failing IS projects. Still others are external events that have significant implications on the performance of firms, the structure of industries affected by IT, and the viability of various aspects of the economy. Event history methods, also known as survival analysis and duration analysis methods, spatial analysis, and count data analysis in the medical sciences, public health and biostatistics literature, offer rigorous methods for empirical analysis that can provide rich insights into research issues that arise in association with identifiable events. This article provides a current survey of these methods and in-depth discussion of how researchers can apply them to study technology adoption problems and related issues in IS and e-commerce. We offer a framework for mapping the methods to applicable problems, and discuss the relevant variants of the methods. We also illustrate the range of research questions that can be asked and answered through the use of the methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. For a fuller discussion of semiparametric methods and related regression approaches, the interested reader should see Cameron and Trivedi [36] and Horowitz [89]. In addition to these three basic categories of techniques, there are also advanced models that deal with events driven by multiple competing risks, recurrence of the same event for the subject, heterogeneity of survival patterns among subgroups of subjects, accelerated likelihood of failure due to the impact of the explanatory variables, and long-term survival of a fraction of the population.

  2. Spatial dependence is observed when Cov (Y i , Y j ) ≠ 0, for i ≠ j; where i and j refer to locations and Y is a variable measuring an event of interest.

  3. This commonly is represented in terms of a log-linear function of the explanatory variables of the multiplicative Poisson regression model, with \( \lambda_{i} = \exp \left( {{\beta}_{0} + \sum_{j} {\beta}_{j} x_{ij} } \right). \) The estimated coefficients β j of the independent variables x ij can be interpreted as though they involve an elasticity, with the expected value of y given x j for some value of x j , through log(x j ) in the Poisson regression model. A key issue with Poisson regression is that the dependence of the variance on the mean value of λ is viewed as a strong assumption, diminishing the range of its applicability.

  4. The validity of Poisson regression model maximum likelihood estimation results also is affected by model misspecification. As a result, the literature on count data models has extensively explored and developed models that expand the capabilities of count data modeling, so it is possible to estimate explanatory models to test relevant theory and hypotheses [29, 39, 49, 65].

  5. Although we chose to highlight six levels of analysis, it does not mean that these methods cannot be used to study technology adoption and other phenomena at other levels of analysis. For example, they can be applied to study technology and standards adoption among countries for digital wireless technologies such as the Global System for Mobile Communications (GSM) or the Code Division of Multiple Access (CDMA).

  6. Other nonparametric methods include life tables, which characterize the rate of survivorship in a population at risk, and are helpful for estimating the survival function using intervals of duration. Another is the Nelson-Aalen estimator for the cumulative hazard function [98]. This involves developing a staircase function that specifies when an observation fails or when an event occurs in time, as well as the number of observations for which events have not been observed just prior to the occurrence of event.

  7. In addition to the Cox proportional hazards model, there are other semiparametric survival analysis techniques that are useful, including additive hazards model. With this kind of model, the covariates are assumed to have an additive rather than a multiplicative effect on the hazard rate [90]. Another approach is called rank regression, which is useful for comparing two distributions using the rank of the natural logarithm of the durations [113].

  8. For readers who are interested in exploring the use of event history methods in other disciplines, we offer additional coverage of some of these works in Appendix 1.

  9. Bayesian survival analysis also supports semiparametric and fully-parametric models, as well as more advanced methods, such as the frailty and cure-rate models. In these models, gamma distributions are used for the priors. For the exponential model, we can formulate the prior distribution for the hazard rate λ = φ(x′ β) to be a gamma prior. By specifying the parameters of the gamma prior and our weight or confidence on this prior, we can formulate the posterior distribution π (θ|D) and use the Gibbs sampler to estimate the parameters. For a Weibull model with a shape parameter λ = x′ β and another parameter α, λ and α are treated as independent and specified to follow normal and gamma distributions, respectively. It is common to assume that the βs follow a multivariate normal distribution. By specifying the priors, we can formulate the joint posterior of α and β given D and estimate the parameters. In addition, gamma distributions are often used as prior distributions for the baseline hazard in the Cox proportional hazards model and for the frailty term in shared frailty models.

  10. In the case of the spatial lag model, ordinary least squares (OLS) estimates are biased and inconsistent [5]. So estimation of the spatial lag model needs to be done via maximum likelihood or through the use of instrumental variables. The spatial error model can be estimated via maximum likelihood. The model with both a spatially-lagged dependent variable and spatial dependence in the error term is complicated to estimate, and thus this model is rarely used in practice.

  11. Similar to the cross-sectional spatial models, the estimation of spatial panel models can be performed using maximum likelihood, or instrumental variables and the generalized method of moments (GMM) approaches [10].

  12. Note that, similar to cross-sectional spatial models, constraining the spatial parameters results in three different models. Setting ρ ≠ 0 and γ ≠ 0 results in the mixed spatial lag and error model. Setting γ  = 0 permits estimating the spatial probit lag model only, while setting ρ = 0 is for estimating the spatial error lag model.

  13. The spatial probit model is known to be complex to estimate, however. This is because the joint probabilities in the likelihood function of this model are multi-dimensional multivariate normal probabilities [7]. Four techniques to estimate spatial probit models are: (1) the expectation–maximization (EM) algorithm, (2) the GMM approach, (3) the Gibbs sampling approach, and (4) the recursive importance sampling approach. The EM algorithm uses a two-step process, an expectation step and a maximization step, to estimate the expected likelihood function of the latent model. Pinkse and Slade [137] develop a GMM approach to assess spatial error correlation in a spatial discrete model. Lesage [116] proposes the Bayesian spatial discrete choice methods that uses Gibbs sampling approach to solve a spatial model based on a latent continuous variable. Finally, the recursive importance sampling approach estimator proposed by Beron and Vijverberg [24] uses simulation to estimate the multivariate normal probability function.

  14. We provide a brief summary of other interdisciplinary applications of spatial analysis methods in Appendix 2. For a review of the application of spatial analysis methods in regional economic and social science, see Anselin [6], Anselin et al. [9], Anselin [8], and Goodchild et al. [69].

  15. These include SpaceStat, S+ SpatialStats, GeoDa, PySpace, the Spatial Econometrics Toolbox for MATLAB, R and STATA, and GeoBUGS (for geographical Bayesian inference using the Gibbs sampler).

  16. Similar to the other empirical modeling approaches that we have discussed, there is plentiful software support for count data modeling. See Cameron and Trivedi [35] for Gauss, Limdep and Stata; Liu and Cela [118] for SAS; Venables and Ripley [163] for S, now Spotfire S+; and Zeilis et al. [175] for R.

  17. Greene [72] showed that it is possible to create overdispersion in the negative binomial model. This is because it goes beyond the Poisson model’s representation. It uses a noisy form of the mean function which permits a larger number of zero values for the dependent variable that are predicted. Berk and McDonald [22] caution us about the use of this model, and provide four implications for estimation practice. First, they note that it is critical to specify the full set of explanatory variables for negative binomial regression model estimation, along with an appropriate functional form for the mean value function, similar to the Poisson regression model. Failure to do so creates the omitted variables problem and introduces systematic bias in the estimation results. Second, if the theoretical arguments and knowledge of the empirical regularities of data fail to match the various assumptions of the models, and cause an analyst to lack confidence that the systematic part of the model that is being used is right, then it may not be a good idea to use these models at all. Berk [21] suggests a fallback position involving what he called a descriptive data analysis. Third, even though an analyst may be confident that the systematic part of the model is right, there still is no guarantee that the usual tests of statistical inference are going to be right also. Fourth, though the author suggests the negative binomial model as a workaround, he still reminds us that it may be a stretch to trust the more attractive p-values that the negative binomial produces.

  18. Such models don’t properly capture the structure of the empirical estimation problem, however; there is no assumption with these models that there will be no zeros observed [85]. Instead, it is necessary and logically consistent to eliminate the possibility that any zeros are estimated at all, by adjusting the requirements of the underlying distribution of the observed counts. The approach used involves the application of zero-truncated Poisson models and zero-truncated negative binomial models [35, 169].

  19. Similar to some of the other models that we have discussed, count data models that have an excess number of zeros have also been studied in terms of their robustness and estimation capabilities. For example, there is a score test to evaluate zero inflation in the presence of correlated count data [172]. There is also a test to evaluate the extent to which non-zero count values of the dependent variable are overdispersed in a zero-inflated Poisson mixed regression model [172]. The motivation for this test and model arises in contexts of computer operations that have many small intra-day network service problems, but also major network outages of longer duration. There are other kinds of robustness checks that can be applied that will be useful for research on IT, e-commerce and technology adoption.

  20. The authors point out that the Bayesian nonparametric approach is more effective in fitting the data of the study, and that this is partly due to the fact that an initial parametric model cannot be specified easily based on the data that it estimates. Nevertheless, some specification iteration is to be expected when the analyst identifies the defects of the parametric model that prompt its re-specification.

  21. This has many applications in e-market settings, including in online group-buying Web sites such as Woot! (www.woot.com), DailyDeal (www.dailydeal.com), and ShuangTuan (www.shuangtuan.com) in China, and for social couponing services, including Groupon (www.groupon.com) and its joint venture with TenCent (www. tencent.com) in China called GaoPeng (www.gaopeng.com), as well as LivingSocial (www.livingsocial.com).

References

  1. Albuquerque P, Bronnenberg BJ, Corbett CJ (2007) A spatiotemporal analysis of the global diffusion of ISO9000 and ISO14000 certification. Manage Sci 53(3):451–468

    Article  Google Scholar 

  2. Allison PD, Waterman RP (2002) Fixed-effects negative binomial regression models. Sociol Methodol 32(1):247–265

    Article  Google Scholar 

  3. Anderson DM (2010) Estimating the economic value of ice climbing in Hyalite Canyon: an application of travel cost count data models that account for excess zeros. J Environ Manage 91(4):1012–1020

    Article  Google Scholar 

  4. Andress HJ (1989) Recurrent unemployment—the West German experience: an exploratory analysis using count data models with panel data. Eur Sociol Rev 5(3):275–297

    Google Scholar 

  5. Anselin L (1988) Spatial econometrics: methods and models. Kluwer, Dordrecht

    Google Scholar 

  6. Anselin L (1992) Space and applied econometrics: introduction. Reg Sci Urban Econ 22(3):307–316

    Article  Google Scholar 

  7. Anselin L (1999) Spatial econometrics. Working paper, School of Social Sciences, University of Texas at Dallas, TX

  8. Anselin L (2007) Spatial econometrics in RSUE: retrospect and prospect. Reg Sci Urban Econ 27(4):450–456

    Article  Google Scholar 

  9. Anselin L, Florax RJGM, Rey SJ (2004) Econometrics for spatial models: recent advances. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Berlin

    Google Scholar 

  10. Anselin L, LeGallo J, Jayet H (2008) Spatial panel econometrics. In: Matyas L, Sevestre P (eds) The econometrics of panel data: fundamentals and recent developments in theory, 3rd edn. Springer, Berlin

    Google Scholar 

  11. Arranz JM, Muro J (2004) Recurrent unemployment, welfare benefits and heterogeneity. Int Rev Appl Econ 18(4):423–441

    Article  Google Scholar 

  12. Aten B (1996) Evidence of spatial autocorrelation in international prices. Rev Income Wealth 42(2):149–163

    Article  Google Scholar 

  13. Aten B (1997) Does space matter? International comparisons of the prices of tradables and nontradables. Int Reg Sci Rev 20(1–2):35–52

    Article  Google Scholar 

  14. Audretsch DB, Mahmood T (1995) New firm survival: new results using a hazard function. Rev Econ Stat 77(1):97–103

    Article  Google Scholar 

  15. Bago d’Uva T (2005) Latent class models for use of primary care: evidence from a British panel. Health Econ 14(9):873–892

    Article  Google Scholar 

  16. Bago d’Uva T (2006) Latent class models for utilization of health care. Health Econ 15(4):329–343

    Article  Google Scholar 

  17. Banerjee S, Kauffman RJ, Wang B (2007) Modeling Internet firm survival using Bayesian dynamic models with time-varying coefficients. Electron Commer Res Appl 6(3):332–342

    Article  Google Scholar 

  18. Baskerville RL, Myers MD (2009) Fashion waves in information systems research and practice. MIS Q 33(4):647–662

    Google Scholar 

  19. Basu S, Thibodeau TG (1998) Analysis of spatial autocorrelation in house prices. J Real Estate Finance Econ 17(1):61–85

    Article  Google Scholar 

  20. Beck N, Gleditsch KS, Beardsley K (2006) Space is more than geography: using spatial econometrics in the study of political economy. Int Stud Q 50(1):27–44

    Article  Google Scholar 

  21. Berk RA (2003) Regression analysis: a constructive critique. Sage Publications, Newbury Park

    Google Scholar 

  22. Berk RA, McDonald J (2007) Overdispersion and Poisson regression. Working paper, Department of Statistics and Department of Criminology, University of Pennsylvania, Philadelphia, PA

  23. Berkson J, Gage RP (1952) Survival curve for cancer patients following treatment. J Am Stat Assoc 47(259):501–515

    Article  Google Scholar 

  24. Beron KJ, Vijverberg WP (2004) Probit in a spatial context: a Monte Carlo analysis. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Berlin

    Google Scholar 

  25. Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc B 36(2):192–236

    Google Scholar 

  26. Bhattacharjee S, Gopal RD, Lertwachara K, Marsden JR, Telang R (2007) The effect of digital sharing technologies on music markets: a survival analysis of albums on ranking charts. Manage Sci 53(9):1359–1374

    Article  Google Scholar 

  27. Bhattacherjee S, Premkumar G (2004) Understanding changes in belief and attitude toward information technology usage: a theoretical model and longitudinal test. MIS Q 28(2):229–254

    Google Scholar 

  28. Bilgic A, Florkowski W (2003) Application of hurdle negative binomial count data model to demand for black bass fishing in the southeastern United States. Presented at the Southern Agricultural Economics Association Annual Meeting, Mobile, AL

  29. Boes S (2004) Empirical likelihood in count data models: the case of endogenous regressors. Working paper, Socioeconomic Institute, University of Zurich, Zurich, Switzerland

  30. Bolton RN (1998) A dynamic model of the duration of the customer’s relationship with a continuous service provider: the role of satisfaction. Market Sci 17(1):45–65

    Article  Google Scholar 

  31. Brunger WG (2009) The impact of the Internet on airline fares: the ‘Internet price effect’. Journal of Revenue and Pricing Management 9(1–2):66–93

    Google Scholar 

  32. Cameron AC, Trivedi PK (1986) Econometric models based on count data: comparisons and applications of some estimators and tests. J Appl Econometr 1(1):29–54

    Article  Google Scholar 

  33. Cameron AC, Trivedi PK (1990) Regression-based tests for overdispersion in the Poisson model. J Econometr 46(3):347–364

    Article  Google Scholar 

  34. Cameron AC, Trivedi PK (1996) Count data models for financial data. In: Maddala GS, Rao CF (eds) Statistical methods in finance, Handbook of statistics, vol 14. Elsevier, Amsterdam, pp 363–391

    Chapter  Google Scholar 

  35. Cameron AC, Trivedi PK (1998) Regression analysis of count data. Econometric society monograph no. 30. Cambridge University Press, Cambridge, UK

    Google Scholar 

  36. Cameron AC, Trivedi PK (2005) Microeconometrics: methods and applications. Cambridge University Press, Cambridge, UK

    Book  Google Scholar 

  37. Case A (1992) Neighborhood influence and technological change. Reg Sci Urban Econ 22(3):491–508

    Article  Google Scholar 

  38. Case A, Rosen HS, Hines JS (1993) Budget spillovers and fiscal policy interdependence: evidence from the States. Journal of Public Economics 52(3):285–307

    Article  Google Scholar 

  39. Chib S, Winkelmann R (2001) Markov chain Monte Carlo analysis of correlated count data. J Bus Econ Stat 19(4):428–435

    Article  Google Scholar 

  40. Chin HCC, Quddus MA (2003) Applying the random effect negative binomial model to examine traffic accident occurrence at signalized intersections. Accid Anal Prev 35(2):253–259

    Article  Google Scholar 

  41. Cho W (2003) Contagion effects and ethnic contribution networks. Am J Polit Sci 47(2):368–387

    Article  Google Scholar 

  42. Choi J, Hui SK, Bell DR (2010) Spatiotemporal analysis of imitation behavior across new buyers at an online grocery retailer. J Market Res 47(1):75–89

    Article  Google Scholar 

  43. Clemons EK, Reddi SP, Row MC (1993) The impact of information technology on the organization of economic activity: ‘the move to the middle’ hypothesis. J Manage Inform Syst 10(2):9–35

    Google Scholar 

  44. Cohen J, Paul C (2005) Agglomeration economies and industry location decisions: the impacts of spatial and industrial spillovers. Reg Sci Urban Econ 35(3):215–237

    Article  Google Scholar 

  45. Cox D (1975) Partial likelihood. Biometrika 62(2):269–275

    Article  Google Scholar 

  46. Cragg JG (1971) Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39(5):829–844

    Article  Google Scholar 

  47. Cressie N (1993) Statistics for spatial data. Wiley, New York

    Google Scholar 

  48. Crowder MJ (2001) Classical competing risks. Chapman and Hall/CRC, Boca Raton, FL

    Book  Google Scholar 

  49. Dagne GA (2010) Bayesian semiparametric zero-inflated Poisson model for longitudinal count data. Math Biosci 224:126–130

    Article  Google Scholar 

  50. Dai Q, Kauffman RJ (2004) Partnering for perfection: an economics perspective on B2B electronic market strategic alliances. In Tomak K (ed) Economics, IS and e-commerce. Idea Group Publishing, Harrisburg, pp 43–79

  51. Dai Q, Kauffman RJ (2009) Cooperative strategies to leverage network effects: evaluating network partnerships in the B2B software market, Working paper, Lebow College of Business, Drexel University, Philadelphia, PA

  52. Danaher PJ, Hardie BGS, Putsis WP (2001) Marketing-mix variables and the diffusion of successive generations of a technological innovation. J Market Res 38(4):501–514

    Article  Google Scholar 

  53. Danö AM (2002) Unemployment and health conditions: a count data approach. Presented at the 57th Econometric Society European meeting, Venice, Italy, August 25–28. Available at www.econometricsociety.org/meetings/esem02/cdrom/papers/753/ESEM2002.pdf

  54. Darmofal D (2006) Spatial econometrics and political science. Working paper, Department of Political Science, University of South Carolina, Los Angeles

  55. Dean C, Lawless JF (1989) Tests for detecting overdispersion in Poisson regression models. J Am Stat Assoc 84(406):467–472

    Article  Google Scholar 

  56. Deb P, Trivedi PK (1997) Demand for medical care by the elderly: a finite mixture approach. J Appl Econometr 12(3):313–336

    Article  Google Scholar 

  57. Deb P, Trivedi PK (2002) The structure of demand for healthcare: latent class versus two-part models. J Health Econ 21(4):601–625

    Article  Google Scholar 

  58. Dennis AR, Garfield MJ (2003) The adoption and use of GSS in project teams: toward more participative processes and outcomes. MIS Q 27(2):289–323

    Google Scholar 

  59. Dionne G, Gagné R, Gagnon F, Vanasse C (1997) Debt, moral hazard and airline safety: empirical evidence. J Econometr 79(2):379–402

    Article  Google Scholar 

  60. Dwivedi A, Dwivedi SN, Deo S, Shukla R (2010) Statistical models for predicting number of involved nodes in breast cancer patients. Health 2(7):641–651

    Article  Google Scholar 

  61. Elhorst JP, Blien U, Wolf K (2007) New evidence on the wage curve: a spatial panel approach. Int Reg Sci Rev 30(2):173–191

    Article  Google Scholar 

  62. Fleming MM (2004) Techniques for estimating spatially dependent discrete choice models. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Berlin

    Google Scholar 

  63. Forman C, Gron A (2011) Vertical integration and information technology investment in the insurance industry. J Law Econ Organ 27(1):180–218

    Article  Google Scholar 

  64. Frondel M, Vance C (2011) Rarely enjoyed? A count data analysis of ridership in Germany’s public transport. Transp Policy 18(2):425–433

    Article  Google Scholar 

  65. Ghosh SK, Mukhopadhyay P, Lu JC (2006) Bayesian analysis of zero-inflated count data models. J Stat Plann Inf 136:1360–1375

    Article  Google Scholar 

  66. Giacomini R, Granger CWJ (2004) Aggregation of space-time processes. J Econometr 118(1–2):7–26

    Article  Google Scholar 

  67. Gleditsch K, Ward M (2000) Peace and war in time and space: the role of democratization. Int Stud Q 44(1):1–29

    Article  Google Scholar 

  68. Goo J, Song Y, Kishore R, Nam K, Rao HR (2007) An investigation of the factors that influence the duration of IT outsourcing relationships. Decis Support Syst 42(4):2107–2125

    Article  Google Scholar 

  69. Goodchild M, Anselin L, Appelbaum R, Harthorn B (2000) Toward spatially integrated social science. Int Reg Sci Rev 23(2):139–159

    Google Scholar 

  70. Granados NF, Kauffman RJ, Lai HC, Lin HC (2011) Decommoditization, resonance marketing and IT: an empirical study of air travel services amidst channel conflict. J Manage Inform Syst 28(2) (in press)

  71. Granados NF, Gupta A, Kauffman RJ (2012) Online and offline demand and price elasticities: evidence from the air travel industry. Inform Syst Res (in press)

  72. Greene W (2007) Econometric analysis, 6th edn. Prentice Hall, Englewood Cliffs

    Google Scholar 

  73. Greene W (2007) Functional form and heterogeneity in count data. Working paper, Stern School of Business, New York University, New York

  74. Gregor S (2006) The nature of theory in information systems. MIS Q 30(3):611–642

    Google Scholar 

  75. Grogger JT, Carson RT (1991) Models for truncated counts. J Appl Econometr 6(3):225–238

    Article  Google Scholar 

  76. Grover V, Lyytinen K, Srinivasan A, Tan B (2008) Contributing to rigorous and forward thinking explanatory theory. J Assoc Inform Syst 9(2):40–47

    Google Scholar 

  77. Gupta PL, Gupta RC, Tripathi RC (1996) Analysis of zero-adjusted count data. Comput Stat Data Anal 23(2):207–218

    Article  Google Scholar 

  78. Gurmu S, Elder J (2008) A bivariate zero-inflated count data regression model with unrestricted correlation. Econ Lett 100(2):245–248

    Article  Google Scholar 

  79. Gurmu S, Trivedi P (1998) Semi-parametric estimation of hurdle regression models with an application to MedicAid utilization. J Appl Econometr 12(3):225–242

    Article  Google Scholar 

  80. Gurmu S, Rilstone P, Stern S (1999) Semiparametric estimation of count regression models. J Econometr 88:123–150

    Article  Google Scholar 

  81. Hall DB (2000) Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics 56(4):1030–1039

    Article  Google Scholar 

  82. Hall DB, Berenhaut KS (2002) Score tests for heterogeneity and overdispersion in zero-inflated Poisson and binomial regression. Can J Stat 30(3):415–430

    Article  Google Scholar 

  83. Harrison T, Ansell J (2002) Customer retention in the insurance industry: using survival analysis to predict cross-selling opportunities. J Finan Serv Market 6(3):229–239

    Article  Google Scholar 

  84. Hausman JA, Hall B, Griliches Z (1984) Econometric models for count data with an application to the patents-R&D relationship. Econometrica 52(4):909–938

    Article  Google Scholar 

  85. Hilbe J (2007) Negative binomial regression. Cambridge University Press, Cambridge, UK

    Book  Google Scholar 

  86. Hom PW, Kinicki AJ (2001) Toward a greater understanding of how dissatisfaction drives employee turnover. Acad Manag J 44(5):975–987

    Article  Google Scholar 

  87. Honaker J (2008) Unemployment and violence in Northern Ireland: a missing data model for ecological inference. Working paper, University of California, Los Angeles, CA; presented at the Summer Meetings of the Society for Political Methodology, Tallahassee, FL, July 2005

  88. Honjo Y (2000) Business failure of new firms: an empirical analysis using a multiplicative hazards model. Int J Ind Organ 18(4):557–574

    Article  Google Scholar 

  89. Horowitz JL (2009) Semiparametric and nonparametric methods in econometrics. Springer, New York, NY

    Book  Google Scholar 

  90. Hosmer DW, Lemeshow S (1999) Applied survival analysis: regression model of time to event data. Wiley, New York, NY

    Google Scholar 

  91. Houston DJ (2007) Are helmet laws protecting young motorcyclists? J Safety Res 38(3):329–336

    Article  Google Scholar 

  92. Hu XJ, Sun J, Wei LJ (2003) Regression parameter estimation from panel counts. Scand J Stat 30(1):25–43

    Article  Google Scholar 

  93. Huang CY, Wang MC, Zhang T (2006) Analyzing panel count data with informative observation times. Biometrika 93(4):763–775

    Article  Google Scholar 

  94. Ibrahim JG, Chen MH, Sinha D (2001) Bayesian survival analysis. Springer, New York

    Google Scholar 

  95. Josefek RA, Kauffman RJ (1998) Duration of IT human capital employment. MIS Research Center, Carlson School of Management, University of Minnesota, Minneapolis, MN

    Google Scholar 

  96. Jung RC, Kukuk M, Liesenfeld R (2006) Time series of count data: modeling, estimation and diagnostics. Comput Stat Data Anal 51(4):2350–2364

    Article  Google Scholar 

  97. Kalbfleisch JD, Lawless JF (1985) The analysis of panel count data under a Markov assumption. J Am Stat Assoc 80(392):863–871

    Article  Google Scholar 

  98. Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, Hoboken

    Book  Google Scholar 

  99. Kamakura WA, Kossar BS, Wedel M (2004) Identifying innovators for the cross-selling of new products. Manage Sci 50(8):1120–1133

    Article  Google Scholar 

  100. Kauffman RJ, Mohtadi H (2004) Proprietary and open systems adoption in e-procurement: a risk-augmented transaction cost perspective. J Manage Inform Syst 21(1):137–166

    Google Scholar 

  101. Kauffman RJ, Techatassanasoontorn AA (2005) International diffusion of digital mobile technology: A Coupled-hazard state-based approach. Inf Technol Manage 6(2):253–292

    Article  Google Scholar 

  102. Kauffman RJ, Techatassanasoontorn AA (2009) Understanding early diffusion of digital wireless phones. Telecommun Policy 33(8):432–450

    Article  Google Scholar 

  103. Kauffman RJ, Tsai J (2009) The unified procurement strategy for enterprise software: a test of the “move to the middle” hypothesis. J Manage Inform Syst 26(2):177–204

    Article  Google Scholar 

  104. Kauffman RJ, Wang B (2008) Tuning into the digital channel: evaluating business model fit for Internet firm survival. Inf Technol Manage 9(3):215–232

    Article  Google Scholar 

  105. Kauffman RJ, Wang B (2008) Developing rich insights on public Internet firm entry and exit based on survival analysis and data visualization. In: Jank W, Shmueli G (eds) Statistical methods in e-commerce research. Wiley, New York

    Google Scholar 

  106. Kauffman RJ, McAndrews JJ, Wang YM (2000) Opening the `black Box’ of network externalities in network adoption. Inform Syst Res 11(1):61–82

    Article  Google Scholar 

  107. Kennedy BS (2005) Does race predict stroke readmission? An analysis using the truncated negative binomial model. J Natl Med Assoc 97(5):699–713

    Google Scholar 

  108. Klein JP, Moeschberger ML (1997) Survival analysis: techniques for censored and truncated data. Springer, New York

    Google Scholar 

  109. Kockelman K, Bottom J, Kweon YJ, Ma J, Wang X (2006) Safety impacts and other implications of raised speed limits. National Highway Project Research Report #17–23. University of Texas, Austin, TX

    Google Scholar 

  110. Krnajajic M, Kottas A, Draper D (2008) Parametric and nonparametric Bayesian model specification: a case study involving models for count data. Comput Stat Data Anal 52:2110–2128

    Article  Google Scholar 

  111. Kweon YJ, Kockelman K (2005) The safety effects of speed limit changes: use of panel data models, including speed, use, and design variables. Transp Res Rec 1908:148–158

    Article  Google Scholar 

  112. Lambert D (1992) Zero-inflated Poisson regression with an application to defects in manufacturing. Technometrics 34(1):1–14

    Article  Google Scholar 

  113. Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley, Hoboken

    Google Scholar 

  114. Le CT (1997) Applied survival analysis. Wiley, New York

    Google Scholar 

  115. Lee AH, Wang K, Yau KKW, Somerford PJ (2003) Truncated negative binomial mixed regression modeling of ischaemic stroke hospitalizations. Stat Med 22(7):1129–1139

    Article  Google Scholar 

  116. Lesage JP (2000) Bayesian estimation of limited dependent variable spatial autoregressive models. Geograph Anal 32(1):19–35

    Article  Google Scholar 

  117. Light A, Omori Y (2004) Unemployment insurance and job quits. J Labor Econ 22(1):159–188

    Article  Google Scholar 

  118. Liu WS, Cela J (2008) Count data models in SAS. SAS Global Forum, San Antonio

  119. Liu FY, Hua KA, Xie F (2011) A hybrid communication solution to distributed moving query monitoring systems. Electron Commer Res Appl 10(2):214–228

    Article  Google Scholar 

  120. Long JS (1997) Regression models for categorical and limited dependent variables. Sage Publications, Thousand Oaks

    Google Scholar 

  121. Longhi S, Nijkamp P (2007) Forecasting regional labor market developments under spatial autocorrelation. Int Reg Sci Rev 30(2):100–119

    Article  Google Scholar 

  122. Ma J, Kockelman KM, Damien P (2008) A multivariate Poisson-lognormal regression model for prediction of crash counts by severity using Bayesian methods. Accid Anal Prev 40(4):964–975

    Article  Google Scholar 

  123. Manchanda P, Dubé JP, Goh KY, Chintagunta PK (2006) The effect of banner advertising on Internet purchasing. J Market Res 43(1):98–108

    Article  Google Scholar 

  124. Mann A, Kauffman RJ, Han K, Nault BR (2011) Are there contagion effects in IT and business process outsourcing? Decision Supp Syst (in press)

  125. Markus ML, Robey D (1988) Information technology and organizational change. Manage Sci 34(5):583–598

    Article  Google Scholar 

  126. McCabe BPM, Martin GM (2005) Bayesian predictions of low count time series. Int J Forecast 21:315–330

    Article  Google Scholar 

  127. Miaou SP (1994) The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. Accid Anal Prev 26(4):471–482

    Article  Google Scholar 

  128. Moreno E, Giron J (1998) Estimating with incomplete count data: a Bayesian approach. J Stat Plann Inf 66:147–159

    Article  Google Scholar 

  129. Morenoff J, Sampson RJ (1997) Violent crime and the spatial dynamics of neighborhood transition: Chicago 1970–1990. Social Forces 76(1):31–64

    Google Scholar 

  130. Mossholder KW, Settoon RP, Henagan SC (2005) A relational perspective on turnover: examining structural, attitudinal, and behavioral predictors. Acad Manag J 48(4):607–618

    Article  Google Scholar 

  131. Mullahy J (1986) Specification and testing of some modified count data models. J Econometr 33(3):341–365

    Article  Google Scholar 

  132. Mundlak Y (1978) On the pooling of time series and cross section data. Econometrica 46(1):59–85

    Article  Google Scholar 

  133. Munkin MK, Trivedi PK (1986) Simulated maximum likelihood estimation of multivariate mixed-Poisson regression models. Econ J 2(1):29–48

    Google Scholar 

  134. Novo AA (2004) Contagious currency crises: a spatial probit approach. Working paper, Economic Research Department, Banco de Portugal, Lisbon, Portugal

  135. Orlikowski WJ, Iacono CS (2001) Research commentary: desperately seeking the “IT” in IT research: a call to theorizing the IT artifact. Inform Syst Res 12(2):121–134

    Article  Google Scholar 

  136. Paelinck J, Klaassen L (1979) Spatial econometrics. Saxon House, Farnborough

    Google Scholar 

  137. Pinkse J, Slade ME (1998) Contracting in space: an application of spatial statistics to discrete-choice models. J Econometr 85(1):125–154

    Article  Google Scholar 

  138. Quddus M (2008) Time-series count data models: an empirical application to traffic accidents. Accid Anal Prev 40(4):1732–1741

    Article  Google Scholar 

  139. Ravichandran V, Rai A (2003) Structural analysis of the impact of knowledge creation and knowledge embedding on software process capability. IEEE Trans Eng Manag 50(3):270–284

    Article  Google Scholar 

  140. Ray G, Wu D, Konana P (2009) Competitive environment and the relationship between IT and vertical integration. Inform Syst Res 20(4):585–603

    Article  Google Scholar 

  141. Raymond JE, Beard TR, Gropper DM (1993) Modeling the consumer’s decision to replace durable goods: a hazard function approach. Appl Econ 25(10):1287–1292

    Article  Google Scholar 

  142. Rose NL (1990) Profitability and product quality: economic determinants of airline performance. J Polit Econ 98(5):944–964

    Article  Google Scholar 

  143. Rose NL, Joskow PL (1990) The diffusion of new technologies: evidence from the electric utility industry. RAND J Econ 21(3):354–373

    Article  Google Scholar 

  144. Rose CE, Martin SW, Wannemuehler KA, Plikaytis BD (2006) On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. J Biopharm Stat 16(4):463–468

    Article  Google Scholar 

  145. Ross SM (1995) Stochastic processes. Wiley, New York, NY

    Google Scholar 

  146. Russo RP, Shmueli G, Jank W, Shyamalkumar ND (2010) Models for bid arrival and bidder arrivals in online auctions. In: Balakrishnan N (ed) Methods and applications of statistics in business, finance and management sciences. Wiley, Newark, pp 293–309

  147. Saloner G, Shepard A (1995) Adoption of technologies with network effects: an empirical examination of the adoption of automated teller machines. RAND J Econ 26(3):479–501

    Article  Google Scholar 

  148. Sampson RJ, Morenoff J, Earls F (1999) Beyond social capital: spatial dynamics of collective efficacy for children. Am Sociol Rev 64(5):633–660

    Article  Google Scholar 

  149. Shively TS, Kockelman K, Damien P (2010) A Bayesian semi-parametric model to estimate relationships between crash counts and roadway characteristics. Transp Res Part B 44(5):699–715

    Article  Google Scholar 

  150. Shmueli G, Russo RP, Jank W (2007) The BARISTA: a model for bid arrivals in online auctions. Ann Appl Stat 1(2):412–441

    Article  Google Scholar 

  151. Shonkwiler JS, Shaw WD (1996) Hurdle count data models in recreation demand analysis. J Agric Resour Econ 21(2):210–219

    Google Scholar 

  152. Sidorova A, Evangelopoulos N, Valacich JS, Ramakrishnan T (2008) Uncovering the intellectual core of the information systems discipline. MIS Q 32(3):467–482

    Google Scholar 

  153. Sinha RK, Chandrashekaran M (1992) A split hazard model for analyzing the diffusion of innovations. J Market Res 29(1):116–127

    Article  Google Scholar 

  154. Sood A, Tellis GJ (2011) Demystifying disruption: a new model for understanding and predicting disruptive technologies. Market Sci 30(2):339–354

    Article  Google Scholar 

  155. Stephan PE, Gurmu S, Sumell AJ, Black GC (2007) Who’s patenting in the university? Evidence from the survey of doctorate recipients. Econ Innov New Technol 16(2):71–99

    Article  Google Scholar 

  156. Telang R, Boatwright P, Mukhopadhyay T (2004) A mixture model for Internet search engine visits. J Market Res 41(2):206–214

    Article  Google Scholar 

  157. Tolnay SE, Deane G, Beck EM (1996) Vicarious violence: spatial effects on southern lynchings, 1890–1919. Am J Sociol 102(3):788–815

    Article  Google Scholar 

  158. Trevor CO (2001) Interactions among actual ease-of-movement determinants and job satisfaction in the prediction of voluntary turnover. Acad Manag J 44(4):621–638

    Article  Google Scholar 

  159. Trivedi PK (1997) Introductions: econometric models of event counts. J Appl Econometr 12(3):199–201

    Article  Google Scholar 

  160. Trivedi PK, Munkin MK (2009) Recent developments in cross section and panel count models, working paper, University of California, Davis, CA

  161. Van de Ven A, Poole MP (1995) Explaining development and change in organizations. Acad Manag Rev 20(3):510–540

    Google Scholar 

  162. Van den Poel D, Larivière B (2004) Customer attrition analysis for financial services using proportional hazard models. Eur J Oper Res 157(1):196–217

    Article  Google Scholar 

  163. Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York

    Google Scholar 

  164. Wang K, Yau KKW, Lee AH (2002) A hierarchical Poisson mixture regression model to analyze material length and hospital stay. Stat Med 21(23):3639–3654

    Article  Google Scholar 

  165. Ward MD, Gleditsch KS (2002) Location, location, location: an MCMC approach to modeling the spatial context of war and peace. Polit Anal 10(3):244–260

    Article  Google Scholar 

  166. Wedel M, Desarbo WS, Bult JR, Ramaswamy V (1993) A latent class Poisson regression model for heterogeneous count data. J Appl Econometr 8(4):397–411

    Article  Google Scholar 

  167. Windmeijer FAG, Santos Silva JMC (1997) Endogeneity in count data models: an application to demand for health care. J Appl Econometr 12(3):281–294

    Article  Google Scholar 

  168. Winkelmann R (1995) Duration dependence and dispersion in count data models. J Bus Econ Stat 13(4):467–474

    Google Scholar 

  169. Winkelmann R (2000) Econometric analysis of count data, 3rd edn. Springer, New York

    Google Scholar 

  170. Winkelmann R (2000) Seemingly unrelated negative binomial regression. Oxford Bull Econ Stat 62(4):553–560

    Article  Google Scholar 

  171. Winkelmann R, Zimmermann KF (1995) Recent developments in count data modeling: theory and application. J Econ Surv 9(1):1–24

    Article  Google Scholar 

  172. Xiang LM, Lee AH, Yau KKW, McLachlan GJ (2006) A score test for zero-inflation in correlated count data. Stat Med 25(10):1660–1671

    Article  Google Scholar 

  173. Yang Z, Hardin JW, Addy CL, Vuong QH (2007) Testing approaches for overdispersion in Poisson regression versus the generalized Poisson model. Biomed J 49(4):565–584

    Google Scholar 

  174. Yelland P (2009) Bayesian forecasting for low-count time-series using state-space models: an empirical evaluation for inventory management. Int J Prod Econ 118:95–103

    Article  Google Scholar 

  175. Zeilis A, Kleiber C, Jackman S (2007) Regression models for count data in R. Research report series, 53, Department of Statistics and Mathematics. Wirtschaftsuniversitat Wien, Vienna, Austria

    Google Scholar 

  176. Zhu K, Kraemer KL, Gurbaxani V, Xu SX (2006) Migration to open-standard interorganizational systems: network effects, switching costs, and path dependency. MIS Q 30(Special issue):515–539

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank the guest editor, Christopher Westland, and the anonymous reviewers of this special issue article in Information Technology and Management for their helpful comments and encouragement. Rob Kauffman also thanks Singapore Management University, National Sun Yat-sen University, and the W.P. Carey Chair in IS at Arizona State University for research funding and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert J. Kauffman.

Appendices

Appendix 1: Interdisciplinary studies illustrating the application of event history methods

We offer a brief summary of additional studies in a number of disciplines beyond IS that illustrate broad applications of event history methods. There are many other applications of event history methods in marketing, human resources and labor economics, and economics that can guide future efforts (see Table 8). A number of articles are worthwhile to briefly discuss for their use of event history methods, since they illustrate the kinds of applications of the modeling and methodological ideas that seem as though they will be most actionable.

Table 8 Representative studies from other disciplines involving event history methods

1.1 Marketing

Marketing researchers have used event history methods to examine the adoption of new product and technology innovations [52, 153], a consumer’s decisions to replace an existing durable product [141], the loss of customers by companies [30, 83, 162], and a consumer’s repeat product purchase decisions [99, 123]. The Cox proportional hazards model has been the most often used method, since it allows researchers to examine the factors that affect the underlying failure process that leads to the occurrence of the events. In addition, researchers have used parametric analysis, AFT models, frailty models and Bayesian survival analysis. Using these survival analysis techniques, IS and e-commerce researchers can investigate technology adoption by individuals and organizations, the decision to discontinue using a technology or to replace it with a newer technology, the duration of outsourcing contracts, and the duration a software application remains on the most popular software list on Download.com.

1.2 Human resources and labor economics

Researchers have used survival analysis to examine employee turnover [86, 130, 158], the decision to quit on the part of employees and managers [117], and the duration of unemployment [11]. In addition to the Cox proportional hazards model, this stream of research also uses recurrent event survival analysis, since researchers often observe multiple instances of turnover and unemployment on the same individual. Similar research issues in the IS and e-commerce area include IT professionals’ turnover and duration of unemployment.

1.3 Economics

Survival analysis, especially the Cox proportional hazards model, has been frequently used in empirical analyses of the diffusion of new technologies [143, 147] and firm survival [14, 88]. Parametric and semiparametric proportional hazards models have been leveraged, allowing researchers to examine the impact of a set of explanatory variables on the adoption decisions or firm survival. These studies provide IS and e-commerce researchers with examples for how different survival analysis techniques can be applied to examine the adoption and diffusion of new technologies and survival of IT and e-commerce firms.

Appendix 2: Interdisciplinary studies illustrating the application of spatial analysis methods

There are also many areas of application for spatial analysis methods that occur in sociology and criminology, political science and political economy, and trade and economics, among others. We will cover a representative sample of the different kinds of applications (see Table 9).

Table 9 Representative studies from other disciplines involving spatial analysis methods

2.1 Sociology and criminology

Three studies in sociology examine neighborhood effects in population dynamics, child development, and violence. Morenoff and Sampson [129] looked at the effects of homicide and ecological factors (socioeconomic disadvantage, ethnicity, age composition, and residential stability) on changes in the neighborhood population in Chicago from 1970 to 1990. They tested the hypothesis that neighborhood population change is a contagion-bearing process, so a change in one area is likely to affect other close-by neighborhoods. The results suggest the extent of spatial interactions on the diffusion of violent crime and population change. The higher the crime in the surrounding neighborhood of a given area, the greater will be its population loss. Sampson et al. [148] used the spatial lag model to study the spatial dynamics in social processes that produce collective efficacy for children in Chicago area. They found that neighborhoods benefit from being in proximity to those that have high levels of adult-child exchange and shared expectations for child social control.

Tolnay et al. [157] studied the lynching of African Americans across counties in ten southern U.S. states. They test two competing theories that might explain the spatial dependence of lynchings. Contagion theory predicts that lynchings in one area increase the likelihood that they will occur in other nearby areas. Deterrence theory predicts that the probability of such events declines when they occurred in other areas. Their findings support the deterrence model of spatial dependence. Similarly, IS and e-commerce researchers can examine contagion effects in IT adoption in different regions, the provision of outsourcing services in neighboring cities and countries, the emergence of IT and e-commerce firms in different cities, states, or countries, and the migration of IT professionals in the marketplace.

2.2 Political science

Two studies look at spatial effects in war involvement and political campaign donations. Ward and Gleditsch [165] modeled the dependence of the likelihood of a country’s war involvement on the war involvement of other proximate countries using an autologistic model and the MCMC estimation method. Autologistic models have found much use in the evaluation of Markov random field theory and spatial correlation representations for pixel locations in computer and robotic vision [25]. Ward and Gleditsch’s [165] model shows good predictive ability in an out-of-sample forecast for countries that were drawn into civil wars and international conflicts in the 1989–1998 period. Cho [41] estimated spatial effects in campaign donations in the U.S. from 1980 to 1998. The results suggest that campaign donations from Asian American contribution networks are spatially-clustered with both spatial dependence and spatial heterogeneity. This research offers analogies for IS and e-commerce modeling, including the spatial clustering of competition among similar business models in a variety of industry sectors that have been major areas of technological innovation, capital investment and business development.

2.3 Political economy

Case et al. [38] analyzed the impact of state budgets on the expenditures of other nearby states. Their theoretical perspective is based on the effects of spillovers between states that are in close proximity or share similar economic characteristics and demographic qualities. For example, school expenditures in the Washington, DC area may have provided the impetus for Maryland and Virginia, since they are neighboring states, to spend more. The authors found that state government fiscal decisions are influenced significantly by the budgeting actions of nearby states. Another interesting study is by Novo [134], who used a spatial probit model to study the contagion effects among European countries in the 1992 currency crisis. The author’s findings confirm that problems with perceptions about the strengths and weaknesses of currencies were transmitted through the community’s trade channels.

2.4 Economics

Several studies in various sub-fields of economics have investigated spatial dependence in other contexts. For example, Basu and Thibodeau [19] examined spatial autocorrelation in the prices of single-family properties in Dallas, Texas between 1991 and 1993. They found that structural characteristics (lot characteristics, neighborhood amenities, accessibility to other locations) did not explain all of the variation in transaction prices. Instead, there was strong evidence of spatial autocorrelation in transaction prices across submarkets. Cohen and Paul [44] attempted to answer why manufacturing activities seem to concentrate in a few regions. In particular, they examined the contribution of spatial spillovers and agglomeration economies on the performance of the food manufacturing industry in the U.S. Their results suggest that the geographic concentration of food manufacturers is motivated by cost considerations from locating close to suppliers and markets. The agglomeration of IT companies and outsourcing service providers can also be examined using similar methods.

Appendix 3: Interdisciplinary studies illustrating the application of count data methods

We will discuss a number of application areas of count data methods including: transportation, criminal recidivism, healthcare, manufacturing and factory processes, the creation of patents in research and development activities, and labor and unemployment issues (see Table 10).

Table 10 Representative studies from other disciplines involving count data methods

3.1 Transportation accidents

One of the well-known application areas for count data modeling is motivated by the study of the frequency of airline accidents and incidents, including the work of Rose [142] and Dionne et al. [59]. The original work by Rose [142] explores data on 16 American passenger airlines between 1957 and 1986 that involved the incidence of aircraft damage and human injuries and deaths. Dionne et al.’s [59] research explored similar issues for 120 Canadian airlines during the 1976–1987 period. In both instances, a problem arose with correlation between the incidence of accidents and the actions of regulatory authorities to require the airlines to implement improved practices that were intended to diminish the likelihood of accidents.

Chib and Winkelmann [39] examined this kind of data using a model that represents the accident count correlations using correlated latent effects. They estimate the data based on a model with Poisson counts and latent effects that follow a multivariate Gaussian distribution, and a number of context-relevant covariates. The latter include: the number of an airline’s departures, the operating margin to gauge an airline’s profitability, the average stage length of an airline’s city-pair routes, the airline’s cumulative experience in terms of miles flown, and the extent to which the airline’s flights had international components, as well as airline firm and year fixed effects. The authors used Markov chain Monte Carlo (MCMC) methods, a Bayesian simulation-based approach.

A number of interesting applications of count data estimation models cluster in the area of vehicle accidents. For example, Houston [91] modeled youth motorcycle fatalities for 15 to 20-year olds from 1974 to 2004 using a family of negative binomial regression models as a basis for understanding the relevant explanatory factors. By using a variety of fixed effects in their models, the authors were able to show regional and temporal variations in the incidence of deaths. They also were able to evaluate the beneficial impacts of mandatory universal helmet laws versus partial coverage helmet laws at the state level in the U.S. The authors’ methodology and careful consideration of estimation bias was relatively effective in how it addressed omitted variables and unobserved heterogeneity, even though the fixed effects modeling specification assumed that there was temporal stability with respect to the effects of the stratifying variables. This work offers a useful analogy for how IS researchers might conduct empirical studies of a spectrum of issues that arise with respect to the incidence of customer information privacy breaches, corporate information breaches, and a variety of related problems that arise on the effectiveness of what organizations do to safeguard sensitive information for their stakeholders.

Another work in the transportation safety area is suggestive of a different kind of analogy for the development of useful research in the IS discipline based on the characteristics of different technologies, applications and systems, users and adopter groups, business processes, and business partners. Shively et al. [149] used a Bayesian semiparametric model to evaluate the relationship between the characteristics of more than 7,700 two-lane rural roadways and roadway segments, and vehicle crash counts that occurred on them in the state of Washington in 2002. The authors’ approach brings a set of continuous variables into their model via unknown functional forms and a second set of categorical variables via linear functions. They use measures such as speed limits, average annualized daily traffic, roadway width, degree of road curvature, and vertical grade, among other variables, to predict the number of crashes. Their approach goes beyond prior and more standard approaches involving panel data negative binomial regression models with covariate effects that are modeled in linear form [109, 127], incorporate random effects [40, 111], and employ Bayesian approaches to the estimation of the data [122]. Taken together, these different works that extend the negative binomial regression model’s assumptions and estimation approach offer a useful roadmap for the application of these methods in IS research involving such issues as the occurrence of defects related to the characteristics of software applications, development teams, project development tools and organizational environments.

3.2 Public transportation

Another area of study that IS researchers can learn from the application of count data modeling approaches is in the area of public transportation. This area has become increasingly important as oil and energy resource prices have steadily risen, reflecting the need for greater awareness of business policies that emphasize resource sustainability. Research that demonstrates the application of zero-inflated count data models in this area is attributable to Frondel and Vance [64], although its basis is on earlier work that pioneered zero-inflated Poisson models in the estimation of manufacturing defects [112] and in the more general areas of zero-inflated models and zero-inflated negative binomial models that we discussed earlier. The need for such models arises when the analyst observes many zeros for the dependent variable in a count data model. The authors investigate public transportation use in Germany during the five-day work-week, and how it is influenced by vehicle fuel prices and transit fares. The zeros arise because the authors modeled individual riders in the catchment areas of potential public transportation users that are served. On any given day, many do not use public transportation, and others never use it at all. The authors’ approach permitted them to model the effects of key covariates while controlling for individual-level user attributes, as well as the characteristics of the system as a whole.

The public transportation context suggests the presence of two latent regimes: one involving users and the other involving non-users. We see similar problems in IS research including the use of virus scanning software and PC firewalls, choices to opt out or opt into corporate information privacy programs, the non-adoption versus adoption and use of a variety of social networking tools and environments, and online news and advertising services (e.g., Facebook and LinkedIn, RSS and other pushed Internet news services, and Groupon, Social Living and other online purchase opportunity awareness building services). The opportunity to leverage zero-inflated models also is likely to occur in the context of music and video downloads among free and fee-based online streaming digital media services for the study of users who consume digital media contents differently via different distribution channels. A similar analogy applies to consumers whose travel-related purchase patterns for airline tickets and hotel stays, where there is significant interest in issues of information transparency [71], alternative channel strategies [31] and product and service decommoditization [70].

3.3 Healthcare

There have been other efforts made to model a variety of healthcare issues using count data modeling approaches. Another aspect of Chib and Winkelmann’s [39] research involves the application of their MCMC approach to explain patient visits for different healthcare services (including doctor’s office visits, non-doctor office visits, hospital outpatient visits, emergency room visits and so on). Although they specifically considered the possibility of correlated visit outcomes based on different provisions for healthcare insurance, prior research viewed the different outcomes as independent [56, 133]. These approaches offer almost immediate translation for application in the technology adoption arena, when firms experience new regulatory requirements for accounting process changes and information requirements that require multiple changes and adjustments to their systems requiring modifications and choices of new kinds of supporting technologies.

The research of Deb and Trivedi [56, 57], Bago d’Uva [15, 16], and Gurmu and Elder [78] offers useful guidance for IS researchers who wish to distinguish between adopters who are frequent users versus infrequent users, as opposed to adopters versus non-adopters. This is true for healthcare services, for example, where few people are non-users, even though not many people are likely to be frequent users. Gurmu and Elder [78] explored empirical count data models that involve two kinds of zeros: one type is the choices that consumers, technology adopters, or users make, and the other is measurement error where the analyst is unable to record relevant count-related behaviors. This occurs with research scientists who apply for patents and do not receive them, and for other research scientists who obtain patents for their inventions outside the observation time window [155].

The methods and empirical work that these authors present use the negative binomial model as a foundation. Their work suggests that hurdle models for count data analysis do not offer sufficient structure to take into account frequent versus infrequent use, and so they propose the use of another class of models called finite mixture models. The method involves the identification of latent classes. In the healthcare context, this could be something like a person’s long-term unobservable health status (e.g., related to advancing problems with diabetes, heart disease or other issues). In the IS context, latent classes may arise with respect to technology adoption due to changes in the functionality of the available technologies, evolving business practices and business models, and so on, that cause individuals to be differentially pre-disposed to higher or lower levels of technology usage. This kind of modeling approach, according to Bago d’Uva [16], can be applied in such a way that it is possible to allow or disallow the parameters in the model that address overdispersion to vary with the latent classes.

The reader shouldn’t conclude that only more complex count data models are useful in empirical research in healthcare. An important problem in the treatment of women’s health involves diagnosing breast cancer from axillary dissection of cancerous breast tissue. This helps clinicians to determine the counts of involved nodes to set up a basis for assessing the severity of the illness, the likelihood of survival, and the appropriate treatment for their patients. Dwivedi et al. [60] evaluated the Poisson, zero-inflated Poisson, negative binomial, and zero hurdle/negative binomial regression models for the extent to which visible skin changes, location and tumor size were associated with cancerous nodes for 1,152 Indian women from 1983 to 2005.

Two results are interesting to consider for IS researchers, since they illustrate the sophistication of observation that is possible with this kind of econometric analysis. First, the researchers found that the negative binomial model fits the data better than any of the other models that were used, which is consistent with the large number of uninvolved nodes (the zeros, in this case) that often are present when the cancer is diagnosed early. The zero-inflated negative binomial and the hurdle regression models predicted the number of involved nodes better though. The conclusion the authors drew was that the prediction of involved nodes was a by-product of the overdispersion of uninvolved nodes, as well as some other unobserved heterogeneity that was not captured by the explanatory variables. They recommend that doctors base their treatment on the results of zero-inflated negative binomial regression estimates, since it permits the analysis to focus on uninvolved nodes that are at high risk of becoming cancerous.

The work of Rose et al. [144] is similar in its comparisons of these kinds of models in a study of vaccine-averse patient event count data for at-risk and not-at-risk populations. The issue is whether it is appropriate to permit zeroes to arise from both the at-risk (what they call sample zeroes) and the not-at-risk groups (what they call structural zeroes). The authors conclude that overdispersion of zero outcomes may not always be sufficient cause for the analyst to choose zero-inflated negative binomial regression or hurdle regression. They show empirically, based on their data, that when at-risk and not-at-risk patients are considered (the sample and structural zeroes together), then a zero-inflated negative binomial regression is better. When estimating data with only at-risk patients, hurdle regression seems to work better.

3.4 Unemployment

To wrap up our discussion of applied contexts that may offer useful modeling analogies for IS researchers who are interested in the application of count data modeling, we will next consider three studies that deal with labor and employment. (For a more in-depth review, see [169].) There is a substantial literature available that offers useful research designs, modeling structures, and estimation approach choices in which various count data models are employed.

An example of the issues and estimation structures that are employed is found in the work of Andress [4], who studied the recurrence of unemployment among West German men between 1977 and 1982. The author points out the contrast between having access to event data, which are more micro-level and more informative, in comparison to count data, which are more aggregative and less informative. He also points out three other issues: problems with underestimation of counts due to the use of retrospective data and only registered instances of unemployment, the sampling of employment status as an endogenous variable (which raises the question of the underlying structural model, and leaves the analysis open to over-sampling of higher-risk groups), one-shot observations of independent variables that actually are changing over time, and panel attrition of participants across the timeline of the study. The author distinguishes between two useful concepts. One is statistical dependence, which occurs when “events appear to be dynamically dependent, but in fact, some individuals have higher risks of experiencing an event than others” [4] due to some unobserved heterogeneity. The other is causal dependence, which occurs when there is something beyond the individual that is a clear driver of the outcome that is counted. In this context, an example is that prior spells of unemployment might drive the observation of recurring unemployment; in other words, there is some causal link. The author extends the Poisson regression model with a gamma mixing distribution to permit overdispersion of zero event counts. He refers to his approach as an apparent contagion model or a spurious occurrence dependence model, with the idea of emphasizing the effort they have made to sort out causal and statistical dependence.

Numerous observers have suggested that unemployment tends to predispose people to healthcare problems, since they are less likely to be able to visit the doctor or receive the appropriate kinds of hospital treatment that are called for in funded health insurance programs. Danö [53] evaluated causality in this setting, with panel data from 1981 to 1996 with a 10% sample of the entire population of Denmark, based on the frequency of doctor visits and consultations. The author uses multiple count data estimation approaches, two of which involve the negative binomial regression model with fixed effects and a Mundlak formulation random effects specification [132]. The latter formulation permits individual-specific effects to be correlated with time-varying explanatory variables in the model—something that was not possible with fixed effects at the time the author did this research. An interesting outcome of this research, which illustrates the power that refined econometric models have for challenging the conventional wisdom of an area, is that the authors found no significant effects of unemployment for health among men and women—even after correcting for the possible correlations between individual-specific effects and the relevant explanatory variables. A requirement of all of the count data models that the authors used was the assumption of exogenous explanatory variables, so they also estimated a reduced generalized method of moments (GMM) model that didn’t require this assumption, and they still were able to establish the same general results. This approach in empirical research with count data models further emphasizes how it is possible with post-estimation robustness checks to increase the perceived reliability of the statistical findings.

3.5 Political economy

Another interesting work in the labor and political economy area explores the connection between the results of a variety of count data models with other associated events that occur later, but are subject to unknown lags. This is a useful method for IS and e-commerce researchers, since it is often the case that the observation of one kind of event, or a cluster of events, may be tied to other events that come later. An example is in the context of the signing on or the departure of members of a firm’s top management team, once the chief executive officer resigns. Honaker [87] also estimated a family of count data models in order to make the case that there is a causal link between the unemployment level in the population and the number of instances of political violence in Northern Ireland. In this research, the author employed several different models, including Poisson, negative binomial, zero-inflated Poisson and hurdle regression, and again shows the different efficacies of the various formulations based on the estimation outcomes. The author’s estimation approach also involved the modeling of instances when one side retaliates against a violent attack by the other side, by building a lag structure for violence into a count data estimation model.

An important observation in this research, and one that should be relevant to other IS researchers who undertake e-commerce and technology adoption studies, is that lagging the dependent variable further increases the number of zero-valued dependent variable counts. The workaround suggested by the author is to compute a rolling average of the events over some period of time that makes sense for the study context. A second key observation that is useful for IS research, e-commerce and technology adoption modeling settings is to recognize that there will be a most likely time to the observation of the lagged event associated with the initial event. It may be typical to assume, as the author has, that the likelihood of a tied event diminishes after the occurrence of the first event. This observation may be helpful for those who are interested in studying time-clustered technology adoption that is subject to contagion effects from prior adoption events [124]. Honaker [87] evaluated the retaliation events with geometrically-distributed and quadratically-distributed lags, which are well-suited to creating reaction curves of the appropriate shape. This approach also provides a micro-level foundation for the theoretical explanation of aggregate behavior.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kauffman, R.J., Techatassanasoontorn, A.A. & Wang, B. Event history, spatial analysis and count data methods for empirical research in information systems. Inf Technol Manag 13, 115–147 (2012). https://doi.org/10.1007/s10799-011-0106-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10799-011-0106-5

Keywords

Navigation