Event history, spatial analysis and count data methods for empirical research in information systems

Kauffman, Robert J.; Techatassanasoontorn, Angsana A.; Wang, Bin

doi:10.1007/s10799-011-0106-5

Event history, spatial analysis and count data methods for empirical research in information systems

Published: 28 September 2011

Volume 13, pages 115–147, (2012)
Cite this article

Information Technology and Management Aims and scope Submit manuscript

Robert J. Kauffman¹,
Angsana A. Techatassanasoontorn² &
Bin Wang³

1122 Accesses
22 Citations
Explore all metrics

Abstract

A large number of interesting business and technology problems in IS and e-commerce research center around events and the associated variables that influence them. Researchers are often interested in studying the timing, patterns, and frequencies of events. Some of those events are related to the timing of strategic decisions such as new technology adoption, functionality upgrades to established software products, new outsourcing contracts, and the termination of failing IS projects. Still others are external events that have significant implications on the performance of firms, the structure of industries affected by IT, and the viability of various aspects of the economy. Event history methods, also known as survival analysis and duration analysis methods, spatial analysis, and count data analysis in the medical sciences, public health and biostatistics literature, offer rigorous methods for empirical analysis that can provide rich insights into research issues that arise in association with identifiable events. This article provides a current survey of these methods and in-depth discussion of how researchers can apply them to study technology adoption problems and related issues in IS and e-commerce. We offer a framework for mapping the methods to applicable problems, and discuss the relevant variants of the methods. We also illustrate the range of research questions that can be asked and answered through the use of the methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Longitudinal Studies in Information Systems

Event-History-Analysis

Case Studies

Notes

For a fuller discussion of semiparametric methods and related regression approaches, the interested reader should see Cameron and Trivedi [36] and Horowitz [89]. In addition to these three basic categories of techniques, there are also advanced models that deal with events driven by multiple competing risks, recurrence of the same event for the subject, heterogeneity of survival patterns among subgroups of subjects, accelerated likelihood of failure due to the impact of the explanatory variables, and long-term survival of a fraction of the population.
Spatial dependence is observed when Cov (Y _i, Y _j) ≠ 0, for i ≠ j; where i and j refer to locations and Y is a variable measuring an event of interest.
This commonly is represented in terms of a log-linear function of the explanatory variables of the multiplicative Poisson regression model, with \( \lambda_{i} = \exp \left( {{\beta}_{0} + \sum_{j} {\beta}_{j} x_{ij} } \right). \) The estimated coefficients β _j of the independent variables x _ij can be interpreted as though they involve an elasticity, with the expected value of y given x _j for some value of x _j, through log(x _j) in the Poisson regression model. A key issue with Poisson regression is that the dependence of the variance on the mean value of λ is viewed as a strong assumption, diminishing the range of its applicability.
The validity of Poisson regression model maximum likelihood estimation results also is affected by model misspecification. As a result, the literature on count data models has extensively explored and developed models that expand the capabilities of count data modeling, so it is possible to estimate explanatory models to test relevant theory and hypotheses [29, 39, 49, 65].
Although we chose to highlight six levels of analysis, it does not mean that these methods cannot be used to study technology adoption and other phenomena at other levels of analysis. For example, they can be applied to study technology and standards adoption among countries for digital wireless technologies such as the Global System for Mobile Communications (GSM) or the Code Division of Multiple Access (CDMA).
Other nonparametric methods include life tables, which characterize the rate of survivorship in a population at risk, and are helpful for estimating the survival function using intervals of duration. Another is the Nelson-Aalen estimator for the cumulative hazard function [98]. This involves developing a staircase function that specifies when an observation fails or when an event occurs in time, as well as the number of observations for which events have not been observed just prior to the occurrence of event.
In addition to the Cox proportional hazards model, there are other semiparametric survival analysis techniques that are useful, including additive hazards model. With this kind of model, the covariates are assumed to have an additive rather than a multiplicative effect on the hazard rate [90]. Another approach is called rank regression, which is useful for comparing two distributions using the rank of the natural logarithm of the durations [113].
For readers who are interested in exploring the use of event history methods in other disciplines, we offer additional coverage of some of these works in Appendix 1.
Bayesian survival analysis also supports semiparametric and fully-parametric models, as well as more advanced methods, such as the frailty and cure-rate models. In these models, gamma distributions are used for the priors. For the exponential model, we can formulate the prior distribution for the hazard rate λ = φ(x′ β) to be a gamma prior. By specifying the parameters of the gamma prior and our weight or confidence on this prior, we can formulate the posterior distribution π (θ|D) and use the Gibbs sampler to estimate the parameters. For a Weibull model with a shape parameter λ = x′ β and another parameter α, λ and α are treated as independent and specified to follow normal and gamma distributions, respectively. It is common to assume that the βs follow a multivariate normal distribution. By specifying the priors, we can formulate the joint posterior of α and β given D and estimate the parameters. In addition, gamma distributions are often used as prior distributions for the baseline hazard in the Cox proportional hazards model and for the frailty term in shared frailty models.
In the case of the spatial lag model, ordinary least squares (OLS) estimates are biased and inconsistent [5]. So estimation of the spatial lag model needs to be done via maximum likelihood or through the use of instrumental variables. The spatial error model can be estimated via maximum likelihood. The model with both a spatially-lagged dependent variable and spatial dependence in the error term is complicated to estimate, and thus this model is rarely used in practice.
Similar to the cross-sectional spatial models, the estimation of spatial panel models can be performed using maximum likelihood, or instrumental variables and the generalized method of moments (GMM) approaches [10].
Note that, similar to cross-sectional spatial models, constraining the spatial parameters results in three different models. Setting ρ ≠ 0 and γ ≠ 0 results in the mixed spatial lag and error model. Setting γ = 0 permits estimating the spatial probit lag model only, while setting ρ = 0 is for estimating the spatial error lag model.
The spatial probit model is known to be complex to estimate, however. This is because the joint probabilities in the likelihood function of this model are multi-dimensional multivariate normal probabilities [7]. Four techniques to estimate spatial probit models are: (1) the expectation–maximization (EM) algorithm, (2) the GMM approach, (3) the Gibbs sampling approach, and (4) the recursive importance sampling approach. The EM algorithm uses a two-step process, an expectation step and a maximization step, to estimate the expected likelihood function of the latent model. Pinkse and Slade [137] develop a GMM approach to assess spatial error correlation in a spatial discrete model. Lesage [116] proposes the Bayesian spatial discrete choice methods that uses Gibbs sampling approach to solve a spatial model based on a latent continuous variable. Finally, the recursive importance sampling approach estimator proposed by Beron and Vijverberg [24] uses simulation to estimate the multivariate normal probability function.
We provide a brief summary of other interdisciplinary applications of spatial analysis methods in Appendix 2. For a review of the application of spatial analysis methods in regional economic and social science, see Anselin [6], Anselin et al. [9], Anselin [8], and Goodchild et al. [69].
These include SpaceStat, S+ SpatialStats, GeoDa, PySpace, the Spatial Econometrics Toolbox for MATLAB, R and STATA, and GeoBUGS (for geographical Bayesian inference using the Gibbs sampler).
Similar to the other empirical modeling approaches that we have discussed, there is plentiful software support for count data modeling. See Cameron and Trivedi [35] for Gauss, Limdep and Stata; Liu and Cela [118] for SAS; Venables and Ripley [163] for S, now Spotfire S+; and Zeilis et al. [175] for R.
Greene [72] showed that it is possible to create overdispersion in the negative binomial model. This is because it goes beyond the Poisson model’s representation. It uses a noisy form of the mean function which permits a larger number of zero values for the dependent variable that are predicted. Berk and McDonald [22] caution us about the use of this model, and provide four implications for estimation practice. First, they note that it is critical to specify the full set of explanatory variables for negative binomial regression model estimation, along with an appropriate functional form for the mean value function, similar to the Poisson regression model. Failure to do so creates the omitted variables problem and introduces systematic bias in the estimation results. Second, if the theoretical arguments and knowledge of the empirical regularities of data fail to match the various assumptions of the models, and cause an analyst to lack confidence that the systematic part of the model that is being used is right, then it may not be a good idea to use these models at all. Berk [21] suggests a fallback position involving what he called a descriptive data analysis. Third, even though an analyst may be confident that the systematic part of the model is right, there still is no guarantee that the usual tests of statistical inference are going to be right also. Fourth, though the author suggests the negative binomial model as a workaround, he still reminds us that it may be a stretch to trust the more attractive p-values that the negative binomial produces.
Such models don’t properly capture the structure of the empirical estimation problem, however; there is no assumption with these models that there will be no zeros observed [85]. Instead, it is necessary and logically consistent to eliminate the possibility that any zeros are estimated at all, by adjusting the requirements of the underlying distribution of the observed counts. The approach used involves the application of zero-truncated Poisson models and zero-truncated negative binomial models [35, 169].
Similar to some of the other models that we have discussed, count data models that have an excess number of zeros have also been studied in terms of their robustness and estimation capabilities. For example, there is a score test to evaluate zero inflation in the presence of correlated count data [172]. There is also a test to evaluate the extent to which non-zero count values of the dependent variable are overdispersed in a zero-inflated Poisson mixed regression model [172]. The motivation for this test and model arises in contexts of computer operations that have many small intra-day network service problems, but also major network outages of longer duration. There are other kinds of robustness checks that can be applied that will be useful for research on IT, e-commerce and technology adoption.
The authors point out that the Bayesian nonparametric approach is more effective in fitting the data of the study, and that this is partly due to the fact that an initial parametric model cannot be specified easily based on the data that it estimates. Nevertheless, some specification iteration is to be expected when the analyst identifies the defects of the parametric model that prompt its re-specification.
This has many applications in e-market settings, including in online group-buying Web sites such as Woot! (www.woot.com), DailyDeal (www.dailydeal.com), and ShuangTuan (www.shuangtuan.com) in China, and for social couponing services, including Groupon (www.groupon.com) and its joint venture with TenCent (www. tencent.com) in China called GaoPeng (www.gaopeng.com), as well as LivingSocial (www.livingsocial.com).

References

Albuquerque P, Bronnenberg BJ, Corbett CJ (2007) A spatiotemporal analysis of the global diffusion of ISO9000 and ISO14000 certification. Manage Sci 53(3):451–468
Article Google Scholar
Allison PD, Waterman RP (2002) Fixed-effects negative binomial regression models. Sociol Methodol 32(1):247–265
Article Google Scholar
Anderson DM (2010) Estimating the economic value of ice climbing in Hyalite Canyon: an application of travel cost count data models that account for excess zeros. J Environ Manage 91(4):1012–1020
Article Google Scholar
Andress HJ (1989) Recurrent unemployment—the West German experience: an exploratory analysis using count data models with panel data. Eur Sociol Rev 5(3):275–297
Google Scholar
Anselin L (1988) Spatial econometrics: methods and models. Kluwer, Dordrecht
Google Scholar
Anselin L (1992) Space and applied econometrics: introduction. Reg Sci Urban Econ 22(3):307–316
Article Google Scholar
Anselin L (1999) Spatial econometrics. Working paper, School of Social Sciences, University of Texas at Dallas, TX
Anselin L (2007) Spatial econometrics in RSUE: retrospect and prospect. Reg Sci Urban Econ 27(4):450–456
Article Google Scholar
Anselin L, Florax RJGM, Rey SJ (2004) Econometrics for spatial models: recent advances. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Berlin
Google Scholar
Anselin L, LeGallo J, Jayet H (2008) Spatial panel econometrics. In: Matyas L, Sevestre P (eds) The econometrics of panel data: fundamentals and recent developments in theory, 3rd edn. Springer, Berlin
Google Scholar
Arranz JM, Muro J (2004) Recurrent unemployment, welfare benefits and heterogeneity. Int Rev Appl Econ 18(4):423–441
Article Google Scholar
Aten B (1996) Evidence of spatial autocorrelation in international prices. Rev Income Wealth 42(2):149–163
Article Google Scholar
Aten B (1997) Does space matter? International comparisons of the prices of tradables and nontradables. Int Reg Sci Rev 20(1–2):35–52
Article Google Scholar
Audretsch DB, Mahmood T (1995) New firm survival: new results using a hazard function. Rev Econ Stat 77(1):97–103
Article Google Scholar
Bago d’Uva T (2005) Latent class models for use of primary care: evidence from a British panel. Health Econ 14(9):873–892
Article Google Scholar
Bago d’Uva T (2006) Latent class models for utilization of health care. Health Econ 15(4):329–343
Article Google Scholar
Banerjee S, Kauffman RJ, Wang B (2007) Modeling Internet firm survival using Bayesian dynamic models with time-varying coefficients. Electron Commer Res Appl 6(3):332–342
Article Google Scholar
Baskerville RL, Myers MD (2009) Fashion waves in information systems research and practice. MIS Q 33(4):647–662
Google Scholar
Basu S, Thibodeau TG (1998) Analysis of spatial autocorrelation in house prices. J Real Estate Finance Econ 17(1):61–85
Article Google Scholar
Beck N, Gleditsch KS, Beardsley K (2006) Space is more than geography: using spatial econometrics in the study of political economy. Int Stud Q 50(1):27–44
Article Google Scholar
Berk RA (2003) Regression analysis: a constructive critique. Sage Publications, Newbury Park
Google Scholar
Berk RA, McDonald J (2007) Overdispersion and Poisson regression. Working paper, Department of Statistics and Department of Criminology, University of Pennsylvania, Philadelphia, PA
Berkson J, Gage RP (1952) Survival curve for cancer patients following treatment. J Am Stat Assoc 47(259):501–515
Article Google Scholar
Beron KJ, Vijverberg WP (2004) Probit in a spatial context: a Monte Carlo analysis. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Berlin
Google Scholar
Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc B 36(2):192–236
Google Scholar
Bhattacharjee S, Gopal RD, Lertwachara K, Marsden JR, Telang R (2007) The effect of digital sharing technologies on music markets: a survival analysis of albums on ranking charts. Manage Sci 53(9):1359–1374
Article Google Scholar
Bhattacherjee S, Premkumar G (2004) Understanding changes in belief and attitude toward information technology usage: a theoretical model and longitudinal test. MIS Q 28(2):229–254
Google Scholar
Bilgic A, Florkowski W (2003) Application of hurdle negative binomial count data model to demand for black bass fishing in the southeastern United States. Presented at the Southern Agricultural Economics Association Annual Meeting, Mobile, AL
Boes S (2004) Empirical likelihood in count data models: the case of endogenous regressors. Working paper, Socioeconomic Institute, University of Zurich, Zurich, Switzerland
Bolton RN (1998) A dynamic model of the duration of the customer’s relationship with a continuous service provider: the role of satisfaction. Market Sci 17(1):45–65
Article Google Scholar
Brunger WG (2009) The impact of the Internet on airline fares: the ‘Internet price effect’. Journal of Revenue and Pricing Management 9(1–2):66–93
Google Scholar
Cameron AC, Trivedi PK (1986) Econometric models based on count data: comparisons and applications of some estimators and tests. J Appl Econometr 1(1):29–54
Article Google Scholar
Cameron AC, Trivedi PK (1990) Regression-based tests for overdispersion in the Poisson model. J Econometr 46(3):347–364
Article Google Scholar
Cameron AC, Trivedi PK (1996) Count data models for financial data. In: Maddala GS, Rao CF (eds) Statistical methods in finance, Handbook of statistics, vol 14. Elsevier, Amsterdam, pp 363–391
Chapter Google Scholar
Cameron AC, Trivedi PK (1998) Regression analysis of count data. Econometric society monograph no. 30. Cambridge University Press, Cambridge, UK
Google Scholar
Cameron AC, Trivedi PK (2005) Microeconometrics: methods and applications. Cambridge University Press, Cambridge, UK
Book Google Scholar
Case A (1992) Neighborhood influence and technological change. Reg Sci Urban Econ 22(3):491–508
Article Google Scholar
Case A, Rosen HS, Hines JS (1993) Budget spillovers and fiscal policy interdependence: evidence from the States. Journal of Public Economics 52(3):285–307
Article Google Scholar
Chib S, Winkelmann R (2001) Markov chain Monte Carlo analysis of correlated count data. J Bus Econ Stat 19(4):428–435
Article Google Scholar
Chin HCC, Quddus MA (2003) Applying the random effect negative binomial model to examine traffic accident occurrence at signalized intersections. Accid Anal Prev 35(2):253–259
Article Google Scholar
Cho W (2003) Contagion effects and ethnic contribution networks. Am J Polit Sci 47(2):368–387
Article Google Scholar
Choi J, Hui SK, Bell DR (2010) Spatiotemporal analysis of imitation behavior across new buyers at an online grocery retailer. J Market Res 47(1):75–89
Article Google Scholar
Clemons EK, Reddi SP, Row MC (1993) The impact of information technology on the organization of economic activity: ‘the move to the middle’ hypothesis. J Manage Inform Syst 10(2):9–35
Google Scholar
Cohen J, Paul C (2005) Agglomeration economies and industry location decisions: the impacts of spatial and industrial spillovers. Reg Sci Urban Econ 35(3):215–237
Article Google Scholar
Cox D (1975) Partial likelihood. Biometrika 62(2):269–275
Article Google Scholar
Cragg JG (1971) Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39(5):829–844
Article Google Scholar
Cressie N (1993) Statistics for spatial data. Wiley, New York
Google Scholar
Crowder MJ (2001) Classical competing risks. Chapman and Hall/CRC, Boca Raton, FL
Book Google Scholar
Dagne GA (2010) Bayesian semiparametric zero-inflated Poisson model for longitudinal count data. Math Biosci 224:126–130
Article Google Scholar
Dai Q, Kauffman RJ (2004) Partnering for perfection: an economics perspective on B2B electronic market strategic alliances. In Tomak K (ed) Economics, IS and e-commerce. Idea Group Publishing, Harrisburg, pp 43–79
Dai Q, Kauffman RJ (2009) Cooperative strategies to leverage network effects: evaluating network partnerships in the B2B software market, Working paper, Lebow College of Business, Drexel University, Philadelphia, PA
Danaher PJ, Hardie BGS, Putsis WP (2001) Marketing-mix variables and the diffusion of successive generations of a technological innovation. J Market Res 38(4):501–514
Article Google Scholar
Danö AM (2002) Unemployment and health conditions: a count data approach. Presented at the 57th Econometric Society European meeting, Venice, Italy, August 25–28. Available at www.econometricsociety.org/meetings/esem02/cdrom/papers/753/ESEM2002.pdf
Darmofal D (2006) Spatial econometrics and political science. Working paper, Department of Political Science, University of South Carolina, Los Angeles
Dean C, Lawless JF (1989) Tests for detecting overdispersion in Poisson regression models. J Am Stat Assoc 84(406):467–472
Article Google Scholar
Deb P, Trivedi PK (1997) Demand for medical care by the elderly: a finite mixture approach. J Appl Econometr 12(3):313–336
Article Google Scholar
Deb P, Trivedi PK (2002) The structure of demand for healthcare: latent class versus two-part models. J Health Econ 21(4):601–625
Article Google Scholar
Dennis AR, Garfield MJ (2003) The adoption and use of GSS in project teams: toward more participative processes and outcomes. MIS Q 27(2):289–323
Google Scholar
Dionne G, Gagné R, Gagnon F, Vanasse C (1997) Debt, moral hazard and airline safety: empirical evidence. J Econometr 79(2):379–402
Article Google Scholar
Dwivedi A, Dwivedi SN, Deo S, Shukla R (2010) Statistical models for predicting number of involved nodes in breast cancer patients. Health 2(7):641–651
Article Google Scholar
Elhorst JP, Blien U, Wolf K (2007) New evidence on the wage curve: a spatial panel approach. Int Reg Sci Rev 30(2):173–191
Article Google Scholar
Fleming MM (2004) Techniques for estimating spatially dependent discrete choice models. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Berlin
Google Scholar
Forman C, Gron A (2011) Vertical integration and information technology investment in the insurance industry. J Law Econ Organ 27(1):180–218
Article Google Scholar
Frondel M, Vance C (2011) Rarely enjoyed? A count data analysis of ridership in Germany’s public transport. Transp Policy 18(2):425–433
Article Google Scholar
Ghosh SK, Mukhopadhyay P, Lu JC (2006) Bayesian analysis of zero-inflated count data models. J Stat Plann Inf 136:1360–1375
Article Google Scholar
Giacomini R, Granger CWJ (2004) Aggregation of space-time processes. J Econometr 118(1–2):7–26
Article Google Scholar
Gleditsch K, Ward M (2000) Peace and war in time and space: the role of democratization. Int Stud Q 44(1):1–29
Article Google Scholar
Goo J, Song Y, Kishore R, Nam K, Rao HR (2007) An investigation of the factors that influence the duration of IT outsourcing relationships. Decis Support Syst 42(4):2107–2125
Article Google Scholar
Goodchild M, Anselin L, Appelbaum R, Harthorn B (2000) Toward spatially integrated social science. Int Reg Sci Rev 23(2):139–159
Google Scholar
Granados NF, Kauffman RJ, Lai HC, Lin HC (2011) Decommoditization, resonance marketing and IT: an empirical study of air travel services amidst channel conflict. J Manage Inform Syst 28(2) (in press)
Granados NF, Gupta A, Kauffman RJ (2012) Online and offline demand and price elasticities: evidence from the air travel industry. Inform Syst Res (in press)
Greene W (2007) Econometric analysis, 6th edn. Prentice Hall, Englewood Cliffs
Google Scholar
Greene W (2007) Functional form and heterogeneity in count data. Working paper, Stern School of Business, New York University, New York
Gregor S (2006) The nature of theory in information systems. MIS Q 30(3):611–642
Google Scholar
Grogger JT, Carson RT (1991) Models for truncated counts. J Appl Econometr 6(3):225–238
Article Google Scholar
Grover V, Lyytinen K, Srinivasan A, Tan B (2008) Contributing to rigorous and forward thinking explanatory theory. J Assoc Inform Syst 9(2):40–47
Google Scholar
Gupta PL, Gupta RC, Tripathi RC (1996) Analysis of zero-adjusted count data. Comput Stat Data Anal 23(2):207–218
Article Google Scholar
Gurmu S, Elder J (2008) A bivariate zero-inflated count data regression model with unrestricted correlation. Econ Lett 100(2):245–248
Article Google Scholar
Gurmu S, Trivedi P (1998) Semi-parametric estimation of hurdle regression models with an application to MedicAid utilization. J Appl Econometr 12(3):225–242
Article Google Scholar
Gurmu S, Rilstone P, Stern S (1999) Semiparametric estimation of count regression models. J Econometr 88:123–150
Article Google Scholar
Hall DB (2000) Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics 56(4):1030–1039
Article Google Scholar
Hall DB, Berenhaut KS (2002) Score tests for heterogeneity and overdispersion in zero-inflated Poisson and binomial regression. Can J Stat 30(3):415–430
Article Google Scholar
Harrison T, Ansell J (2002) Customer retention in the insurance industry: using survival analysis to predict cross-selling opportunities. J Finan Serv Market 6(3):229–239
Article Google Scholar
Hausman JA, Hall B, Griliches Z (1984) Econometric models for count data with an application to the patents-R&D relationship. Econometrica 52(4):909–938
Article Google Scholar
Hilbe J (2007) Negative binomial regression. Cambridge University Press, Cambridge, UK
Book Google Scholar
Hom PW, Kinicki AJ (2001) Toward a greater understanding of how dissatisfaction drives employee turnover. Acad Manag J 44(5):975–987
Article Google Scholar
Honaker J (2008) Unemployment and violence in Northern Ireland: a missing data model for ecological inference. Working paper, University of California, Los Angeles, CA; presented at the Summer Meetings of the Society for Political Methodology, Tallahassee, FL, July 2005
Honjo Y (2000) Business failure of new firms: an empirical analysis using a multiplicative hazards model. Int J Ind Organ 18(4):557–574
Article Google Scholar
Horowitz JL (2009) Semiparametric and nonparametric methods in econometrics. Springer, New York, NY
Book Google Scholar
Hosmer DW, Lemeshow S (1999) Applied survival analysis: regression model of time to event data. Wiley, New York, NY
Google Scholar
Houston DJ (2007) Are helmet laws protecting young motorcyclists? J Safety Res 38(3):329–336
Article Google Scholar
Hu XJ, Sun J, Wei LJ (2003) Regression parameter estimation from panel counts. Scand J Stat 30(1):25–43
Article Google Scholar
Huang CY, Wang MC, Zhang T (2006) Analyzing panel count data with informative observation times. Biometrika 93(4):763–775
Article Google Scholar
Ibrahim JG, Chen MH, Sinha D (2001) Bayesian survival analysis. Springer, New York
Google Scholar
Josefek RA, Kauffman RJ (1998) Duration of IT human capital employment. MIS Research Center, Carlson School of Management, University of Minnesota, Minneapolis, MN
Google Scholar
Jung RC, Kukuk M, Liesenfeld R (2006) Time series of count data: modeling, estimation and diagnostics. Comput Stat Data Anal 51(4):2350–2364
Article Google Scholar
Kalbfleisch JD, Lawless JF (1985) The analysis of panel count data under a Markov assumption. J Am Stat Assoc 80(392):863–871
Article Google Scholar
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, Hoboken
Book Google Scholar
Kamakura WA, Kossar BS, Wedel M (2004) Identifying innovators for the cross-selling of new products. Manage Sci 50(8):1120–1133
Article Google Scholar
Kauffman RJ, Mohtadi H (2004) Proprietary and open systems adoption in e-procurement: a risk-augmented transaction cost perspective. J Manage Inform Syst 21(1):137–166
Google Scholar
Kauffman RJ, Techatassanasoontorn AA (2005) International diffusion of digital mobile technology: A Coupled-hazard state-based approach. Inf Technol Manage 6(2):253–292
Article Google Scholar
Kauffman RJ, Techatassanasoontorn AA (2009) Understanding early diffusion of digital wireless phones. Telecommun Policy 33(8):432–450
Article Google Scholar
Kauffman RJ, Tsai J (2009) The unified procurement strategy for enterprise software: a test of the “move to the middle” hypothesis. J Manage Inform Syst 26(2):177–204
Article Google Scholar
Kauffman RJ, Wang B (2008) Tuning into the digital channel: evaluating business model fit for Internet firm survival. Inf Technol Manage 9(3):215–232
Article Google Scholar
Kauffman RJ, Wang B (2008) Developing rich insights on public Internet firm entry and exit based on survival analysis and data visualization. In: Jank W, Shmueli G (eds) Statistical methods in e-commerce research. Wiley, New York
Google Scholar
Kauffman RJ, McAndrews JJ, Wang YM (2000) Opening the `black Box’ of network externalities in network adoption. Inform Syst Res 11(1):61–82
Article Google Scholar
Kennedy BS (2005) Does race predict stroke readmission? An analysis using the truncated negative binomial model. J Natl Med Assoc 97(5):699–713
Google Scholar
Klein JP, Moeschberger ML (1997) Survival analysis: techniques for censored and truncated data. Springer, New York
Google Scholar
Kockelman K, Bottom J, Kweon YJ, Ma J, Wang X (2006) Safety impacts and other implications of raised speed limits. National Highway Project Research Report #17–23. University of Texas, Austin, TX
Google Scholar
Krnajajic M, Kottas A, Draper D (2008) Parametric and nonparametric Bayesian model specification: a case study involving models for count data. Comput Stat Data Anal 52:2110–2128
Article Google Scholar
Kweon YJ, Kockelman K (2005) The safety effects of speed limit changes: use of panel data models, including speed, use, and design variables. Transp Res Rec 1908:148–158
Article Google Scholar
Lambert D (1992) Zero-inflated Poisson regression with an application to defects in manufacturing. Technometrics 34(1):1–14
Article Google Scholar
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley, Hoboken
Google Scholar
Le CT (1997) Applied survival analysis. Wiley, New York
Google Scholar
Lee AH, Wang K, Yau KKW, Somerford PJ (2003) Truncated negative binomial mixed regression modeling of ischaemic stroke hospitalizations. Stat Med 22(7):1129–1139
Article Google Scholar
Lesage JP (2000) Bayesian estimation of limited dependent variable spatial autoregressive models. Geograph Anal 32(1):19–35
Article Google Scholar
Light A, Omori Y (2004) Unemployment insurance and job quits. J Labor Econ 22(1):159–188
Article Google Scholar
Liu WS, Cela J (2008) Count data models in SAS. SAS Global Forum, San Antonio
Liu FY, Hua KA, Xie F (2011) A hybrid communication solution to distributed moving query monitoring systems. Electron Commer Res Appl 10(2):214–228
Article Google Scholar
Long JS (1997) Regression models for categorical and limited dependent variables. Sage Publications, Thousand Oaks
Google Scholar
Longhi S, Nijkamp P (2007) Forecasting regional labor market developments under spatial autocorrelation. Int Reg Sci Rev 30(2):100–119
Article Google Scholar
Ma J, Kockelman KM, Damien P (2008) A multivariate Poisson-lognormal regression model for prediction of crash counts by severity using Bayesian methods. Accid Anal Prev 40(4):964–975
Article Google Scholar
Manchanda P, Dubé JP, Goh KY, Chintagunta PK (2006) The effect of banner advertising on Internet purchasing. J Market Res 43(1):98–108
Article Google Scholar
Mann A, Kauffman RJ, Han K, Nault BR (2011) Are there contagion effects in IT and business process outsourcing? Decision Supp Syst (in press)
Markus ML, Robey D (1988) Information technology and organizational change. Manage Sci 34(5):583–598
Article Google Scholar
McCabe BPM, Martin GM (2005) Bayesian predictions of low count time series. Int J Forecast 21:315–330
Article Google Scholar
Miaou SP (1994) The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. Accid Anal Prev 26(4):471–482
Article Google Scholar
Moreno E, Giron J (1998) Estimating with incomplete count data: a Bayesian approach. J Stat Plann Inf 66:147–159
Article Google Scholar
Morenoff J, Sampson RJ (1997) Violent crime and the spatial dynamics of neighborhood transition: Chicago 1970–1990. Social Forces 76(1):31–64
Google Scholar
Mossholder KW, Settoon RP, Henagan SC (2005) A relational perspective on turnover: examining structural, attitudinal, and behavioral predictors. Acad Manag J 48(4):607–618
Article Google Scholar
Mullahy J (1986) Specification and testing of some modified count data models. J Econometr 33(3):341–365
Article Google Scholar
Mundlak Y (1978) On the pooling of time series and cross section data. Econometrica 46(1):59–85
Article Google Scholar
Munkin MK, Trivedi PK (1986) Simulated maximum likelihood estimation of multivariate mixed-Poisson regression models. Econ J 2(1):29–48
Google Scholar
Novo AA (2004) Contagious currency crises: a spatial probit approach. Working paper, Economic Research Department, Banco de Portugal, Lisbon, Portugal
Orlikowski WJ, Iacono CS (2001) Research commentary: desperately seeking the “IT” in IT research: a call to theorizing the IT artifact. Inform Syst Res 12(2):121–134
Article Google Scholar
Paelinck J, Klaassen L (1979) Spatial econometrics. Saxon House, Farnborough
Google Scholar
Pinkse J, Slade ME (1998) Contracting in space: an application of spatial statistics to discrete-choice models. J Econometr 85(1):125–154
Article Google Scholar
Quddus M (2008) Time-series count data models: an empirical application to traffic accidents. Accid Anal Prev 40(4):1732–1741
Article Google Scholar
Ravichandran V, Rai A (2003) Structural analysis of the impact of knowledge creation and knowledge embedding on software process capability. IEEE Trans Eng Manag 50(3):270–284
Article Google Scholar
Ray G, Wu D, Konana P (2009) Competitive environment and the relationship between IT and vertical integration. Inform Syst Res 20(4):585–603
Article Google Scholar
Raymond JE, Beard TR, Gropper DM (1993) Modeling the consumer’s decision to replace durable goods: a hazard function approach. Appl Econ 25(10):1287–1292
Article Google Scholar
Rose NL (1990) Profitability and product quality: economic determinants of airline performance. J Polit Econ 98(5):944–964
Article Google Scholar
Rose NL, Joskow PL (1990) The diffusion of new technologies: evidence from the electric utility industry. RAND J Econ 21(3):354–373
Article Google Scholar
Rose CE, Martin SW, Wannemuehler KA, Plikaytis BD (2006) On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. J Biopharm Stat 16(4):463–468
Article Google Scholar
Ross SM (1995) Stochastic processes. Wiley, New York, NY
Google Scholar
Russo RP, Shmueli G, Jank W, Shyamalkumar ND (2010) Models for bid arrival and bidder arrivals in online auctions. In: Balakrishnan N (ed) Methods and applications of statistics in business, finance and management sciences. Wiley, Newark, pp 293–309
Saloner G, Shepard A (1995) Adoption of technologies with network effects: an empirical examination of the adoption of automated teller machines. RAND J Econ 26(3):479–501
Article Google Scholar
Sampson RJ, Morenoff J, Earls F (1999) Beyond social capital: spatial dynamics of collective efficacy for children. Am Sociol Rev 64(5):633–660
Article Google Scholar
Shively TS, Kockelman K, Damien P (2010) A Bayesian semi-parametric model to estimate relationships between crash counts and roadway characteristics. Transp Res Part B 44(5):699–715
Article Google Scholar
Shmueli G, Russo RP, Jank W (2007) The BARISTA: a model for bid arrivals in online auctions. Ann Appl Stat 1(2):412–441
Article Google Scholar
Shonkwiler JS, Shaw WD (1996) Hurdle count data models in recreation demand analysis. J Agric Resour Econ 21(2):210–219
Google Scholar
Sidorova A, Evangelopoulos N, Valacich JS, Ramakrishnan T (2008) Uncovering the intellectual core of the information systems discipline. MIS Q 32(3):467–482
Google Scholar
Sinha RK, Chandrashekaran M (1992) A split hazard model for analyzing the diffusion of innovations. J Market Res 29(1):116–127
Article Google Scholar
Sood A, Tellis GJ (2011) Demystifying disruption: a new model for understanding and predicting disruptive technologies. Market Sci 30(2):339–354
Article Google Scholar
Stephan PE, Gurmu S, Sumell AJ, Black GC (2007) Who’s patenting in the university? Evidence from the survey of doctorate recipients. Econ Innov New Technol 16(2):71–99
Article Google Scholar
Telang R, Boatwright P, Mukhopadhyay T (2004) A mixture model for Internet search engine visits. J Market Res 41(2):206–214
Article Google Scholar
Tolnay SE, Deane G, Beck EM (1996) Vicarious violence: spatial effects on southern lynchings, 1890–1919. Am J Sociol 102(3):788–815
Article Google Scholar
Trevor CO (2001) Interactions among actual ease-of-movement determinants and job satisfaction in the prediction of voluntary turnover. Acad Manag J 44(4):621–638
Article Google Scholar
Trivedi PK (1997) Introductions: econometric models of event counts. J Appl Econometr 12(3):199–201
Article Google Scholar
Trivedi PK, Munkin MK (2009) Recent developments in cross section and panel count models, working paper, University of California, Davis, CA
Van de Ven A, Poole MP (1995) Explaining development and change in organizations. Acad Manag Rev 20(3):510–540
Google Scholar
Van den Poel D, Larivière B (2004) Customer attrition analysis for financial services using proportional hazard models. Eur J Oper Res 157(1):196–217
Article Google Scholar
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York
Google Scholar
Wang K, Yau KKW, Lee AH (2002) A hierarchical Poisson mixture regression model to analyze material length and hospital stay. Stat Med 21(23):3639–3654
Article Google Scholar
Ward MD, Gleditsch KS (2002) Location, location, location: an MCMC approach to modeling the spatial context of war and peace. Polit Anal 10(3):244–260
Article Google Scholar
Wedel M, Desarbo WS, Bult JR, Ramaswamy V (1993) A latent class Poisson regression model for heterogeneous count data. J Appl Econometr 8(4):397–411
Article Google Scholar
Windmeijer FAG, Santos Silva JMC (1997) Endogeneity in count data models: an application to demand for health care. J Appl Econometr 12(3):281–294
Article Google Scholar
Winkelmann R (1995) Duration dependence and dispersion in count data models. J Bus Econ Stat 13(4):467–474
Google Scholar
Winkelmann R (2000) Econometric analysis of count data, 3rd edn. Springer, New York
Google Scholar
Winkelmann R (2000) Seemingly unrelated negative binomial regression. Oxford Bull Econ Stat 62(4):553–560
Article Google Scholar
Winkelmann R, Zimmermann KF (1995) Recent developments in count data modeling: theory and application. J Econ Surv 9(1):1–24
Article Google Scholar
Xiang LM, Lee AH, Yau KKW, McLachlan GJ (2006) A score test for zero-inflation in correlated count data. Stat Med 25(10):1660–1671
Article Google Scholar
Yang Z, Hardin JW, Addy CL, Vuong QH (2007) Testing approaches for overdispersion in Poisson regression versus the generalized Poisson model. Biomed J 49(4):565–584
Google Scholar
Yelland P (2009) Bayesian forecasting for low-count time-series using state-space models: an empirical evaluation for inventory management. Int J Prod Econ 118:95–103
Article Google Scholar
Zeilis A, Kleiber C, Jackman S (2007) Regression models for count data in R. Research report series, 53, Department of Statistics and Mathematics. Wirtschaftsuniversitat Wien, Vienna, Austria
Google Scholar
Zhu K, Kraemer KL, Gurbaxani V, Xu SX (2006) Migration to open-standard interorganizational systems: network effects, switching costs, and path dependency. MIS Q 30(Special issue):515–539
Google Scholar

Download references

Acknowledgments

The authors would like to thank the guest editor, Christopher Westland, and the anonymous reviewers of this special issue article in Information Technology and Management for their helpful comments and encouragement. Rob Kauffman also thanks Singapore Management University, National Sun Yat-sen University, and the W.P. Carey Chair in IS at Arizona State University for research funding and support.

Author information

Authors and Affiliations

School of Information Systems, and Lee Kong Chian School of Business, Singapore Management University, Singapore, Singapore
Robert J. Kauffman
Faculty of Business and Law, Auckland University of Technology, Auckland, New Zealand
Angsana A. Techatassanasoontorn
College of Business, University of Texas-Pan American, Edinburg, TX, USA
Bin Wang

Authors

Robert J. Kauffman
View author publications
You can also search for this author in PubMed Google Scholar
Angsana A. Techatassanasoontorn
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert J. Kauffman.

Appendices

Appendix 1: Interdisciplinary studies illustrating the application of event history methods

We offer a brief summary of additional studies in a number of disciplines beyond IS that illustrate broad applications of event history methods. There are many other applications of event history methods in marketing, human resources and labor economics, and economics that can guide future efforts (see Table 8). A number of articles are worthwhile to briefly discuss for their use of event history methods, since they illustrate the kinds of applications of the modeling and methodological ideas that seem as though they will be most actionable.

Table 8 Representative studies from other disciplines involving event history methods

Full size table

1.1 Marketing

Marketing researchers have used event history methods to examine the adoption of new product and technology innovations [52, 153], a consumer’s decisions to replace an existing durable product [141], the loss of customers by companies [30, 83, 162], and a consumer’s repeat product purchase decisions [99, 123]. The Cox proportional hazards model has been the most often used method, since it allows researchers to examine the factors that affect the underlying failure process that leads to the occurrence of the events. In addition, researchers have used parametric analysis, AFT models, frailty models and Bayesian survival analysis. Using these survival analysis techniques, IS and e-commerce researchers can investigate technology adoption by individuals and organizations, the decision to discontinue using a technology or to replace it with a newer technology, the duration of outsourcing contracts, and the duration a software application remains on the most popular software list on Download.com.

1.2 Human resources and labor economics

Researchers have used survival analysis to examine employee turnover [86, 130, 158], the decision to quit on the part of employees and managers [117], and the duration of unemployment [11]. In addition to the Cox proportional hazards model, this stream of research also uses recurrent event survival analysis, since researchers often observe multiple instances of turnover and unemployment on the same individual. Similar research issues in the IS and e-commerce area include IT professionals’ turnover and duration of unemployment.

1.3 Economics

Survival analysis, especially the Cox proportional hazards model, has been frequently used in empirical analyses of the diffusion of new technologies [143, 147] and firm survival [14, 88]. Parametric and semiparametric proportional hazards models have been leveraged, allowing researchers to examine the impact of a set of explanatory variables on the adoption decisions or firm survival. These studies provide IS and e-commerce researchers with examples for how different survival analysis techniques can be applied to examine the adoption and diffusion of new technologies and survival of IT and e-commerce firms.

Appendix 2: Interdisciplinary studies illustrating the application of spatial analysis methods

There are also many areas of application for spatial analysis methods that occur in sociology and criminology, political science and political economy, and trade and economics, among others. We will cover a representative sample of the different kinds of applications (see Table 9).

Table 9 Representative studies from other disciplines involving spatial analysis methods

Full size table

2.1 Sociology and criminology

Three studies in sociology examine neighborhood effects in population dynamics, child development, and violence. Morenoff and Sampson [129] looked at the effects of homicide and ecological factors (socioeconomic disadvantage, ethnicity, age composition, and residential stability) on changes in the neighborhood population in Chicago from 1970 to 1990. They tested the hypothesis that neighborhood population change is a contagion-bearing process, so a change in one area is likely to affect other close-by neighborhoods. The results suggest the extent of spatial interactions on the diffusion of violent crime and population change. The higher the crime in the surrounding neighborhood of a given area, the greater will be its population loss. Sampson et al. [148] used the spatial lag model to study the spatial dynamics in social processes that produce collective efficacy for children in Chicago area. They found that neighborhoods benefit from being in proximity to those that have high levels of adult-child exchange and shared expectations for child social control.

Tolnay et al. [157] studied the lynching of African Americans across counties in ten southern U.S. states. They test two competing theories that might explain the spatial dependence of lynchings. Contagion theory predicts that lynchings in one area increase the likelihood that they will occur in other nearby areas. Deterrence theory predicts that the probability of such events declines when they occurred in other areas. Their findings support the deterrence model of spatial dependence. Similarly, IS and e-commerce researchers can examine contagion effects in IT adoption in different regions, the provision of outsourcing services in neighboring cities and countries, the emergence of IT and e-commerce firms in different cities, states, or countries, and the migration of IT professionals in the marketplace.

2.2 Political science

Two studies look at spatial effects in war involvement and political campaign donations. Ward and Gleditsch [165] modeled the dependence of the likelihood of a country’s war involvement on the war involvement of other proximate countries using an autologistic model and the MCMC estimation method. Autologistic models have found much use in the evaluation of Markov random field theory and spatial correlation representations for pixel locations in computer and robotic vision [25]. Ward and Gleditsch’s [165] model shows good predictive ability in an out-of-sample forecast for countries that were drawn into civil wars and international conflicts in the 1989–1998 period. Cho [41] estimated spatial effects in campaign donations in the U.S. from 1980 to 1998. The results suggest that campaign donations from Asian American contribution networks are spatially-clustered with both spatial dependence and spatial heterogeneity. This research offers analogies for IS and e-commerce modeling, including the spatial clustering of competition among similar business models in a variety of industry sectors that have been major areas of technological innovation, capital investment and business development.

2.3 Political economy

Case et al. [38] analyzed the impact of state budgets on the expenditures of other nearby states. Their theoretical perspective is based on the effects of spillovers between states that are in close proximity or share similar economic characteristics and demographic qualities. For example, school expenditures in the Washington, DC area may have provided the impetus for Maryland and Virginia, since they are neighboring states, to spend more. The authors found that state government fiscal decisions are influenced significantly by the budgeting actions of nearby states. Another interesting study is by Novo [134], who used a spatial probit model to study the contagion effects among European countries in the 1992 currency crisis. The author’s findings confirm that problems with perceptions about the strengths and weaknesses of currencies were transmitted through the community’s trade channels.

2.4 Economics

Several studies in various sub-fields of economics have investigated spatial dependence in other contexts. For example, Basu and Thibodeau [19] examined spatial autocorrelation in the prices of single-family properties in Dallas, Texas between 1991 and 1993. They found that structural characteristics (lot characteristics, neighborhood amenities, accessibility to other locations) did not explain all of the variation in transaction prices. Instead, there was strong evidence of spatial autocorrelation in transaction prices across submarkets. Cohen and Paul [44] attempted to answer why manufacturing activities seem to concentrate in a few regions. In particular, they examined the contribution of spatial spillovers and agglomeration economies on the performance of the food manufacturing industry in the U.S. Their results suggest that the geographic concentration of food manufacturers is motivated by cost considerations from locating close to suppliers and markets. The agglomeration of IT companies and outsourcing service providers can also be examined using similar methods.

Appendix 3: Interdisciplinary studies illustrating the application of count data methods

We will discuss a number of application areas of count data methods including: transportation, criminal recidivism, healthcare, manufacturing and factory processes, the creation of patents in research and development activities, and labor and unemployment issues (see Table 10).

Table 10 Representative studies from other disciplines involving count data methods

Full size table

3.1 Transportation accidents

One of the well-known application areas for count data modeling is motivated by the study of the frequency of airline accidents and incidents, including the work of Rose [142] and Dionne et al. [59]. The original work by Rose [142] explores data on 16 American passenger airlines between 1957 and 1986 that involved the incidence of aircraft damage and human injuries and deaths. Dionne et al.’s [59] research explored similar issues for 120 Canadian airlines during the 1976–1987 period. In both instances, a problem arose with correlation between the incidence of accidents and the actions of regulatory authorities to require the airlines to implement improved practices that were intended to diminish the likelihood of accidents.

Chib and Winkelmann [39] examined this kind of data using a model that represents the accident count correlations using correlated latent effects. They estimate the data based on a model with Poisson counts and latent effects that follow a multivariate Gaussian distribution, and a number of context-relevant covariates. The latter include: the number of an airline’s departures, the operating margin to gauge an airline’s profitability, the average stage length of an airline’s city-pair routes, the airline’s cumulative experience in terms of miles flown, and the extent to which the airline’s flights had international components, as well as airline firm and year fixed effects. The authors used Markov chain Monte Carlo (MCMC) methods, a Bayesian simulation-based approach.

A number of interesting applications of count data estimation models cluster in the area of vehicle accidents. For example, Houston [91] modeled youth motorcycle fatalities for 15 to 20-year olds from 1974 to 2004 using a family of negative binomial regression models as a basis for understanding the relevant explanatory factors. By using a variety of fixed effects in their models, the authors were able to show regional and temporal variations in the incidence of deaths. They also were able to evaluate the beneficial impacts of mandatory universal helmet laws versus partial coverage helmet laws at the state level in the U.S. The authors’ methodology and careful consideration of estimation bias was relatively effective in how it addressed omitted variables and unobserved heterogeneity, even though the fixed effects modeling specification assumed that there was temporal stability with respect to the effects of the stratifying variables. This work offers a useful analogy for how IS researchers might conduct empirical studies of a spectrum of issues that arise with respect to the incidence of customer information privacy breaches, corporate information breaches, and a variety of related problems that arise on the effectiveness of what organizations do to safeguard sensitive information for their stakeholders.

Another work in the transportation safety area is suggestive of a different kind of analogy for the development of useful research in the IS discipline based on the characteristics of different technologies, applications and systems, users and adopter groups, business processes, and business partners. Shively et al. [149] used a Bayesian semiparametric model to evaluate the relationship between the characteristics of more than 7,700 two-lane rural roadways and roadway segments, and vehicle crash counts that occurred on them in the state of Washington in 2002. The authors’ approach brings a set of continuous variables into their model via unknown functional forms and a second set of categorical variables via linear functions. They use measures such as speed limits, average annualized daily traffic, roadway width, degree of road curvature, and vertical grade, among other variables, to predict the number of crashes. Their approach goes beyond prior and more standard approaches involving panel data negative binomial regression models with covariate effects that are modeled in linear form [109, 127], incorporate random effects [40, 111], and employ Bayesian approaches to the estimation of the data [122]. Taken together, these different works that extend the negative binomial regression model’s assumptions and estimation approach offer a useful roadmap for the application of these methods in IS research involving such issues as the occurrence of defects related to the characteristics of software applications, development teams, project development tools and organizational environments.

3.2 Public transportation

Another area of study that IS researchers can learn from the application of count data modeling approaches is in the area of public transportation. This area has become increasingly important as oil and energy resource prices have steadily risen, reflecting the need for greater awareness of business policies that emphasize resource sustainability. Research that demonstrates the application of zero-inflated count data models in this area is attributable to Frondel and Vance [64], although its basis is on earlier work that pioneered zero-inflated Poisson models in the estimation of manufacturing defects [112] and in the more general areas of zero-inflated models and zero-inflated negative binomial models that we discussed earlier. The need for such models arises when the analyst observes many zeros for the dependent variable in a count data model. The authors investigate public transportation use in Germany during the five-day work-week, and how it is influenced by vehicle fuel prices and transit fares. The zeros arise because the authors modeled individual riders in the catchment areas of potential public transportation users that are served. On any given day, many do not use public transportation, and others never use it at all. The authors’ approach permitted them to model the effects of key covariates while controlling for individual-level user attributes, as well as the characteristics of the system as a whole.

The public transportation context suggests the presence of two latent regimes: one involving users and the other involving non-users. We see similar problems in IS research including the use of virus scanning software and PC firewalls, choices to opt out or opt into corporate information privacy programs, the non-adoption versus adoption and use of a variety of social networking tools and environments, and online news and advertising services (e.g., Facebook and LinkedIn, RSS and other pushed Internet news services, and Groupon, Social Living and other online purchase opportunity awareness building services). The opportunity to leverage zero-inflated models also is likely to occur in the context of music and video downloads among free and fee-based online streaming digital media services for the study of users who consume digital media contents differently via different distribution channels. A similar analogy applies to consumers whose travel-related purchase patterns for airline tickets and hotel stays, where there is significant interest in issues of information transparency [71], alternative channel strategies [31] and product and service decommoditization [70].

3.3 Healthcare

There have been other efforts made to model a variety of healthcare issues using count data modeling approaches. Another aspect of Chib and Winkelmann’s [39] research involves the application of their MCMC approach to explain patient visits for different healthcare services (including doctor’s office visits, non-doctor office visits, hospital outpatient visits, emergency room visits and so on). Although they specifically considered the possibility of correlated visit outcomes based on different provisions for healthcare insurance, prior research viewed the different outcomes as independent [56, 133]. These approaches offer almost immediate translation for application in the technology adoption arena, when firms experience new regulatory requirements for accounting process changes and information requirements that require multiple changes and adjustments to their systems requiring modifications and choices of new kinds of supporting technologies.

The research of Deb and Trivedi [56, 57], Bago d’Uva [15, 16], and Gurmu and Elder [78] offers useful guidance for IS researchers who wish to distinguish between adopters who are frequent users versus infrequent users, as opposed to adopters versus non-adopters. This is true for healthcare services, for example, where few people are non-users, even though not many people are likely to be frequent users. Gurmu and Elder [78] explored empirical count data models that involve two kinds of zeros: one type is the choices that consumers, technology adopters, or users make, and the other is measurement error where the analyst is unable to record relevant count-related behaviors. This occurs with research scientists who apply for patents and do not receive them, and for other research scientists who obtain patents for their inventions outside the observation time window [155].

The methods and empirical work that these authors present use the negative binomial model as a foundation. Their work suggests that hurdle models for count data analysis do not offer sufficient structure to take into account frequent versus infrequent use, and so they propose the use of another class of models called finite mixture models. The method involves the identification of latent classes. In the healthcare context, this could be something like a person’s long-term unobservable health status (e.g., related to advancing problems with diabetes, heart disease or other issues). In the IS context, latent classes may arise with respect to technology adoption due to changes in the functionality of the available technologies, evolving business practices and business models, and so on, that cause individuals to be differentially pre-disposed to higher or lower levels of technology usage. This kind of modeling approach, according to Bago d’Uva [16], can be applied in such a way that it is possible to allow or disallow the parameters in the model that address overdispersion to vary with the latent classes.

The reader shouldn’t conclude that only more complex count data models are useful in empirical research in healthcare. An important problem in the treatment of women’s health involves diagnosing breast cancer from axillary dissection of cancerous breast tissue. This helps clinicians to determine the counts of involved nodes to set up a basis for assessing the severity of the illness, the likelihood of survival, and the appropriate treatment for their patients. Dwivedi et al. [60] evaluated the Poisson, zero-inflated Poisson, negative binomial, and zero hurdle/negative binomial regression models for the extent to which visible skin changes, location and tumor size were associated with cancerous nodes for 1,152 Indian women from 1983 to 2005.

Two results are interesting to consider for IS researchers, since they illustrate the sophistication of observation that is possible with this kind of econometric analysis. First, the researchers found that the negative binomial model fits the data better than any of the other models that were used, which is consistent with the large number of uninvolved nodes (the zeros, in this case) that often are present when the cancer is diagnosed early. The zero-inflated negative binomial and the hurdle regression models predicted the number of involved nodes better though. The conclusion the authors drew was that the prediction of involved nodes was a by-product of the overdispersion of uninvolved nodes, as well as some other unobserved heterogeneity that was not captured by the explanatory variables. They recommend that doctors base their treatment on the results of zero-inflated negative binomial regression estimates, since it permits the analysis to focus on uninvolved nodes that are at high risk of becoming cancerous.

The work of Rose et al. [144] is similar in its comparisons of these kinds of models in a study of vaccine-averse patient event count data for at-risk and not-at-risk populations. The issue is whether it is appropriate to permit zeroes to arise from both the at-risk (what they call sample zeroes) and the not-at-risk groups (what they call structural zeroes). The authors conclude that overdispersion of zero outcomes may not always be sufficient cause for the analyst to choose zero-inflated negative binomial regression or hurdle regression. They show empirically, based on their data, that when at-risk and not-at-risk patients are considered (the sample and structural zeroes together), then a zero-inflated negative binomial regression is better. When estimating data with only at-risk patients, hurdle regression seems to work better.

3.4 Unemployment

To wrap up our discussion of applied contexts that may offer useful modeling analogies for IS researchers who are interested in the application of count data modeling, we will next consider three studies that deal with labor and employment. (For a more in-depth review, see [169].) There is a substantial literature available that offers useful research designs, modeling structures, and estimation approach choices in which various count data models are employed.

An example of the issues and estimation structures that are employed is found in the work of Andress [4], who studied the recurrence of unemployment among West German men between 1977 and 1982. The author points out the contrast between having access to event data, which are more micro-level and more informative, in comparison to count data, which are more aggregative and less informative. He also points out three other issues: problems with underestimation of counts due to the use of retrospective data and only registered instances of unemployment, the sampling of employment status as an endogenous variable (which raises the question of the underlying structural model, and leaves the analysis open to over-sampling of higher-risk groups), one-shot observations of independent variables that actually are changing over time, and panel attrition of participants across the timeline of the study. The author distinguishes between two useful concepts. One is statistical dependence, which occurs when “events appear to be dynamically dependent, but in fact, some individuals have higher risks of experiencing an event than others” [4] due to some unobserved heterogeneity. The other is causal dependence, which occurs when there is something beyond the individual that is a clear driver of the outcome that is counted. In this context, an example is that prior spells of unemployment might drive the observation of recurring unemployment; in other words, there is some causal link. The author extends the Poisson regression model with a gamma mixing distribution to permit overdispersion of zero event counts. He refers to his approach as an apparent contagion model or a spurious occurrence dependence model, with the idea of emphasizing the effort they have made to sort out causal and statistical dependence.

Numerous observers have suggested that unemployment tends to predispose people to healthcare problems, since they are less likely to be able to visit the doctor or receive the appropriate kinds of hospital treatment that are called for in funded health insurance programs. Danö [53] evaluated causality in this setting, with panel data from 1981 to 1996 with a 10% sample of the entire population of Denmark, based on the frequency of doctor visits and consultations. The author uses multiple count data estimation approaches, two of which involve the negative binomial regression model with fixed effects and a Mundlak formulation random effects specification [132]. The latter formulation permits individual-specific effects to be correlated with time-varying explanatory variables in the model—something that was not possible with fixed effects at the time the author did this research. An interesting outcome of this research, which illustrates the power that refined econometric models have for challenging the conventional wisdom of an area, is that the authors found no significant effects of unemployment for health among men and women—even after correcting for the possible correlations between individual-specific effects and the relevant explanatory variables. A requirement of all of the count data models that the authors used was the assumption of exogenous explanatory variables, so they also estimated a reduced generalized method of moments (GMM) model that didn’t require this assumption, and they still were able to establish the same general results. This approach in empirical research with count data models further emphasizes how it is possible with post-estimation robustness checks to increase the perceived reliability of the statistical findings.

3.5 Political economy

Another interesting work in the labor and political economy area explores the connection between the results of a variety of count data models with other associated events that occur later, but are subject to unknown lags. This is a useful method for IS and e-commerce researchers, since it is often the case that the observation of one kind of event, or a cluster of events, may be tied to other events that come later. An example is in the context of the signing on or the departure of members of a firm’s top management team, once the chief executive officer resigns. Honaker [87] also estimated a family of count data models in order to make the case that there is a causal link between the unemployment level in the population and the number of instances of political violence in Northern Ireland. In this research, the author employed several different models, including Poisson, negative binomial, zero-inflated Poisson and hurdle regression, and again shows the different efficacies of the various formulations based on the estimation outcomes. The author’s estimation approach also involved the modeling of instances when one side retaliates against a violent attack by the other side, by building a lag structure for violence into a count data estimation model.

An important observation in this research, and one that should be relevant to other IS researchers who undertake e-commerce and technology adoption studies, is that lagging the dependent variable further increases the number of zero-valued dependent variable counts. The workaround suggested by the author is to compute a rolling average of the events over some period of time that makes sense for the study context. A second key observation that is useful for IS research, e-commerce and technology adoption modeling settings is to recognize that there will be a most likely time to the observation of the lagged event associated with the initial event. It may be typical to assume, as the author has, that the likelihood of a tied event diminishes after the occurrence of the first event. This observation may be helpful for those who are interested in studying time-clustered technology adoption that is subject to contagion effects from prior adoption events [124]. Honaker [87] evaluated the retaliation events with geometrically-distributed and quadratically-distributed lags, which are well-suited to creating reaction curves of the appropriate shape. This approach also provides a micro-level foundation for the theoretical explanation of aggregate behavior.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kauffman, R.J., Techatassanasoontorn, A.A. & Wang, B. Event history, spatial analysis and count data methods for empirical research in information systems. Inf Technol Manag 13, 115–147 (2012). https://doi.org/10.1007/s10799-011-0106-5

Download citation

Published: 28 September 2011
Issue Date: September 2012
DOI: https://doi.org/10.1007/s10799-011-0106-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Event history, spatial analysis and count data methods for empirical research in information systems

Abstract

Access this article

Similar content being viewed by others

Longitudinal Studies in Information Systems

Event-History-Analysis

Case Studies

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Interdisciplinary studies illustrating the application of event history methods

1.1 Marketing

1.2 Human resources and labor economics

1.3 Economics

Appendix 2: Interdisciplinary studies illustrating the application of spatial analysis methods

2.1 Sociology and criminology

2.2 Political science

2.3 Political economy

2.4 Economics

Appendix 3: Interdisciplinary studies illustrating the application of count data methods

3.1 Transportation accidents

3.2 Public transportation

3.3 Healthcare

3.4 Unemployment

3.5 Political economy

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Event history, spatial analysis and count data methods for empirical research in information systems

Abstract

Access this article

Similar content being viewed by others

Longitudinal Studies in Information Systems

Event-History-Analysis

Case Studies

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Interdisciplinary studies illustrating the application of event history methods

1.1 Marketing

1.2 Human resources and labor economics

1.3 Economics

Appendix 2: Interdisciplinary studies illustrating the application of spatial analysis methods

2.1 Sociology and criminology

2.2 Political science

2.3 Political economy

2.4 Economics

Appendix 3: Interdisciplinary studies illustrating the application of count data methods

3.1 Transportation accidents

3.2 Public transportation

3.3 Healthcare

3.4 Unemployment

3.5 Political economy

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation