Expressing casual relationships in conceptual database schemas
Introduction
Conceptual schema design is a crucial phase in the database design process. The quality of the final database (regardless of logical implementation model) and applications are dependent largely upon the quality of the conceptual schema. A conceptual model is intended to serve as a formal representation of the requirements specifications. Hence, it is important that a conceptual schema capture the requirements specified as completely and unambiguously as possible (Jarvenpaa et al., 1989). The need to bridge the gap between requirements specifications and data modeling has been identified as a critical area of research (Navathe, 1992). This paper seeks to help bridge this gap by hypothesizing the need for additional cognitive relationship types in conceptual schemas, and demonstrating the need for one such construct, causation.
To understand the weakness of current data models, it is important to examine the kinds of relationships that are likely to be present in a requirements document. Managers and other end-users are typically not trained in database design, and therefore express requirements as they perceive them in the world using natural relationships among objects. These expressions of requirements reflect the natural relationship types humans use to organize knowledge. Research in fields as diverse as cognitive psychology, philosophy, and rhetoric has identified numerous such relationships between entities, such as causal, motivational, and hierarchical relationships (Brockriede et al., 1960; Browne et al., 1998; Curley et al., 1995). A list of these relationships appears in Table 1.
Given the breadth of relationship types that is likely to occur in requirements specifications, it is not surprising that semantically-rich data models that permit explicit specification of cognitive relationships such as association (sign in Table 1), generalization/specialization (generalization and individuation in Table 1), and aggregation (various hierarchical relationships in Table 1) have been found to be better for conceptual schema design than traditional models such as relational, hierarchical, and network models (Hull et al., 1987). For example, in a study of the literature comparing the usability of various conceptual data models (traditional and semantic), Batra et al. (1994)found that semantic models such as the Entity–Relationship Model (Chen, 1976) and its derivatives were best suited for supporting conceptual database design. Further, Jarvenpaa et al. (1989)found that end-users were able to express relationships better using semantic models. Navathe (1992)identified five characteristics that a good conceptual model must possess: Expressiveness, Simplicity, Minimality, Formality, and Unique Interpretation. The key characteristic distinguishing semantic models from traditional models is the expressiveness of the relationship constructs supported by them (Burt et al., 1990). This expressiveness allows designers to create abstractions of real-world information by mapping that information into basic human concepts (Tsichritzis et al., 1982), Thus, a semantic model can better capture the user's perception of data relevant to an application (as defined by the requirements) (Navathe, 1992).
As noted, most semantic models allow explicit specification of association, generalization/specialization, and aggregation relationships. However, a review of Table 1 shows that these represent only a subset of the relationships that are likely to be used by people in describing an application environment. Hence, while semantic models may have higher image fidelity than traditional data models, i.e., schemas created using semantic models may conform better to users' views of the world (Everest, 1986), they are still limited in the types of relationships available in them. Thus, using current semantic models for conceptual schema design may result in abstractions of application environments in which some important information is either not represented or is represented inappropriately.1
One relationship type that is not supported in current semantic data models is causation. Causation is a fundamental aspect of cognition, and is the most common type of relationship revealed in studies of human reasoning (Curley et al., 1995; Schustack, 1988). For example, in an empirical study of managerial reasoning, two-thirds of the relationships expressed by subjects were causal in nature (Curley et al., 1995). Hence, causal relationships undoubtedly are part of users' representations of problem representations, and it is likely that such relationships will be found in requirements specifications.
Data modelers are most likely to encounter causal relationships in the form of business rules (McFadden et al., 1999) or conditional requirements statements. Such a rule or statement, though not representing causality in its purest form, is an informal use of causation; it provides a condition whose presence makes a critical difference to the occurrence of an outcome (Schustack, 1988). The importance of causal statements in requirements documents is likely to increase in the future, because embedding business rules in the form of triggers is becoming increasingly prevalent in commercial databases.
Although the pervasiveness of causation in problem solvers' representations has been empirically demonstrated (e.g., Curley et al., 1995; Tversky et al., 1980; Wilkin, 1996), none of the models used for database design provide sufficient means for capturing causal relationships (Hull et al., 1987). The inability to express these relationships is likely to lead to conceptual schemas that do not completely represent the requirements. The focus of this paper is on investigating the effects of the lack of constructs in semantic models for capturing causation on analysts' ability to express causal relationships mentioned in a requirements document.
Section snippets
Hypotheses
Two groups of subjects were sought for the study, one familiar with semantic data modeling techniques and one unfamiliar. The rationale for the two groups was as follows. Research has demonstrated that people organize information using causation under appropriate circumstances (Schustack, 1988). Hence, subjects unfamiliar with database modeling (the database-naive group) should use causal relationships as naturally appropriate in modeling an application environment. However, because current
Methodology
An experimental hypothesis testing methodology was used to investigate users' ability to represent causal relationships in an application scenario. Subjects were 78 students recruited from information systems classes at an eastern US university who received course credit for their participation. Subjects were categorized as either database-naive (“Naive”) or database-knowledgeable (“Knowledgeable”) for purposes of analysis. A brief questionnaire distributed after the experimental task was used
Results
As a check on whether the naive group was handicapped by the lack of formal training in modeling, we tested to see whether members of the two groups captured the essential entities in the model to the same extent. Six entities were identified by the researchers as critical to representing the content of the scenario. The number of entities expressed by each subject was tallied (to be counted, the entity had to be explicitly stated by the subject). The mean number of entities expressed by group
Conclusions and future research
Our objective in this paper has been to investigate the effects of the lack of causal constructs in semantic models on analysts' ability to express causal relationships in conceptual schemas. We reported the outcome of an experiment that examined the extent to which database-naive and database-knowledgeable people were able to model causal statements embedded in a short case. We found that the conceptual representations created by the naive subjects expressed causal relationships better than
V. Ramesh is an Assistant Professor in the Department of Accounting and Information Systems, Kelley School of Business at Indiana University. His research interests are in heterogeneous databases, database modeling, and group support systems. His papers have been published in ACM Transactions on Information Systems, IEEE Expert, Information Systems and other journals. He received his Ph.D. in Business Administration (MIS) from the University of Arizona. He also holds a M.S. in Computer Science
References (17)
- et al.
Effects of data model and task characteristics on designer performance: a laboratory study
International Journal of Human-Computer Studies
(1994) - et al.
Data analysis and learning: an experimental study of data modeling tools
International Journal of Man-Machine Studies
(1989) - Brockriede, W., Ehninger, D., 1960. Toulmin on argument: an interpretation and application. Quarterly Journal of...
- et al.
Evoking information in probability assessment: knowledge maps and reasoning-based directed questions
Management Science
(1997) - Browne, G.J., Curley, S.P., 1998. Reasoning with category knowledge in probability forecasting: typicality and...
- et al.
Information models and modeling techniques for information systems
Annual Review of Information Science and Technology
(1990) The entity-relationship model: toward a unified view of data
ACM Transactions on Database Systems.
(1976)- et al.
Arguments in the practical reasoning underlying constructed probability responses
Journal of Behavioral Decision Making
(1995)
Cited by (10)
Improving information requirements determination: A cognitive perspective
2002, Information and ManagementA research note on representing part-whole relations in conceptual modeling
2012, MIS Quarterly: Management Information SystemsKnowledge representation: A conceptual modeling approach
2012, Journal of Database ManagementConceptual modeling of events for active information systems
2008, Distributed Artificial Intelligence, Agent Technology, and Collaborative ApplicationsOntological foundations for active information systems
2007, International Journal of Intelligent Information TechnologiesA contingency model for requirements development
2007, Journal of the Association for Information Systems
V. Ramesh is an Assistant Professor in the Department of Accounting and Information Systems, Kelley School of Business at Indiana University. His research interests are in heterogeneous databases, database modeling, and group support systems. His papers have been published in ACM Transactions on Information Systems, IEEE Expert, Information Systems and other journals. He received his Ph.D. in Business Administration (MIS) from the University of Arizona. He also holds a M.S. in Computer Science from the University of Iowa and a B.E. in Computer Science from the Birla Institute of Technology, Mesra (Ranchi), India.
Glenn J. Browne received his Ph.D. in MIS and Decision Sciences from the University of Minnesota. His research interests include systems development, semantic modeling, and basic decision-making processes. His papers have appeared in Management Science and other journals.