Expressing casual relationships in conceptual database schemas

doi:10.1016/S0164-1212(98)10081-X

Journal of Systems and Software

Volume 45, Issue 3, 15 March 1999, Pages 225-232

https://doi.org/10.1016/S0164-1212(98)10081-X Get rights and content

Abstract

Conceptual schema design is a crucial phase in the database design process. The quality of the final database (regardless of logical implementation model) is dependent largely upon the quality of the conceptual schema. Since conceptual schemas serve as formal representations of the requirements specification for a database, it is critical that a schema capture the requirements as completely and unambiguously as possible. Many studies have shown that semantic models, such as the Extended Entity–Relationship model, are better for conceptual database design than traditional models such as relational, hierarchical, and network models. This is primarily because of their ability to capture explicitly many “natural” cognitive relationship types that are likely to occur in requirements specifications, e.g., association, generalization/specialization, and aggregation. However, the relationships that can be specified in a semantic model represent only a subset of the relationships that are likely to be used by people in describing an application environment. Thus, using current semantic models for conceptual database design may result in abstractions of application environments in which some important information from the requirements is either not represented or is represented inappropriately.This paper seeks to help bridge the gap between requirements specifications and data modeling by hypothesizing the need for supporting additional cognitive relationship types in conceptual models. In the paper, we demonstrate the need for one such relationship type, causation. Specifically, we investigate the effects of the lack of constructs in semantic models for capturing causation on analysts' ability to express causal relationships mentioned in a requirements document.We found that subjects not familiar with data modeling expressed causal relationships better in their representations than did subjects who had some prior exposure to data modeling. This seems to indicate that the lack of constructs for capturing causation in semantic models hinders the ability of people trained in data modeling techniques to recognize and express causal relationships in conceptual schemas. The results also suggest the need to develop semantic models that provide constructs for capturing causation and other cognitive relationships.

Introduction

Conceptual schema design is a crucial phase in the database design process. The quality of the final database (regardless of logical implementation model) and applications are dependent largely upon the quality of the conceptual schema. A conceptual model is intended to serve as a formal representation of the requirements specifications. Hence, it is important that a conceptual schema capture the requirements specified as completely and unambiguously as possible (Jarvenpaa et al., 1989). The need to bridge the gap between requirements specifications and data modeling has been identified as a critical area of research (Navathe, 1992). This paper seeks to help bridge this gap by hypothesizing the need for additional cognitive relationship types in conceptual schemas, and demonstrating the need for one such construct, causation.

To understand the weakness of current data models, it is important to examine the kinds of relationships that are likely to be present in a requirements document. Managers and other end-users are typically not trained in database design, and therefore express requirements as they perceive them in the world using natural relationships among objects. These expressions of requirements reflect the natural relationship types humans use to organize knowledge. Research in fields as diverse as cognitive psychology, philosophy, and rhetoric has identified numerous such relationships between entities, such as causal, motivational, and hierarchical relationships (Brockriede et al., 1960; Browne et al., 1998; Curley et al., 1995). A list of these relationships appears in Table 1.

Given the breadth of relationship types that is likely to occur in requirements specifications, it is not surprising that semantically-rich data models that permit explicit specification of cognitive relationships such as association (sign in Table 1), generalization/specialization (generalization and individuation in Table 1), and aggregation (various hierarchical relationships in Table 1) have been found to be better for conceptual schema design than traditional models such as relational, hierarchical, and network models (Hull et al., 1987). For example, in a study of the literature comparing the usability of various conceptual data models (traditional and semantic), Batra et al. (1994)found that semantic models such as the Entity–Relationship Model (Chen, 1976) and its derivatives were best suited for supporting conceptual database design. Further, Jarvenpaa et al. (1989)found that end-users were able to express relationships better using semantic models. Navathe (1992)identified five characteristics that a good conceptual model must possess: Expressiveness, Simplicity, Minimality, Formality, and Unique Interpretation. The key characteristic distinguishing semantic models from traditional models is the expressiveness of the relationship constructs supported by them (Burt et al., 1990). This expressiveness allows designers to create abstractions of real-world information by mapping that information into basic human concepts (Tsichritzis et al., 1982), Thus, a semantic model can better capture the user's perception of data relevant to an application (as defined by the requirements) (Navathe, 1992).

As noted, most semantic models allow explicit specification of association, generalization/specialization, and aggregation relationships. However, a review of Table 1 shows that these represent only a subset of the relationships that are likely to be used by people in describing an application environment. Hence, while semantic models may have higher image fidelity than traditional data models, i.e., schemas created using semantic models may conform better to users' views of the world (Everest, 1986), they are still limited in the types of relationships available in them. Thus, using current semantic models for conceptual schema design may result in abstractions of application environments in which some important information is either not represented or is represented inappropriately.¹

One relationship type that is not supported in current semantic data models is causation. Causation is a fundamental aspect of cognition, and is the most common type of relationship revealed in studies of human reasoning (Curley et al., 1995; Schustack, 1988). For example, in an empirical study of managerial reasoning, two-thirds of the relationships expressed by subjects were causal in nature (Curley et al., 1995). Hence, causal relationships undoubtedly are part of users' representations of problem representations, and it is likely that such relationships will be found in requirements specifications.

Data modelers are most likely to encounter causal relationships in the form of business rules (McFadden et al., 1999) or conditional requirements statements. Such a rule or statement, though not representing causality in its purest form, is an informal use of causation; it provides a condition whose presence makes a critical difference to the occurrence of an outcome (Schustack, 1988). The importance of causal statements in requirements documents is likely to increase in the future, because embedding business rules in the form of triggers is becoming increasingly prevalent in commercial databases.

Although the pervasiveness of causation in problem solvers' representations has been empirically demonstrated (e.g., Curley et al., 1995; Tversky et al., 1980; Wilkin, 1996), none of the models used for database design provide sufficient means for capturing causal relationships (Hull et al., 1987). The inability to express these relationships is likely to lead to conceptual schemas that do not completely represent the requirements. The focus of this paper is on investigating the effects of the lack of constructs in semantic models for capturing causation on analysts' ability to express causal relationships mentioned in a requirements document.

Section snippets

Hypotheses

Two groups of subjects were sought for the study, one familiar with semantic data modeling techniques and one unfamiliar. The rationale for the two groups was as follows. Research has demonstrated that people organize information using causation under appropriate circumstances (Schustack, 1988). Hence, subjects unfamiliar with database modeling (the database-naive group) should use causal relationships as naturally appropriate in modeling an application environment. However, because current

Methodology

An experimental hypothesis testing methodology was used to investigate users' ability to represent causal relationships in an application scenario. Subjects were 78 students recruited from information systems classes at an eastern US university who received course credit for their participation. Subjects were categorized as either database-naive (“Naive”) or database-knowledgeable (“Knowledgeable”) for purposes of analysis. A brief questionnaire distributed after the experimental task was used

Results

As a check on whether the naive group was handicapped by the lack of formal training in modeling, we tested to see whether members of the two groups captured the essential entities in the model to the same extent. Six entities were identified by the researchers as critical to representing the content of the scenario. The number of entities expressed by each subject was tallied (to be counted, the entity had to be explicitly stated by the subject). The mean number of entities expressed by group

Conclusions and future research

Our objective in this paper has been to investigate the effects of the lack of causal constructs in semantic models on analysts' ability to express causal relationships in conceptual schemas. We reported the outcome of an experiment that examined the extent to which database-naive and database-knowledgeable people were able to model causal statements embedded in a short case. We found that the conceptual representations created by the naive subjects expressed causal relationships better than

V. Ramesh is an Assistant Professor in the Department of Accounting and Information Systems, Kelley School of Business at Indiana University. His research interests are in heterogeneous databases, database modeling, and group support systems. His papers have been published in ACM Transactions on Information Systems, IEEE Expert, Information Systems and other journals. He received his Ph.D. in Business Administration (MIS) from the University of Arizona. He also holds a M.S. in Computer Science

References (17)

D Batra et al.
Effects of data model and task characteristics on designer performance: a laboratory study
International Journal of Human-Computer Studies
(1994)
S Jarvenpaa et al.
Data analysis and learning: an experimental study of data modeling tools
International Journal of Man-Machine Studies
(1989)
Brockriede, W., Ehninger, D., 1960. Toulmin on argument: an interpretation and application. Quarterly Journal of...
G.J Browne et al.
Evoking information in probability assessment: knowledge maps and reasoning-based directed questions
Management Science
(1997)
Browne, G.J., Curley, S.P., 1998. Reasoning with category knowledge in probability forecasting: typicality and...
P.V Burt et al.
Information models and modeling techniques for information systems
Annual Review of Information Science and Technology
(1990)
P.P Chen
The entity-relationship model: toward a unified view of data
ACM Transactions on Database Systems.
(1976)
S.P Curley et al.
Arguments in the practical reasoning underlying constructed probability responses
Journal of Behavioral Decision Making
(1995)

There are more references available in the full text version of this article.

Cited by (10)

Improving information requirements determination: A cognitive perspective
2002, Information and Management
Requirements determination is a critical phase of information systems development, but much evidence suggests that the process can and should be improved. Because the bulk of requirements determination occurs early in the development of a system, improvements can yield significant benefits for the entire systems development process. This paper first discusses a three-stage descriptive model of the requirements determination process. Four classes of difficulties in determining systems requirements are then used to organize and describe particular problems that occur within each stage of the process, together with the cognitive and behavioral theories that underlie them. The paper then describes techniques that can address the problems and presents theoretical considerations that analysts can use in applying the techniques to improve requirements determination.
A research note on representing part-whole relations in conceptual modeling
2012, MIS Quarterly: Management Information Systems
Knowledge representation: A conceptual modeling approach
2012, Journal of Database Management
Conceptual modeling of events for active information systems
2008, Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications
Ontological foundations for active information systems
2007, International Journal of Intelligent Information Technologies
A contingency model for requirements development
2007, Journal of the Association for Information Systems

View all citing articles on Scopus

Glenn J. Browne received his Ph.D. in MIS and Decision Sciences from the University of Minnesota. His research interests include systems development, semantic modeling, and basic decision-making processes. His papers have appeared in Management Science and other journals.

View full text

Expressing casual relationships in conceptual database schemas

Abstract

Introduction

Section snippets

Hypotheses

Methodology

Results

Conclusions and future research

International Journal of Human-Computer Studies

International Journal of Man-Machine Studies

Evoking information in probability assessment: knowledge maps and reasoning-based directed questions

Management Science

Information models and modeling techniques for information systems

Annual Review of Information Science and Technology

The entity-relationship model: toward a unified view of data

ACM Transactions on Database Systems.

Arguments in the practical reasoning underlying constructed probability responses

Journal of Behavioral Decision Making