1 Introduction

Healthcare has always been striving to tackle the issue of prompt availability of information at the point of care to ensure the continuity of the patient treatments and related measures. Information sharing across the entire continuum of care is becoming a critical demand of today. In this regard, a lot of efforts are being made in the last few decades for the provision of standards for bringing interoperability in healthcare delivery. Health Level Seven (HL7) is one of the leading healthcare standards, addressing the exchange of information among disparate health care organizations (Health Level 7). The major causes for the inability to share the information are lack of standard format of healthcare information and lack of infrastructure to enable sharing of information (Wu and Hadzic 2008). HL7 provides a well defined infrastructure to support interoperable communication among the healthcare systems to take maximum benefit.

HL7 Version 3 (V3) adopts an Object Oriented approach using Unified Modeling Language (UML) principles. HL7 V3 strives to achieve semantic interoperability through Reference Information Model (RIM) (HL7 Reference Information Model 2003). HL7 RIM is the source of all the information subjects used in HL7 specification in the form of classes, attributes and relationships. The purpose of a reference model is to share consistent meaning beyond a local context. Healthcare organizations mostly store their data in the traditional relational databases. Relational databases store the information in the form of tables and its constituent fields. The data transfer between the systems in a meaningful manner is a critical aspect in the information sharing. In order to enable medical databases to easily exchange information, we need to make use of some standard formats. HL7 RIM is a comprehensive UML model representing concepts in healthcare domain through classes containing their attributes and connected with associations. Therefore, there exist either one-to-one or one-to-many correspondences of fields mapping between clinical data model and RIM model.

Achieving interoperability through HL7 RIM and database mapping is an appealing but a complex problem. RIM is the foundation of healthcare interoperability and covers a vast domain. There are problems with the RIM documentation which is not only unclear but is also poorly integrated with those other parts of the HL7 documentation (Smith and Ceuster 2006). Manually tracing out relevant classes and attributes requires lot of efforts in understanding the RIM concepts. In addition, identifying the most appropriate match in the local clinical database is a mammoth task. The major challenges in RIM and clinical databases are as follows:

  1. 1.

    RIM encompassing a vast domain is therefore multifarious and complex.

  2. 2.

    Databases are highly heterogeneous with respect to the data models, the data schemas, the query languages they support, and the terminologies they recognize (Sujansky 2001).

  3. 3.

    Different databases having same real world semantics may have different schema structures.

  4. 4.

    There is currently no tool available to automate the RIM and database schema mapping process.

In this paper, we are addressing the issues of aligning clinical relational databases with HL7 RIM to achieve interoperability. The manual process of identifying RIM and schema mappings is quite tedious, time consuming, error prone and expensive process. In order to reduce the manual efforts we propose a technique to perform mappings in a semi-automatic manner and generate custom code to enable the transactions with no development efforts and manual intervention. The proposed approach exploits database catalog information and metadata such as element names, data types and structural properties in the mapping process. In summary, the proposed technique:

  1. 1.

    Loads the schema of the clinical database into the application.

  2. 2.

    Then it automatically detects the mappings between clinical databases and the RIM by applying different mapping strategies. A mapping knowledge repository is established after analyzing several existing healthcare databases and their mappings with HL7 RIM. Ultimate target of this repository is to further improve accuracy of the mapping process.

  3. 3.

    The user interface allows the user to validate or edit/re-map the correspondences.

  4. 4.

    Once all the mappings are defined, the application generates Mapping Specification, containing all the mapping information i.e. database tables and fields with associated RIM classes and attributes.

  5. 5.

    In order to enable the transactions, the application is facilitated with the autonomous code generation from the Mapping Specification. The Code Generator component focuses primarily on generating custom classes and hibernate mapping files against the runtime system to store/retrieve the data from the source database.

RIM and database schema mappings would be helpful in sharing and exchange of the information among different healthcare systems and making real time decisions. It minimizes the user intervention, saves time and reduces errors. This tool will help in bridging the existing relational databases with the HL7 V3 messaging making it easy to integrate with the existing infrastructure of the organization. Rest of the paper is structured as follows: In the next section we have presented an analysis of existing healthcare databases and the issues involved in their mapping with HL7 RIM. Section 3 explains our approach for autonomous mapping generation. Related work is discussed in Section 4 and the experimental results are shown in Section 5. And finally, section 6 elucidates the conclusion and the future work.

2 Analysis of healthcare databases

As a testament to our understanding of the healthcare databases and to deliver complete solutions, we have analyzed a couple of functioning laboratory databases. We have used a renowned clinical laboratory as a case study, having several branches and collection points at different cities in Pakistan. It has laboratory information system at each branch/collection point to manage and process patient tests data but has a great problem of transferring patient tests’ orders and tests’ results data from one branch/collection point to the branch where the tests are performed. Currently, they are using mailing service for collecting tests orders and specimens and phone call for dictating tests results.

2.1 RIM and schema mapping—a case study

The database used as a case study runs in a Windows environment and uses the Microsoft SQL relational database management system (ver. 2000). The data model of the lab underlying the case is information centered on the concept of a “test order” and “test results”. The database comprises of about eighty tables; some of the tables are locally managed and we have no concern to consider for mapping to RIM. Almost 25 tables are needed to map to RIM that are necessary to support HL7 messaging of test orders and test results between different collection points and the central unit of Lab.

2.1.1 Placer order

When a patient visits Lab for placing a test order, it stores certain information regarding the patient personal demography, test and specimen. When we map this information with the HL7 RIM, we find in PlacerOrder RMIM for observation orders (Laboratory Domain 2008) under Lab Domain. The corresponding classes in RMIM include Person, ObservationRequest, and Specimen etc. In following section, we are providing mapping details for some of the important parameters.

For PATIENT table, the corresponding classes in RMIM are Person (Entity) and PatientClinical (Role) . In the Person Class, we have an attribute “name”, having data type BAG. We can map the Patient’s Name to the Person’s attribute name. For the column Sex, there is an attribute “administrativeGenderCode” with data type CE. CE is the coded data that consists of a coded value. For administrativeGenderCode, the values are F (Female), M (Male), UN (Un-differentiated). The same representation is being followed in Lab with exception of last value i.e. UN. The Address column is mapped to the “addr” attribute having data type BAG. Fax column is mapped to “telecom” having BAG datatype, Mobile and PhoneNo are also mapped to “telecom”. Since, telecom is a BAG; therefore, it can accumulate multiple column values. The rest of the attributes in Person class are not required as those concepts are not used in Lab.

There exist many-to-many relationship between HL7 concepts and database tables; in other words, HL7 concept may correspond to one or more database tables and vice-versa. For example, tblPatient is mapped to two classes i.e. Patient and Person. Further, attributes in different classes may have the same representation; i.e. we have ‘addr’ attribute in Person as well as in Patient Class but both have their own meanings and usage. ‘addr’ in Person Class means the postal/residential address whereas ‘addr’ in Patient class means address for the entity while in role (Laboratory Domain 2008).

Similarly, we have mapped other tables and their corresponding attributes like Test, Specimen etc., and are shown partially in Table 1.

Table 1 Partial view of mapping

2.1.2 Result event

In this section, we explain Result Event with respect to Lab test result processing. The ObservationEvent(Act) represents a related group of observations. Observations are actions performed in order to determine result value(s). Observation result values (Observation.value) include test results. We select the ObservationEvent from the ObservationEventChoice as it fulfills our requirement of carrying the actual result. A test may constitute of multiple test parameters and it can be accommodated with the facility of recursive relationship provided by RIM. For a particular test, the following participants are involved in lab:

A Data Enterer in Lab enters the data. In Laboratory RMIM, we relate it to dataEnterer participation. A doctor in the Lab verifies all the test results before delivering it to the concerned authority, either to a hospital or to a patient. In Lab RMIM, we relate it to verifier participant.

In Lab some tests are performed by Lab Technicians by using certain chemical reagents and some tests are performed by machines independently. In Lab RMIM, we relate both to author (participation). When a test is performed for a particular patient, we related the Patient Class (Role) and the ObservationEvent(Act) through recordTarget (participation). Majority tests require specimen. Specimen (participation) is used to identify the specimen on which the observations are made. In Lab, the specimens are kept in bottles with an Identification Number, which identifies the specimen. Container (Entity) in R_Specimen CMET is used for this purpose.

These conceptual mappings paved a way in understanding the real world database variations in healthcare and building the Mapping Knowledge Repository containing the healthcare related terms and variations.

3 Proposed architecture

The major components of the proposed architecture as depicted in Fig. 1, are discussed as follows: Based on our previous work (Umer et al. 2009; 2010) of mapping between relational database and HL7 RIM, we enhanced the proposed architecture by refining the existing components as well as adding new components and mapping strategies which are discussed below in detail.

Fig. 1
figure 1

RSM Architecture

3.1 Mapping knowledge repository

Mapping Knowledge Repository (MKR) is storage of the possible mappings found in the clinical laboratory databases. This is done by the analysis of functioning laboratory databases and is kept to be extended with more mapping knowledge by community involvement. In the following section, we elaborate the structure and the major functions of the knowledge repository.

  1. a.

    Structure of the Mapping Knowledge Repository

    MKR provides two essential functions, one is to store mapping knowledge and other is to accommodate various input/output functions to let users view, edit, and create new mappings. MKR is based on XML; partial structure is shown in Fig. 2.

    Fig. 2
    figure 2

    Structure of Repository

    We selected XML due to flexibility and extendibility; we need to extend the MKR during evolution. An XML repository retrieval system, could receive new documents containing new elements without breaking the system (XML database).

  2. b.

    Integrated Searching & Browsing Interface

    GUI is provided to browse and search the MKR. When the user wants to search any of the elements, then its complete information is displayed to the user. For instance, if the user wants to search specimen information, complete information regarding the specimen class and its corresponding attributes is displayed along with its description and data types. If the user is looking for any particular attribute then its description along with its parent class is displayed to the user. This provides better insight in sorting out the most appropriate attribute for a particular clinical database element. To provide ease to the user, we have made the following enrichments.

    • • Categorize the RIM Classes: RIM Classes are organized in a well-organized hierarchy so that similar information is kept together. This helps the user in finding the appropriate mappings.

    • • Provide user-level description: Besides RIM complexity, the description provided to the user is simple and clear. This will help the user in identifying correct mappings for the unmapped attributes. E.g. ‘telecom’ attribute is explained as “A telephone number (voice or fax), e-mail address, or other locator for a resource mediated by telecommunication equipment”. This will help the user that she/he can map any contact number/email to this attribute. Display to the user the appropriate choices for the mappings.

  3. c.

    Evolution of Mapping Knowledge Repository

    The evolution process of the MKR involves the community through a registration process. The registered users are allowed to update the MKR by adding new matching columns and the tables found in their clinical databases. Besides this, MKR automatically gets updated while the tool is performing mappings i.e. when the user edits or re-maps as explained in section 3.4. The evolution of the MKR will improve its matching results. The user can request for the updates made in the MKR.

    MKR is flourished with RIM content representation and navigation. As the system is being continuously refined, we plan to conduct tests and evaluations of the mappings in knowledge repository with real users’ participation. We expect that the user evaluation will provide valuable input for the enhancement of the MKR.

3.2 Schema loader

Schema Loader loads the data model of the target clinical database and the corresponding MIF as discussed earlier. This component extracts schema information regarding tables, columns, datatypes etc. Loading schema is a prerequisite step for mapping.

3.3 Mapping controller

This component is the core engine of the process and it exploits the database constraints at structural-level (Rahm and Bernstein 2001). For every column in the database, it searches the most appropriate match through applying pattern matching algorithms and binds it to the RMIM accordingly. We are using existing pattern matching algorithms along with certain modifications to address our needs. Our matching algorithms include the concept of n-gram matching (N gram) and Boyer–Moore Algorithm (Boyer Moore Algorithm). We have modified them according our matching criteria and allowed the user can select different matching strategies. For Example, in Partial Match, we used the concept of n-gram to find out a pattern as subset of the element found in the MKR. This component uses MKR for its processing as shown in Fig. 1.

  • Data Types Handler: Most of the RIM data types are complex and different from that of RDBMS supported data types like ED (Encoded Data), CS (Coded String) etc. When local clinical data types are to be mapped with HL7, semantics are carefully considered to be preserved which is the ultimate goal of HL7 v3 for interoperability. For example, “Patient Name” in local clinical databases, is normally stored as a single column with corresponding data type varchar. HL7 on the other side adds semantic to handle the complexity by defining the person name parts like given name, family name, prefix etc having BAG data type. Data Type Handler addresses the data types’ complexity by mapping the components of the name in the database to the defined parts in HL7 Name. Similarly, the other data types are handled using well-defined strategies.

  • Name Handler: This tries to find out the corresponding patterns of the table/ column name in the RMIM for appropriate mapping. Different pattern matching rules are defined against which we have attached confidence values. These rules show that how much accurate is the mapping rule. If the accuracy is high than we perform the mappings confidently. In case the confidence value is low then we confirm the matching from the user to improve its accuracy. The matching rules and confidence values are briefly explained as follows:

    • i. Exact Match Rule: We have assigned confidence value 1 for this rule. It finds the exact pattern in the MKR. For example, in the Knowledge Repository; we have mappings for patient name as name, pname, patname etc. During finding mappings, if an element is found in the database as pname or patname then this will be the exact match.

    • ii. Pattern Match Rule: This rule trace out pattern in the database element, i.e. the complete pattern is found in the element plus some other characters, and its confidence value is 0.75. For example, in the Knowledge Repository, we have mappings for patient name as name, pname, patname etc. Now while performing mappings, an element is found in the database as patientname, which contains the pattern ‘name’ in it plus patient.

    • iii. Synonyms Match Rule: This rule is about synonym match with the database element and its confidence value is 0.5. For example in the Knowledge Repository, we have mappings for test description as desc, description, comments etc. Now while performing mappings, an element is found in the database as summary then this is the synonym match.

    • iv. Sub-Pattern Match Rule: This rule further trace out a subset of the exact pattern of the stored mappings in the Knowledge Repository to the database element and its confidence value is 0.25. For example, in the Knowledge Repository we have mappings for patient name as name, pname, patname etc. Now while performing mappings, an element is found in the database as p_nam then this will be the sub-pattern match.

  • Locality Handler: The Principle of Locality refers to use the data element within relatively close storage locations (Principle of Locality). The location of the column in a particular table helps in the identification of the particular class in the RIM. The tool displays the corresponding attributes to the user with well-explanation so that it can help in identifying the mappings. This strategy is explained in detail by quoting an example of an attribute. For example, for a column named as age, tool cannot identify its corresponding attribute, then it looks into the Parent table i.e. Patient. This helps us in identifying all possible mappings for the said attribute in the Person Class. Finally, the user selects the best suited mapping for the element.

  • Regular Expressions: Regular expressions (regex) are also utilized carefully in order to get effective mapping results. Basically, a regex is a pattern describing a certain amount of text that makes them ideally suited for searching, text processing and data validation (RegularExpressions). Searching with regular expressions enables you to get results with just one search instead of many searches. Regex is a special text string for describing a search pattern e.g. Patient Name having different patterns like pname, pat_name, patient_name, etc. can be stated as one regular expression as

    • ▪ “ (p(atient|at)?(\_)?(\s)?name)”

      Similarly for different telecommunication identifiers commonly used, mobile, phone, cell phone, fax, etc we can write a generic pattern as

    • “tele|cell|mobile)(\_)?phone)|((fax|mobile|phone)(no|number|no.)?)”

      Similarly for different addresses commonly used, office address, residential address, home address etc we can write a generic pattern as

    • “Home|Office|residential)(\_)?Address)|((Address))”

3.4 Constraint handler

This component improves the mapping accuracy in situation when the confidence value of the mapping is low. If two or more mappings are identified for a single concept then ask the user to decide which mapping is most appropriate. We provide a list of possible mappings to the user and then the user is allowed to select one of them, which he/she finds the most appropriate. If any mapping is wrongly identified then the user can request to remap the particular element. Pattern Match Rule and Sub-Pattern Match Rule confirms before finalizing the mapping because they have low confidence values.

3.5 Validation process

Once the user is finished with mappings, the next step is to validate the mapping. During validation, the tool displays any missing mapping to the user. The user has the choice to remap or save the mappings; and finally validate the output mappings.

Mapping Engine performs the mappings as per the mapping strategies. The user can edit the mappings by selecting the mapped element and then request either to delete the mapping or re-map the element. On selecting the Delete option, the selected mapping is deleted and on selection of the “Re-Map” option, the tool deselects the mappings and search again for all possible best mappings. The user can then select the option which best suits the requirement. These new mappings specified by the user not only fulfill the user requirement, but also helps in updating the Knowledge Repository. The tool itself saves the new mappings suggested by the user during the mapping process, thus evolving the MKR.

3.6 Mapping specification file

Once the mappings are handled completely, the next step is to generate the Mapping Specification File. Mapping Specification File contains all the mappings generated during the mapping generation process. The structure of the Mapping Specification File is designed in xml with meaningful representation, partially shown in Fig. 3. This file looks similar to the MKR due to similarity in the concepts (HL7 RIM) but there is a big difference in their information and their purpose. It contains the mappings between HL7 RIM and the target relational database.

Fig. 3
figure 3

Partial Mapping File

3.7 Custom code generation

The identification of the correct correspondences among the HL7 RIM and database concepts is highly non-trivial process. The interpretation of the output mappings to enable the data exchange is also a cumbersome process. To address the data exchange, we have designed code generator component that will create queries for transactions into the database as shown in Fig. 4.

Fig. 4
figure 4

Code Generation

This component scans all the mappings from the Mapping Specification File generated as mentioned in step 1 as shown in Fig. 4. For all the tables found hibernate mappings files and custom classes are generated.

HL7 JavaSIG is an API for message generation and parsing (HL7 JavaSIG). The message is parsed into the corresponding custom class and stored into the database using hibernate mapping file. Hibernate is an open source framework that persists objects in relational database transparently, very well matured and adopted by a large developer community (Hibernate). The mapping files tell the Hibernate what table in the database it has to access, and what columns in that table it should use. The basic structure of a mapping file looks like as shown in Fig. 5:

Fig. 5
figure 5

Partial View of Hibernate Mapping File

Code generator is supporting to generate custom classes with different IDE project types to support hibernate working. These custom classes are like Java Bean Classes with certain additions.

4 RSM validation using formal specifications

Mapping Knowledge Repository holds the mapping knowledge and is described in formal methods as follows:

  • [HL7CLASS, RELATION, ATTRIBUTE, COLUMN]

  • HL7CLASS: Set of all classes of HL7

  • RELATION: Set of all possible tables intended to be mapped to a HL7CLASS

  • ATTRIBUTE: Set of all attributes of a HL7CLASS

  • COLUMN: Set of all possible columns intended to be mapped to an ATTRIBUTE

  • ATTRIBUTE ↠ HL7CLASS

figure a

where, known is the set of all the classes of HL7, and mapping is a function which, when applied to certain classes, gives the mappings associated with them.

figure b

Operation: Add a new mapping, and we describe it with a schema:

figure c

We expect that the set of classes known to the system will be augmented with the new class:

  • known’ = known ∪ {class?}

In fact we can prove this from the specification of AddMapping, using the invariants on the state before and after the operation:

figure d

Operation: Find a mapping, and we describe it with a schema:

figure e

Operation: Initial state of the system

figure f

Handling constraints: If the user attempts to add a class that already exists.

figure g

Similarly, we have formally specified other components of RSM and proved them.

5 Application scenario

In this section, the working process is elaborated with a complete running example. The target database contains information regarding the lab tests like Patient, Specimen, Test Order, Test Results etc. First of all, the target clinical database is loaded along with the corresponding RMIM. We will discuss a portion of the database-related to only PlacerOrder related concepts. The Patient Demographic related table is scanned sequentially and then passed to the Mapping Controller. Mapping Controller component runs different matching algorithms and tries to find out the matching terms in the MKR. For instance, let us take the first attribute of tblPatient table, i.e. Sex. Using the Perfect Match algorithm, it will find out the most appropriate map in the MKR. In Perfect Match, we are exploiting the parent table details as well as the column information. For Patient table, we have associated class Person in RIM and for Sex it is found as administrativeGenderCode. So here PerfectMatch will provide the maps to the Mapping Controller, which will check the confidence value first i.e. 1 and then display the results to the user. Similarly, it will attempt to find out remaining attributes of tblPatient in an incremental order, i.e. Title, Age, PhoneNo.,GuardianName, Fax, BloodGroup, PanelId, PointId, Address, Mobile, Remarks, PatientId, NIC, and run different matching algorithms using the Mapping Knowledge and provide to the user as shown in Table 2.

Table 2 Partial output mappings

Now the user is given the choice to edit these mappings. Once the user is done with it, then Mapping Specification File is generated as shown Fig. 6.

Fig. 6
figure 6

Output Mapping Specification File

Now these mappings are utilized to generate queries to enable data exchange to and from RIM to database. We are using hibernate framework to support our work with additional custom classes to enable the transaction flow. Code generator will scan the Mapping Specification File and generate custom classes and corresponding Hibernate Mapping File (.hbm). Once the classes are built along with the .hbm files, then we have tested the data exchange using JavaSIG API message generation successfully.

6 Related work

Anhoi Doan et al. (2003) work focuses on building data integration systems that provide a uniform query interface to the sources. They have proposed a multi-strategy learning approach to automatically find such mappings. The approach applies multiple learner modules, where each module exploits a different type of information either in the schemas of the sources or in their data. It then combines the predictions of the modules using a meta-learner. The experiments validate the utility of multi-strategy learning for data integration and show that LSD proposes semantic mappings with a high degree of accuracy. It is an effective framework strategy that could be used for any domain. It helped in polishing our efforts in making mapping strategies for RIM and clinical schema mappings.

caAdapter (caAdapter) is an open source tool set which aims to facilitate data mapping and transformation among various kinds of data sources. Using caAdapter, the user can perform object model to data model mapping through drag & drop facility. Classes are mapped to the corresponding tables, attributes are mapped to the corresponding columns and the object relationships are mapped to table relationships as per the requirements by the user. caAdapter provides graphical user interface for mapping without automation. Its mapping service requires human intervention for manually tracing out all attributes in RIM, which is laborious and time consuming task.

MS BizTalk Mapper helps in creating and editing maps to translate or to transform messages (MS Biztalk Server). These maps are used in orchestrations, and may also be used in send port message processing. On the other hand, our application focuses on a different work i.e. identifying matching correspondences between healthcare databases and HL7 RIM in a semi-automatic manner and then handling the database queries through code generation. Therefore, these two mechanisms are slightly different in their objectives and outputs.

Jason Lyman et al. (2003) focused on the ground work in mapping a relational data warehouse to HL7 RIM. Their major efforts primarily focused on mapping Clinical Document Record (CDR), a unique information resource at University of Virginia Health System. The most information centered on the concept of a “visit”, “case”, or “encounter”. They manually Identified CDR to RIM mappings for few clinical scenarios. This work is an important task in identification of the matching concepts of the clinical databases to the RIM. A shortcoming of the scheme is that it only addresses Static Mapping of RIM to CDR without any automation.

COMA++ is a widely used schema matching tool adopted in the database community, utilizes a composite approach to combine different match algorithms. COMA++ implements significant improvements and offers a comprehensive infrastructure to solve large real-world match problems (Aumueller et al. 2005). This tool has effective matching algorithms but lacks healthcare domain knowledge and also code generation.

7 System evaluation

We have shown a comparison with the related applications; caAdapter, COMA++ and our technique RIM and Schema Mapper (RSM) in Table 3.

Table 3 Feature matrix of mapping

7.1 Experimental results

We have evaluated RSM on three real-world domains. Our goals were to evaluate the matching accuracy of RSM, to measure the relative contributions of domain knowledge, and to examine the usefulness of the mapping and the code generation. We found that approximately 35–65% mappings are correctly identified, the remaining unidentified mappings are either not compatible with HL7 RMIM or contain details regarding functions of the organization like authentications, locations etc. RSM experiments and their results are appended in Table 4.

Table 4 RSM experimental results

7.2 Lab Schema 1 details

We have tested RSM on OpenEMR database, a free medical practice management, electronic medical records, prescription writing, and medical billing application (OpenEMR). We took the relevant tables related to laboratory domain and tested them on our application; Patient table is shown in Fig. 7. Among 26 columns, RSM calculated 21 mappings with RIM (12 perfect match, 5 as subset match, 4 as partial subset match). These mappings generated by the application are validated with the results of conceptual mappings we have in the analysis phase.

Fig. 7
figure 7

OpenEMR Patient Data

7.3 Lab Schema 2 details

RSM is also tested on a real world database that is running in a Windows environment and uses the Microsoft SQL relational database management system (ver. 2000). The data model underlying the Lab Schema 2 is information centered on the concept of a “test order” and “test results”, partially shown in Fig. 8.

Fig. 8
figure 8

Lab Schema 2

7.4 Lab Schema 3 details

Another real world database that is running in a Windows environment and uses MySQL relational database management system is tested. The data model underlying the Lab Schema 2 is information centered on the concept of a “test order” and “test results” as shown in Fig. 9.

Fig. 9
figure 9

Lab Schema 3

7.5 Comparison

Finally, RSM is compared with other tools to find out the mapping accuracy and other statistics. Lab Schema 1 was loaded into COMA++ along with corresponding RMIMs, which showed the results appended in Table 5. This shows that COMA++ has low precision and recall then RSM, thus our technique can provide better mapping results and serve effectively in making healthcare database interoperable.

Table 5 RSM vs COMA++

There is a significant tradeoff between time and precision when comparing an automated and manual mapping process. A similar manual mapping for the case study took us approximately 1 month to complete. In order to achieve the goal of making data interoperable it is necessary to choose a middle path, which balances both time and efficiency. It is simpler to perform a quick manual lookup of laboratory terms for a lesser number of unmatched terms through browsing the MKR and RMIMs.

8 Discussion

Interoperability is the key to achieve maximum benefits of time saving and accuracy for the healthcare industry. Alignment of data stored in database of a particular healthcare organization with HL7 V3 specification can help in achieving this objective. Our research contributions in making the Architecture Design, confining diverse requirements in a unified way will be very helpful in bridging the local healthcare databases with standard HL7 messaging. We provided conceptual mappings, rules designing, locality handler and code generation to handle the database transactions. We have built the solution of semi-automatic mappings, indigenously tested on real world data sets with fruitful results. All the mappings could not be found automatically due to following reasons:

  • Database heterogeneity

  • Some of the database fields are meant for local processing

  • Hard to remove small noise components from some elements

Due to representational heterogeneity in databases, we cannot build a global Knowledge Base in order to achieve 100% mappings. Once the knowledge is complete, the proposed tool can perform mappings to maximum level. Another way is that all of the future health/ medicine based databases should follow HL7 RIM model standards, so that the mapping process is easily handled.

These mappings are done once, and then afterwards, all the transactions are handled automatically. The code generation provides a run time environment to enable the database communication by utilizing the output mappings. Our application considerably reduces development time, effort and cost that are currently handled manually.

9 Conclusion and future work

The paper discusses the proposed architecture and the components involved in performing the mappings in a semi-automatic manner; and further utilizing these mappings to enable the transactions with no development overhead. It makes aggressive use of healthcare domain knowledge to guide the searching and the mapping process. The developed system is capable to map data of healthcare laboratory domain as a first step. However the architecture is kept modular and extensible to other domains e.g. Patient Administration, Billing and Accounting etc. New searching mechanisms and new mapping rules can be added easily. Finally, it helps users in interacting with the system to generate mappings more efficiently, and also facilitates the transactions with no programming overhead. Our experimental results show 35–65% accuracy on real laboratory systems, thus demonstrating the promise of the approach.

The proposed scheme is an effective step in bringing the clinical databases in compliance with RIM. This scheme will reduce the overhead of manually mapping thus results in saving of valuable time of users. The mapping strategies are intelligently handled to make the system robust. The evolution of the tool with the time will help in serving the community more effectively by providing maximum number of mappings.

The future work is to refine the rules to other domains like patient administration, accounting and billing etc. Due to modular approach, it is very easy to incorporate new rules into the application without any addition of new module/component. The approach will be performing analysis of new domains first, then make new rules and test it with running systems. Secondly, improving the interface of the application by making it more interactive user friendly like improving the mapping graphics, enabling the zooming of the mapping lines and showing mapping description. Thirdly, improve the Mapping Controller component by incorporating more mapping strategies like extending the rules using Data types and Data Contents. Most of the RIM data types are complex and different from that of RDBMS supported data types like ED (Encoded Data), CS (Coded String) etc. When local clinical data types are to be mapped with HL7, semantics are carefully considered to be preserved which is the ultimate goal of HL7 V3 for interoperability. Data Type Handler addresses the data types’ complexity by mapping the components of the name in the database to the defined parts in HL7 Name. Similarly, the other data types are handled using well-defined strategies.