Abstract
Data curation activities in collaborative databases mandate that collaborators interact until they converge and agree on the content of their data. In a previous work, we presented a cloud-based collaborative database system that promotes and enables collaboration and data curation scenarios. Our system classifies different versions of a data item to either pending, approved, or rejected. The approval or rejection of a certain version is done by the database Principle Investigators (or PIs) based on its value. Our system also allows collaborators to view the status of each version and help PIs take decisions by providing feedback based on their experiments and/or opinions. Most importantly, our system provided mechanisms for history tracking of different versions to trace the modifications and approval/rejection done by both collaborators and PIs on different versions of a data item. We labeled our system as Update-Pending-Approval model (or UPA). In this paper, we describe a high level SQL query interface language for PIs and collaborators to interact with the UPA framework. We define a set of UPA keywords that are used as a part of the history tracking mechanism to select specific versions of a data item, and a set of UPA options that select specific versions based on possible future decisions of PIs. We implemented our query interface mechanism on top of Apache Phoenix, taking into consideration that the UPA system was implemented on top of Apache HBase. We test the performance of the UPA query language by executing several queries that contain different complexity levels and discuss their results.
Similar content being viewed by others
References
Fagin, R.: On an authorization mechanism. ACM Trans. Database Syst. 3(3), 310–319 (1978)
Griffiths, P.P., Wade, B.W.: An authorization mechanism for a relational database system. ACM TODS 1(3), 242–255 (1976)
Mershad, K., Malluhi, Q., Quzzani, M, Tang, M., Aref, A.: Approving updates in collaborative databases. In: Proceedings of the 3rd IEEE International Conference on Cloud Engineering. IC2E 15, March (2015)
Mershad, K., Malluhi, Q., Quzzani, M, Tang, M., Gribskov, M., Aref, A.: AUDIT: Approving and Tracking Updates with Dependencies in Collaborative Databases. Distributed and Parallel Databases. Springer, Berlin. https://doi.org/10.1007/s10619-017-7208-y (in press)
Apache HBase. [Online]. https://hbase.apache.org/
Rose, E., Segev, A.: TooSQL-a temporal object-oriented query language. In: Entity-Relationship Approach ER’93, pp. 122–136. Springer, Berlin (1994)
Snodgrass, R.: An overview of the temporal query language TQuel. University of Arizona, Department of Computer Science (1992)
Snodgrass, R.T.: The TSQL2 Temporal Query Language, vol. 330. Springer Science & Business Media, New York (2012)
Jensen, C.S., Lomet, D.B.: Transaction timestamping in (temporal) databases. In: VLDB, 2001, pp. 441–450
Lomet, D., Barga, R., Mokbel, M.F., Shegalov, G., Wang, R., Zhu, Y.: Immortal DB: transaction time support for SQL server. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. ACM, 2005, pp. 939–941 (2005 )
Lomet, D., Hong, M., Nehme, R., Zhang, R.: Transaction time indexing with version compression. Proc. VLDB Endow. 1(1), 870–881 (2008)
Abdessalem, T., Jomier, G.: VQL: a query language for multiversion databases. In: Database Programming Languages. Springer, Berlin, pp. 160–179 (1998)
Proll, S., Rauber, A.: Scalable data citation in dynamic, large databases: model and reference implementation. In: Big Data, 2013 IEEE International Conference on. IEEE, pp. 307–312 (2013)
Meimaris, M., Papastefanatos, G., Viglas, S., Stavrakas, Y., Pateritsas, C.: A query language for multi-version data web archives (2015). arXiv:1504.01891
SPARQL Query Language for RDF. [Online]. http://www.w3.org/TR/rdf-sparql-query/
Cypher—the Neo4j query Language. [Online]. http://www.neo4j.org/learn/cypher
He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Management of data, 2008 ACM SIGMOD International Conference on. ACM, pp. 405–418 (2008)
Hong, S., Chafi, H., Sedlar, E., Olukotun, K.: Green-Marl: a DSL for easy and efficient graph analysis. In: Proceedings of the 17\({{\rm th}}\) International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, pp. 349–362 (2012)
Jindal, A., Madden, S.: Graphiql: a graph intuitive query language for relational databases. In: Big Data, 2014 IEEE International Conference on IEEE, pp. 441–450 (2014)
Tinkerpop, Gremlin. [Online]. https://github.com/tinkerpop/gremlin/wiki
Apache hadoop. [Online]. http://hadoop.apache.org/
Hadoop distributed file system. [Online]. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
Apache hive. [Online]. https://hive.apache.org/
Cloudera impala. [Online]. http://impala.io/
Apache phoenix. [Online]. http://phoenix.apache.org/
Wikimedia downloads. [Online]. https://dumps.wikimedia.org/
How to calculate the record size of HBase? [Online]. http://prafull-blog.blogspot.com/2012/06/how-to-calculate-record-size-of-hbase.html/
Author information
Authors and Affiliations
Corresponding author
Additional information
This publication was made possible by the support of an NPRP Grant 4-1534-1-247 from the the Qatar National Research Fund (a member of Qatar Foundation) and the National Science Foundation under Grants IIS-1117766 and IIS-0964639. The statements made herein are solely the responsibility of the authors.
Rights and permissions
About this article
Cite this article
Mershad, K., Malluhi, Q.M., Ouzzani, M. et al. COACT: a query interface language for collaborative databases. Distrib Parallel Databases 36, 121–151 (2018). https://doi.org/10.1007/s10619-017-7213-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-017-7213-1