Abstract
Molecular Dynamics (MD) simulation is often used to study properties of various chemical interactions in domains such as drug development when executing real experimental studies are costly and/or unsafe. Studying trajectories generated from MD simulations provides detailed atomic level location data of every atom in the experiment. The analysis of this data leads to an atomic and molecular level understanding of interactions among the constituents of the system-of-interest, however, the data is extremely large and poses formidable storage and processing challenges in the analyses and querying of associated atom level motion trajectories. We take a first step towards applying domain-specific generalization techniques for trajectory compression algorithms towards reducing the storage requirements and speeding up the processing of within-distance queries over MD simulation data. We demonstrate that this generalization-aware compression, when applied to the dataset used in this case study yields significant efficiency improvements, without sacrificing the effectiveness of within-distance queries for threshold-based detection of molecular events of interest, such as the formation of hydrogen-bonds (H-Bonds).
Research was partly supported by the Eppley Foundation for Research.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aminpour, M., Montemagno, C., Tuszynski, J.A.: An overview of molecular modeling for drug discovery with specific illustrative e.g’s of apps. Molecules 24, 1693 (2019)
Barequet, G., Chen, D.Z., Daescu, O., Goodrich, M.T., Snoeyink, J.: Efficiently approx. polygonal paths in 3+ dimensions. Algorithmica 33, 150–167 (2002)
Bibelayi, D.D., Lundemba, A.S., Tsalu, P.V., Kilunga, P.I., Tshishimbi, J.M., Yav, Z.G.: Hydrogen bonds of C=S, C=Se and C=Te with C-H in small-organic molecule compounds derived from the Cambridge structural database (CSD) (2021)
Cao, H., Wolfson, O., Trajcevski, G.: Spatio-temporal data reduction with deterministic error bounds. VLDB J. 15(3), 211–228 (2006)
Chan, W.S., Chin, F.: Approximation of polygonal curves with minimum number of line segments or minimum error. Int. J. Comput. Geom. Appl. 6, 59–77 (1996)
Chiarot, G., Silvestri, C.: Time series compression: a survey (2021). https://doi.org/10.48550/ARXIV.2101.08784
Douglas, D.H., Peucker, T.K.: Algos for the reduction of the no. of points required to represent a digitized line or its caricature. Cartographica 10, 112–122 (1973)
Guerrero-Corella, A., Fraile, A., Alemán, J.: Intramolecular HB activation: strategies, benefits, and influence in catalysis. ACS Organic & Inorganic Au (2022)
Hagita, K., et al.: Efficient compressed database of equilibrated configurations of ring-linear polymer blends for md simulations. Sci. Data 9, 1–9 (2022)
Jeffrey, G.: An Introduction to Hydrogen Bonding. Oxford University Press, Oxford (1997)
Knight, K.J.: Pharma chemistry. Pharm. J. 282, 105–128 (2021)
Kostal, J.: Computational chemistry in predictive toxicology: status quo et quo vadis? In: Advances in Molecular Toxicology, vol. 10 (2016)
Mcree, D.E.: Comp techniques. Practical Protein Crystallography (1999)
Muckell, J., Olsen, P.W., Hwang, J.H., Lawson, C.T., Ravi, S.S.: Compression of trajectory data: a comprehensive evaluation and new approach. GeoInformatica 18, 435–460 (2013)
Pauling, L.: The Nature of the Chemical Bond, an Introduction to Modern Structural Chemistry, 3 edn. Cornell University Press, Ithaca (1960)
Sandu Popa, I., Zeitouni, K., Oria, V., Kharrat, A.: Spatio-temporal compression of trajectories in road networks. GeoInformatica 19(1), 117–145 (2014). https://doi.org/10.1007/s10707-014-0208-4
Saalfeld, A.: Topologically consistent line simplification with the Douglas-Peucker algorithm. Cartogr. Geogr. Inf. Sci. 26(1), 7–18 (1999)
Sayood, K.: Intro to Data Compression. Morgan Kaufmann Publisher, Burlington (2017)
Singh, A.K., Aggarwal, V., Saxena, P., Prakash, O.: Performance analysis of trajectory compression algorithms on marine surveillance data. In: ICACCI 2017 (2017)
Steiniger, S.: Enabling pattern-aware automated map generalization (2007)
Trajcevski, G.: Compression of spatio-temporal data (tutorial). In: IEEE International Conference on Mobile Data Management (MDM) (2016)
Wang, X., et al.: Md sims of the chiral recognition mechanism for a polysaccharide chiral stationary phase in enantiomeric chromatographic separations. Mol. Phys. 117(23–24), 3569–3588 (2019)
Wang, X., Jameson, C.J., Murad, S.: Modeling enantiomeric separations as an interfacial process using amylose tris (3, 5-dimethylphenyl carbamate) (ADMPC) polymers coated on amorphous silica. Langmuir 36, 1113–1124 (2020)
Weibel, R.: Generalization of spatial data: principles and selected algorithms. In: van Kreveld, M., Nievergelt, J., Roos, T., Widmayer, P. (eds.) CISM School 1996. LNCS, vol. 1340, pp. 99–152. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63818-0_5
Wibowo, E.S., Park, B.D.: Two-dimensional nuclear magnetic resonance analysis of hydrogen-bond formation in thermosetting crystalline urea-formaldehyde resins at a low molar ratio. ACS Appl. Polym. Mater. 4(2), 1084–1094 (2022)
Zhang, D., Ding, M., Yang, D., Liu, Y., Fan, J., Shen, H.T.: Trajectory simplification. Proc. VLDB Endow. 11, 934–946 (2018)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Anowar, M.H. et al. (2022). Generalization Aware Compression of Molecular Trajectories. In: Chiusano, S., Cerquitelli, T., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2022. Lecture Notes in Computer Science, vol 13389. Springer, Cham. https://doi.org/10.1007/978-3-031-15740-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-15740-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15739-4
Online ISBN: 978-3-031-15740-0
eBook Packages: Computer ScienceComputer Science (R0)