Abstract
The ability to extract the most relevant information from a dataset is paramount when the dataset is large. For data arising from a numeric domain, a pervasive means of modelling the data is to represent it in the form of vectors. This enables a range of geometric techniques; this paper introduces projection as a natural and powerful means of scoring the relevancy of vectors. As yet, there are no effective indexing techniques for quickly retrieving those vectors in a dataset that have large projections onto a query vector. We address that gap by introducing the first indexing algorithms for vectors of arbitrary dimension, producing indices with strong sub-linear and output-sensitive worst-case query cost and linear data structure size guarantees in the I/O cost model. We improve this query cost markedly for the special case of two dimensions. The derivation of these algorithms results from the novel geometric insight that is presented in this paper, the concept of a data vector’s cap.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agarwal, P.K., Arge, L., Erickson, J., Franciosa, P.G., Vitter, J.S.: Efficient searching with linear constraints. Journal of Computer and System Sciences 61, 194–216 (2000)
Arge, L., Vitter, J.S.: Optimal external memory interval management. SIAM Journal of Computing 32(6), 1488–1508 (2003)
Arya, S., Mount, D.M., Xia, J.: Tight lower bounds for halfspace range searching. In: Proceedings of the 26th Annual Symposium on Computational Geometry, pp. 29–37. ACM, New York (2010)
de Berg, M., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry: Algorithms and Applications, 3rd edn. Springer, Heidelberg (2008)
Chan, T.M.: Optimal partition trees. In: Proceedings of the 26th Annual Symposium on Computational Geometry. ACM, New York (2010)
Chang, Y.C., Bergman, L., Castelli, V., Li, C.S., Lo, M.L., Smith, J.R.: The onion technique: indexing for linear optimization queries. In: Proceedings of the 26th SIGMOD International Conference on Management of Data. ACM, New York (2000)
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Transactions on Database Systems 29, 319–362 (2004)
Matoušek, J.: Reporting points in halfspaces. Computational Geometry: Theory and Applications 2(3), 169–186 (1992)
Matoušek, J.: Geometric range searching. ACM Computing Surveys 26(4), 422–461 (1994)
Matoušek, J., Schwarzkopf, O.: Linear optimization queries. In: Proceedings of the 8th Annual Symposium on Computational Geometry. ACM, New York (1992)
Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: Proceedings of the 19th International Conference on Data Engineering, pp. 277–288. IEEE, Los Alamitos (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chester, S., Thomo, A., Venkatesh, S., Whitesides, S. (2011). Indexing for Vector Projections. In: Yu, J.X., Kim, M.H., Unland, R. (eds) Database Systems for Advanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20152-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-20152-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20151-6
Online ISBN: 978-3-642-20152-3
eBook Packages: Computer ScienceComputer Science (R0)