NBLucene: Flexible and Efficient Open Source Search Engine

Zhang, Zhaohua; Ye, Benjun; Huang, Jiayi; Stones, Rebecca; Wang, Gang; Liu, Xiaoguang

doi:10.1007/978-3-319-39937-9_39

Zhaohua Zhang¹⁸,
Benjun Ye¹⁸,
Jiayi Huang¹⁸,
Rebecca Stones¹⁸,
Gang Wang¹⁸ &
…
Xiaoguang Liu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9658))

Included in the following conference series:

International Conference on Web-Age Information Management

1654 Accesses

Abstract

The most popular open source projects for text searching have been designed to support many features. These projects are well-written in Java for cross-platform using. But when conducting research, the execution efficiency of program should be more essential, which is a problem for applications written in Java. It is also difficult for Java to use parallel mechanisms in the modern computer system like SIMD and GPUs. To this end, we expand an open source text searching project written in C++ for research purpose.

Our approach is to define a flexible and efficient search engine architecture which consists of extensible application programming interfaces. We aim to provide a flexible architecture to enable researchers to readily implement and modify search engine algorithms and strategies. Moreover, we integrate one generic mathematical encoding library which can be used to compress inverted index. We also implement an integral framework for result summarization, including snippet generation and cache strategies. Experiment results show that the new architecture makes a significant improvement versus original work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Parrot: A Progressive Analysis System on Large Text Collections

Article Open access 22 October 2020

A Comparative Survey of Text Summarization Techniques

Article 02 December 2023

Efficient and Convenient SPARQL+Text Search: A Quick Survey

Notes

References

Anh, V.N., Moffat, A.: Index compression using fixed binary codewords. In: Proceedings of 15th Australasian Database Conference, vol. 27, pp. 61–67 (2004)
Google Scholar
Anh, V.N., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf. Retrieval 8, 151–166 (2005)
Article Google Scholar
Ao, N., Zhang, F., Wu, D., Stones, D.S., Wang, G., Liu, X., Liu, J., Lin, S.: Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proc. VLDB Endowment 4(8), 470–481 (2011)
Article Google Scholar
Bast, H., Celikik, M.: Efficient index-based snippet generation. ACM Trans. Inf. Syst. 32(2), 6 (2014)
Article Google Scholar
Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, Cambridge (2010)
MATH Google Scholar
Cutting, D., Pedersen, J.: Optimization for dynamic inverted index maintenance. In: Proceedings of 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 405–411 (1989)
Google Scholar
Dean, J.: Challenges in building large-scale information retrieval systems. In: Proceedings of 2nd ACM International Conference on Web Search and Web Data Mining, WSDM 2009, p. 1. ACM (2009)
Google Scholar
Ding, S., He, J., Yan, H., Suel, T.: Using graphics processors for high performance IR query processing. In: Proceedings of 18th International Conference on World Wide Web, pp. 421–430 (2009)
Google Scholar
Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Softw. Pract. Exp. 45, 1–29 (2015)
Article Google Scholar
Lemire, D., Boytsov, L., Kurz, N.: SIMD compression and the intersection of sorted integers. CoRR, abs/1401.6399 (2014)
Google Scholar
Middleton, C., Baeza-Yates, R.: A comparison of open source search engines. Technical report, Department of Technologies, Universitat Pompeu Fabra (2007)
Google Scholar
Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. J. Am. Soc. Inform. Sci. Technol. 27(3), 129–146 (1976)
Article Google Scholar
Schlegel, B., Willhalm, T., Lehner, W.: Fast sorted-set intersection using SIMD instructions. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, pp. 1–8 (2011)
Google Scholar
Stepanov, A.A., Gangolli, A.R., Rose, D.E., Ernst, R.J., Oberoi, P.S.: SIMD-based decoding of posting lists. In: Proceedings of 20th ACM International Conference on Information and Knowledge Management, pp. 317–326 (2011)
Google Scholar
Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: a language model-based search engine for complex queries. In: Proceedings of International Conference on Intelligent Analysis, vol. 2, pp. 2–6 (2005)
Google Scholar
Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast generation of result snippets in web search. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 127–134 (2007)
Google Scholar
Varadarajan, R., Hristidis, V.: A system for query-specific document summarization. In: Proceedings of 15th ACM International Conference on Information and Knowledge Management, pp. 622–631 (2006)
Google Scholar
Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier, A., Schaffner, J.: SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. Proc. VLDB Endowment 2, 385–394 (2009)
Article Google Scholar
Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proceedings of 18th International Conference on World Wide Web, pp. 401–410 (2009)
Google Scholar
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proceedings of 17th International Conference on World Wide Web, pp. 387–396 (2008)
Google Scholar
Zhang, X., Zhao, W.X., Shan, D., Yan, H.: Group-scheme: SIMD-based compression algorithms for web text data. In: Proceedings of 2013 IEEE International Conference on Big Data, pp. 525–530 (2013)
Google Scholar
Zobel, J., Williams, H., Scholer, F., Yiannis, J., Hein, S.: The Zettair Search Engine. Search Engine Group, RMIT University, Melbourne (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Control Engineering, Nankai University, Tianjin, China
Zhaohua Zhang, Benjun Ye, Jiayi Huang, Rebecca Stones, Gang Wang & Xiaoguang Liu

Authors

Zhaohua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Benjun Ye
View author publications
You can also search for this author in PubMed Google Scholar
Jiayi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Stones
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Gang Wang or Xiaoguang Liu .

Editor information

Editors and Affiliations

Peking University , Beijing, China
Bin Cui
The George Washington University, Washington, D.C., USA
Nan Zhang
Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
Jianliang Xu
University of Texas Rio Grande Valley, Edinburg, Texas, USA
Xiang Lian
Jiangxi University of Finance and Economics, Nanchang, Jiangxi, China
Dexi Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Ye, B., Huang, J., Stones, R., Wang, G., Liu, X. (2016). NBLucene: Flexible and Efficient Open Source Search Engine. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9658. Springer, Cham. https://doi.org/10.1007/978-3-319-39937-9_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-39937-9_39
Published: 28 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39936-2
Online ISBN: 978-3-319-39937-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics