Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition

Dong, Zhaoan; Lu, Jiaheng; Ling, Tok Wang; Fan, Ju; Chen, Yueguo

doi:10.1007/s10586-017-1089-8

Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition

Published: 25 September 2017

Volume 20, pages 3629–3641, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Zhaoan Dong²,
Jiaheng Lu ORCID: orcid.org/0000-0003-2067-454X^1,2,
Tok Wang Ling³,
Ju Fan² &
…
Yueguo Chen²

548 Accesses
8 Citations
Explore all metrics

Abstract

Scientific literature contains a lot of meaningful objects such as Figures, Tables, Definitions, Algorithms, etc., which are called Knowledge Cells hereafter. An advanced academic search engine which could take advantage of Knowledge Cells and their various relationships to obtain more accurate search results is expected. Further, it’s expected to provide a fine-grained search regarding to Knowledge Cells for deep-level information discovery and exploration. Therefore, it is important to identify and extract the Knowledge Cells and their various relationships which are often intrinsic and implicit in articles. With the exponential growth of scientific publications, discovery and acquisition of such useful academic knowledge impose some practical challenges For example, existing algorithmic methods can hardly extend to handle diverse layouts of journals, nor to scale up to process massive documents. As crowdsourcing has become a powerful paradigm for large scale problem-solving especially for tasks that are difficult for computers but easy for human, we consider the problem of academic knowledge discovery and acquisition as a crowd-sourced database problem and show a hybrid framework to integrate the accuracy of crowdsourcing workers and the speed of automatic algorithms. In this paper, we introduce our current system implementation, a platform for academic knowledge discovery and acquisition (PANDA), as well as some interesting observations and promising future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge Graphs: Opportunities and Challenges

Article Open access 03 April 2023

Artificial intelligence to automate the systematic review of scientific literature

Article Open access 11 May 2023

Identifying interdisciplinary topics and their evolution based on BERTopic

Article 03 July 2023

Notes

http://scholar.google.com/.
http://dblp.uni-trier.de/.
http://dl.acm.org/.
http://arxiv.org/.
https://en.wikipedia.org/wiki/Digital_curation.
http://proquest.libguides.com/deepindexing.
http://search.proquest.com.
http://www.sciencedirect.com/science/search.
http://citeseer.ist.psu.edu/.
http://pdfbox.apache.org/.
https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
We extend the standard SQL statements to illustrate these examples. Tables like papers and cells can be either relational tables or non-relational data collections, and functions like “relations” and “contains” can be some built-in functions. It doesn’t affect the problem statement.
https://s3-us-west-2.amazonaws.com/cropfigure/templates.html.

References

Alewiwi, M., Orencik, C., Savaş, E.: Efficient top-k similarity document search utilizing distributed file systems and cosine similarity. Clust. Comput. 19(1), 109–126 (2016). doi:10.1007/s10586-015-0506-0
Article Google Scholar
Allahbakhsh, M., Benatallah, B., Ignjatovic, A.: Quality control in crowdsourcing systems. IEEE Internet Comput. 17, 76–81 (2013)
Article Google Scholar
Chen, J.J., Menezes, N.J., Bradley, A.D., North, T.: Opportunities for crowdsourcing research on amazon mechanical turk. Interfaces 5(3) (2011)
Dai, P., Lin, C.H., Weld, D.S., et al.: Pomdp-based control of workflows for crowdsourcing. Artif. Intell. 202, 52–85 (2013)
Article MATH MathSciNet Google Scholar
Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the world-wide web. Commun. ACM 54(4), 86–96 (2011)
Article Google Scholar
Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: CrowdDB:answering queries with crowdsourcing. In: SIGMOD, pp. 61–72 (2011)
Gomes, C., Schneider, D., Moraes, K., de Souza, J.: Crowdsourcing for music: survey and taxonomy. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 832–839 (2012)
Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006)
Google Scholar
Hofeld, T., Tran-Gia, P., Vucovic, M.: Crowdsourcing: from theory to practice and long-term perspectives (dagstuhl seminar 13361). Dagstuhl Rep. 3(9), 1–33 (2013)
Google Scholar
Hu, J., Liu, Y.: Analysis of documents born digital. In: Handbook of Document Image Processing and Recognition, pp. 775–804. Springer London (2014). doi:10.1007/978-0-85729-859-1_26
Huang, F., Li, J., Lu, J., Ling, T.W., Dong, Z.: Pandasearch: a fine-grained academic search engine for research documents. In: ICDE 2015 (2015)
Hung, N.Q.V., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: Web Information Systems Engineering–WISE 2013, pp. 1–15. Springer, New York (2013)
Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10, pp. 64–67. ACM, New York, NY (2010). doi:10.1145/1837885.1837906
Joglekar, M., Garcia-Molina, H., Parameswaran, A.: Evaluating the crowd with confidence. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 686–694 (2013)
Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in large-scale crowdsourcing. In: AAMAS, pp. 467–474 (2012)
Kittur, A., Nickerson, J.V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., Lease, M., Horton, J.: The future of crowd work. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1301–1318 (2013)
Klampfl, S., Granitzer, M., Jack, K., Kern, R.: Unsupervised document structure analysis of digital scientific articles. Int. J. Digit. Libr. 14(3), 83–99 (2014)
Article Google Scholar
Kondreddi, S.K., Triantafillou, P., Weikum, G.: Combining information extraction and human computing for crowdsourced knowledge acquisition. In: ICDE, pp. 988–999 (2014)
Kulkarni, A.: The complexity of crowdsourcing: Theoretical problems in human computation. In: CHI Workshop on Crowdsourcing and Human Computation (2011)
Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., Duin, R.P.W.: Limits on the majority vote accuracy in classifier fusion. Pattern Anal. Appl. 6(1), 22–31 (2003)
Article MATH MathSciNet Google Scholar
Li, P., yang Yu, X., Liu, Y., ting Zhang, T.: Crowdsourcing fraud detection algorithm based on Ebbinghaus forgetting curve. Int. J. Secur. Appl. 8(1), 283 (2014)
Google Scholar
Lofi, C., Maarry, K.E.: Design patterns for hybrid algorithmic-crowdsourcing workflows. In: CBI, pp. 1–8 (2014)
Luz, N., Silva, N., Novais, P.: Generating human-computer micro-task workflows from domain ontologies. In: Human-Computer Interaction. Theories, Methods, and Tools, pp. 98–109. Springer, New York(2014)
Luz, N., Silva, N., Novais, P.: A survey of task-oriented crowdsourcing. Artif. Intell. Rev. (2014). doi:10.1007/s10462-014-9423-5
Mozafari, B., Sarkar, P., Franklin, M.J., Jordan, M.I., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. Proc. VLDB Endow. PVLDB 8(2), 125–136 (2014)
Article Google Scholar
Panos, I., Little, G., Malone, T.W.: Composing and analyzing crowdsourcing workflows. Collective Intelligence pp. 1–3 (2014)
Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1403–1412 (2011)
Rzeszotarski, J., Kittur, A.: Crowdscape: interactively visualizing user behavior and output. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 55–62 (2012)
Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceeding of the LREC (2014)
Saxton, G.D., Oh, O., Kishore, R.: Rules of crowdsourcing: models, issues, and systems of control. Inf. Syst. Manag. 30(1), 2–20 (2013)
Article Google Scholar
Swaraj, K.P., Manjula, D.: A fast approach to identify trending articles in hot topics from xml based big bibliographic datasets. Clust. Comput. 19(2), 837–848 (2016). doi:10.1007/s10586-016-0561-1
Article Google Scholar
Wang, G., Wang, T., Zheng, H., Zhao, B.Y.: Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In: 23rd USENIX Security Symposium, USENIX Association, CA, pp. 239–254 (2014)
Wu, J., Williams, K., Chen, H., Khabsa, M., Caragea, C., Ororbia, A., Jordan, D., Giles, C.L.: Citeseerx: AI in a digital library search engine. In: AAAI, pp. 2930–2937 (2014)
Yin, X., Liu, W., Wang, Y., Yang, C., Lu, L.: What? how? where? a survey of crowdsourcing. In: Frontier and Future Development of Information Technology in Medicine and Education, Lecture Notes in Electrical Engineering, vol. 269, chap. 22, pp. 221–232. Springer, Netherlands (2014). doi:10.1007/978-94-007-7618-0_22
Zhao, Y., Zhu, Q.: Evaluation on crowdsourcing research: current status and future direction. Inf. Syst. Front. 1–18 (2014)

Download references

Acknowledgements

This study was funded by the National Natural Science Foundation of China (Grant No.61472427) and the Research Funds of Renmin University of China (Grant No.11XNJ003).

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, Helsinki, Finland
Jiaheng Lu
DEKE, MOE and School of Information, Renmin University of China, Beijing, China
Zhaoan Dong, Jiaheng Lu, Ju Fan & Yueguo Chen
Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore
Tok Wang Ling

Authors

Zhaoan Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jiaheng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Tok Wang Ling
View author publications
You can also search for this author in PubMed Google Scholar
Ju Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yueguo Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiaheng Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, Z., Lu, J., Ling, T.W. et al. Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition. Cluster Comput 20, 3629–3641 (2017). https://doi.org/10.1007/s10586-017-1089-8

Download citation

Received: 12 July 2016
Revised: 16 May 2017
Accepted: 28 July 2017
Published: 25 September 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10586-017-1089-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Artificial intelligence to automate the systematic review of scientific literature

Identifying interdisciplinary topics and their evolution based on BERTopic

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Artificial intelligence to automate the systematic review of scientific literature

Identifying interdisciplinary topics and their evolution based on BERTopic

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation