Abstract
Scientific literature contains a lot of meaningful objects such as Figures, Tables, Definitions, Algorithms, etc., which are called Knowledge Cells hereafter. An advanced academic search engine which could take advantage of Knowledge Cells and their various relationships to obtain more accurate search results is expected. Further, it’s expected to provide a fine-grained search regarding to Knowledge Cells for deep-level information discovery and exploration. Therefore, it is important to identify and extract the Knowledge Cells and their various relationships which are often intrinsic and implicit in articles. With the exponential growth of scientific publications, discovery and acquisition of such useful academic knowledge impose some practical challenges For example, existing algorithmic methods can hardly extend to handle diverse layouts of journals, nor to scale up to process massive documents. As crowdsourcing has become a powerful paradigm for large scale problem-solving especially for tasks that are difficult for computers but easy for human, we consider the problem of academic knowledge discovery and acquisition as a crowd-sourced database problem and show a hybrid framework to integrate the accuracy of crowdsourcing workers and the speed of automatic algorithms. In this paper, we introduce our current system implementation, a platform for academic knowledge discovery and acquisition (PANDA), as well as some interesting observations and promising future directions.








Similar content being viewed by others
Notes
We extend the standard SQL statements to illustrate these examples. Tables like papers and cells can be either relational tables or non-relational data collections, and functions like “relations” and “contains” can be some built-in functions. It doesn’t affect the problem statement.
References
Alewiwi, M., Orencik, C., Savaş, E.: Efficient top-k similarity document search utilizing distributed file systems and cosine similarity. Clust. Comput. 19(1), 109–126 (2016). doi:10.1007/s10586-015-0506-0
Allahbakhsh, M., Benatallah, B., Ignjatovic, A.: Quality control in crowdsourcing systems. IEEE Internet Comput. 17, 76–81 (2013)
Chen, J.J., Menezes, N.J., Bradley, A.D., North, T.: Opportunities for crowdsourcing research on amazon mechanical turk. Interfaces 5(3) (2011)
Dai, P., Lin, C.H., Weld, D.S., et al.: Pomdp-based control of workflows for crowdsourcing. Artif. Intell. 202, 52–85 (2013)
Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the world-wide web. Commun. ACM 54(4), 86–96 (2011)
Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: CrowdDB:answering queries with crowdsourcing. In: SIGMOD, pp. 61–72 (2011)
Gomes, C., Schneider, D., Moraes, K., de Souza, J.: Crowdsourcing for music: survey and taxonomy. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 832–839 (2012)
Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006)
Hofeld, T., Tran-Gia, P., Vucovic, M.: Crowdsourcing: from theory to practice and long-term perspectives (dagstuhl seminar 13361). Dagstuhl Rep. 3(9), 1–33 (2013)
Hu, J., Liu, Y.: Analysis of documents born digital. In: Handbook of Document Image Processing and Recognition, pp. 775–804. Springer London (2014). doi:10.1007/978-0-85729-859-1_26
Huang, F., Li, J., Lu, J., Ling, T.W., Dong, Z.: Pandasearch: a fine-grained academic search engine for research documents. In: ICDE 2015 (2015)
Hung, N.Q.V., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: Web Information Systems Engineering–WISE 2013, pp. 1–15. Springer, New York (2013)
Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10, pp. 64–67. ACM, New York, NY (2010). doi:10.1145/1837885.1837906
Joglekar, M., Garcia-Molina, H., Parameswaran, A.: Evaluating the crowd with confidence. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 686–694 (2013)
Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in large-scale crowdsourcing. In: AAMAS, pp. 467–474 (2012)
Kittur, A., Nickerson, J.V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., Lease, M., Horton, J.: The future of crowd work. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1301–1318 (2013)
Klampfl, S., Granitzer, M., Jack, K., Kern, R.: Unsupervised document structure analysis of digital scientific articles. Int. J. Digit. Libr. 14(3), 83–99 (2014)
Kondreddi, S.K., Triantafillou, P., Weikum, G.: Combining information extraction and human computing for crowdsourced knowledge acquisition. In: ICDE, pp. 988–999 (2014)
Kulkarni, A.: The complexity of crowdsourcing: Theoretical problems in human computation. In: CHI Workshop on Crowdsourcing and Human Computation (2011)
Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., Duin, R.P.W.: Limits on the majority vote accuracy in classifier fusion. Pattern Anal. Appl. 6(1), 22–31 (2003)
Li, P., yang Yu, X., Liu, Y., ting Zhang, T.: Crowdsourcing fraud detection algorithm based on Ebbinghaus forgetting curve. Int. J. Secur. Appl. 8(1), 283 (2014)
Lofi, C., Maarry, K.E.: Design patterns for hybrid algorithmic-crowdsourcing workflows. In: CBI, pp. 1–8 (2014)
Luz, N., Silva, N., Novais, P.: Generating human-computer micro-task workflows from domain ontologies. In: Human-Computer Interaction. Theories, Methods, and Tools, pp. 98–109. Springer, New York(2014)
Luz, N., Silva, N., Novais, P.: A survey of task-oriented crowdsourcing. Artif. Intell. Rev. (2014). doi:10.1007/s10462-014-9423-5
Mozafari, B., Sarkar, P., Franklin, M.J., Jordan, M.I., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. Proc. VLDB Endow. PVLDB 8(2), 125–136 (2014)
Panos, I., Little, G., Malone, T.W.: Composing and analyzing crowdsourcing workflows. Collective Intelligence pp. 1–3 (2014)
Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1403–1412 (2011)
Rzeszotarski, J., Kittur, A.: Crowdscape: interactively visualizing user behavior and output. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 55–62 (2012)
Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceeding of the LREC (2014)
Saxton, G.D., Oh, O., Kishore, R.: Rules of crowdsourcing: models, issues, and systems of control. Inf. Syst. Manag. 30(1), 2–20 (2013)
Swaraj, K.P., Manjula, D.: A fast approach to identify trending articles in hot topics from xml based big bibliographic datasets. Clust. Comput. 19(2), 837–848 (2016). doi:10.1007/s10586-016-0561-1
Wang, G., Wang, T., Zheng, H., Zhao, B.Y.: Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In: 23rd USENIX Security Symposium, USENIX Association, CA, pp. 239–254 (2014)
Wu, J., Williams, K., Chen, H., Khabsa, M., Caragea, C., Ororbia, A., Jordan, D., Giles, C.L.: Citeseerx: AI in a digital library search engine. In: AAAI, pp. 2930–2937 (2014)
Yin, X., Liu, W., Wang, Y., Yang, C., Lu, L.: What? how? where? a survey of crowdsourcing. In: Frontier and Future Development of Information Technology in Medicine and Education, Lecture Notes in Electrical Engineering, vol. 269, chap. 22, pp. 221–232. Springer, Netherlands (2014). doi:10.1007/978-94-007-7618-0_22
Zhao, Y., Zhu, Q.: Evaluation on crowdsourcing research: current status and future direction. Inf. Syst. Front. 1–18 (2014)
Acknowledgements
This study was funded by the National Natural Science Foundation of China (Grant No.61472427) and the Research Funds of Renmin University of China (Grant No.11XNJ003).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dong, Z., Lu, J., Ling, T.W. et al. Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition. Cluster Comput 20, 3629–3641 (2017). https://doi.org/10.1007/s10586-017-1089-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1089-8