Skip to main content

Advertisement

Log in

Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Scientific literature contains a lot of meaningful objects such as Figures, Tables, Definitions, Algorithms, etc., which are called Knowledge Cells hereafter. An advanced academic search engine which could take advantage of Knowledge Cells and their various relationships to obtain more accurate search results is expected. Further, it’s expected to provide a fine-grained search regarding to Knowledge Cells for deep-level information discovery and exploration. Therefore, it is important to identify and extract the Knowledge Cells and their various relationships which are often intrinsic and implicit in articles. With the exponential growth of scientific publications, discovery and acquisition of such useful academic knowledge impose some practical challenges For example, existing algorithmic methods can hardly extend to handle diverse layouts of journals, nor to scale up to process massive documents. As crowdsourcing has become a powerful paradigm for large scale problem-solving especially for tasks that are difficult for computers but easy for human, we consider the problem of academic knowledge discovery and acquisition as a crowd-sourced database problem and show a hybrid framework to integrate the accuracy of crowdsourcing workers and the speed of automatic algorithms. In this paper, we introduce our current system implementation, a platform for academic knowledge discovery and acquisition (PANDA), as well as some interesting observations and promising future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://scholar.google.com/.

  2. http://dblp.uni-trier.de/.

  3. http://dl.acm.org/.

  4. http://arxiv.org/.

  5. https://en.wikipedia.org/wiki/Digital_curation.

  6. http://proquest.libguides.com/deepindexing.

  7. http://search.proquest.com.

  8. http://www.sciencedirect.com/science/search.

  9. http://citeseer.ist.psu.edu/.

  10. http://pdfbox.apache.org/.

  11. https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

  12. We extend the standard SQL statements to illustrate these examples. Tables like papers and cells can be either relational tables or non-relational data collections, and functions like “relations” and “contains” can be some built-in functions. It doesn’t affect the problem statement.

  13. https://s3-us-west-2.amazonaws.com/cropfigure/templates.html.

References

  1. Alewiwi, M., Orencik, C., Savaş, E.: Efficient top-k similarity document search utilizing distributed file systems and cosine similarity. Clust. Comput. 19(1), 109–126 (2016). doi:10.1007/s10586-015-0506-0

    Article  Google Scholar 

  2. Allahbakhsh, M., Benatallah, B., Ignjatovic, A.: Quality control in crowdsourcing systems. IEEE Internet Comput. 17, 76–81 (2013)

    Article  Google Scholar 

  3. Chen, J.J., Menezes, N.J., Bradley, A.D., North, T.: Opportunities for crowdsourcing research on amazon mechanical turk. Interfaces 5(3) (2011)

  4. Dai, P., Lin, C.H., Weld, D.S., et al.: Pomdp-based control of workflows for crowdsourcing. Artif. Intell. 202, 52–85 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  5. Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the world-wide web. Commun. ACM 54(4), 86–96 (2011)

    Article  Google Scholar 

  6. Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: CrowdDB:answering queries with crowdsourcing. In: SIGMOD, pp. 61–72 (2011)

  7. Gomes, C., Schneider, D., Moraes, K., de Souza, J.: Crowdsourcing for music: survey and taxonomy. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 832–839 (2012)

  8. Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006)

    Google Scholar 

  9. Hofeld, T., Tran-Gia, P., Vucovic, M.: Crowdsourcing: from theory to practice and long-term perspectives (dagstuhl seminar 13361). Dagstuhl Rep. 3(9), 1–33 (2013)

    Google Scholar 

  10. Hu, J., Liu, Y.: Analysis of documents born digital. In: Handbook of Document Image Processing and Recognition, pp. 775–804. Springer London (2014). doi:10.1007/978-0-85729-859-1_26

  11. Huang, F., Li, J., Lu, J., Ling, T.W., Dong, Z.: Pandasearch: a fine-grained academic search engine for research documents. In: ICDE 2015 (2015)

  12. Hung, N.Q.V., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: Web Information Systems Engineering–WISE 2013, pp. 1–15. Springer, New York (2013)

  13. Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10, pp. 64–67. ACM, New York, NY (2010). doi:10.1145/1837885.1837906

  14. Joglekar, M., Garcia-Molina, H., Parameswaran, A.: Evaluating the crowd with confidence. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 686–694 (2013)

  15. Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in large-scale crowdsourcing. In: AAMAS, pp. 467–474 (2012)

  16. Kittur, A., Nickerson, J.V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., Lease, M., Horton, J.: The future of crowd work. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1301–1318 (2013)

  17. Klampfl, S., Granitzer, M., Jack, K., Kern, R.: Unsupervised document structure analysis of digital scientific articles. Int. J. Digit. Libr. 14(3), 83–99 (2014)

    Article  Google Scholar 

  18. Kondreddi, S.K., Triantafillou, P., Weikum, G.: Combining information extraction and human computing for crowdsourced knowledge acquisition. In: ICDE, pp. 988–999 (2014)

  19. Kulkarni, A.: The complexity of crowdsourcing: Theoretical problems in human computation. In: CHI Workshop on Crowdsourcing and Human Computation (2011)

  20. Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., Duin, R.P.W.: Limits on the majority vote accuracy in classifier fusion. Pattern Anal. Appl. 6(1), 22–31 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  21. Li, P., yang Yu, X., Liu, Y., ting Zhang, T.: Crowdsourcing fraud detection algorithm based on Ebbinghaus forgetting curve. Int. J. Secur. Appl. 8(1), 283 (2014)

    Google Scholar 

  22. Lofi, C., Maarry, K.E.: Design patterns for hybrid algorithmic-crowdsourcing workflows. In: CBI, pp. 1–8 (2014)

  23. Luz, N., Silva, N., Novais, P.: Generating human-computer micro-task workflows from domain ontologies. In: Human-Computer Interaction. Theories, Methods, and Tools, pp. 98–109. Springer, New York(2014)

  24. Luz, N., Silva, N., Novais, P.: A survey of task-oriented crowdsourcing. Artif. Intell. Rev. (2014). doi:10.1007/s10462-014-9423-5

  25. Mozafari, B., Sarkar, P., Franklin, M.J., Jordan, M.I., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. Proc. VLDB Endow. PVLDB 8(2), 125–136 (2014)

    Article  Google Scholar 

  26. Panos, I., Little, G., Malone, T.W.: Composing and analyzing crowdsourcing workflows. Collective Intelligence pp. 1–3 (2014)

  27. Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1403–1412 (2011)

  28. Rzeszotarski, J., Kittur, A.: Crowdscape: interactively visualizing user behavior and output. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 55–62 (2012)

  29. Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceeding of the LREC (2014)

  30. Saxton, G.D., Oh, O., Kishore, R.: Rules of crowdsourcing: models, issues, and systems of control. Inf. Syst. Manag. 30(1), 2–20 (2013)

    Article  Google Scholar 

  31. Swaraj, K.P., Manjula, D.: A fast approach to identify trending articles in hot topics from xml based big bibliographic datasets. Clust. Comput. 19(2), 837–848 (2016). doi:10.1007/s10586-016-0561-1

    Article  Google Scholar 

  32. Wang, G., Wang, T., Zheng, H., Zhao, B.Y.: Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In: 23rd USENIX Security Symposium, USENIX Association, CA, pp. 239–254 (2014)

  33. Wu, J., Williams, K., Chen, H., Khabsa, M., Caragea, C., Ororbia, A., Jordan, D., Giles, C.L.: Citeseerx: AI in a digital library search engine. In: AAAI, pp. 2930–2937 (2014)

  34. Yin, X., Liu, W., Wang, Y., Yang, C., Lu, L.: What? how? where? a survey of crowdsourcing. In: Frontier and Future Development of Information Technology in Medicine and Education, Lecture Notes in Electrical Engineering, vol. 269, chap. 22, pp. 221–232. Springer, Netherlands (2014). doi:10.1007/978-94-007-7618-0_22

  35. Zhao, Y., Zhu, Q.: Evaluation on crowdsourcing research: current status and future direction. Inf. Syst. Front. 1–18 (2014)

Download references

Acknowledgements

This study was funded by the National Natural Science Foundation of China (Grant No.61472427) and the Research Funds of Renmin University of China (Grant No.11XNJ003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaheng Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, Z., Lu, J., Ling, T.W. et al. Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition. Cluster Comput 20, 3629–3641 (2017). https://doi.org/10.1007/s10586-017-1089-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1089-8

Keywords

Navigation