Abstract
This study aims to observe the researchers’ behavior in Iranian scientific databases to determine the research gaps and priorities in their field of research. Text mining and natural language processing techniques were used to identify what researchers are looking for and to analyze existing research works. In this paper, the information about the behavior of researchers who work in the field of environmental science and existing research works in the Iranian scientific database are processed. The search trends in all areas are evaluated by analyzing the users’ search data. The trend analysis indicates that in the period of February 2013 to July 2015, the growth of the researchers’ requests in some domains of the environment such as Industry, Training, Assessment, Material, Water and Pollution was 1.5 up to 2 times more than the overall requests. A Combination of the trend analysis and clustering of queries led to shaping four priority zones. Then, the research priorities for each environmental research area were determined. The results show that Training, Pollution, Rangeland, Management and Law are those domains in the environmental research which have the most research gaps in Iran, but there are enough research in Forest, Soil and Industry domains. At the end, we describe the steps for the implementation of a decision support system in environmental research management. Researchers, managers and policy makers can use this proposed “research demand and supply monitoring” system or RDSM to make appropriate decisions and allocate their resources more efficiently.
Similar content being viewed by others
Notes
Available in: http://thesauri.irandoc.ac.ir/.
The fraction of the records that are relevant to the query that are successfully retrieved.
The fraction of retrieved records that are relevant to the query.
An open source engine for full text search.
Boolean Operators are used to connect and define the relationship between search terms (e.g. +,−, &, not, etc.).
Popular but invaluable words (same as “the”, “in”, “as”… in English).
The Islamic theory or philosophy of law.
References
Abedinzadeh, N., Jamalzade Fallah, F., Pendashteh, A., Mokrem, R., Panahandeh, M., Moghadami, S., et al. (2013). Investigation of effectiveness of EMS establishment in improvement of environmental performance in industrial units accordance with Iso 14000 Standards. Rasht: SID. Retrieved from http://fa.projects.sid.ir/ViewPaper.aspx?ID=84034#.
Abrishamchi, A. (2013). Overview of key urban air pollution problems in Iran and its capital city, Tehran. In Section 3. Case Studies on Specific Urban Areas: Understanding the Roles of Key Economic, Geographic, and Urban Design Inputs in the Pollution Characterization or Mitigation Scenarios 87 (pp. 11–18). Irvine, California.
Akçapınar, G. (2015). How automated feedback through text mining changes plagiaristic behavior in online assignments. Computers and Education, 87, 123–130.
Anwar, T., & Abulaish, M. (2014). A social graph based text mining framework for chat log investigation. Digital Investigation, 11(4), 349–362.
Beth, B., & Deyrup, M. M. (2015). The SHU research logs: Student online search behaviors trans-scripted. The Journal of Academic Librarianship, 41(5), 593–601.
Bijalwan, V., Kumar, V., Kumari, P., & Jordan, P. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1), 61–70.
Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., & Song, A. (2015). Efficient agglomerative hierarchical clustering. Expert Systems with Applications, 42(5), 2785–2797.
Cavnar, W., & Trenkle, J. (1994). N-gram-based text categorization. Ann Arbor MI, 48113(2), 161–175.
Chen, L., Mao, K., Zheng, Y., Zhou, X., & Zhu, C. (2012). Research on mining association rules in university scientific projects management. Communications in Computer and Information Science, 345, 561–567.
Choubin, B., Khalighi-Sigaroodi, S., Malekian, A., Ahmad, S., & Attarod, P. (2014). Drought forecasting in a semi-arid watershed using climate signals: a neuro-fuzzy modeling approach. Journal of Mountain Science, 11(6), 1593–1605.
Choudhary, A., Oluikpe, P., Harding, J., & Carrillo, P. (2009). The needs and benefits of text mining applications on post-project reviews. Computers in Industry, 60(9), 728–740.
Claes, J., & Poels, G. (2014). Merging event logs for process mining: A rule based merging method and rule suggestion algorithm. Expert Systems with Applications, 41(16), 7291–7306.
Davide, B., Rosso, P., Gómez-Soriano, J., & Sanchis, E. (2010). Answering questions with an n-gram based passage retrieval engine. Journal of Intelligent Information Systems, 34(2), 113–134.
Delen, D., & Crossland, M. D. (2008). Seeding the survey and analysis of research literature with text mining. Expert Systems with Applications, 34(3), 1707–1720.
Erdmann, M., Ikeda, K., Ishizaki, H., Hattori, G., & Takishima, Y. (2014). Feature based sentiment analysis of tweets in multiple languages. In B. Benatallah, A. Bestavros, Y. Manolopoulos, A. Vakal, & Y. Zhang (Eds.), Web information systems engineering—WISE 2014 (pp. 109–124). Thessaloniki, Greece: Springer.
Faramarzi, M., Abbaspour, K. C., Schulin, R., & Yang, H. (2009). Modelling blue and green water resources availability in Iran. Hydrological Processes, 23(3), 486.
Fronza, I., Sillitti, A., Succi, G., Terho, M., & Vlasenko, J. (2013). Failure prediction based on log files using random indexing and support vector machines. Journal of Systems and Software, 86(1), 2–11.
Gadkari, N., Savio Raj, S., & Raka, H. (2015). Query subtopic mining from search log data. International Journal of Current Engineering and Technology, 5(3), 2058–2062.
Gorjian, S., & Ghobadian, B. (2015). Solar desalination: A sustainable solution to water crisis in Iran. Renewable and Sustainable Energy Reviews, 48, 571–584.
Gu, X., & Blackmore, K. (2016). Recent trends in academic journal growth. Scientometrics, 108(2), 693–716. doi:10.1007/s11192-016-1985-3.
Günes, E., & Radev, D. R. (2004). LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.
Gunton, T. (2002). Establishing environmental priorities for the 21st century: Results from an expert survey methodology. Environments, 30(1), 71–98.
Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concepts and techniques. Boston: The Morgan Kaufmann Series in Data Management Systems.
Hemmati, Z., & Shobeiri, S. M. (2016). Review the status of environmental education in Iran and comparison with other countries. Journal of Human and Environment, 14(2), 61–81.
Houškova, B., & Houška, M. (2011). Data, information and knowledge in agricultural decision-making. Agris on-line Papers in Economics and Informatics, 3(2), 74–82.
Hsin-Chang, Y., & Lee, C.-H. (2005). A text mining approach for automatic construction of hypertexts. Expert Systems with Applications, 29(4), 723–734.
IranDoc. (2016, 11 14). IrandDoc Thesauri. (IranDoc) Retrieved 11 14, 2016, from IranDoc: http://thesauri.irandoc.ac.ir/.
IranDoc. (2016, 11 14). IRANDOC Information Reposiory. Retrieved 11 14, 2016, from http://irandoc.ac.ir/db/databases-about.html.
IranDoc. (2016, 11 14). Iranian Scientific Repository. Retrieved from IRANDOC: http://ganj.irandoc.ac.ir/.
Jalalimanesh, A. (2012). Knowledge discovery in scientific databases using text mining and social network analysis. In Control, Systems and Industrial Informatics (ICCSII) (pp. 46–49). IEEE.
Julien, H., Pecoskie, J., & Reed, K. (2011). Trends in information behavior research, 1999–2008: A content analysis. Library and Information Science Research, 33(1), 19–24.
Kademani, B., Sagar, A., Surwase, G., & Bhanumurthy, K. (2013). Publication trends in materials science: A global perspective. Scientometrics, 94(3), 1275–1295.
Khosravi, M., & Jamali, H. R. (2014). Log analysis of the IRANDOC database and the analysis of its users’ information seeking behavior. Iranian Journal of Information Processing and Management, 29(4), 979–1006. Retrieved from http://jipm.irandoc.ac.ir/article-1-2444-fa.html.
Kim, M., & Chen, C. (2015). A scientometric review of emerging trends and new developments in recommendation systems. Scientometrics, 104(1), 239–263.
Kirkland, J. (2010). The management of university research. In P. A. Peterson (Ed.), International Encyclopedia of Education (Third Edition) (Third Edition ed., pp. 316–321). Oxford: Elsevier. doi:10.1016/B978-0-08-044894-7.00877-0.
Kolehmainen, M., Martikainen, H., Hiltunen, T., & Ruuskanen, J. (2011). Forecasting air quality parameters using hybrid neural network modelling. In International Conference on Urban Air Quality: Measurement, Modelling and Management. 65 (pp. 277–286). Madrid: Springer Science and Business Media.
Kouziokas, G. (2016). Technology-based management of environmental organizations using an Environmental Management Information System (EMIS): Design and development. Environmental Technology and Innovation, 5, 106–116.
Kusiak, A., Verma, A., & Wei, X. (2013). A data-mining approach to predict influent quality. Environmental Monitoring and Assessment, 185(3), 2197–2210.
Library of Congress Collections Policy Statements. (2016). Retrieved from LIBRARY OF CONGRESS: https://www.loc.gov/acq/devpol/environ.pdf.
Lin, H.-C., Hong, Y.-M., & Kan, Y.-C. (2012). The backend design of an environmental monitoring system upon real-time prediction of groundwater level fluctuation under the hillslope. Environmental Monitoring and Assessment, 184(1), 381–395.
Liu, K., Li, X., Shi, X., & Wang, S. (2008). Monitoring mangrove forest changes using remote sensing and GIS data with decision-tree learning. Wetlands, 28(2), 336–346.
Losiewicz, P., Oard, D., & Kostoff, R. (2000). Textual data mining to support science and technology management. Journal of Intelligent Information Systems, 15(2), 99–119.
Lu, G., & Eldin, N. (2014). Employers’ expectations: A probabilistic text mining model. Procedia Engineering, 85, 175–182.
Ma, R., & Ho, Y.-S. (2013). Comparison of environmental laws publications in science citation index expanded and social science index: A bibliometric analysis. Scientometrics, 109(1), 227–239. doi:10.1007/s11192-016-2010-6.
Marino, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A., & Costa-Jussà, M. R. (2006). N-gram-based machine translation. Computational Linguistics, 32(4), 527–549.
Mesdaghinia, A., Mahvi, A., Nasseri, S., Nodehi, R., & Hadi, M. (2015). A bibliometric analysis on the solid waste-related research from 1982 to 2013 in Iran. International Journal of Recycling of Organic Waste in Agriculture, 4(3), 185–195.
Munková, D., Munk, M., & Vozár, M. (2013). Data pre-processing evaluation for text mining: Transaction/sequence model. Procedia Computer Science, 18, 1198–1207.
Nadjla, H., & Sahar, M. (2014). Search strategies in nanotechnology databases: Log analysis. Iranian Journal of Information Processing and Management, 29(1), 233–252. Retrieved from http://jipm.irandoc.ac.ir/article-1-2192-fa.html.
Nicholas, D., Huntington, P., & Jamali, H. (2008). User diversity: As demonstrated by deep log analysis. The Electronic Library, 26(1), 21–38.
Noh, H., Jo, Y., & Lee, S. (2015). Keyword selection and processing strategy for applying text mining to patent analysis. Expert Systems with Applications, 42(9), 4348–4360.
Oberreuter, G., & Velásquez, J. (2013). Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style. Expert Systems with Applications, 40(9), 3756–3763.
Salehi, S. (2012). Environmental behavior and education. Journal of Education, 18(2), 201–226. Retrieved from http://education.scu.ac.ir/article_10133.html.
Salehi, S., & Pazoki Nejad, Z. (2013). Environment in higher education: The evaluation of environmental awareness in the mazandaran students. Educational Planning Studies, 2(4), 199–220. Retrieved from http://www.eps.journals.umz.ac.ir/?_action=articleInfo&article=764#.
Saneifar, H., Bonniol, S., Poncelet, P., & Roche, M. (2014). Enhancing passage retrieval in log files by query expansion based on explicit and pseudo relevance feedback. Computers in Industry, 65(6), 937–951.
Shachi, M., & Jaiswal, U. (2014). Resolving issues in parsing technique in machine translation from hindi language to english language. In International Conference on Computer and Communication Technology (ICCCT) (pp. 55–58). Allahabad: IEEE.
Shearer, C. (2000). The CRISP-DM Model: The new blueprint for data mining. Journal of Data Warehousing, 5(4), 13–22.
Souza, F. (2014). A data-based model to locate mass movements triggered by seismic events in Sichuan, China. Environmental Monitoring and Assessment, 186(1), 575–587.
Sphinx. (2001). Open Source Search Server. Retrieved 11 14, 2016, from Sphinx Search: http://sphinxsearch.com/docs/current/extended-syntax.html.
Sunikka, A., & Bragge, J. (2012). Applying text-mining to personalization and customization research literature—Who, what and where? Expert Systems with Applications, 39, 10049–10058.
Tsai, H.-H. (2011). Research trends analysis by comparing data mining and customer relationship management through bibliometric methodology. Scientometrics, 87(3), 425–450.
Tu, Y.-N., & Seng, J.-L. (2009). Research intelligence involving information retrieval—An example of conferences and journals. Expert Systems with Applications, 36(10), 12151–12166.
Wang, G., Zhang, X., Tang, S., Zheng, H., & Zhao, B. (2016). Unsupervised clickstream clustering for user behavior analysis. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 225–236). San Jose: ACM. doi:10.1145/2858036.2858107.
Acknowledgements
The authors gratefully acknowledge the support of the Iranian Research Institute for Information Science and Technology (IRANDOC). And specially acknowledge the help of IRANDOC’s R&D department.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Rabiei, M., Hosseini-Motlagh, SM. & Haeri, A. Using text mining techniques for identifying research gaps and priorities: a case study of the environmental science in Iran. Scientometrics 110, 815–842 (2017). https://doi.org/10.1007/s11192-016-2195-8
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-016-2195-8