Abstract
Text visualization has become a significant tool that facilitates knowledge discovery and insightful presentation of large amounts of data. This paper presents a visualization system for exploring Arabic text called ViStA. We report about the design, the implementation and some of the experiments we conducted on the system. The development of such tools assists Arabic language analysts to effectively explore, understand, and discover interesting knowledge hidden in text data. We used statistical techniques from the field of Information Retrieval to identify the relevant documents coupled with sophisticated natural language processing (NLP) tools to process the text. For text visualization, the system used a hybrid approach combining latent semantic indexing for feature selection and multidimensional scaling for dimensionality reduction. Initial results confirm the viability of using this approach to tackle the problem of Arabic text visualization and other Arabic NLP applications.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alencar, A. B., de Oliveira, M. C. F., & Paulovich, F. V. (2012). Seeing beyond reading: A survey on visual text analytics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 476–492.
Ando, R. K., Boguraev B. K., Byrd R. J., & Neff M. S. (2000). Multi-document summarization by visualizing topical content. In ANLP-NAACL 2000 workshop on automatic summarization proceedings (Vol. 4, pp. 79–88). Seattle, WA: ACL.
Banchs, R. E. (2009). Semantic mapping for related term identification. In A. Gelbukh (Ed.), CICLing 2009 (pp. 111–124)., LNCS 5449 Berlin: Springer.
Chen, C. (1997). Structuring and visualizing the WWW by generalized similarity analysis. In Proceedings of ACM conference on hypertext, Southampton, UK (pp. 177–186).
Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling (2nd ed.). Boca Raton: CRC Press.
Cunningham, H. (1999). Information extraction, automatic. Accessed March 1, 2015, from https://gate.ac.uk/sale/ell2/ie/main.pdf.
Deerwester, S., Dumais, S., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
Di Nunzio, G. M. (2006). Visualization and classification of documents: A new probabilistic model to automated text classification. TCDL Bulletin, 2(2), np.
Dukes, K., Atwell, E., & Habash, N. (2013). Supervised collaboration for syntactic annotation of Quranic Arabic. Language Resources and Evaluation Journal (LREJ). Special issue on Collaboratively Constructed Language Resources, 47(1), 33–62.
Fortuna, B., Grobelnik, M., & Mladenic, D. (2005). Visualization of text document corpus. Informatica, 29, 497–502.
Gan, Q., Zhu, M., Li, M., Liang, T., Cao, Y., & Zhou, B. (2014). Document visualization: An overview of current research. Computational Statistics, 6(1), 19–36.
Hammo, B. (2009). Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents. Information Retrieval, 12(3), 300–323.
Hammo, B., Abuleil, S., Lytinen, S., & Evens, M. (2004). Experimenting with a question answering system for the Arabic language. Computers and the Humanities, 38(4), 397–415.
Hammo, B., Abu-Salem, H., & Lytinen, S. (2002). QARAB: A question answering system to support the Arabic language. In Proceedings of the ACL-02 workshop on computational approaches to Semitic languages (pp. 1–11).
Hammo, B., Al-Shargi, F., Yagi, S., & Obeid, N. (2013). Developing tools for Arabic corpus for researchers. In Proceedings of the second workshop on Arabic corpus linguistics (WACL-2), Lancaster University, UK, np.
Hammo, B., Moubaiddin, A., Obeid, N., & Tuffaha, A. (2014). Formal description of Arabic syntactic structure in the framework of the government and binding theory. Computación y Sistemas, 18(3), 611–625.
Hammo, B., Yagi, S., & Ismail, O. (2015). HAC: Arabic historical corpus. Accessed March 1, 2015, from http://nlp.ju.edu.jo.
Hetzler, B., Harris, W. M., Havre, S., & Whitney, P. (1998). Visualizing the full spectrum of document relationships. In Proceedings of 5th Int. ISKO conference (pp. 168–175), Wurzburg.
Karabeg, A., & Akkøk, N. (2005). Visual representations and the Web. In R. Griffin, S. Chandler, & B. D. Cowden (Eds.) Visual literacy and development: An African experience. Proceedings of the International Visual Literacy Association (pp. 115–123), October 10–11, Newport, Rhode Island.
Khoja, S., & Garside, R. (1999). Stemming Arabic text. Lancaster: Computing Department, Lancaster University.
Kontostathis, A., Galitsky, L., Pottenger, W. M., Roy, S., & Phelps, D. J. (2003). A survey of emerging trend detection in textual data mining. In M. W. Berry (Ed.), Survey of text mining (pp. 185–224). Heidelberg: Springer.
Kruskal, J. B., & Wish, M. (1977). Multidimensional Scaling. Thousand Oaks, CA: Sage Publications.
Leskovec, J., Grobelnik, M., & Milic-Frayling N. (2004). Learning sub-structures of document semantic graphs for document summarization. In Proceedings of the workshop on link analysis and group detection, Seattle, USA, np.
Merlo, P., Henderson, J., Schneider, G., & Wehrli, E. (2003). Learning document similarity using natural language processing. Linguistik Online, 17(5), 99–115.
Monroy, C., Kochumman, R., Furuta, R., & Urbina, E. (2002). Interactive timeline viewer (ItLv): A tool to visualize variants among documents. In Proceedings of the second international workshop on visual interfaces to digital libraries. ACM-IEEE joint conference on digital libraries proceedings (pp. 39–49), Houston, TX. Berlin: Springer.
Nelson, F. W., & Kuĉera, H. (1982). Frequency analysis of English Usage: Lexicon and grammar. Boston: Houghton Mifflin.
Normaly, K. I., & Tengku, M. S. (2012). 2D text visualization for the retrieval of Malay. In Proceedings of the 6th WSEAS international conference on computer engineering (pp. 116–121).
Obeid, N., I. Huzayyen, I., & Hammo, B. (2013). Experimenting with Arabic text visualizing. In Proceedings of 1st international conference on communications, signal processing, and their applications (ICCSPA’13), np.
Rodgers, P., Gaizauskas, R., Humphreys, K., & Cunningham, H. (1997). Visual execution and data visualization in natural language processing. In Proceedings of the IEEE symposium on visual languages, Isle of Capri, Italy (pp. 342–347).
Saad, M., Langlois, D., & Smaïli, K. (2014). Cross-lingual semantic similarity measure for comparable articles. In Proceedings of the 9th international conference on natural language processing—PolTAL 2014, Warsaw, Poland, 17–19 September.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24, 513–523.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Šilić, A., & Bašić, B. D. (2010). Visualization of text streams: A survey. In R. Setchi, I. Jordanov, R. Howlett, & L. Jain (Eds.), Knowledge-based and intelligent information and engineering systems (Vol. 6277, pp. 31–43)., LNCS Berlin: Springer.
Siroker, D., & Miller, S. (2003). Topical clustering, summarization, and visualization. Stanford University. Accessed March 1, 2015, from http://nlp.stanford.edu/courses/cs224n/2003/fp/millersj/cs224nfp.pdf.
Spangler, S., Kreulen J. T., & Lessler, J. (2002). MindMap: Utilizing multiple taxonomies and visualization to understand a document collection. In Proceedings of the 35th Hawaii international conference on system sciences (Vol. 4, pp. 1170–1179).
Tominski, C., & Aigner, W. (2014). The TimeViz browser. Accessed March 1, 2015, from http://survey.timeviz.net.
Toutanova, K., Klein, D., Manning, C., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of HLT-NAACL (pp. 252–259).
Wanner, F., Stoffel, A., Jäckle, D., Kwon, B. C. Weiler, A. & Keim, D. A. (2014). State-of-the-art report of visual analysis for event detection in text data streams. Accessed March 1, 2015, from http://bib.dbvis.de/uploadedFiles/3_submission.pdf.
Weber, W. (2007). Text visualization-what colors tell about a text. In Proceedings of the 11th international conference information visualization, Washington, DC (pp. 354–362).
Yang, Y. M. (1995). Noise reduction in a statistical approach to text categorization. In Proceedings of SIGIR-95, 18th ACM international conference on research and development in information retrieval (pp. 256–263).
Yang, Y., Zhang, J., & Carbonell, J. (2002). Topic conditioned novelty detection. In Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, AB, Canada (pp. 688–693).
Yusof, R. J. R., Zainuddin, R., Baba, M. S., & Yusoff, Z. M. (2009). T-test for visualizing frequently used Arabic words. Journal of Applied Sciences, 9, 988–992.
Conflict of interest
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper was written while the first author was on sabbatical leave from The University of Jordan to Princess Sumaya University of Technology (PSUT), Amman-Jordan.
Rights and permissions
About this article
Cite this article
Hammo, B., Obeid, N. & Huzayyen, I. ViStA: a visualization system for exploring Arabic text. Int J Speech Technol 19, 237–247 (2016). https://doi.org/10.1007/s10772-015-9286-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-015-9286-4