Skip to main content
Log in

ViStA: a visualization system for exploring Arabic text

  • Special Issue Article
  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Text visualization has become a significant tool that facilitates knowledge discovery and insightful presentation of large amounts of data. This paper presents a visualization system for exploring Arabic text called ViStA. We report about the design, the implementation and some of the experiments we conducted on the system. The development of such tools assists Arabic language analysts to effectively explore, understand, and discover interesting knowledge hidden in text data. We used statistical techniques from the field of Information Retrieval to identify the relevant documents coupled with sophisticated natural language processing (NLP) tools to process the text. For text visualization, the system used a hybrid approach combining latent semantic indexing for feature selection and multidimensional scaling for dimensionality reduction. Initial results confirm the viability of using this approach to tackle the problem of Arabic text visualization and other Arabic NLP applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Alencar, A. B., de Oliveira, M. C. F., & Paulovich, F. V. (2012). Seeing beyond reading: A survey on visual text analytics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 476–492.

    Google Scholar 

  • Ando, R. K., Boguraev B. K., Byrd R. J., & Neff M. S. (2000). Multi-document summarization by visualizing topical content. In ANLP-NAACL 2000 workshop on automatic summarization proceedings (Vol. 4, pp. 79–88). Seattle, WA: ACL.

  • Banchs, R. E. (2009). Semantic mapping for related term identification. In A. Gelbukh (Ed.), CICLing 2009 (pp. 111–124)., LNCS 5449 Berlin: Springer.

    Google Scholar 

  • Chen, C. (1997). Structuring and visualizing the WWW by generalized similarity analysis. In Proceedings of ACM conference on hypertext, Southampton, UK (pp. 177–186).

  • Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling (2nd ed.). Boca Raton: CRC Press.

    MATH  Google Scholar 

  • Cunningham, H. (1999). Information extraction, automatic. Accessed March 1, 2015, from https://gate.ac.uk/sale/ell2/ie/main.pdf.

  • Deerwester, S., Dumais, S., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.

    Article  Google Scholar 

  • Di Nunzio, G. M. (2006). Visualization and classification of documents: A new probabilistic model to automated text classification. TCDL Bulletin, 2(2), np.

  • Dukes, K., Atwell, E., & Habash, N. (2013). Supervised collaboration for syntactic annotation of Quranic Arabic. Language Resources and Evaluation Journal (LREJ). Special issue on Collaboratively Constructed Language Resources, 47(1), 33–62.

    Google Scholar 

  • Fortuna, B., Grobelnik, M., & Mladenic, D. (2005). Visualization of text document corpus. Informatica, 29, 497–502.

    Google Scholar 

  • Gan, Q., Zhu, M., Li, M., Liang, T., Cao, Y., & Zhou, B. (2014). Document visualization: An overview of current research. Computational Statistics, 6(1), 19–36.

    Article  Google Scholar 

  • Hammo, B. (2009). Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents. Information Retrieval, 12(3), 300–323.

    Article  Google Scholar 

  • Hammo, B., Abuleil, S., Lytinen, S., & Evens, M. (2004). Experimenting with a question answering system for the Arabic language. Computers and the Humanities, 38(4), 397–415.

    Article  Google Scholar 

  • Hammo, B., Abu-Salem, H., & Lytinen, S. (2002). QARAB: A question answering system to support the Arabic language. In Proceedings of the ACL-02 workshop on computational approaches to Semitic languages (pp. 1–11).

  • Hammo, B., Al-Shargi, F., Yagi, S., & Obeid, N. (2013). Developing tools for Arabic corpus for researchers. In Proceedings of the second workshop on Arabic corpus linguistics (WACL-2), Lancaster University, UK, np.

  • Hammo, B., Moubaiddin, A., Obeid, N., & Tuffaha, A. (2014). Formal description of Arabic syntactic structure in the framework of the government and binding theory. Computación y Sistemas, 18(3), 611–625.

    Google Scholar 

  • Hammo, B., Yagi, S., & Ismail, O. (2015). HAC: Arabic historical corpus. Accessed March 1, 2015, from http://nlp.ju.edu.jo.

  • Hetzler, B., Harris, W. M., Havre, S., & Whitney, P. (1998). Visualizing the full spectrum of document relationships. In Proceedings of 5th Int. ISKO conference (pp. 168–175), Wurzburg.

  • Karabeg, A., & Akkøk, N. (2005). Visual representations and the Web. In R. Griffin, S. Chandler, & B. D. Cowden (Eds.) Visual literacy and development: An African experience. Proceedings of the International Visual Literacy Association (pp. 115–123), October 10–11, Newport, Rhode Island.

  • Khoja, S., & Garside, R. (1999). Stemming Arabic text. Lancaster: Computing Department, Lancaster University.

    Google Scholar 

  • Kontostathis, A., Galitsky, L., Pottenger, W. M., Roy, S., & Phelps, D. J. (2003). A survey of emerging trend detection in textual data mining. In M. W. Berry (Ed.), Survey of text mining (pp. 185–224). Heidelberg: Springer.

    Google Scholar 

  • Kruskal, J. B., & Wish, M. (1977). Multidimensional Scaling. Thousand Oaks, CA: Sage Publications.

    Google Scholar 

  • Leskovec, J., Grobelnik, M., & Milic-Frayling N. (2004). Learning sub-structures of document semantic graphs for document summarization. In Proceedings of the workshop on link analysis and group detection, Seattle, USA, np.

  • Merlo, P., Henderson, J., Schneider, G., & Wehrli, E. (2003). Learning document similarity using natural language processing. Linguistik Online, 17(5), 99–115.

    Google Scholar 

  • Monroy, C., Kochumman, R., Furuta, R., & Urbina, E. (2002). Interactive timeline viewer (ItLv): A tool to visualize variants among documents. In Proceedings of the second international workshop on visual interfaces to digital libraries. ACM-IEEE joint conference on digital libraries proceedings (pp. 39–49), Houston, TX. Berlin: Springer.

  • Nelson, F. W., & Kuĉera, H. (1982). Frequency analysis of English Usage: Lexicon and grammar. Boston: Houghton Mifflin.

    Google Scholar 

  • Normaly, K. I., & Tengku, M. S. (2012). 2D text visualization for the retrieval of Malay. In Proceedings of the 6th WSEAS international conference on computer engineering (pp. 116–121).

  • Obeid, N., I. Huzayyen, I., & Hammo, B. (2013). Experimenting with Arabic text visualizing. In Proceedings of 1st international conference on communications, signal processing, and their applications (ICCSPA’13), np.

  • Rodgers, P., Gaizauskas, R., Humphreys, K., & Cunningham, H. (1997). Visual execution and data visualization in natural language processing. In Proceedings of the IEEE symposium on visual languages, Isle of Capri, Italy (pp. 342–347).

  • Saad, M., Langlois, D., & Smaïli, K. (2014). Cross-lingual semantic similarity measure for comparable articles. In Proceedings of the 9th international conference on natural language processingPolTAL 2014, Warsaw, Poland, 17–19 September.

  • Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24, 513–523.

    Article  Google Scholar 

  • Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.

    Article  MATH  Google Scholar 

  • Šilić, A., & Bašić, B. D. (2010). Visualization of text streams: A survey. In R. Setchi, I. Jordanov, R. Howlett, & L. Jain (Eds.), Knowledge-based and intelligent information and engineering systems (Vol. 6277, pp. 31–43)., LNCS Berlin: Springer.

    Chapter  Google Scholar 

  • Siroker, D., & Miller, S. (2003). Topical clustering, summarization, and visualization. Stanford University. Accessed March 1, 2015, from http://nlp.stanford.edu/courses/cs224n/2003/fp/millersj/cs224nfp.pdf.

  • Spangler, S., Kreulen J. T., & Lessler, J. (2002). MindMap: Utilizing multiple taxonomies and visualization to understand a document collection. In Proceedings of the 35th Hawaii international conference on system sciences (Vol. 4, pp. 1170–1179).

  • Tominski, C., & Aigner, W. (2014). The TimeViz browser. Accessed March 1, 2015, from http://survey.timeviz.net.

  • Toutanova, K., Klein, D., Manning, C., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of HLT-NAACL (pp. 252–259).

  • Wanner, F., Stoffel, A., Jäckle, D., Kwon, B. C. Weiler, A. & Keim, D. A. (2014). State-of-the-art report of visual analysis for event detection in text data streams. Accessed March 1, 2015, from http://bib.dbvis.de/uploadedFiles/3_submission.pdf.

  • Weber, W. (2007). Text visualization-what colors tell about a text. In Proceedings of the 11th international conference information visualization, Washington, DC (pp. 354–362).

  • Yang, Y. M. (1995). Noise reduction in a statistical approach to text categorization. In Proceedings of SIGIR-95, 18th ACM international conference on research and development in information retrieval (pp. 256–263).

  • Yang, Y., Zhang, J., & Carbonell, J. (2002). Topic conditioned novelty detection. In Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, AB, Canada (pp. 688–693).

  • Yusof, R. J. R., Zainuddin, R., Baba, M. S., & Yusoff, Z. M. (2009). T-test for visualizing frequently used Arabic words. Journal of Applied Sciences, 9, 988–992.

    Article  Google Scholar 

Download references

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bassam Hammo.

Additional information

This paper was written while the first author was on sabbatical leave from The University of Jordan to Princess Sumaya University of Technology (PSUT), Amman-Jordan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hammo, B., Obeid, N. & Huzayyen, I. ViStA: a visualization system for exploring Arabic text. Int J Speech Technol 19, 237–247 (2016). https://doi.org/10.1007/s10772-015-9286-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9286-4

Keywords

Navigation