ViStA: a visualization system for exploring Arabic text

Hammo, Bassam; Obeid, Nadim; Huzayyen, Israa

doi:10.1007/s10772-015-9286-4

ViStA: a visualization system for exploring Arabic text

Special Issue Article
Published: 09 June 2015

Volume 19, pages 237–247, (2016)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Bassam Hammo¹,
Nadim Obeid¹ &
Israa Huzayyen¹

338 Accesses
Explore all metrics

Abstract

Text visualization has become a significant tool that facilitates knowledge discovery and insightful presentation of large amounts of data. This paper presents a visualization system for exploring Arabic text called ViStA. We report about the design, the implementation and some of the experiments we conducted on the system. The development of such tools assists Arabic language analysts to effectively explore, understand, and discover interesting knowledge hidden in text data. We used statistical techniques from the field of Information Retrieval to identify the relevant documents coupled with sophisticated natural language processing (NLP) tools to process the text. For text visualization, the system used a hybrid approach combining latent semantic indexing for feature selection and multidimensional scaling for dimensionality reduction. Initial results confirm the viability of using this approach to tackle the problem of Arabic text visualization and other Arabic NLP applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic similarity based approach for reducing Arabic texts dimensionality

Article 09 June 2015

Graph-Based Text Modeling: Considering Mathematical Semantic Linking to Improve the Indexation of Arabic Documents

Recognition of the Logical Structure of Arabic Newspaper Pages

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Alencar, A. B., de Oliveira, M. C. F., & Paulovich, F. V. (2012). Seeing beyond reading: A survey on visual text analytics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 476–492.
Google Scholar
Ando, R. K., Boguraev B. K., Byrd R. J., & Neff M. S. (2000). Multi-document summarization by visualizing topical content. In ANLP-NAACL 2000 workshop on automatic summarization proceedings (Vol. 4, pp. 79–88). Seattle, WA: ACL.
Banchs, R. E. (2009). Semantic mapping for related term identification. In A. Gelbukh (Ed.), CICLing 2009 (pp. 111–124)., LNCS 5449 Berlin: Springer.
Google Scholar
Chen, C. (1997). Structuring and visualizing the WWW by generalized similarity analysis. In Proceedings of ACM conference on hypertext, Southampton, UK (pp. 177–186).
Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling (2nd ed.). Boca Raton: CRC Press.
MATH Google Scholar
Cunningham, H. (1999). Information extraction, automatic. Accessed March 1, 2015, from https://gate.ac.uk/sale/ell2/ie/main.pdf.
Deerwester, S., Dumais, S., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
Article Google Scholar
Di Nunzio, G. M. (2006). Visualization and classification of documents: A new probabilistic model to automated text classification. TCDL Bulletin, 2(2), np.
Dukes, K., Atwell, E., & Habash, N. (2013). Supervised collaboration for syntactic annotation of Quranic Arabic. Language Resources and Evaluation Journal (LREJ). Special issue on Collaboratively Constructed Language Resources, 47(1), 33–62.
Google Scholar
Fortuna, B., Grobelnik, M., & Mladenic, D. (2005). Visualization of text document corpus. Informatica, 29, 497–502.
Google Scholar
Gan, Q., Zhu, M., Li, M., Liang, T., Cao, Y., & Zhou, B. (2014). Document visualization: An overview of current research. Computational Statistics, 6(1), 19–36.
Article Google Scholar
Hammo, B. (2009). Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents. Information Retrieval, 12(3), 300–323.
Article Google Scholar
Hammo, B., Abuleil, S., Lytinen, S., & Evens, M. (2004). Experimenting with a question answering system for the Arabic language. Computers and the Humanities, 38(4), 397–415.
Article Google Scholar
Hammo, B., Abu-Salem, H., & Lytinen, S. (2002). QARAB: A question answering system to support the Arabic language. In Proceedings of the ACL-02 workshop on computational approaches to Semitic languages (pp. 1–11).
Hammo, B., Al-Shargi, F., Yagi, S., & Obeid, N. (2013). Developing tools for Arabic corpus for researchers. In Proceedings of the second workshop on Arabic corpus linguistics (WACL-2), Lancaster University, UK, np.
Hammo, B., Moubaiddin, A., Obeid, N., & Tuffaha, A. (2014). Formal description of Arabic syntactic structure in the framework of the government and binding theory. Computación y Sistemas, 18(3), 611–625.
Google Scholar
Hammo, B., Yagi, S., & Ismail, O. (2015). HAC: Arabic historical corpus. Accessed March 1, 2015, from http://nlp.ju.edu.jo.
Hetzler, B., Harris, W. M., Havre, S., & Whitney, P. (1998). Visualizing the full spectrum of document relationships. In Proceedings of 5th Int. ISKO conference (pp. 168–175), Wurzburg.
Karabeg, A., & Akkøk, N. (2005). Visual representations and the Web. In R. Griffin, S. Chandler, & B. D. Cowden (Eds.) Visual literacy and development: An African experience. Proceedings of the International Visual Literacy Association (pp. 115–123), October 10–11, Newport, Rhode Island.
Khoja, S., & Garside, R. (1999). Stemming Arabic text. Lancaster: Computing Department, Lancaster University.
Google Scholar
Kontostathis, A., Galitsky, L., Pottenger, W. M., Roy, S., & Phelps, D. J. (2003). A survey of emerging trend detection in textual data mining. In M. W. Berry (Ed.), Survey of text mining (pp. 185–224). Heidelberg: Springer.
Google Scholar
Kruskal, J. B., & Wish, M. (1977). Multidimensional Scaling. Thousand Oaks, CA: Sage Publications.
Google Scholar
Leskovec, J., Grobelnik, M., & Milic-Frayling N. (2004). Learning sub-structures of document semantic graphs for document summarization. In Proceedings of the workshop on link analysis and group detection, Seattle, USA, np.
Merlo, P., Henderson, J., Schneider, G., & Wehrli, E. (2003). Learning document similarity using natural language processing. Linguistik Online, 17(5), 99–115.
Google Scholar
Monroy, C., Kochumman, R., Furuta, R., & Urbina, E. (2002). Interactive timeline viewer (ItLv): A tool to visualize variants among documents. In Proceedings of the second international workshop on visual interfaces to digital libraries. ACM-IEEE joint conference on digital libraries proceedings (pp. 39–49), Houston, TX. Berlin: Springer.
Nelson, F. W., & Kuĉera, H. (1982). Frequency analysis of English Usage: Lexicon and grammar. Boston: Houghton Mifflin.
Google Scholar
Normaly, K. I., & Tengku, M. S. (2012). 2D text visualization for the retrieval of Malay. In Proceedings of the 6th WSEAS international conference on computer engineering (pp. 116–121).
Obeid, N., I. Huzayyen, I., & Hammo, B. (2013). Experimenting with Arabic text visualizing. In Proceedings of 1st international conference on communications, signal processing, and their applications (ICCSPA’13), np.
Rodgers, P., Gaizauskas, R., Humphreys, K., & Cunningham, H. (1997). Visual execution and data visualization in natural language processing. In Proceedings of the IEEE symposium on visual languages, Isle of Capri, Italy (pp. 342–347).
Saad, M., Langlois, D., & Smaïli, K. (2014). Cross-lingual semantic similarity measure for comparable articles. In Proceedings of the 9th international conference on natural language processing—PolTAL 2014, Warsaw, Poland, 17–19 September.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24, 513–523.
Article Google Scholar
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Article MATH Google Scholar
Šilić, A., & Bašić, B. D. (2010). Visualization of text streams: A survey. In R. Setchi, I. Jordanov, R. Howlett, & L. Jain (Eds.), Knowledge-based and intelligent information and engineering systems (Vol. 6277, pp. 31–43)., LNCS Berlin: Springer.
Chapter Google Scholar
Siroker, D., & Miller, S. (2003). Topical clustering, summarization, and visualization. Stanford University. Accessed March 1, 2015, from http://nlp.stanford.edu/courses/cs224n/2003/fp/millersj/cs224nfp.pdf.
Spangler, S., Kreulen J. T., & Lessler, J. (2002). MindMap: Utilizing multiple taxonomies and visualization to understand a document collection. In Proceedings of the 35th Hawaii international conference on system sciences (Vol. 4, pp. 1170–1179).
Tominski, C., & Aigner, W. (2014). The TimeViz browser. Accessed March 1, 2015, from http://survey.timeviz.net.
Toutanova, K., Klein, D., Manning, C., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of HLT-NAACL (pp. 252–259).
Wanner, F., Stoffel, A., Jäckle, D., Kwon, B. C. Weiler, A. & Keim, D. A. (2014). State-of-the-art report of visual analysis for event detection in text data streams. Accessed March 1, 2015, from http://bib.dbvis.de/uploadedFiles/3_submission.pdf.
Weber, W. (2007). Text visualization-what colors tell about a text. In Proceedings of the 11th international conference information visualization, Washington, DC (pp. 354–362).
Yang, Y. M. (1995). Noise reduction in a statistical approach to text categorization. In Proceedings of SIGIR-95, 18th ACM international conference on research and development in information retrieval (pp. 256–263).
Yang, Y., Zhang, J., & Carbonell, J. (2002). Topic conditioned novelty detection. In Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, AB, Canada (pp. 688–693).
Yusof, R. J. R., Zainuddin, R., Baba, M. S., & Yusoff, Z. M. (2009). T-test for visualizing frequently used Arabic words. Journal of Applied Sciences, 9, 988–992.
Article Google Scholar

Download references

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

King Abdullah II School for Informational Technology, The University of Jordan, Amman, Jordan
Bassam Hammo, Nadim Obeid & Israa Huzayyen

Authors

Bassam Hammo
View author publications
You can also search for this author in PubMed Google Scholar
Nadim Obeid
View author publications
You can also search for this author in PubMed Google Scholar
Israa Huzayyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bassam Hammo.

Additional information

This paper was written while the first author was on sabbatical leave from The University of Jordan to Princess Sumaya University of Technology (PSUT), Amman-Jordan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hammo, B., Obeid, N. & Huzayyen, I. ViStA: a visualization system for exploring Arabic text. Int J Speech Technol 19, 237–247 (2016). https://doi.org/10.1007/s10772-015-9286-4

Download citation

Received: 09 March 2015
Accepted: 28 May 2015
Published: 09 June 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10772-015-9286-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ViStA: a visualization system for exploring Arabic text

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic similarity based approach for reducing Arabic texts dimensionality

Graph-Based Text Modeling: Considering Mathematical Semantic Linking to Improve the Indexation of Arabic Documents

Recognition of the Logical Structure of Arabic Newspaper Pages

Explore related subjects

References

Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now