TopicsRanksDC: Distance-Based Topic Ranking Applied on Two-Class Data

Yousef, Malik; Qundus, Jamal Al; Peikert, Silvio; Paschke, Adrian

doi:10.1007/978-3-030-59028-4_2

Malik Yousef^14,15,
Jamal Al Qundus¹⁶,
Silvio Peikert¹⁶ &
…
Adrian Paschke¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1285))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

396 Accesses
2 Citations

Abstract

In this paper, we introduce a novel approach named TopicsRanksDC for topics ranking based on the distance between two clusters that are generated by each topic. We assume that our data consists of text documents that are associated with two-classes. Our approach ranks each topic contained in these text documents by its significance for separating the two-classes. Firstly, the algorithm detects topics using Latent Dirichlet Allocation (LDA). The words defining each topic are represented as two clusters, where each one is associated with one of the classes. We compute four distance metrics, Single Linkage, Complete Linkage, Average Linkage and distance between the centroid. We compare the results of LDA topics and random topics. The results show that the rank for LDA topics is much higher than random topics. The results of TopicsRanksDC tool are promising for future work to enable search engines to suggest related topics.

This work has been partially supported by the “Wachstumskern Qurator – Corporate Smart Insights” project (03WKDA1F) funded by the German Federal Ministry of Education and Research (BMBF).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Combining Latent Dirichlet Allocation and K-Means for Documents Clustering: Effect of Probabilistic Based Distance Measures

Topic-Level Clustering on Web Resources

Topic Modeling for Unsupervised Concept Extraction and Document Ranking

Notes

1.
Joachims, T.: A Statistical Learning Model of Text Classification with Support Vector Machines. In: Proceedings of the Conference on Research and Development in Information Retrieval, SIGIR (2001).
2.
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006).
3.
https://archive.org/details/stackexchange.

References

Al Qundus, J., Peikert, S., Paschke, A.: AI supported topic modeling using KNIME-workflows. In: Conference on Digit Curation Technologies, Berlin, Germany (2020)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Wei, L., McCallum, A.: Pachinko: allocation DAG-structured mixture models of topic correlations. In: ACM International Conference Proceeding Series (2006)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999 (1999)
Google Scholar
Allahyari, M., Kochut, K.: Automatic topic labeling using ontology-based topic models. In: Proceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015 (2016)
Google Scholar
Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using DBpedia. In: WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search Data Mining (2013)
Google Scholar
AlSumait, L., Barbará, D., Gentle, J., Domeniconi, C.: Topic significance ranking of LDA generative models. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 67–82. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04180-8_22
Chapter Google Scholar
Song, Y., Pan, S., Liu, S., Zhou, M.X., Qian, W.: Topic and keyword re-ranking for LDA-based topic modeling. In: International Conference on Information and Knowledge Management Proceedings (2009)
Google Scholar
Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)
Google Scholar
Mehta, V., Caceres, R.S., Carter, K.M.: Evaluating topic quality using model clustering. In: IEEE SSCI 2014–2014 IEEE Symposium on Computational Intelligence and Data Mining, Proceedings (2015)
Google Scholar
Al Qundus, J., Paschke, A., Kumar, S., Gupta, S.: Calculating trust in domain analysis: theoretical trust model. Int. J. Inf. Manage. 48, 1–11 (2019)
Article Google Scholar
Qundus, J.A., Paschke, A.: Investigating the effect of attributes on user trust in social media. In: Elloumi, M., Granitzer, M., Hameurlain, A., Seifert, C., Stein, B., Tjoa, A.M., Wagner, R. (eds.) DEXA 2018. CCIS, vol. 903, pp. 278–288. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99133-7_23
Chapter Google Scholar
Al Qundus, J., Paschke, A., Gupta, S., Alzouby, A., Yousef, M.: Exploring the impact of short text complexity and structure on its quality in social media. J. Enterp. Inf. Manage. (2020)
Google Scholar
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., et al.: KNIME: the Konstanz information miner. SIGKDD Explor. 319–326 (2008)
Google Scholar
Xu, Q.-S., Liang, Y.-Z.: Monte Carlo cross validation. Chemom. Intell. Lab. Syst. 56, 1–11 (2001)
Article Google Scholar
Manevitz, L., Yousef, M.: One-class document classification via Neural Networks. Neurocomputing 70, 1466–81 (2007)
Article Google Scholar
Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139–154 (2001)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Zefat Academic College, Zefat, Israel
Malik Yousef
The Galilee Digital Health Research Center (GDH), Zefat, Israel
Malik Yousef
Data Analytics Center (DANA), Fraunhofer FOKUS, Berlin, Germany
Jamal Al Qundus, Silvio Peikert & Adrian Paschke

Authors

Malik Yousef
View author publications
You can also search for this author in PubMed Google Scholar
Jamal Al Qundus
View author publications
You can also search for this author in PubMed Google Scholar
Silvio Peikert
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Paschke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malik Yousef .

Editor information

Editors and Affiliations

Johannes Kepler University of Linz, Linz, Austria
Gabriele Kotsis
Vienna University of Technology, Vienna, Wien, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Oberösterreich, Austria
Ismail Khalil
Software Competence Center Hagenberg, Linz, Austria
Lukas Fischer
Software Competence Center Hagenberg, Linz, Austria
Bernhard Moser
Software Competence Center Hagenberg, Linz, Austria
Atif Mashkoor
Johannes Kepler University of Linz, Linz, Austria
Johannes Sametinger
University of Innsbruck, Innsbruck, Tirol, Austria
Anna Fensel
Software Competence Center Hagenberg, Linz, Austria
Jorge Martinez-Gil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yousef, M., Qundus, J.A., Peikert, S., Paschke, A. (2020). TopicsRanksDC: Distance-Based Topic Ranking Applied on Two-Class Data. In: Kotsis, G., et al. Database and Expert Systems Applications. DEXA 2020. Communications in Computer and Information Science, vol 1285. Springer, Cham. https://doi.org/10.1007/978-3-030-59028-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-59028-4_2
Published: 12 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59027-7
Online ISBN: 978-3-030-59028-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TopicsRanksDC: Distance-Based Topic Ranking Applied on Two-Class Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Latent Dirichlet Allocation and K-Means for Documents Clustering: Effect of Probabilistic Based Distance Measures

Topic-Level Clustering on Web Resources

Topic Modeling for Unsupervised Concept Extraction and Document Ranking

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

TopicsRanksDC: Distance-Based Topic Ranking Applied on Two-Class Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Latent Dirichlet Allocation and K-Means for Documents Clustering: Effect of Probabilistic Based Distance Measures

Topic-Level Clustering on Web Resources

Topic Modeling for Unsupervised Concept Extraction and Document Ranking

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation