Skip to main content

Three-Tier Clustering: An Online Citation Clustering System

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2118))

Abstract

In this paper, we present a three tier clustering method where data objects are described by a number of feature dimensions. Using the approach, similarity along each feature dimension of objects are first computed. The inter-objects similarity are then computed from inter-feature-dimension similarity using a Bayesian multi-causal model. Objects are finally clustered based on the computed similarity. An online citation entry clustering system was built using the approach. It accepts user queries in the form of name of authors. Such queries are sent to citation/bibliography search engines. The returned entries are clustered based on feature dimensions such as authors, title, place of publication, etc. After clustering, entries from different authors with the similar name form different clusters, that are presented to the user. Preliminary experiment results indicated the effectiveness of the proposed clustering approach. The architecture of three-tire clustering framework, feature representation of a citation entry, a brief network model for inter-object similarity computation, and a special cluster evaluation technique are discussed in detail.

This work is partially supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region, China (AOE97/98.EG05) and a grant from the National 973 project of China (No. G1998030414)

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Brin, L. Page. The Anatomy of a Large-Scale Hyper-textual Web Search Engine. Proc. Of the 7th International World Wide Web conference, 1998.

    Google Scholar 

  2. Rodrigo A. Botafogo, Clustering Analysis for Hypertext Systems, ACM-SIGIR’93-6/93/Pittsburgh, PA, USA.

    Google Scholar 

  3. Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey, Scatter /Gather: A Cluster-based Approach to Browsing Large Document Collections, 15th Ann Int’l SIGIR’92, Denmark-6/92.

    Google Scholar 

  4. Douglass R. Cutting, David R. Karger, Jan O. Pedersen, Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections, 16th Ann Int’s SIGIR’93/Pittsburgh PA USA-6/93.

    Google Scholar 

  5. R. O. DUDA and P. E. HART, Pattern Classification and Scene Analysis, John Wiley and Sons, Inc., New York, NY, 1973.

    MATH  Google Scholar 

  6. Computer Science Bibliography, http://www.informatik.uni-trier.de/~ley/db/.

  7. Lee Giles, Kurt Bollacker, Steve Lawrence. CiteSeer: An Automatic Citation Indexing System. Proceedings of the 3rd ACM Conference on Digital Libraries, pp. 89–98, 1998 [short listed for best paper award].

    Google Scholar 

  8. A.K. JAIN, M.N. Murty and P.J. FLYNN. Data Clustering: A review. ACM Computing Surveys, Vol. 31, No. 3, September 1999.

    Google Scholar 

  9. S. Lawrence and C.L. Giles. Accessibility of information on theWeb. Nature, 400(8), July 1999, 107–109.

    Google Scholar 

  10. M. F. Porter. An algorithm for suffix stripping. Program, 14:130–137, 3 1980.

    Google Scholar 

  11. Dharmendra s. Modha, W.Scott Spangler. Clustering Hypertext with applications to Web Searching, Reseach Report RJ 10160(95035), Proceedings of ACM Hypertext Conference, May 30–June 3, 2000.

    Google Scholar 

  12. ACC:SampleFunction to Format NamesinSeveral Different Ways, http://support.microsoft.com/support/kb/articles/Q149/9/53.asp

  13. Rasmussen, E. Clustering algorithms in Information Retrieval: Data Structures and Algorithms. (1992), W. B. Frakes and R. Baeza Yates, Eds., Prentice Hall, Englewood Cliffs, New Jersey, pp. 419–442.

    Google Scholar 

  14. Stuart J. Russell. (1995), Artificial intelligence: a modern approach, Chapter 15, Prentice Hall.

    Google Scholar 

  15. Willet, P. Recent trends in hierarchic document clustering: a critical review. Inform. Proc. & Management (1988), 577–597.

    Google Scholar 

  16. N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference, Journal of Artificial Intelligence Research, 5, 301–328.

    MATH  MathSciNet  Google Scholar 

  17. N. L. Zhang and D. Poole (1999), On the role of context-specific independence in Probabilistic Reasoning, IJCAI-99, 1288–1293.

    Google Scholar 

  18. Research Index, the NECI Scientific Literature Digital Library. Available at http://citeseer.nj.nec.com/cs.

  19. The Collection of Computer Science Bibliographies. Available at http://liinwww.ira.uka.de/bibliography/.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, H., Lou, W., Wang, W. (2001). Three-Tier Clustering: An Online Citation Clustering System. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_22

Download citation

  • DOI: https://doi.org/10.1007/3-540-47714-4_22

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42298-3

  • Online ISBN: 978-3-540-47714-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics