Skip to main content

An Improved Algorithm for Extracting Research Communities from Bibliographic Data

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6193))

Included in the following conference series:

  • 660 Accesses

Abstract

In this paper we improve the performance of the community extraction algorithm in [1] from bibliographic data, which was originally proposed for web community discovery by [2]. A web community is considered to be a set of web pages holding a common topic, in other words, it is a dense subgraph induced in web graph. Such subgraphs obtained by the max-flow algorithm are called max-flow communities, and this algorithm was improved to obtain research communities from bibliographic data by the strategy for selection of community nodes in [1]. We propose an improvement of this algorithm by carefully selecting initial seed node, and show the performance of this algorithm by experiments for the list of many keywords frequently appearing in data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Horiike, T., Takahashi, Y., Kuboyama, T., Sakamoto, H.: Extracting research communities by improved maximum flow algorithm. In: Velásquez, J.D., Ríos, S.A., Howlett, R.J., Jain, L.C. (eds.) KES 2009, Part II. LNCS, vol. 5712, pp. 472–479. Springer, Heidelberg (2009)

    Google Scholar 

  2. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: KDD 2000, pp. 150–160 (2000)

    Google Scholar 

  3. Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and identification of web communities. IEEE Computer 35(3), 66–71 (2002)

    Google Scholar 

  4. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Computer Networks 31(11-16), 1481–1493 (1999)

    Article  Google Scholar 

  5. Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J.M.: Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks 30(1-7), 65–74 (1998)

    Google Scholar 

  6. Gibson, D., Kleinberg, J.M., Raghavan, P.: Inferring web communities from link topology. In: Hypertext 1998, pp. 225–234 (1998)

    Google Scholar 

  7. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large-scale knowledge bases from the web. In: VLDB 1999, pp. 639–650 (1999)

    Google Scholar 

  8. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: SODA 1998, pp. 668–677 (1998)

    Google Scholar 

  9. Goldberg, A., Tarjan, R.: A new approach to the maximal flow problem. In: STOC 1986, pp. 136–146 (1986)

    Google Scholar 

  10. Ford Jr., L., Fulkerson, D.: Maximal flow through a network. Canadian Journal of Mathematics 8, 399–404 (1956)

    MATH  MathSciNet  Google Scholar 

  11. Edmonds, J., Karp, R.M.: Theoretical improvements in algorithmic efficiency for network flow problems. J. ACM 19(2), 248–264 (1972)

    Article  MATH  Google Scholar 

  12. CiteSeer.IST: http://citeseer.ist.psu.edu/

  13. Imafuji, N., Kitsuregawa, M.: Effects of maximum flow algorithm on identifying web community. In: WIDM 2002, pp. 43–48 (2002)

    Google Scholar 

  14. Toyoda, M., Kitsuregawa, M.: Creating a web community chart for navigating related communities. In: Hypertex 2001, pp. 103–112 (2001)

    Google Scholar 

  15. Imafuji, N., Kitsuregawa, M.: Finding a web community by maximum flow algorithm with hits score based capacity. In: DASFAA 2003, pp. 101–106 (2003)

    Google Scholar 

  16. Dean, J., Henzinger, M.R.: Finding related pages in the world wide web. Computer Networks 31(11-16), 1467–1479 (1999)

    Article  Google Scholar 

  17. Asano, Y., Nishizeki, T., Toyoda, M., Kitsuregawa, M.: Mining communities on the web using a max-flow and a site-oriented framework. IEICE Transactions 89-D(10), 2606–2615 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nakamura, Y., Horiike, T., Taira, Y., Sakamoto, H. (2010). An Improved Algorithm for Extracting Research Communities from Bibliographic Data. In: Yoshikawa, M., Meng, X., Yumoto, T., Ma, Q., Sun, L., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 6193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14589-6_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14589-6_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14588-9

  • Online ISBN: 978-3-642-14589-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics