skip to main content
10.1145/1882992.1883058acmotherconferencesArticle/Chapter ViewAbstractPublication PagesihiConference Proceedingsconference-collections
poster

An exploratory study of news article clustering for web-based bio-surveillance

Published: 11 November 2010 Publication History

Abstract

Online news articles provide rich and timely information for disease outbreak surveillance. Meanwhile, it is not trivial to search articles relevant to disease outbreaks among the large volume of online publications. In this study, we examined the use of text clustering techniques to organize online articles. To take into account surveillance analysts' expertise in clustering articles, we considered selection of informative word features in a supervised manner. Our experiments suggest that the supervised selection of features can significantly reduce the features size without affecting the utility of resulting clusters. In addition, we observed that the clustering algorithm could yield consistent results when a small number of selected features were used.

References

[1]
A. Amato-Gauci and A. Ammon, The surveillance of communicable diseases in the European Union--a long-term strategy (2008--2013), Euro Surveill, 13 (2008).
[2]
M. Blench, Global Public Health Intelligence Network (GPHIN), Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, 2008, pp. 299--303.
[3]
J. S. Brownstein and C. C. Freifeld, HealthMap: the development of automated real-time internet surveillance for epidemic intelligence, Euro Surveill, 12 (2007), E071129 5.
[4]
N. Collier, S. Doan, A. Kawazoe, R. M. Goodwin, M. Conway, Y. Tateno, Q. H. Ngo, D. Dien, A. Kawtrakul, K. Takeuchi, M. Shigematsu and K. Taniguchi, BioCaster: detecting public health rumors with a Web-based text mining system, Bioinformatics, 24 (2008), pp. 2940--2941.
[5]
M. Conway, S. Doan, A. Kawazoe and N. Collier, Classifying disease outbreak reports using n-grams and semantic features, The Third International Symposium on Semantic Mining in Biomedicine, 2008, pp. 29--36.
[6]
L. Damianos, J. Ponte, S. Wohlever, F. Reeder, D. Day, G. Wilson and L. Hirschman, MiTAP, text and audio processing for bio-security: a case study, Eighteenth national conference on Artificial intelligence, 2002, pp. 807--814.
[7]
S. Doan, A. Kawazoe and N. Collier, The role of roles in classifying annotated biomedical texts, BioNLP 2007, pp. 17--24.
[8]
G. Forman, BNS Feature Scaling: An Improved Representation over TF"IDF for SVM Text Classification, The 17th ACM conference on Information and knowledge mining, 2008, pp. 263--270.
[9]
C. C. Freifeld, K. D. Mandl, B. Y. Reis and J. S. Brownstein, HealthMap: global infectious disease monitoring through automated classification and visualization of Internet media reports, J Am Med Inform Assoc, 15 (2008), pp. 150--7.
[10]
D. M. Hartley, N. P. Nelson, R. Walters, R. Arthur, R. Yangarber, L. Madoff, J. Linge, A. Mawudeku, N. Collier, J. Brownstein, G. Thinus and N. Lightfoot, The Landscape of International Event-based Biosurveillance, Emerging Health Threats Journal, 3 (2010).
[11]
T. Liu, S. Liu, Z. Chen and W. Y. Ma, An evaluation on feature selection for text clustering, The Twentieth International Conference on Machine Learning, 2003.
[12]
C. D. Manning, P. Raghavan and H. Schütze, Introduction to information retrieval, Cambridge University Press, Cambridge 2008.
[13]
M. Steinbach, G. Karypis and V. Kumar, A comparison of document clustering techniques, The 6th ACM SIGKDD, World Text Mining Conference, 2000.
[14]
R. Steinberger, F. Fuart, E. v. d. Groot, C. Best, P. v. Etter and R. Yangarber, Text Mining from the Web for Medical Intelligence, OIS Press, The Netherlands, 2008.
[15]
M. Torii, L. Yin, T. Nguyena, C. T. Mazumdar, H. Liu, D. M. Hartley and N. P. Nelson, An exploratory study of a text classification framework for web-based surveillance of emerging epidemics, (in review).
[16]
R. Walters, P. Harlan, N. P. Nelson and D. M. Hartley, Data Sources for Biosurveillance, in J. G. Voeller, ed., Wiley Handbook of Science and Technology for Homeland Security: Risk Analysis, 2009.
[17]
Y. Yang and J. O. Pedersen, A comparative study on feature selection in text categorization, Fourteenth International Conference on Machine Learning, 1997, pp. 412--420.
[18]
Y. Zhang and B. Liu, Semantic text classification of emergent disease reports, Knowledge Discovery in Databases: PKDD 2007, 2007, pp. 629--637.
[19]
Y. L. Zhang, Y. Dang, H. C. Chen, M. Thurmond and C. Larson, Automatic online news monitoring and classification for syndromic surveillance, Decision Support Systems, 47 (2009), pp. 508--517.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IHI '10: Proceedings of the 1st ACM International Health Informatics Symposium
November 2010
886 pages
ISBN:9781450300308
DOI:10.1145/1882992
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. biosurveillance
  2. clustering
  3. feature selection
  4. text mining

Qualifiers

  • Poster

Conference

IHI '10
IHI '10: ACM International Health Informatics Symposium
November 11 - 12, 2010
Virginia, Arlington, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 204
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media