Skip to main content

Using Sub-sequence Patterns for Detecting Organ Trafficking Websites

  • Conference paper
Book cover Multimedia Communications, Services and Security (MCSS 2013)

Abstract

This paper presents a novel method for mining suspicious websites from World Wide Web by using state-of-the-art pattern mining and machine learning methods. In this document, the term “suspicious website” is used to mean any website that contains known or suspected violations of law. Although, we present our evaluation on illegal online organ trading, the method described in this paper is generic and can be used to detect any specific kind of websites. We use an iterative setting in which at each iterations we unearth both normal and suspicious websites. These newly detected websites are augmented in our training examples and used in next iterations. The first iteration uses user supplied seed normal and suspicious websites. We show that the accuracy increases in intial iterations but decreases with further increase in iterations. This is due to the bias caused by adding large number of normal websites and also due to the automatic addition of noise in training examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Heyes, J.D.: Global organ harvesting a booming black market business; a kidney harvested every hour, http://www.naturalnews.com/036052_organ_harvesting_kidneys_black_market.html (accessed January 30, 2013)

  2. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998), http://citeseer.ist.psu.edu/joachims97text.html

    Chapter  Google Scholar 

  3. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)

    Google Scholar 

  4. Li, Y., Zhang, C., Swan, J.: An information filtering model on the web and its application in jobagent. Knowledge-Based Systems 13(5), 285–296 (2000), http://www.sciencedirect.com/science/article/pii/S0950705100000885

    Article  Google Scholar 

  5. Robertson, S., Soboroff, I.: The trec 2002 filtering track report. In: Text Retrieval Conference (2002)

    Google Scholar 

  6. Lewis, D.D.: Feature selection and feature extraction for text categorization. In: Proceedings of the Workshop on Speech and Natural Language, HLT 1991, pp. 212–217. Association for Computational Linguistics, Stroudsburg (1992)

    Google Scholar 

  7. Scott, S., Matwin, S.: Feature engineering for text classification. In: Proceedings of the Sixteenth International Conference on Machine Learning, ICML 1999, pp. 379–388. Morgan Kaufmann Publishers Inc., San Francisco (1999)

    Google Scholar 

  8. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  9. Wang, Z., Zhang, D.: Feature selection in text classification via svm and lsi. In: Wang, J., Yi, Z., Żurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3971, pp. 1381–1386. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, ACL 2003, vol. 1, pp. 423–430. Association for Computational Linguistics, Stroudsburg (2003), http://dx.doi.org/10.3115/1075096.1075150

    Chapter  Google Scholar 

  11. De Marneffe, M.C., Maccartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC 2006 (2006)

    Google Scholar 

  12. Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 301–311. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Data mining for path traversal patterns in a web environment. In: Proceedings of the 16th International Conference on Distributed Computing Systems, ICDCS 1996, pp. 385–392. IEEE Computer Society, Washington, DC (1996)

    Google Scholar 

  14. Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining access patterns efficiently from web logs. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 396–407. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  15. Wu, S.-T., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: Proceedings of the Sixth International Conference on Data Mining, ICDM 2006, pp. 1157–1161. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  16. Jindal, N., Liu, B.: Identifying comparative sentences in text documents. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 244–251. ACM, New York (2006)

    Chapter  Google Scholar 

  17. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, ICDE 1995, pp. 3–14. IEEE Computer Society, Washington, DC (1995)

    Chapter  Google Scholar 

  18. Feldman, R.: Mining associations in text in the presence of background knowledge. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 343–346 (1996)

    Google Scholar 

  19. Holt, J.D., Chung, S.M.: Multipass algorithms for mining association rules in text databases. Knowl. Inf. Syst. 3, 168–183 (2001)

    Article  MATH  Google Scholar 

  20. Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1993)

    Google Scholar 

  21. Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic Query Expansion Using SMART: TREC 3. In: TREC (1994)

    Google Scholar 

  22. Sahami, M., Heilman, T.: A web-based kernel function for matching short text snippets. In: International Workshop Located at the 22nd International Conference on Machine Learning (ICML), pp. 2–9 (2005)

    Google Scholar 

  23. Abhishek, V., Hosanagar, K.: Keyword generation for search engine advertising using semantic similarity between terms. In: Proceedings of the Ninth International Conference on Electronic Commerce, ICEC 2007, pp. 89–94. ACM, New York (2007)

    Google Scholar 

  24. Joshi, A., Motwani, R.: Keyword generation for search engine advertising. In: Sixth IEEE International Conference on Data Mining Workshops, ICDM Workshops 2006, pp. 490–496 (December 2006)

    Google Scholar 

  25. Moschitti, A., Quarteroni, S., Basili, R., Manandhar, S.: Exploiting syntactic and shallow semantic kernels for question answer classification. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2007)

    Google Scholar 

  26. Joshi, M., Pedersen, T., Maclin, R., Pakhomov, S.: Kernel methods for word sense disambiguation and acronym expansion. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 2, pp. 1879–1880. AAAI Press (2006), http://portal.acm.org/citation.cfm?id=1597348.1597488

  27. Lee, Y.K., Ng, H.T., Chia, T.K.: Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In: Mihalcea, R., Edmonds, P. (eds.) Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 137–140. Association for Computational Linguistics, Barcelona (2004)

    Google Scholar 

  28. Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083–1106 (2003), http://portal.acm.org/citation.cfm?id=944919.944964

    MathSciNet  MATH  Google Scholar 

  29. Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jung Pandey, S., Manandhar, S., Kleszcz, A. (2013). Using Sub-sequence Patterns for Detecting Organ Trafficking Websites. In: Dziech, A., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2013. Communications in Computer and Information Science, vol 368. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38559-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38559-9_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38558-2

  • Online ISBN: 978-3-642-38559-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics