Using Sub-sequence Patterns for Detecting Organ Trafficking Websites

Jung Pandey, Suraj; Manandhar, Suresh; Kleszcz, Agnieszka

doi:10.1007/978-3-642-38559-9_15

Suraj Jung Pandey³,
Suresh Manandhar³ &
Agnieszka Kleszcz⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 368))

Included in the following conference series:

International Conference on Multimedia Communications, Services and Security

856 Accesses

Abstract

This paper presents a novel method for mining suspicious websites from World Wide Web by using state-of-the-art pattern mining and machine learning methods. In this document, the term “suspicious website” is used to mean any website that contains known or suspected violations of law. Although, we present our evaluation on illegal online organ trading, the method described in this paper is generic and can be used to detect any specific kind of websites. We use an iterative setting in which at each iterations we unearth both normal and suspicious websites. These newly detected websites are augmented in our training examples and used in next iterations. The first iteration uses user supplied seed normal and suspicious websites. We show that the accuracy increases in intial iterations but decreases with further increase in iterations. This is due to the bias caused by adding large number of normal websites and also due to the automatic addition of noise in training examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Heyes, J.D.: Global organ harvesting a booming black market business; a kidney harvested every hour, http://www.naturalnews.com/036052_organ_harvesting_kidneys_black_market.html (accessed January 30, 2013)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998), http://citeseer.ist.psu.edu/joachims97text.html
Chapter Google Scholar
Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)
Google Scholar
Li, Y., Zhang, C., Swan, J.: An information filtering model on the web and its application in jobagent. Knowledge-Based Systems 13(5), 285–296 (2000), http://www.sciencedirect.com/science/article/pii/S0950705100000885
Article Google Scholar
Robertson, S., Soboroff, I.: The trec 2002 filtering track report. In: Text Retrieval Conference (2002)
Google Scholar
Lewis, D.D.: Feature selection and feature extraction for text categorization. In: Proceedings of the Workshop on Speech and Natural Language, HLT 1991, pp. 212–217. Association for Computational Linguistics, Stroudsburg (1992)
Google Scholar
Scott, S., Matwin, S.: Feature engineering for text classification. In: Proceedings of the Sixteenth International Conference on Machine Learning, ICML 1999, pp. 379–388. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)
Article MathSciNet Google Scholar
Wang, Z., Zhang, D.: Feature selection in text classification via svm and lsi. In: Wang, J., Yi, Z., Żurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3971, pp. 1381–1386. Springer, Heidelberg (2006)
Chapter Google Scholar
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, ACL 2003, vol. 1, pp. 423–430. Association for Computational Linguistics, Stroudsburg (2003), http://dx.doi.org/10.3115/1075096.1075150
Chapter Google Scholar
De Marneffe, M.C., Maccartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC 2006 (2006)
Google Scholar
Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 301–311. Springer, Heidelberg (2005)
Chapter Google Scholar
Data mining for path traversal patterns in a web environment. In: Proceedings of the 16th International Conference on Distributed Computing Systems, ICDCS 1996, pp. 385–392. IEEE Computer Society, Washington, DC (1996)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining access patterns efficiently from web logs. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 396–407. Springer, Heidelberg (2000)
Chapter Google Scholar
Wu, S.-T., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: Proceedings of the Sixth International Conference on Data Mining, ICDM 2006, pp. 1157–1161. IEEE Computer Society, Washington, DC (2006)
Google Scholar
Jindal, N., Liu, B.: Identifying comparative sentences in text documents. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 244–251. ACM, New York (2006)
Chapter Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, ICDE 1995, pp. 3–14. IEEE Computer Society, Washington, DC (1995)
Chapter Google Scholar
Feldman, R.: Mining associations in text in the presence of background knowledge. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 343–346 (1996)
Google Scholar
Holt, J.D., Chung, S.M.: Multipass algorithms for mining association rules in text databases. Knowl. Inf. Syst. 3, 168–183 (2001)
Article MATH Google Scholar
Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1993)
Google Scholar
Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic Query Expansion Using SMART: TREC 3. In: TREC (1994)
Google Scholar
Sahami, M., Heilman, T.: A web-based kernel function for matching short text snippets. In: International Workshop Located at the 22nd International Conference on Machine Learning (ICML), pp. 2–9 (2005)
Google Scholar
Abhishek, V., Hosanagar, K.: Keyword generation for search engine advertising using semantic similarity between terms. In: Proceedings of the Ninth International Conference on Electronic Commerce, ICEC 2007, pp. 89–94. ACM, New York (2007)
Google Scholar
Joshi, A., Motwani, R.: Keyword generation for search engine advertising. In: Sixth IEEE International Conference on Data Mining Workshops, ICDM Workshops 2006, pp. 490–496 (December 2006)
Google Scholar
Moschitti, A., Quarteroni, S., Basili, R., Manandhar, S.: Exploiting syntactic and shallow semantic kernels for question answer classification. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2007)
Google Scholar
Joshi, M., Pedersen, T., Maclin, R., Pakhomov, S.: Kernel methods for word sense disambiguation and acronym expansion. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 2, pp. 1879–1880. AAAI Press (2006), http://portal.acm.org/citation.cfm?id=1597348.1597488
Lee, Y.K., Ng, H.T., Chia, T.K.: Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In: Mihalcea, R., Edmonds, P. (eds.) Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 137–140. Association for Computational Linguistics, Barcelona (2004)
Google Scholar
Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083–1106 (2003), http://portal.acm.org/citation.cfm?id=944919.944964
MathSciNet MATH Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

University of York, Heslington, York, YO10 5GH, UK
Suraj Jung Pandey & Suresh Manandhar
AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059, Krakow, Poland
Agnieszka Kleszcz

Authors

Suraj Jung Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Suresh Manandhar
View author publications
You can also search for this author in PubMed Google Scholar
Agnieszka Kleszcz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Telecommunications, AGH University of Science and Technology, al. Mickiewicza 30, 30-059, Krakow, Poland
Andrzej Dziech
Multimedia Systems Department, Gdansk University of Technology, Narutowicza 11/12, Gdansk, Poland
Andrzej Czyżewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jung Pandey, S., Manandhar, S., Kleszcz, A. (2013). Using Sub-sequence Patterns for Detecting Organ Trafficking Websites. In: Dziech, A., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2013. Communications in Computer and Information Science, vol 368. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38559-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-38559-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38558-2
Online ISBN: 978-3-642-38559-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics