Scalable semantic analytics on social networks for addressing the problem of conflict of interest detection

Published: 03 March 2008 Publication History


In this article, we demonstrate the applicability of semantic techniques for detection of Conflict of Interest (COI). We explain the common challenges involved in building scalable Semantic Web applications, in particular those addressing connecting-the-dots problems. We describe in detail the challenges involved in two important aspects on building Semantic Web applications, namely, data acquisition and entity disambiguation (or reference reconciliation). We extend upon our previous work where we integrated the collaborative network of a subset of DBLP researchers with persons in a Friend-of-a-Friend social network (FOAF). Our method finds the connections between people, measures collaboration strength, and includes heuristics that use friendship/affiliation information to provide an estimate of potential COI in a peer-review scenario. Evaluations are presented by measuring what could have been the COI between accepted papers in various conference tracks and their respective program committee members. The experimental results demonstrate that scalability can be achieved by using a dataset of over 3 million entities (all bibliographic data from DBLP and a large collection of FOAF documents).


Adamic, L. A., Buyukkokten, O., and Adar, E. 2003. A social network caught in the Web. First Monday 8, 6.
Aleman-Meza, B., Halaschek-Wiener, C., Arpinar, I. B., Ramakrishnan, C., and Sheth, A. P. 2005. Ranking complex relationships on the semantic web. IEEE Internet Comput. 9, 3, 37-- 44.
Aleman-Meza, B., Nagarajan, M., Ramakrishnan, C., Ding, L., Kolari, P., Sheth, A. P., Arpinar, I. B., Joshi, A., and Finin, T. 2006. Semantic analytics on social networks: Experiences addressing the problem of conflict of interest detection. In Proceedings of the 13th International World Wide Web Conference, Edinburgh. Scotland. 407--416.
Aleman-Meza, B., Hakimpour, F., Arpinar, I. B., and Sheth, A. P. 2007. SwetoDblp ontology of computer science publications, J. Web Semant. 5, 6, 151--155.
Anderson, R. and Khattak, A. 1998. The use of information retrieval techniques for intrusion detection. In Proceedings of the 1st International Workshop on Recent Advances in Intrusion Detection. Louvain-la-Neuve, Berlin, Germany.
Anyanwu, K. and Sheth, A. P. 2003. ρ-Queries: Enabling querying for semantic associations on the semantic web. In Proceedings of the 12th International World Wide Web Conference. Budapest, Hungary. 690--699.
Anyanwu, K., Maduko, A., and Sheth, A. P. 2007. SPARQ2L: Towards support for subgraph extraction queries in RDF databases. In Proceedings of the 14th International World Wide Web Conference. Banff, Alberta, Canada.
Aswani, N., Bontcheva, K., and Cunningham, H. 2006. Mining information for instance unification. In Proceedings of the 5th International Semantic Web Conference. Athens, GA. 329--342.
Barabási, A.-L. 2002. Linked---The New Science of Networks. Perseus Publishing, Cambridge, MA.
Berkowitz, S. D. 1982. Introduction to Structural Analysis: The Network Approach to Social Research. Butterworth, Toronto, Canada.
Bergamaschi, S., Castano, S., and Vincini, M. 1999. Semantic integration of semistructured and structured data sources.SIGMOD Rec. 28, 1, 54--59.
Bhattacharya, I. and Getoor, L. 2006. Entity resolution in graphs. In L. B. Holder and D. J. Cook, Eds. Mining Graph Data. John Wiley & Sons.
Chen, C. 1999. Visualising semantic spaces and author co-citation networks in digital libraries. Inform. Proc. Manag. 35, 3, 401--420.
Chen, C. and Carr, L. 1999. Trailblazing the literature of hypertext: Author co-citation analysis (1989--1998). In Proceedings of the 10th ACM Conference on Hypertext and Hypermedia: Returning to Our Diverse Roots. Darmstadt, Germany, 51--60.
Crescenzi, V., Mecca, G., and Merialdo, P. 2001. RoadRunner: Towards automatic data extraction from large Web sites. In Proceedings of the 27th International Conference on Very Large Data Bases. Rome, Italy.
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R. V., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J. A., and Zien, J. Y. 2003. SemTag and seeker: Bootstrapping the semantic Web via automated semantic annotation. In Proceedings of the 12th International World Wide Web Conference. Budapest, Hungary. 178--186.
Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R. S., Peng, Y., Reddivari, P., Doshi, V., and Sachs, J. 2004. Swoogle: A search and metadata engine for the semantic Web. In Proceedings of the International Conference on Information and Knowledge Management. Washington, DC.
Ding, L., Finin, T., Zhou, L., and Joshi, A. 2005a. Social networking on the semantic web. Learn. Orga. 5, 12.
Ding, L., Zhou, L., Finin, T., and Joshi, A. 2005b. How the Semantic Web is being used: An analysis of FOAF documents. In Proceedings of the 38th Hawaii International Conference on System Sciences. Big Island, HI.
Dong, X., Halevy, A., and Madhavan, J. 2005. Reference reconciliation in complex information spaces. In Proceedings of the ACM SIGMOD Conference. Baltimore. MD.
Garton, L., Haythornthwaite, C., and Wellman, B. 1997. Studying online social networks. J. Comput.-Mediated Comm. 3, 1.
Guha, R., Mccool, R., and Miller, E. 2003. Semantic search. In Proceedings of the 12th International World Wide Web Conference. Budapest, Hungary.
Hammond, B., Sheth, A., and Kochut, K. 2002. Semantic enhancement engine: A modular document enhancement platform for semantic applications over heterogeneous content. In V. Kashyap and L. Shklar Eds. Real World Semantic Web Applications. Ios Press. Inc. 29--49.
Hassell, J., Aleman-Meza, B., and Arpinar, I. B. 2006. Ontology-driven automatic entity disambiguation in unstructured text. In Proceedings of the 5th International Semantic Web Conference, Athens, GA.
Hollywood, J., Snyder, D., Mckay, K. N., and Boon, J. E. 2004. Out of the Ordinary: Finding Hidden Threats by Analyzing Unusual Behavior. RAND Corporation.
Horrocks, I. and Tessaris, S. 2002. Querying the semantic web: A formal approach. In Proceedings of the 1st International Semantic Web Conference. Sardinia, Italy.
Janik, M. and Kochut, K. 2005. BRAHMS: A WorkBench RDF store and high performance memory system for semantic association discovery. In Proceedings of the 4th International Semantic Web Conference. Galway, Ireland.
Jonyer, I., Holder, L. B., and Cook, D. J. 2000. Graph-based hierarchical conceptual clustering. In Proceedings of the 13th International Florida Artificial intelligence Research Society Conference. AAAI Press, 91--95.
Kalashnikov, D., Mehrotra, S., and Chen, Z. 2005. Exploiting relationships for domain-independent data cleaning. In Proceedings of the SIAM Data Mining Conference.
Karvounarakis, G., Alexaki, S., Christophides, V., Plexousakis, D., and Scholl, M. 2002. RQL: A declarative query language for RDF. In Proceedings of the 11th International World Wide Web Conference. Honolulu, HI, 592--603.
Kautz, H., Selman, B., and Shah, M. 1997. The hidden web. AI Mag. 18, 2, 27--36.
Kempe, D., Kleinberg, J. M., and Tardos, E. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 137--146.
Kochut, K. and Janik, M. 2007. SPARQLeR: Extended SPARQL for semantic association discovery. In Proceedings of the 4th European Semantic Web Conference. Innsbruck, Austria.
Laender, A. H. F., Ribeiro-Neto, B. A., Da Silva, A. S., and Teixeira, J. S. 2002. A brief survey of web data extraction tools. SIGMOD Rec. 31, 2, 84--93.
Laz, T., Fisher, K., Kostich, M., and Atkinson, M. 2004. Connecting the dots. Modern Drug Discovery, 33--36.
Lee, Y. L. 2005. Apps make semantic web a reality. SD Times.
Mika, P. 2005. Flink: Semantic Web technology for the extraction and analysis of social networks. J. Web Semant. 3, 2--3, 211--223.
Miller, E. 2005. The Semantic Web is Here. In Proceedings of the Semantic Technology Conference 2005. San Francisco, CA.
Nascimento, M. A., Sander, J., and Pound, J. 2003. Analysis of SIGMOD's CoAuthorship graph. SIGMOD Rec. 32, 3.
Neville, J., Adler, M., and Jensen, D. 2003. Clustering relational data using attribute and link information. In Proceedings of the Text Mining and Link Analysis Workshop.
Newman, M. E. J. 2001a. The structure of scientific collaboration networks. In Proceedings of the National Academy of Sciences 98, 2, 404--409.
Newman, M. E. J. 2001b. Scientific collaboration networks: II. Shortest paths, weighted networks, and centrality. Phys. Rev. E 64, 016132.
Papagelis, M., Plexousakis, D., and Nikolaou, P. N. 2005. CONFIOUS: Managing the electronic submission and reviewing process of scientific conferences. In Proceedings of the 6th International Conference on Web Information Systems Engineering. New York, NY.
Ramakrishnan, C., Milnor, W. H., Perry, M., and Sheth, A. P. 2005. Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Exp. 7, 2, 56--63.
Sheth, A. P. 2005a. Enterprise applications of semantic Web: The sweet spot of risk and compliance. In Proceedings of the IFIP International Conference on Industrial Applications of Semantic Web. Jyväskylä, Finland.
Sheth, A. P. 2005b. From semantic search & integration to analytics. In Proceedings of the Dagstuhl Seminar: Semantic Interoperability and Integration. IBFI, Schloss Dagstuhl, Germany.
Sheth, A. P., Aleman-Meza, B., Arpinar, I. B., Halaschek, C., Ramakrishnan, C., Bertram, C., Warke, Y., Avant, D., Arpinar, F. S., Anyanwu, K., and Kochut, K. 2005. Semantic association identification and knowledge discovery for national security applications. J. Datab. Manag. 16, 1, 33--53.
Sheth, A. P., Bertram, C., Avant, D., Hammond, B., Kochut, K., and Warke, Y. 2002. Managing semantic content for the Web. IEEE Internet Computing 6, 4, 80--87.
Smeaton, A. F., Keogh, G., Gurrin, C., McDonald, K., and Sodring, T. 2002. Analysis of papers from twenty-five years of SIGIR conferences: What have we been doing for the last quarter of a century. SIGIR For. 36, 2.
Townley, J. 2000. The streaming search engine that reads your mind. Streaming Media World.
Wasserman, S. and Faust, K. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge, UK.
Wellman, B. 1998. Structural analysis: From method and metaphor to theory and substance. In B. Wellman and S. D. Berkowitz. Eds. Social Structures: A Network Approach. Cambridge University Press, Cambridge, 19--61.
Winkler, W. E. 1999. The state of record linkage and current research problems. RR99/03, U.S. Census Bureau.
Xu, J. and Chen, H. 2003. Untangling criminal networks: A case study. In Proceedings of Intelligence and Security Informatics, 1st NSF/NIJ Symposium, 232--248.

      Cited By

      View all
