Elsevier

Decision Support Systems

Volume 43, Issue 4, August 2007, Pages 1348-1361
Decision Support Systems

An associate constraint network approach to extract multi-lingual information for crime analysis

https://doi.org/10.1016/j.dss.2006.04.011Get rights and content

Abstract

International crime and terrorism have drawn increasing attention in recent years. Retrieving relevant information from criminal records and suspect communications is important in combating international crime and terrorism. However, most of this information is written in languages other than English and is stored in various locations. Information sharing between countries therefore presents the challenge of cross-lingual semantic interoperability. In this work, we propose a new approach – the associate constraint network – to generate a cross-lingual concept space from a parallel corpus, and benchmark it with a previously developed technique, the Hopfield network. The associate constraint network is a constraint programming based algorithm, and the problem of generating the cross-lingual concept space is formulated as a constraint satisfaction problem. Nodes and arcs in an associate constraint network represent extracted terms from parallel corpora and their associations. Constraints are defined for the nodes in the associate constraint network, and node consistency and network satisfaction are also defined. Backmarking is developed to search for a feasible solution. Our experimental results show that the associate constraint network outperforms the Hopfield network in precision, recall and efficiency. The cross-lingual concept space that is generated with this method can assist crime analysts to determine the relevance of criminals, crimes, locations and activities in multiple languages, which is information that is not available in traditional thesauri and dictionaries.

Introduction

In this age of globalisation and information revolution, the threat of international crime and terrorism has increased significantly. Rapid changes in technology have opened new opportunities for trafficking contraband, conducting illicit trade, laundering money and engaging in large-scale economic crimes. New forms of terrorism are also emerging. These threats have a great effect on the citizens, businesses and national security of many countries. Some evidence of cross-border international crime and terrorism is shown below:

  • Algerian national Ahmed Ressam, who was associated with an extremist group that has ties to Al Qaeda, attempted to smuggle bomb making material into the United States from Canada. He was arrested in Port Angeles, Washington.

  • The number of annual fatalities in terrorist-related violence in south Asia far exceeds the death toll in the Middle East, which is traditionally held to be the cradle of terrorism [2].

  • Illegal migration that is facilitated by organized alien smuggling networks is on the rise. The “human cargo” is often kept in cramped, unhealthy and dangerous conditions, and many women and children are smuggled across borders for sexual exploitation and forced labour.

  • The tremendous cost of legally disposing of pollutants and dangerous chemicals has created new illicit business opportunities, and many criminal organizations illegally export toxic wastes to countries in Eastern and Central Europe, Asia and Africa.

  • Intellectual property rights (IPR) crimes cause tremendous revenue losses for the entertainment industries, and the explosion of digitisation and the Internet has enabled the IPR violators to effortlessly copy and distribute electronic products.

  • International criminals produce, distribute and use counterfeit money for profit, to make illicit transactions, to finance illegal operations, and to promote illicit activities.

  • Finally, high-tech crimes, in particular against computer networks, are becoming an increasing law enforcement and national security problem because of the growing reliance of government entities, public utilities, industries, business and financial institutions on electronic data and information storage, retrieval and transmission.

All of this evidence demonstrates that a strong alliance of the law enforcement, defence and security agencies in different countries is needed, and channels through which agencies can exchange information and intelligence on international criminal and terrorist activities are needed.

The successful combating of international crimes and terrorism relies on information sharing between different countries to evaluate the threats and vulnerabilities and to issue the necessary warnings. A knowledge management system can enable the retrieval of relevant information from criminal records and suspect communications in multiple languages for a threat before it causes widespread harm. This creates the challenge of cross-lingual semantic interoperability.

The language barrier is a major problem in knowledge management for international crime analysis. Terrorists and criminals may communicate through emails and bulletin boards in languages other than English. Many of the words that are used in such communications are unknown words that do not exist in dictionaries, such as the names of criminals or terrorists. Typical dictionary-based approach is unable to provide a translation for such terms. Furthermore, translations may not be consistent across different regions. The translation of “Bin Laden”, for example, may be different in China, Hong Kong and Taiwan. It is impossible to update these translations manually. The use of an automated cross-lingual concept space has proved to be promising in solving the problem of cross-lingual semantic interoperability. A concept space is a semantic network that consists of concepts (noun phrases in the textual domain) and related concepts, in which the association of concepts is computed based on co-occurrence relationships. A concept is a noun phrase that represents something that is conceived in the mind. For a given language, a concept may be represented by a word or words, or by a morpheme, an idiomatic expression, a tone or word order. Several concepts may be represented by a single word in one language, but may be translated as one word, two words, a phrase or even a sentence in another language [15]. A concept space for two languages is called a bilingual concept space. A bilingual concept space that represents the association of concepts across the languages is also known as the cross-lingual concept space. In this work, we focus on an English/Chinese cross-lingual concept space.

The research problem in this work is to build a cross-lingual concept space to resolve the problem of cross-lingual semantic interoperability. Such a cross-lingual concept space must be capable of supporting users to search across language boundaries for relevant information to combat international crime and terrorism. For example, users may submit the query ‘peer to peer’ to search for information about the illegal downloading of copyrighted electronic files. An automatic cross-lingual concept space would then provide related concepts in the same language and other languages, such as ‘P2P’, ‘

’ (peer to peer), ‘
’ (peer to peer), ‘
’ (client), ‘client’, ‘
’ (server), and ‘server’. The related concepts can then be used to expand the original query to search for relevant information in multiple languages.

In our previous work [14], we investigated the Hopfield network for generating a cross-lingual concept space to support cross-lingual information retrieval using the Hong Kong SAR Police Department's Web corpus as the test bed for crime analysis. The results found the investigated technique to be promising, and high precision and recall were obtained. However, the Hopfield network has two shortcomings. Firstly, its efficiency in a large network of association of English and Chinese terms is unsatisfactory, and the convergence process is time consuming, especially for general terms. The general terms usually have small semantic distance with other terms. As a result, many terms can be activated and it may not converge. Secondly, the results from the cross-lingual concept space are not consistent, because the Hopfield network is basically a random process, and thus the results that are generated through different convergence processes with the same input are not necessarily the same.

In this work, we propose an associate constraint network approach to tackle the problem of cross-lingual semantic interoperability. The cross-lingual concept space is modelled as an associate constraint network, and the problem of generating cross-lingual concepts is formulated as a constraint satisfaction problem. The nodes in the associate constraint network represent the extracted bilingual terms of a parallel corpus, and the arcs of the associate constraint network represent the association between the extracted terms. The constraints on the nodes are presented. Node consistency and network satisfaction are then defined. A constraint propagation technique that is known as backmarking is proposed to solve the constraint satisfaction problem. Using backmarking, various items of information in different languages are linked together according to the conceptual relationships that are embedded in a parallel corpus and are presented to the analyst as the result for a single query. In our experiment, the English/Chinese daily press releases that are issued by the Hong Kong Police Department are used. Such a cross-lingual knowledge base can aid the pursuit and apprehension of suspects, the searching for evidence and the allocation of resources. An adequate knowledge base could form the basis for collective action in response to the elusive tactics of global terrorists.

Section snippets

Automatic generation of cross-lingual concept space

Due to the limitation of a dictionary-based approach to cross-lingual information retrieval and the infeasibility of manually constructing a sophisticated bilingual dictionary or multilingual thesaurus, most recent works have focused on a corpus-based approach to the problem of cross-lingual semantic interoperability. The corpus-based approach uses term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model. A parallel corpus is a collection of

Associate constraint network approach

To overcome the shortcomings of the Hopfield network in the automatic construction of a cross-lingual concept space, we propose a constraint programming based algorithm. The cross-lingual concept space is modelled as an associate constraint network, and the problem of generating the cross-lingual concept space is formulated as a constraint satisfaction problem.

User interface of the multilingual information extraction system for international crime analysis

We developed a multilingual information extraction system for international crime analysis using the associate constraint network approach. The graphical user interface of the system is presented in Fig. 3, Fig. 4, Fig. 5.

The user enters the keyword of interest in the ‘Query’ text field and then clicks the ‘Navigate the concept space’ button (Fig. 3). The system then conducts the backmarking propagation based on the user query and generates a bilingual concept space. The English and Chinese

Experiments

We conducted an experiment to measure the performance of the associate constraint network approach in generating a cross-lingual concept space and benchmarked it with the previous approach, the Hopfield network.

Conclusion

In light of the increasing threat of international crime and terrorism that has occurred as a result of globalisation and rapid changes in technology, information sharing and effective methods for information retrieval of multi-lingual information to evaluate threats and vulnerabilities is vital. To identify and share information on a threat before it causes widespread harm, an intelligent system is required to retrieve relevant information from criminal records and suspect communications. Most

Christopher C. Yang is an associate professor in the Department of Systems Engineering and Engineering Management at the Chinese University of Hong Kong. He received his B.S., M.S., and Ph.D. in Electrical and Computer Engineering from the University of Arizona. He has also been a faculty member in the Department of Computer Science and Information Systems at the University of Hong Kong and a research scientist in the Department of Management Information Systems at the University of Arizona.

References (23)

  • R. Bartak

    Theory and practice of constraint propagation

  • B. Chellaney

    Fighting terrorism in Southern Asia: the lessons of history

    International Security

    (Winter 2001/2002)
  • H. Chen et al.

    Automatic construction of networks of concepts characterizing document database

    IEEE Transactions on Systems, Man and Cybernetics

    (1992)
  • H. Chen et al.

    A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the Worm Community System

    Journal of the American Society for Information Science

    (1997)
  • L.F. Chien

    PAT-Tree-BASED keyword extraction for Chinese information retrieval

  • J.P. Courtial et al.

    A system based on associational logic for the interrogation of databases

    Journal of Information Science

    (1987)
  • S. He

    Translingual alteration of conceptual information in medical translation: a cross-language analysis between English and Chinese

    Journal of the American Society for Information Science

    (2000)
  • J.J. Hopfield

    Neural network and physical systems with collective computational abilities

    Proceedings of the National Academy of Sciences of the United States of America

    (1982)
  • J.J. Hopfield

    Neurons with graded response have collective computational properties like those of two-state neurons

    Proceedings of the National Academy of Sciences of the United States of America

    (1984)
  • V. Kumar

    Algorithms for constraint satisfaction problems: a survey

    AI Magazine

    (1992)
  • M.L. Larson

    Meaning-Based Translation: A Guide to Cross-Language Equivalence

    (1998)
  • Cited by (8)

    • A novel steganographic algorithm using animations as cover

      2008, Decision Support Systems
      Citation Excerpt :

      Initially, the requirement of a steganographic system was that the difference between cover and stego should be imperceptible to human senses. To analyze text with semantic meanings, Yang and Li [41] follow a constraint network approach to analyze linguistics semantics using Hopfield networks. With the advent of many automated software tools for steganalysis, this requirement is modified in the manner that the difference should be undetectable to any available steganalysis tool.

    • Modeling for criminal case with large message traffic

      2013, Lecture Notes in Electrical Engineering
    • Cross language information extraction for digitized textbooks of specific domains

      2012, Proceedings - 2012 IEEE 12th International Conference on Computer and Information Technology, CIT 2012
    • Managing multilingual S&T knowledge

      2009, IEEE Intelligent Systems
    View all citing articles on Scopus

    Christopher C. Yang is an associate professor in the Department of Systems Engineering and Engineering Management at the Chinese University of Hong Kong. He received his B.S., M.S., and Ph.D. in Electrical and Computer Engineering from the University of Arizona. He has also been a faculty member in the Department of Computer Science and Information Systems at the University of Hong Kong and a research scientist in the Department of Management Information Systems at the University of Arizona. His recent research interests include cross-lingual information retrieval and knowledge management, Web search and mining, security informatics, text summarization, multimedia retrieval, information visualization, digital library, and electronic commerce. He has published over 120 referred journal and conference papers in the Journal of the American Society for Information Science and Technology (JASIST), Decision Support Systems (DSS), IEEE Transactions on Image Processing, IEEE Transactions on Robotics and Automation, IEEE Computer, Information Processing and Management, Journal of Information Science, Graphical Models and Image Processing, Optical Engineering, Pattern Recognition, International Journal of Electronic Commerce, Applied Artificial Intelligence, IWWWC, SIGIR, ICIS, CIKM, and more. He has edited several special issues on multilingual information systems, knowledge management, and Web mining in JASIST and DSS. He has also frequently served as an invited panelist in the NSF Review Panels in the US. He was the chairman of the Association for Computing Machinery Hong Kong Chapter.

    Kar Wing Li is currently at the Department of Information Systems, The City University of Hong Kong. Before his career at the Department of Information Systems, he worked as Assistant Professor at the Department of Computing, The Polytechnic University of Hong Kong. He completed his PhD at the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. He received his B.Eng. in Information System Engineering from Imperial College, University of London, U.K. and M.Phil. from the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. Before he studied in the Chinese University of Hong Kong, he had worked in different departments of the University of Hong Kong and Hong Kong Polytechnic University as a researcher. His research specialization is in the areas of cross-lingual information retrieval, multimedia information retrieval, digital library, internet information retrieval, knowledge management, machine translation, neural networks and constraint networks. His research has been published in several leading journals such as the Journal of the American Society for Information Science and Technology, Information Processing and Management, and proceedings of international conferences such as ACM/IEEE Joint Conference, WWW, International Conference on Asia Digital Libraries, and others.

    View full text