The implementation of a secure and pervasive multimodal Web system architecture

https://doi.org/10.1016/j.infsof.2005.12.012Get rights and content

Abstract

While most users currently access Web applications from Web browser interfaces, pervasive computing is emerging and offering new ways of accessing Internet applications from any device at any location, by utilizing various modes of interfaces to interact with their end users. The PC and its back-end servers remain important in a pervasive system, and the technology could involve new ways of interfacing with a PC and/or various types of gateways to back-end servers. In this research, cellular phone was used as the pervasive device for accessing an Internet application prototype, a multimodal Web system (MWS), through voice user interface technology.

This paper describes how MWS was developed to provide a secure interactive voice channel using an Apache Web server, a voice server, and Java technology. Securing multimodal applications proves more challenging than securing traditional Internet applications. Various standards have been developed within a context of Java 2 Micro Edition (J2ME) platform to secure multimodal and wireless applications. In addition to covering these standards and their applicability to the MWS system implementation, this paper also shows that multimodal user-interface page can be generated by using XSLT stylesheet which transforms XML documents into various formats including XHTML, WML, and VoiceXML.

Introduction

As wireless local area network (LAN) becomes popular today, ‘anytime anywhere’ connectivity is becoming a reality and pervasive computing is becoming more and more popular worldwide [1], [2], [3]. With the rapid spread of mobile phone devices and the convergence of the phone and the personal digital assistant (PDA), there is an increasing demand for a multimodal platform that combines the modalities of various interface devices to reach a greater population of users. Various surveys recently conducted in Taiwan show that the utilization rate of accessing Internet from wireless devices is low [4], and the voice channel is still the preferred choice because voice interaction could escape the physical limitations on keypads and displays as mobile devices become ever smaller and it is much easier to say a few words than it is to thumb them in on a keypad where multiple key presses may be needed for each English letter or Chinese character [5]. Therefore, from the business viewpoint, providing common services available through both Web browser interface and voice telephony interface is a very valuable approach. Traditionally, interactive voice response (IVR) systems are based on proprietary hardware and software technology, with development and deployment tightly integrated on the same hardware platform [6], [7]. This has resulted in high development costs. Non-portable proprietary software cannot be deployed on different platforms and it is also inherently difficult to upgrade or modify [8]. A multimodal language is needed to support human–computer dialogs via spoken input and audio output. As an optimum solution, Voice eXtensible Markup Language (VoiceXML), a markup language for creating voice-user interfaces, bridges the gap between the Web and the speech world [9], by utilizing speech and telephone touchtone recognition for input and prerecorded audio and text-to-speech synthesis (TTS) for output [10]. It is based on the World Wide Web Consortium's (W3C's) eXtensible Markup Language (XML) and leverages the Web paradigm for application development and deployment. By having a common language, application developers, platform vendors, and tool providers can all benefit from code portability and reuse [9]. Furthermore, to reduce the cost of building and delivery of new capabilities to telephone customers, providing voice access to Web-based applications is an attractive option. VoiceXML makes it possible for companies to write shared business logic once and focus their resources on developing only the specific user interface for each device they support.

Since VoiceXML is a child language of XML, a VoiceXML based system may inherit many common security issues from XML [11]. Furthermore, the voice signals are unprotected while transmitted through the voice gateway. Thus, an entire secure voice system is necessary for achieving secure conversation. VoiceXML's ability to seamlessly link a number of sources, databases, Web applications, and human–machine dialogue, one of the technology's most useful features—also makes it a security threat by providing a potential point of entry for wiretap, malicious code, hackers, or other problems. Among these, human–machine dialogues are especially easy to be wire-tapped by eavesdroppers [12]. When this happens, the sensitive information, such as the user's transaction record or credit card number would be usurped. Thus, it is necessary to ensure secure communication in the MWS application [12]. The proposed MWS system applies modern cryptography to encrypt voice packet in the transmission channels. It is also a research goal to find a reasonably good cryptography algorithm suited for voice transmission on both voice quality and security. This secure voice system was implemented as a software application so that it can be integrated with the MWS system.

Section snippets

System description

Fig. 1 illustrates the conceptual model of our voice-enabled Web system, which is used in our research as the example of a MWS system. When a caller places a call to a designated phone number, a computer on the voice site (i.e. the voice server) answers the call and retrieves the initial VoiceXML script from a VoiceXML content server, which can be a Web server located anywhere on the Web. An interpreter on the voice site parses and executes the script by playing prompts, capturing responses,

Multimodal application security with Java

Java's modular nature allows it not only to expand and develop solutions for new computational problems but to drive wireless and multimodal applications. Java 2 Platform, Micro Edition (J2ME), which is designed for non-browser based devices, keeps some of the Java 2 Platform, Standard Edition (J2SE) core library APIs, and substitutes others with lightweight components through the javax.microedition package [16], [17]. Multimodal applications may use both wireless and voice devices. It is

Voice coding and encryption concern

Encryption protects against interception by the third party. No encryption scheme can guarantee to be 100% secure [19]. An encryption algorithm, such as AES and Blowfish, provides a great deal of security, indeed better than any generally available digital cellular telephone [9]. For security concern, we used a secure and H.323-compliant VoIP application to substitute for NetMeeting in order to achieve security and cost reduction.

When sending voice traffic over IP networks, many factors affect

Experiment method

The functional blocks of the voice encryption process for the proposed voice-enabled Web system are demonstrated in Fig. 3. We modified the IP Telephony software of an open source project developed by John Walker and Brian C. Wiles. It can use various Unix and Unix-like systems equipped with audio hardware to connect to the voice server. It also uses the audio input and output facilities of the machine to digitize and later reconstruct the sound. To securely transfer a randomly generated

Performance benchmark

Our system was implemented on a 1.8 GHz Pentium 4 M with 512 MB RAM laptop, and on another 2.4 GHz Pentium 4 with 512 MB RAM desktop. Both machines are running Microsoft Windows 2000 operating system. Evaluation of the system's performance requires the development of goals and objectives for the MWS system. Goals are defined as conditions to be achieved, but are not specific in nature so as to develop a process to achieve them. The goals of our MWS system implementation (i.e. the major categories

Conclusion

To support various types of pervasive devices in a conventional way, multiple applications have to be independently developed with each to satisfy one type of devices. This practice will exponentially increase the cost, complexity, and manageability of a system when new devices or changes are introduced. To resolve this issue, we adopted a new octopus-like software application architecture that enabled one single application simultaneously interfacing with various types of distributed devices

Future research

Other than the traditional PC based Web browser interface and the speech based VUI interface, we have recently enhanced our MWS system to work with other wireless devices including Palm PDA, BlackBerry, and Pocket PC. More experiments will be designed and conducted to get test results on these wireless devices for a more comprehensive understanding of the security issues of the MWS system and their corresponding solutions. The MWS system can be applied to various business products and services

Acknowledgements

The editors and anonymous reviewers are highly appreciated for their invaluable comments and suggestions. The authors would also like to thank the National Science Council, Taiwan, for financially supporting this work, under contract no. NSC-93-2213-E-005-022.

Shuchih Ernest Chang received his MSCS and PhD degrees, both from the University of Texas at Austin, in 1987 and 1994, respectively. Before becoming an assistant professor at the Institute of Electronic Commerce, National Chung Hsing University, Taiwan, he worked at UBS Financial Services Inc. in USA as a divisional vice president for about 5 years. He has about 15 years of working experience in various major computer and financial service firms in USA, including: Unisys, IBM, Sun Microsystems,

References (40)

  • M.J. O'Grady et al.

    Mobile devices and intelligent agents—towards a new generation of applications and services

    Information Sciences

    (2005)
  • K. Turner

    Analysing interactive voice services

    Computer Networks

    (2004)
  • K. Beznosov et al.

    Introduction to Web services and their security

    Information Security Technical Report

    (2005)
  • D. Estrin et al.

    Connecting the physical world with pervasive networks

    IEEE Pervasive Computing

    (2002)
  • A.J. Marsh et al.

    Enabling pervasive computing with smart phones

    IEEE Pervasive Computing

    (2005)
  • Institute for Information Industry, ACI-FIND, Focus on Internet News and Data, 2004, Available from:...
  • Directorate General of Telecommunicate, Analysis of Mobile Phone Subscribers in 2003, 2003, Available from:...
  • X. Jin, G. Zhu, Research on realization scheme of interactive voice response (IVR) system, Proceedings of SPIE—The...
  • R. Dettmer

    It's good to talk [speech technology for on-line services access]

    IEE Review

    (2003)
  • J.A. Larson

    VoiceXML and the W3C speech interface framework

    IEEE Multimedia

    (2003)
  • J.A. Larson

    VoiceXML 2.0 and the W3C speech interface framework,

    Proceedings of ASRU '01, IEEE Workshop on Automatic Speech Recognition and Understanding

    (2001)
  • M.D. Collier

    Current threats to and technical solutions for voice security,

    Proceedings of the Aerospace Conference

    (2002)
  • A. Rodriguez, W.-K. Ho, G. Kempny, M. Pedreschi, N. Richards, IBM WebSphere Voice Server 2.0 Implementation Guide, IBM...
  • S. Adler, A. Berglund, J. Caruso, Extensible Stylesheet Language (XSL), Version 1.0, W3C Recommendation,...
  • E. Burke, JAVA & XSLT, O'Reilly, CA,...
  • Sun Microsystems, Information on J2ME, 2004, Available from:...
  • Sun Microsystems, Information on J2SE, 2004, Available from:...
  • IBM, Using JCE in J2ME environment, 2004, Available from:...
  • M. Bellare et al.

    Breaking and provably repairing the SSH authenticated encryption scheme: a case study of the encode-then-encrypt-and-MAC paradigm

    ACM Transactions on Information and System Security

    (2004)
  • C. Seibert et al.

    Assessing the user-perceived quality of packet voice in networks with mobile users,

    Proceedings of the sixth ACM International Workshop on Modeling Analysis and Simulation of Wireless and Mobile Systems

    (2003)
  • Cited by (0)

    Shuchih Ernest Chang received his MSCS and PhD degrees, both from the University of Texas at Austin, in 1987 and 1994, respectively. Before becoming an assistant professor at the Institute of Electronic Commerce, National Chung Hsing University, Taiwan, he worked at UBS Financial Services Inc. in USA as a divisional vice president for about 5 years. He has about 15 years of working experience in various major computer and financial service firms in USA, including: Unisys, IBM, Sun Microsystems, JP Morgan, Bear Stearns, and UBS. His research interests lie in Internet technologies, e-Commerce, enterprise application architecture, information security management, and voice-enabled Web applications. Dr Chang is a member of IEEE and IEEE Communication Society, and he can be reached at [email protected].

    Boris Minkin is currently a divisional vice president at UBS Financial Services Inc. Boris has more than 12 years of experience working in the areas of information technology and financial services. In his professional career, Boris has successfully completed numerous IT projects, and become an IBM Certified WSAD Enterprise Developer, IBM Certified WebSphere Solution Developer, IBM Certified WebSphere Systems Expert, and Sun Certified Java Programmer. In addition to a BS degree in Electrical Engineering and Computer Science, received from St Petersburg Marine Technical University, Boris is currently pursuing his master degree at Stevens Institute of Technology in NJ, USA. His professional interests are in Internet technology, enterprise application architecture, Java technology, multiplatform application, object oriented methodology & application, network management, and relational database design. Mr Minkin can be reached at [email protected].

    View full text