Using mathematical expressions in a statistical language

https://doi.org/10.1016/j.csda.2006.10.022Get rights and content

Abstract

Statistical models consist of mathematical expressions and algorithms. In a statistical system, such models are described in a statistical language. Therefore, a statistical language is useful if it can handle mathematical equations directly and connect them with algorithms seamlessly. This ability enables users to write and read programs intuitively and easily. The statistical language having this ability has been implemented in a statistical system Jasp using Java and MathML technologies.

Introduction

We often have to construct new statistical models that are appropriate for the given data by using a particular statistical language of a system in order to perform successful statistical analyses. All statistical languages have been developed to make such tasks as easy as possible by adopting available computer technologies. For example, Chambers and Hastie (1991) proposed an advanced statistical language for expressing complicated statistical models.

Many statistical models are represented using mathematical expressions that are written using special characters such as Greek letters, summation notations, and differential notations. Almost all the statistical languages in statistical systems are character-based languages and cannot directly handle mathematical expressions. Thus, we have to convert mathematical expressions into the commands of the statistical system that is being used.

There are some software products in which mathematical expressions can be directly used. Many of them focus on mathematical operations such as symbolic and numerical computations. Mathematica (Wolfram, 2003) is the most popular and comprehensive software among them. This is a powerful software for performing mathematical operations and seamlessly integrates a numerical and symbolic computational engine, graphics system, programming language, documentation system, and advanced connectivity to other applications. It can use both mathematical expressions and programming language expressions. In addition, there are some statistical libraries and programs for Mathematica. Abell et al. (1998) described many examples of introductory statistical analyses. MathStatica (Rose and Smith, 2002) is an add-on package for Mathematica and presents a unified approach for carrying out mathematical statistics such as multivariate distributions, generating functions and symbolic maximum likelihood estimations.

This paper describes the importance of mathematical expressions in a statistical language. Mathematical expressions together with the functions which have already been included in a system are useful to write programs for new statistical functions.

In particular, we consider realizing the ability of handling mathematical expressions directly in a statistical system Jasp (Nakano et al., 2000). This is an experimental statistical system that has been developed by adopting Java technologies. It has several new features such as an original statistical language (Kobayashi et al., 2002), a user interface (Yamamoto et al., 2002), extensibility (Kobayashi et al., 2003), and distributed computing functions (Yamamoto et al., 2004).

We add features for handling mathematical expressions using the existing features. A user can directly write programs that involve mathematical expressions by using an enhanced Jasp editor. Furthermore, the features of the Jasp language allow the user to write a program with less programming work because this language has the characteristics of both procedural and object-oriented languages.

We note that the Jasp mathematical editor is based on the semantics of mathematical expressions. On the contrary, for example, the Mathematica user interface is based on the form of mathematical expressions. Thus, mathematical expressions edited in the Mathematica user interface cannot always be evaluated without additional information. In our implementation, the Mathematical Markup Language (MathML) (The World Wide Web Consortium, 2001) is used for handling mathematical expressions. We also improve the Java mathematical editor Jex (Levine, 2005) to handle the semantics of mathematical expressions.

In the next section, we discuss mathematical expressions in statistical models. The advantages of using mathematical expressions directly in a statistical language are considered in Section 3. Section 4 explains the implementation details of the features for handling mathematical expressions in Jasp. In Section 5, we state some remarks.

Section snippets

Statistical models and their descriptions

Almost all statistical models use mathematical expressions to describe them. Traditional models use rather simple mathematical expressions to define and estimate these models. Recent models, however, use algorithms in addition to mathematical expressions to describe and solve them.

An example of traditional models is a linear regression model. The regression coefficients are estimated by a simple matrix calculationβ^=(XX)-1Xyfor many cases (except those where the inverse calculation is

Language design for using mathematical expressions

It is clear that using mathematical expressions directly in a statistical system has significant advantages. However, the design details of the language and user interface for this purpose are not clear.

Implementation for handling mathematical expressions in Jasp

We modified and extended the existing Jasp system to realize features for handling mathematical expressions. Our implementation consisted of three components: a language for describing mathematical expressions, a mathematical editor, and a translator to translate the mathematical expressions into Jasp commands.

Concluding remarks

In this paper, we describe the modification and extension of the Jasp system to directly use mathematical expressions in a statistical language. By this feature, users can obtain advantages such as the easy description and readability of statistical models and the reusability of programs in other related softwares.

We note that our approach is to extend the existing statistical system. Therefore, added abilities for handling mathematical expressions can be integrated with other statistical

References (24)

  • M.L. Abell et al.

    Statistics with Mathematica

    (1998)
  • J.M. Chambers et al.

    Statistical Models in S

    (1991)
  • Design Science, 2005. MathType 5.2....
  • T. Fujiwara et al.

    An implementation of a statistical language based on Java

    J. Japanese Soc. Comput. Statist.

    (2001)
  • R. Gentleman et al.

    Lexical scope and statistical computing

    Graphical Statist.

    (2000)
  • D.E. Knuth

    The TeXbook

    (1984)
  • I. Kobayashi et al.

    A procedural and object-oriented statistical scripting language

    Comput. Statist.

    (2002)
  • I. Kobayashi et al.

    Extensibilities of a Java-based statistical system

    J. Japanese Soc. Comput. Statist.

    (2003)
  • Levine, D.K., 2005. Jex—a Java equation editor for openoffice 2.0,...
  • Maplesoft, 2005. Maple 10....
  • P. McCullagh et al.

    Generalized Linear Models

    (1989)
  • Mozilla Foundation, 2005. Mozilla 1.7.12....
  • Cited by (0)

    View full text