Elsevier

Astronomy and Computing

Volume 41, October 2022, 100663
Astronomy and Computing

Full length article
Multi-dimensional time-series subsequence clustering for visual feature analysis of blazar observation datasets

https://doi.org/10.1016/j.ascom.2022.100663Get rights and content

Highlights

  • Clustering methods for multi-dimensional time-dependent blazar observation datasets.

  • A novel framework for interactive visual clustering analysis of blazar datasets.

  • Evaluations of our clustering methods and visual analysis framework through two practical case studies by a blazar research expert.

Abstract

Exploring hidden structures in subsequences extracted from long-term time-series data is one of the primary tasks of time-series data analysis. Clustering is one of the most commonly used techniques in that context; however, various existing issues must be addressed, such as the way to extract less-overlapping subsequences and the definition of the inter-subsequence similarity. Especially in multi-dimensional data analysis, correlations among variables should also be emphasized. To boost users’ exploration of the universalities of subsequences, we incorporate multi-dimensional time-series subsequence clustering methods and visual clustering analysis interface into TimeTubesX, which is an integrated visual analytics environment for multi-dimensional time-dependent observation datasets of blazars. TimeTubesX extracts and filters subsequences with various lengths according to the characteristics of the data and clusters them automatically in consideration of correlations among observed attributes. And then, it allows users to visually examine the clustering results in terms of the cluster features, intercluster transitions, and temporal distributions of clusters. Through the application to two practical case studies, we demonstrate how the enhanced TimeTubesX enables users to see not only instances but also universalities (i.e., time-series motifs or cluster prototypes) in time-series observations of blazars.

Introduction

Blazars are the brightest and most energetic objects in the universe. To demystify the magnetic field’s physics inside a relativistic jet ejected from a central black hole of a blazar, blazar researchers observe the light from a blazar. Blazar researchers of Hiroshima Astrophysical Science Center (HASC) scrutinize optical photo-polarimetric and near-infrared observation datasets to identify characteristic blazar behaviors, such as light bursts (i.e., flares) and rotated polarization (i.e., rotation); to explore recurring time variation patterns; and to validate hypotheses. TimeTubesX (Sawada et al., 2022) is an integrated visual analytics environment for multi-dimensional time-dependent blazar datasets. Its visual encoding, together with feature extraction driven by users’ experiences and expertise, realizes an efficient and detailed analysis of instances in long-term observation datasets. Nevertheless, it cannot sufficiently help users identify hidden universalities (“the fact of being true or right at all times and in all places”, Oxford University Press (2022)) in blazar datasets.

Exploring features underlying subsequences from long-term time series is a major challenge of time-series data analysis. Clustering is one of the useful approaches for such a task because it partially realizes data mining and feature exploration without users’ intervention. The sliding window method is the most commonly used to extract subsequences from a long time series. However, according to Keogh and Lin (2005), time-series subsequence clustering via the sliding window method is meaningless, because overlapping subsequences produced by the method will prevent the reasonable subsequence classification. It suggests that the meaningfulness of clustering results strongly depends on the way subsequences are extracted, as well as the clustering methods. Moreover, especially when dealing with multi-dimensional time-dependent data in clustering, the definition of a cluster prototype (i.e., a time-series motif Lin et al., 2002) and the similarity measurement between subsequences and cluster prototypes are also challenging because correlations among variables should be considered.

To make it possible for blazar researchers to examine universalities in blazar datasets, we integrated into TimeTubesX, multi-dimensional time-series subsequence clustering methods and a novel visual analysis framework for the clustering results. The clustering methods extract subsequences with various lengths from a long-term observation dataset considering missing data and observation frequencies, and then filter overlapping subsequences concerning subsequence features. The clustering methods consider correlations among variables and compute means of subsequences without smoothing out their features. The visual analysis framework of TimeTubesX helps users intuitively interpret and evaluate the clustering results. It also allows users to fine-tune the clustering process to obtain a more reasonable clustering result. Consequently, the latest version of TimeTubesX realizes not only the instance exploration but also the universality exploration of blazar datasets.

The contributions of this paper are three-fold:

  • Clustering methods for multi-dimensional time-dependent blazar observation datasets;

  • A novel framework for interactive visual clustering analysis of blazar datasets;

  • Evaluations of our clustering methods and visual analysis framework through two practical case studies by our domain expert.

Section snippets

Related work

This section reviews prior research attempts at clustering time-series data and visual analysis environments for clustering results.

Design of TimeTubesX

This section describes the blazar observation datasets, TimeTubesX’s visual encoding for blazar datasets, and the main visualization panel of TimeTubesX.

Domain analysis

This section overviews domain goals for blazar data analysis, requirements for clustering methods, and blazar researchers’ tasks for the clustering result analysis.

Clustering methods

This section is devoted to explaining the clustering methods for multi-dimensional time-dependent blazar datasets. The clustering process of TimeTubesX is divided into five steps: tweaking parameters; extracting subsequences; filtering subsequences; preprocessing; and clustering. In the first step, users decide arbitrary variables to be considered, subsequence length range, filtering options for extracted subsequences, whether to normalize each subsequence, clustering method, number of

Visual clustering analysis interface

This section presents the visual analysis interface for the clustering result. TimeTubesX provides seven kinds of views for users to examine and evaluate a clustering result visually. Fig. 4 shows a collection of the federated views for visual clustering analysis. The cluster feature view in (a) and clustering evaluation view in (b) gives an overview of a clustering result (Section 6.1). Meanwhile, the timelines view in (c) and cluster stripes mapped on the period selector in Fig. 2(3) show the

Case studies

To demonstrate the effectiveness of the present clustering methods and visual clustering analysis functions of TimeTubesX, this section briefly introduces two case studies that had been conducted by our domain expert (the second author).

Discussions

Fig. 8 illustrates the human-in-the-loop process of the latest TimeTubesX, which is based on the visualization discovery process by Johnson et al. (2006). Compared with the original visualization discovery process, TimeTubesX’s visual analytics framework now has an independent analyzer to sophisticate blazar researchers’ data analysis. A single/multiple input dataset(s) is/are transmitted to the analyzer. If needed, datasets for the same blazar from different sources are merged into a single

Conclusion and future work

Our clustering methods and visual clustering analysis framework help blazar researchers find universalities in multi-dimensional time-dependent blazar datasets. The feature analysis with the proposed methods suggested that there seem to be regularities in polarizations varying seemingly at random that cannot be identified only through instance analysis in time series. Therefore, the latest TimeTubesX allows users to see the entire forest, as well as individual trees. Readers can try the running

CRediT authorship contribution statement

N. Sawada: Conceptualization, Methodology, Software, Writing – original draft, Visualization. M. Uemura: Formal analysis, Data curation. I. Fujishiro: Supervision, Conceptualization, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The present work has been financially supported in part by a MEXT KAKENHI Grant-in-Aid for Scientific Research(A), Japan No. JP21H04916.

References (49)

  • BjornssonC.I.

    Polarization properties of a source in relativistic motion

    Astrophs. J.

    (1982)
  • CalinskiT. et al.

    A dendrite method for cluster analysis

    Commun. Stat.

    (1974)
  • CaoN. et al.

    DICON: Interactive visual analysis of multidimensional clusters

    IEEE Trans. Vis. Comput. Graphics

    (2011)
  • CavalloM. et al.

    Clustrophile 2: Guided visual clustering analysis

    IEEE Trans. Vis. Comput. Graphics

    (2019)
  • ChenJ.R.

    Making clustering in delay-vector space meaningful

    Knowl. Inf. Syst.

    (2007)
  • ChenY. et al.

    Sequence synopsis: Optimize visual summary of temporal event data

    IEEE Trans. Vis. Comput. Graphics

    (2018)
  • DafasP.A. et al.

    Applied Temporal Rule Mining to Time SeriesTechnical Report

    (2005)
  • DaviesD.L. et al.

    A cluster separation measure

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1979)
  • FerstlF. et al.

    Time-hierarchical clustering and visualization of weather forecast ensembles

    IEEE Trans. Vis. Comput. Graphics

    (2017)
  • FujishiroI. et al.

    TimeTubes: Visual exploration of observed blazar datasets

    J. Phys.: Conf. Ser.

    (2018)
  • Ismail FawazH. et al.

    Deep learning for time series classification: A review

    Data Min. Knowl. Discov.

    (2019)
  • JangS. et al.

    MotionFlow: Visual abstraction and aggregation of sequential patterns in human motion tracking data

    IEEE Trans. Vis. Comput. Graphics

    (2016)
  • JohnsonC. et al.

    NIH-NSF visualization research challenges report

    (2006)
  • KammerD. et al.

    Glyphboard: Visual exploration of high-dimensional data combining glyphs with dimensionality reduction

    IEEE Trans. Vis. Comput. Graphics

    (2020)
  • Cited by (1)

    View full text