Full length articleMulti-dimensional time-series subsequence clustering for visual feature analysis of blazar observation datasets
Graphical abstract
Introduction
Blazars are the brightest and most energetic objects in the universe. To demystify the magnetic field’s physics inside a relativistic jet ejected from a central black hole of a blazar, blazar researchers observe the light from a blazar. Blazar researchers of Hiroshima Astrophysical Science Center (HASC) scrutinize optical photo-polarimetric and near-infrared observation datasets to identify characteristic blazar behaviors, such as light bursts (i.e., flares) and rotated polarization (i.e., rotation); to explore recurring time variation patterns; and to validate hypotheses. TimeTubesX (Sawada et al., 2022) is an integrated visual analytics environment for multi-dimensional time-dependent blazar datasets. Its visual encoding, together with feature extraction driven by users’ experiences and expertise, realizes an efficient and detailed analysis of instances in long-term observation datasets. Nevertheless, it cannot sufficiently help users identify hidden universalities (“the fact of being true or right at all times and in all places”, Oxford University Press (2022)) in blazar datasets.
Exploring features underlying subsequences from long-term time series is a major challenge of time-series data analysis. Clustering is one of the useful approaches for such a task because it partially realizes data mining and feature exploration without users’ intervention. The sliding window method is the most commonly used to extract subsequences from a long time series. However, according to Keogh and Lin (2005), time-series subsequence clustering via the sliding window method is meaningless, because overlapping subsequences produced by the method will prevent the reasonable subsequence classification. It suggests that the meaningfulness of clustering results strongly depends on the way subsequences are extracted, as well as the clustering methods. Moreover, especially when dealing with multi-dimensional time-dependent data in clustering, the definition of a cluster prototype (i.e., a time-series motif Lin et al., 2002) and the similarity measurement between subsequences and cluster prototypes are also challenging because correlations among variables should be considered.
To make it possible for blazar researchers to examine universalities in blazar datasets, we integrated into TimeTubesX, multi-dimensional time-series subsequence clustering methods and a novel visual analysis framework for the clustering results. The clustering methods extract subsequences with various lengths from a long-term observation dataset considering missing data and observation frequencies, and then filter overlapping subsequences concerning subsequence features. The clustering methods consider correlations among variables and compute means of subsequences without smoothing out their features. The visual analysis framework of TimeTubesX helps users intuitively interpret and evaluate the clustering results. It also allows users to fine-tune the clustering process to obtain a more reasonable clustering result. Consequently, the latest version of TimeTubesX realizes not only the instance exploration but also the universality exploration of blazar datasets.
The contributions of this paper are three-fold:
- •
Clustering methods for multi-dimensional time-dependent blazar observation datasets;
- •
A novel framework for interactive visual clustering analysis of blazar datasets;
- •
Evaluations of our clustering methods and visual analysis framework through two practical case studies by our domain expert.
Section snippets
Related work
This section reviews prior research attempts at clustering time-series data and visual analysis environments for clustering results.
Design of TimeTubesX
This section describes the blazar observation datasets, TimeTubesX’s visual encoding for blazar datasets, and the main visualization panel of TimeTubesX.
Domain analysis
This section overviews domain goals for blazar data analysis, requirements for clustering methods, and blazar researchers’ tasks for the clustering result analysis.
Clustering methods
This section is devoted to explaining the clustering methods for multi-dimensional time-dependent blazar datasets. The clustering process of TimeTubesX is divided into five steps: tweaking parameters; extracting subsequences; filtering subsequences; preprocessing; and clustering. In the first step, users decide arbitrary variables to be considered, subsequence length range, filtering options for extracted subsequences, whether to normalize each subsequence, clustering method, number of
Visual clustering analysis interface
This section presents the visual analysis interface for the clustering result. TimeTubesX provides seven kinds of views for users to examine and evaluate a clustering result visually. Fig. 4 shows a collection of the federated views for visual clustering analysis. The cluster feature view in (a) and clustering evaluation view in (b) gives an overview of a clustering result (Section 6.1). Meanwhile, the timelines view in (c) and cluster stripes mapped on the period selector in Fig. 2(3) show the
Case studies
To demonstrate the effectiveness of the present clustering methods and visual clustering analysis functions of TimeTubesX, this section briefly introduces two case studies that had been conducted by our domain expert (the second author).
Discussions
Fig. 8 illustrates the human-in-the-loop process of the latest TimeTubesX, which is based on the visualization discovery process by Johnson et al. (2006). Compared with the original visualization discovery process, TimeTubesX’s visual analytics framework now has an independent analyzer to sophisticate blazar researchers’ data analysis. A single/multiple input dataset(s) is/are transmitted to the analyzer. If needed, datasets for the same blazar from different sources are merged into a single
Conclusion and future work
Our clustering methods and visual clustering analysis framework help blazar researchers find universalities in multi-dimensional time-dependent blazar datasets. The feature analysis with the proposed methods suggested that there seem to be regularities in polarizations varying seemingly at random that cannot be identified only through instance analysis in time series. Therefore, the latest TimeTubesX allows users to see the entire forest, as well as individual trees. Readers can try the running
CRediT authorship contribution statement
N. Sawada: Conceptualization, Methodology, Software, Writing – original draft, Visualization. M. Uemura: Formal analysis, Data curation. I. Fujishiro: Supervision, Conceptualization, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The present work has been financially supported in part by a MEXT KAKENHI Grant-in-Aid for Scientific Research(A), Japan No. JP21H04916.
References (49)
- et al.
Time-series clustering - A decade review
Inf. Syst.
(2015) - et al.
A global averaging method for dynamic time warping, with applications to clustering
Pattern Recognit.
(2011) - et al.
Selective subsequence time series clustering
Knowl.-Based Syst.
(2012) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
J. Comput. Appl. Math.
(1987)- et al.
Unfolding preprocessing for meaningful time series clustering
Neural Netw.
(2006) - et al.
TimeCluster: Dimension reduction applied to temporal data for visual analytics
Vis. Comput.
(2019) - et al.
Deep time-series clustering: A review
Electron.
(2021) - et al.
Revealing patterns and trends of mass mobility through spatial and temporal abstraction of origin-destination movement data
IEEE Trans. Vis. Comput. Graphics
(2017) - et al.
Cupid: Cluster-based exploration of geometry generators with parallel coordinates and radial trees
IEEE Trans. Vis. Comput. Graphics
(2014) - et al.
Parallel sets: Visual analysis of categorical data
Polarization properties of a source in relativistic motion
Astrophs. J.
A dendrite method for cluster analysis
Commun. Stat.
DICON: Interactive visual analysis of multidimensional clusters
IEEE Trans. Vis. Comput. Graphics
Clustrophile 2: Guided visual clustering analysis
IEEE Trans. Vis. Comput. Graphics
Making clustering in delay-vector space meaningful
Knowl. Inf. Syst.
Sequence synopsis: Optimize visual summary of temporal event data
IEEE Trans. Vis. Comput. Graphics
Applied Temporal Rule Mining to Time SeriesTechnical Report
A cluster separation measure
IEEE Trans. Pattern Anal. Mach. Intell.
Time-hierarchical clustering and visualization of weather forecast ensembles
IEEE Trans. Vis. Comput. Graphics
TimeTubes: Visual exploration of observed blazar datasets
J. Phys.: Conf. Ser.
Deep learning for time series classification: A review
Data Min. Knowl. Discov.
MotionFlow: Visual abstraction and aggregation of sequential patterns in human motion tracking data
IEEE Trans. Vis. Comput. Graphics
NIH-NSF visualization research challenges report
Glyphboard: Visual exploration of high-dimensional data combining glyphs with dimensionality reduction
IEEE Trans. Vis. Comput. Graphics
Cited by (1)
Clustering, Universalities, and Evolutionary Schema Design
2023, CEUR Workshop Proceedings