Skip to main content
Log in

Online Induction of Probabilistic Real-Time Automata

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The probabilistic real-time automaton (PRTA) is a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix tree acceptor (PTA) and the merge procedure based on clustering of the states. These two steps can be very time intensive when a PRTA is to be induced for massive or even unbounded datasets. The latter one can be efficiently processed, as there exist scalable online clustering algorithms. However, the creation of the PTA still can be very time consuming. To overcome this problem, we propose a genuine online PRTA induction approach that incorporates new instances by first collapsing them and then using a maximum frequent pattern based clustering. The approach is tested against a predefined synthetic automaton and real world datasets, for which the approach is scalable and stable. Moreover, we present a broad evaluation on a real world disease group dataset that shows the applicability of such a model to the analysis of medical processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Patnaik D, Butler P, Ramakrishnan N, Parida L, Keller B J, Hanauer D A. Experiences with mining temporal event sequences from electronic medical records: Initial successes and some challenges. In Proc. the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2011, pp.360-368.

  2. Verwer S, De Weerdt M, Witteveen C. A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data. In Lecture Notes in Computer Science 6339, Sempere J M, Garcia P (eds.), 2010, pp.203-216.

  3. Verwer S, De Weerdt M, Witteveen C. The efficiency of identifying timed automata and the power of clocks. Information and Computation, 2011, 209(3): 606-625.

    Article  MATH  MathSciNet  Google Scholar 

  4. Peng H K, Wu P, Zhu J, Zhang J Y. Helix: Unsupervised grammar induction for structured activity recognition. In Proc. the 11th IEEE International Conference on Data Mining, December 2011, pp.1194-1199.

  5. Han J, Kamber M. Data Mining: Concepts and Techniques. San Francisco, CA, USA: Morgan Kaufmann Publisher, March 2006.

  6. Verwer S, de Weerdt M, Witteveen C. Efficiently identifying deterministic real-time automata from labeled data. Machine Learning, 2012, 86(3): 295–333.

    Article  MATH  MathSciNet  Google Scholar 

  7. Schmidt J, Ansorge S, Kramer S. Scalable induction of probabilistic real-time automata using maximum frequent pattern based clustering. In Proc. the 12th SIAM International Conference on Data Mining, April 2012, pp.272-283.

  8. Džeroski S, Gjorgjioski V, Slavkov I, Struyf J. Analysis of time series data with predictive clustering trees. In Proc. the 5th International Conference on Knowledge Discovery in Inductive Databases, September 2006, pp.63-80.

  9. Sese J, Kurokawa Y, Monden M, Kato K, Morishita S. Constrained clusters of gene expression profiles with pathological features. Bioinformatics, 2004, 20(17): 3137-3145.

    Article  Google Scholar 

  10. Blachon S, Pensa R, Besson J, Robardet C, Boulicaut J F, Gandrillon O. Clustering formal concepts to discover biologically relevant knowledge from gene expression data. In Silico Biology, 2007, 7(4/5): 467-483.

  11. Cerf L, Besson J, Robardet C, Boulicaut J F. Closed patterns meet n-ary relations. ACM Transactions on Knowledge Discovery from Data, 2009, 3(1): Article No.3.

  12. Achar A, Laxman S, Sastry P S. A unified view of the apriori-based algorithms for frequent episode discovery. Knowledge and Information Systems, 2012, 31(2): 223-250.

    Article  Google Scholar 

  13. Schmidt J, Kramer S. The augmented itemset tree: A data structure for online maximum frequent pattern mining. In Proc. the 14th International Conference on Discovery Science, October 2011, pp.277-291.

  14. Wang C, Lai J, Zhu J. Conscience online learning: An efficient approach for robust kernel-based clustering. Knowledge and Information Systems, 2012, 31(1): 79–104.

    Article  Google Scholar 

  15. Masud M M, Al-Khateeb T, Khan L, Aggarwal C, Gao J, Han J, Thuraisingham B M. Detecting recurring and novel classes in concept-drifting data streams. In Proc. the 11th IEEE International Conference on Data Mining, Dec. 2011, pp.1176-1181.

  16. Hommerson A, Verwer S, Lucas P. Discovering probabilistic structures of healthcare. In Lecture Notes in Computer Science 8268, Riaño D, Lenz R, Miksch S et al. (eds.), Springer-Verlag, 2013, pp.53-67.

  17. Rowicka M, Kudlicki A, Tu B P, Otwinowski Z. High-resolution timing of cell cycle-regulated gene expression. Proceedings of the National Academy of Sciences of the United States of America, 2007, 104(43): 16892-16897.

    Article  Google Scholar 

  18. Hubert L, Arabie P. Comparing partitions. Journal of Classification, 1985, 1(2): 193-218.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jana Schmidt.

Additional information

A preliminary version of the paper was published in the Proceedings of ICDM 2012.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 81 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schmidt, J., Kramer, S. Online Induction of Probabilistic Real-Time Automata. J. Comput. Sci. Technol. 29, 345–360 (2014). https://doi.org/10.1007/s11390-014-1435-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1435-8

Keywords

Navigation