Predictive approaches for the UNIX command line: curating and exploiting domain knowledge in semantics deficit data

Singh, Thoudam Doren; Khilji, Abdullah Faiz Ur Rahman; Divyansha; Singh, Apoorva Vikram; Thokchom, Surmila; Bandyopadhyay, Sivaji

doi:10.1007/s11042-020-10109-y

Predictive approaches for the UNIX command line: curating and exploiting domain knowledge in semantics deficit data

Published: 09 November 2020

Volume 80, pages 9209–9229, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

795 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

The command line has always been the most efficient method to interact with UNIX flavor based systems while offering a great deal of flexibility and efficiency as preferred by professionals. Such a system is based on manually inputting commands to instruct the computing machine to carry out tasks as desired. This human-computer interface is quite tedious especially for a beginner. And hence, the command line has not been able to garner an overwhelming reception from new users. Therefore, to improve user-friendliness and to mark a step towards a more intuitive command line system, we propose two predictive approaches that can benefit all kinds of users specially the novice ones by integrating into the command line interface. These methods are based on deep learning based predictions. The first approach is based on the sequence to sequence (Seq2seq) model with joint learning by leveraging continuous representations of a self-curated exhaustive knowledge base (KB) comprising an all-inclusive command description to enhance the embedding employed in the model. The other is based on the attention-based transformer architecture where a pretrained model is employed. This allows the model to dynamically evolve over time making it adaptable to different circumstances by learning as the system is being used. To reinforce our idea, we have experimented with our models on three major publicly available Unix command line datasets and have achieved benchmark results using GLoVe and Word2Vec embeddings. Our finding is that the transformer based framework performs better on two different datasets of the three in our experiment in a semantic deficit scenario like UNIX command line prediction. However, Seq2seq based model outperforms bidirectional encoder representations from transformers (BERT) based model on a larger dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Explainable AI: A Brief Survey on History, Research Areas, Approaches and Challenges

Prompt Engineering in Large Language Models

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

Notes

http://saul.cpsc.ucalgary.ca/pmwiki.php/HCIResources/HCIWWWUnixDataSets
https://linux.die.net/man/
http://www.schonlau.net/intrusion.html
https://archive.ics.uci.edu/ml/datasets/UNIX+User+Data
worst case
Target word is the word for which we want to learn.
Context word is the one that co-occurs with it in some contextual window
c_i, d denotes the word embedding (vector) of the word \(\tilde c_{j}\), and the dimensionality (a user-specified hyperparameter) respectively.
The semantic relations that exists between the two entities within the KB used in the context.
For instance, a context command co-occurring 4 tokens from a target command would impart to a count of \(\frac {1}{4}\) to co-occurrence.

References

Alsuhaibani M, Bollegala D, Maehara T, Kawarabayashi K (2018) Jointly learning word embeddings using a corpus and a knowledge base, vol 13
Daee P, Peltola T, Soare M, Kaski S (2017) Knowledge elicitation via sequential probabilistic inference for high-dimensional prediction. Mach. Learn. 106(9-10):1599–1620
Article MathSciNet Google Scholar
Davison BD, Hirsh H (1997) Experiments in unix command prediction. In: AAAI/IAAI, p 827
Davison BD, Hirsh H (1997) Toward an adaptive command line interface. In: HCI (2), pp 505–508
Davison BD, Hirsh H (1998) Predicting sequences of user actions. In: Notes of the AAAI/ICML 1998 workshop on predicting the future: AI approaches to time-series analysis, pp 5–12
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert:, Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12 (Jul):2121–2159
MathSciNet MATH Google Scholar
Durant KT, Smith MD (2002) Predicting unix commands using decision tables and decision trees. In: WIT transactions on information and communication technologies, p 28
Goldberg Y, Levy O (2014) Word2vec explained:, deriving mikolov others.’s negative-sampling word-embedding method. arXiv:1402.3722
Greenberg S (1988) Using unix: Collected traces of 168 users
Heimerl F, Lohmann S, Lange S, Ertl T (2014) Word cloud explorer: Text analytics based on word clouds. In: System sciences (HICSS), 2014 47th Hawaii international conference on. IEEE, pp 1833–1842
Hendrycks D, Gimpel K (2016) Bridging nonlinearities and stochastic regularizers with gaussian error linear units. 1
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Jacobs N (2000) The learning shell. In: Adaptive user interfaces, papers from the 2000 AAAI spring symposium, pp 50–53
Jernite Y, Bowman SR, Sontag D (2017) Discourse-based objectives for fast unsupervised sentence representation learning. arXiv:1705.00557
Korvemaker B, Greiner R (2000) Predicting unix command lines: adjusting to user patterns. In: AAAI/IAAI, pp 230–235
Lane T, Brodley CE (1997) An application of machine learning to anomaly detection. In: Proceedings of the 20th national information systems security conference. Baltimore, USA, vol 377, pp 366–380
Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225
Article Google Scholar
Lin XV, Wang C, Zettlemoyer L, Ernst MD (2018) Nl2bash:, A corpus and semantic parser for natural language interface to the linux operating system. arXiv:1802.08979
Logeswaran L, Lee H (2018) An efficient framework for learning sentence representations. arXiv:1803.02893
Mikolov T, Karafiát M., Burget L, Černockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: Eleventh annual conference of the international speech communication association
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Rodriguez JD, Perez A, Lozano JA (2010) Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32(3):569–575
Article Google Scholar
Schonlau M, DuMouchel W, Ju WH, Karr AF, Theus M, Vardi Y (2001) Computer intrusion: Detecting masquerades. Stat Sci 58–74
Shirai K, Sornlertlamvanich V, Marukata S et al (2016) Recurrent neural network with word embedding for complaint classification. In: Proceedings of the third international workshop on worldwide language service infrastructure and second workshop on open infrastructures and analysis frameworks for human language technologies (WLSI/OIAF4HLT2016), pp 36–43
Taylor WL (1953) “Cloze procedure”: A new tool for measuring readability. Journal Q 30(4):415–433
Article Google Scholar
Yoshida K (1994) User command prediction by graph-based induction. In: Tools with artificial intelligence, 1994. Proceedings., sixth international conference on. IEEE, pp 732–735

Download references

Author information

Authors and Affiliations

Centre for Natural Language Processing (CNLP) and Department of Computer Science and Engineering, National Institute of Technology Silchar, Assam, India
Thoudam Doren Singh, Abdullah Faiz Ur Rahman Khilji, Divyansha & Sivaji Bandyopadhyay
Department of Electrical Engineering, National Institute of Technology Silchar, Assam, India
Apoorva Vikram Singh
Department of Computer Science and Engineering, National Institute of Technology Meghalaya, Meghalaya, India
Surmila Thokchom

Authors

Thoudam Doren Singh
View author publications
You can also search for this author in PubMed Google Scholar
Abdullah Faiz Ur Rahman Khilji
View author publications
You can also search for this author in PubMed Google Scholar
Divyansha
View author publications
You can also search for this author in PubMed Google Scholar
Apoorva Vikram Singh
View author publications
You can also search for this author in PubMed Google Scholar
Surmila Thokchom
View author publications
You can also search for this author in PubMed Google Scholar
Sivaji Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thoudam Doren Singh.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, T.D., Khilji, A.F.U.R., Divyansha et al. Predictive approaches for the UNIX command line: curating and exploiting domain knowledge in semantics deficit data. Multimed Tools Appl 80, 9209–9229 (2021). https://doi.org/10.1007/s11042-020-10109-y

Download citation

Received: 29 April 2020
Revised: 01 September 2020
Accepted: 19 October 2020
Published: 09 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10109-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predictive approaches for the UNIX command line: curating and exploiting domain knowledge in semantics deficit data

Abstract

Access this article

Similar content being viewed by others

Explainable AI: A Brief Survey on History, Research Areas, Approaches and Challenges

Prompt Engineering in Large Language Models

A survey on deep learning approaches for text-to-SQL

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predictive approaches for the UNIX command line: curating and exploiting domain knowledge in semantics deficit data

Abstract

Access this article

Similar content being viewed by others

Explainable AI: A Brief Survey on History, Research Areas, Approaches and Challenges

Prompt Engineering in Large Language Models

A survey on deep learning approaches for text-to-SQL

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation