Abstract
Deep neural networks are naturally “black boxes”, offering little insight into how or why they make decisions. These limitations diminish the adoption likelihood of such systems for important tasks and as trusted teammates. We design and employ an introspective method to abstract neural activation patterns into human-interpretable strategies and identify relationships between environmental conditions (why), strategies (how), and performance (result) on a deep reinforcement learning two-dimensional pursuit game application. For example, we found that activation patterns that were abstracted into “head-on” or “L-shaped” maneuver strategies were successful and intuitively corresponded to favorable initial conditions. Moreover, we characterize machine commitment by the introduction of a novel measure based on analysis of time-series neural activation patterns over the course of a game, and reveal significant correlations between machine commitment and performance. By uncovering temporally-dependent machine “thought processes” and commitment through introspection, we contribute to the larger explainable artificial intelligence initiative, increasing transparency and trust in machine learning systems.
Similar content being viewed by others
Data availability
The authors confirm that the data supporting the findings of this research are available within the article and its supplementary materials.
References
Barredo Arrieta A, et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion. 2020;58:82–115.
Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B. Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci U S A. 2019;116(44):22071–80. https://doi.org/10.1073/pnas.1900654116.
E. Schmidt et al., “Final Report, Chapter 7: Establishing Justified Confidence in AI Systems,” 2021.
Google, “Explainable AI,” 2021. https://cloud.google.com/explainable-ai.
I.B.M., “Explainable AI,” 2021. https://www.ibm.com/watson/explainable-ai.
Krishnamurthy P, Khorrami F, Schmidt S, Wright K. Machine learning for NetFlow anomaly detection with human-readable annotations. IEEE Trans Netw Serv Manag. 2021;18(2):1885–98. https://doi.org/10.1109/TNSM.2021.3075656.
Schmidt S, Stankowicz J, Carmack J, Kuzdeba S. RiftNeXt(TM): Explainable Deepn Neural RF Scene Classification. 2021.
Sundararajan M, Najmi A. The many shapley values for model explanation. 37th Int Conf Mach Learn ICML. 2020;16814:9210–20.
Hilton J, Cammarata N, Carter S, Goh G, Olah C. Understanding RL vision. Distill. 2020. https://doi.org/10.23915/distill.00029.
Schubert L, Petrov M, Carter S, Cammarata N, Goh G, Olah C. “OpenAI Microscope,” openai.com, 2020. https://microscope.openai.com/about?models.technique=deep_dream.
Booth S, Zhou Y, Shah A, Shah J. Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example. 2020, [Online]. Available: http://arxiv.org/abs/2002.10248.
Vilone G, Longo L. Explainable Artificial Intelligence: a Systematic Review. Prepr. ArXiv, 2020, [Online]. Available: http://arxiv.org/abs/2006.00093.
Bäuerle A, Jönsson D, Ropinski T. Neural Activation Patterns (NAPs): Visual Explainability of Learned Concepts,” 2022, [Online]. Available: http://arxiv.org/abs/2206.10611.
Zahavy T, Ben Zrihem N, Mannor S. Graying the black box: Understanding DQNs. in 33rd International Conference on Machine Learning, ICML 2016; 4: 2809–2822.
Rauber PE, Fadel SG, Falcão AX, Telea AC. Visualizing the hidden activity of artificial neural networks. IEEE Trans Vis Comput Graph. 2017;23(1):101–10. https://doi.org/10.1109/TVCG.2016.2598838.
Jaderberg M, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science. 2019;364:859–65. https://doi.org/10.1126/science.aau6249.
Ali M, Jones ME, Xie X, Williams M. TimeCluster: dimension reduction applied to temporal data for visual analytics. Vis Comput. 2019;35:1013–26.
McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Prepr. ArXiv, 2018, [Online]. Available: http://arxiv.org/abs/1802.03426.
Timothy L, Jonathan H, Alexander P, Nicolas H, Tom E, Yuval T, David S, Daan W (2015) Continuous control with deep reinforcement learning. CoRR
Allen J, Schmidt S, Gabriel SA. Reinforcement learning approach to speed-overmatched pursuit games with uncertain target information. Mil Oper Res Soc J. 2022;27:37–50.
“Defense Advanced Research Projects Agency (DARPA),” 2019. https://www.darpa.mil/program/competency-aware-machine-learning.
Klyubin AS, Polani D, Nehaniv CL. All else being equal be empowered. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Berlin: Springer Berlin Heidelberg; 2005. p. 744–53.
Jung T, Polani D, Stone P. Empowerment for continuous agent-environment systems. Adapt Behav. 2011;19(1):16–39. https://doi.org/10.1177/1059712310392389.
Klyubin AS, Polani D, Nehaniv CL. Keep your options open: an information-based driving principle for sensorimotor systems. PLoS ONE. 2008;3(12):4018. https://doi.org/10.1371/journal.pone.0004018.
Pathak D, Agrawal P, Efros A, Darrell T (2017) Curiosity-driven Exploration by Self-supervised Prediction. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 2017, pp. 488–489. https://doi.org/10.1109/CVPRW.2017.70
Dey S, Huang KW, Beerel PA, Chugg KM (2018) Characterizing sparse connectivity patterns in neural networks. In: 2018 Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 2018, pp. 1–9. https://doi.org/10.1109/ITA.2018.8502950.
Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27(3):379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.
Book TM, Thomas JA. Elements of information theory. Hoboken: Wiley; 1991.
Kruskal WH. Historical notes on the wilcoxon unpaired two-sample test. J Am Stat Assoc. 1957;52(279):356–60. https://doi.org/10.1080/01621459.1957.10501395.
Acknowledgements
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001119S0030. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA. The authors extend gratitude to BAE Systems FAST Labs™ for supporting their publication.
Funding
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001119S0030.
Author information
Authors and Affiliations
Contributions
JA conceptualized and designed the methods. JA and SS implemented methods and created visualizations. JA analyzed results and wrote the main manuscript text. JA performed project administration and secured funding. SG served as advisor. SG and SS made contributions to text via review and revision. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Ethical Approval
All principles of ethical and professional conduct have been followed. No human and/or animal research was conducted as part of this submission.
Consent for Publication
BAE Systems has approved this research for public release, unlimited distribution. Not export controlled per ES-FL-051121-0060.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
42979_2023_1747_MOESM1_ESM.zip
Supplementary file1 Data and the trained DDPG models (Actor_ddpg, Critic_ddpg, TargetActor_ddpg, and TargetCritic_ddpg) are available in a separate.zip file (“Supplementary Data.zip”). The data needed to plot the paths for each game is contained in experimentpositions.csv, where each row corresponds to the time step, pursuer position and velocity, target position and velocity, and target maximum speed. The data associated with the game outcome and conditions are available in experimentgamestats.csv, where each row corresponds to the initial distance, initial angle to the target, maximum target speed, and the game outcome (-1 for loss for maximum distance, 0 for loss for maximum time, and 1 for successful capture (win)). The cluster assignments for each game are provided in k_means_clusters.csv. Additionally, pickled files of the actions (all_actions.p); conditions: initial angles to target (all_angles.p), initial distances to target (all_dists.p) and target maximum speeds (all_speeds.p); behaviors (all_embeddings.p); and game outcomes (all_results.p) are uploaded for convenience. (ZIP 903383 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Allen, J.F., Schmidt, S. & Gabriel, S.A. Uncovering Strategies and Commitment Through Machine Learning System Introspection. SN COMPUT. SCI. 4, 322 (2023). https://doi.org/10.1007/s42979-023-01747-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-01747-8