Abstract
Annotation tools serve a critical role in the generation of datasets that fuel machine learning applications. With the advent of Foundation Models, particularly those based on Transformer architectures and expansive language models, the capacity for training on comprehensive, multimodal datasets has been substantially enhanced. This not only facilitates robust generalization across diverse data categories and knowledge domains but also necessitates a novel form of annotation—prompt engineering—for qualitative model fine-tuning. This advancement creates new avenues for machine intelligence to more precisely identify, forecast, and replicate human behavior, addressing historical limitations that contribute to algorithmic inequities. Nevertheless, the voluminous and intricate nature of the data essential for training multimodal models poses significant engineering challenges, particularly with regard to bias. No consensus has yet emerged on optimal procedures for conducting this annotation work in a manner that is ethically responsible, secure, and efficient. This historical literature review traces advancements in these technologies from 2018 onward, underscores significant contributions, and identifies existing knowledge gaps and avenues for future research pertinent to the development of Transformer-based multimodal Foundation Models. An initial survey of over 724 articles yielded 156 studies that met the criteria for historical analysis; these were further narrowed down to 46 key papers spanning the years 2018–2022. The review offers valuable perspectives on the evolution of best practices, pinpoints current knowledge deficiencies, and suggests potential directions for future research. The paper includes six figures and delves into the transformation of research landscapes in the realm of machine-assisted behavioral annotation, focusing on critical issues such as bias.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
(2019) Subreddit Simulator using GPT-2. Reddit. Retrieved 2 November, 2022, from https://www.reddit.com/r/SubSimulatorGPT2
Al Zamil MG, Rawashdeh M, Samarah S, Hossain MS, Alnusair A, Rahman SMM (2018) An annotation technique for in-home smart monitoring environments. IEEE Access 6:1471–1479
Axenie C, Scherr W, Wieder A, Torres AS, Meng Z, Du X, Sottovia P, Foroni D, Grossi M, Bortoli S, Brasche G (2022) Fuzzy modeling and inference for physics-aware road vehicle driver behavior model calibration. SSRN Electron J
Bahnsen CH, Møgelmose A, Moeslund TB (2018) The AAU multimodal annotation toolboxes: annotating objects in images and videos. ArXiv abs/1809.03171
Baker B, Akkaya I, Zhokhov P, Huizinga J, Tang E, Ecoffet A, Houghton B, Sampedro R, Clune J (2022a) Learning to play minecraft with Video PreTraining (VPT). OpenAI. Retrieved 1 November, 2022, from https://openai.com/blog/vpt/
Baker B, Akkaya I, Zhokhov P, Huizinga J, Tang J, Ecoffet A, Houghton B, Sampedro R, Clune J (2022b) Video PreTraining (VPT): learning to act by watching unlabeled online videos. ArXiv abs/2206.11795
Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, Arx SV, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buch S, Card D, Castellon R, Chatterji NS, Chen AS, Creel KA, Davis J, Demszky D, Donahue C, Doumbouya M, Durmus E, Ermon S, Etchemendy J, Ethayarajh K, Fei-Fei L, Finn C, Gale T, Gillespie LE, Goel K, Goodman ND, Grossman S, Guha N, Hashimoto T, Henderson P, Hewitt J, Ho DE, Hong J, Hsu K, Huang J, Icard TF, Jain S, Jurafsky D, Kalluri P, Karamcheti S, Keeling G, Khani F, Khattab O, Koh PW, Krass MS, Krishna R, Kuditipudi R, Kumar A, Ladhak F, Lee M, Lee T, Leskovec J, Levent I, Li XL, Li X, Ma T, Malik A, Manning CD, Mirchandani SP, Mitchell E, Munyikwa Z, Nair S, Narayan A, Narayanan D, Newman B, Nie A, Niebles JC, Nilforoshan H, Nyarko JF, Ogut G, Orr LJ, Papadimitriou I, Park JS, Piech C, Portelance E, Potts C, Raghunathan A, Reich R, Ren H, Rong F, Roohani YH, Ruiz C, Ryan J, R'e C, Sadigh D, Sagawa S, Santhanam K, Shih A, Srinivasan KP, Tamkin A, Taori R, Thomas AW, Tramèr F, Wang RE, Wang W, Wu B, Wu J, Wu Y, Xie SM, Yasunaga M, You J, Zaharia MA, Zhang M, Zhang T, Zhang X, Zhang Y, Zheng L, Zhou K, Liang P (2021) On the opportunities and risks of foundation models. ArXiv abs/2108.07258
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv abs/1810.04805
Dhamija S, Boult TE (2018) Automated action units vs. expert raters: face off. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp 259–268
Dong H, Wang W, Huang K, Coenen F (2019) Joint multi-label attention networks for social text annotation. In: Proceedings of the 2019 conference of the north, pp 1348–1354
Dong H, Wang W, Huang K, Coenen F (2021) Automated social text annotation with joint multilabel attention networks. IEEE Trans Neural Netw Learn Syst 32(5):2224–2238
Ganguli D, Hernandez D, Lovitt L, Askell A, Bai Y, Chen A, Conerly T, Dassarma N, Drain D, Elhage N, El Showk S, Fort S, Hatfield-Dodds Z, Henighan T, Johnston S, Jones A, Joseph N, Kernian J, Kravec S, Mann B, Nanda N, Ndousse K, Olsson C, Amodei D, Brown T, Kaplan J, McCandlish S, Olah C, Amodei D, Clark J (2022) Predictability and surprise in large generative models. In: 2022 ACM conference on fairness, accountability, and transparency, association for computing machinery, vol 5 pp 1747–1764
Gaur E, Saxena V, Singh SK (2018) Video annotation tools: a review. In: 2018 International conference on advances in computing, communication control and networking (ICACCCN), pp 911–914
Goldberg SB, Tanana M, Imel ZE, Atkins DC, Hill CE, Anderson T (2020) Can a computer detect interpersonal skills? Using machine learning to scale up the Facilitative Interpersonal Skills task. Psychother Res 31(3):281–288
Hänggi JM, Spinnler S, Christodoulides E, Gramespacher E, Taube W, Doherty A (2020) Sedentary behavior in children by wearable cameras: development of an annotation protocol. Am J Prev Med 59(6):880–886
Hassani A, Shi H (2022) Dilated neighborhood attention transformer. ArXiv abs/2209.15001
Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567(7747):209–212
Jäger J, Reus G, Denzler J, Wolff V, Fricke-Neuderth K (2019) LOST: a flexible framework for semi-automatic image annotation. ArXiv abs/1910.07486
Kurzhals K, Rodrigues N, Koch M, Stoll M, Bruhn A, Bulling A, Weiskopf D (2020) Visual analytics and annotation of pervasive eye tracking video. In: ACM symposium on eye tracking research and applications, pp 1–9
Li M, Lv T, Cui L, Lu Y, Florêncio DAF, Zhang C, Li Z, Wei F (2021) TrOCR: transformer-based optical character recognition with pre-trained models. ArXiv abs/2109.10282
Liang PP, Zadeh A, Morency L-P (2022) Foundations and recent trends in multimodal machine learning: principles, challenges, and open questions. ArXiv abs/2209.03430
Lorbach M, Poppe R, Veltkamp RC (2019) Interactive rodent behavior annotation in video using active learning. Multimed Tools Appl 78(14):19787–19806
Rahtz M, Varma V, Kumar R, Kenton Z, Legg S, Leike J (2022) Safe deep RL in 3D environments using human feedback. ArXiv abs/2201.08102
Segalin C, Williams J, Karigo T, Hui M, Zelikowsky M, Sun JJ, Perona P, Anderson DJ, Kennedy A (2021) The Mouse Action Recognition System (MARS) software pipeline for automated analysis of social behaviors in mice. Elife 10:e63720
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
Srivastava A, Rastogi A, Rao AB, Shoeb AAM, Abid A, Fisch A, Brown AR, Santoro A, Gupta A, Garriga-Alonso A, Kluska A, Lewkowycz A, Agarwal A, Power A, Ray A, Warstadt A, Kocurek AW, Safaya A, Tazarv A, Xiang A, Parrish A, Nie A, Hussain A, Askell A, Dsouza A, Rahane AA, Iyer AS, Andreassen AJ, Santilli A, Stuhlmuller A, Dai AM, La AD, Lampinen AK, Zou A, Jiang A, Chen A, Vuong A, Gupta A, Gottardi A, Norelli A, Venkatesh A, Gholamidavoodi A, Tabassum A, Menezes A, Kirubarajan A, Mullokandov A, Sabharwal A, Herrick A, Efrat A, Erdem A, Karakacs A, Roberts BR, Loe BS, Zoph B, Bojanowski B, Ozyurt B, Hedayatnia B, Neyshabur B, Inden B, Stein B, Ekmekci B, Lin BY, Howald BS, Diao C, Dour C, Stinson C, Argueta C, Ram'irez CEF, Singh C, Rathkopf C, Meng C, Baral C, Wu C, Callison-Burch C, Waites C, Voigt C, Manning CD, Potts C, Ramirez CT, Rivera C, Siro C, Raffel C, Ashcraft C, Garbacea C, Sileo D, Garrette DH, Hendrycks D, Kilman D, Roth D, Freeman D, Khashabi D, Levy D, Gonz'alez D, Hernandez D, Chen D, Ippolito D, Gilboa D, Dohan D, Drakard D, Jurgens D, Datta D, Ganguli D, Emelin D, Kleyko D, Yuret D, Chen D, Tam D, Hupkes D, Misra D, Buzan D, Coelho Mollo D, Yang D, Lee D-H, Shutova E, Cubuk ED, Segal E, Hagerman E, Barnes E, Donoway EP, Pavlick E, Rodolà E, Lam EF, Chu E, Tang E, Erdem E, Chang E, Chi EA, Dyer E, Jerzak E, Kim E, Manyasi EE, Zheltonozhskii E, Xia F, Siar F, Mart'inez-Plumed F, Happ'e F, Chollet F, Rong F, Mishra G, Winata GI, de Melo G, Kruszewski G, Parascandolo G, Mariani G, Wang G, Jaimovitch-L'opez G, Betz G, Gur-Ari G, Galijasevic H, Kim HS, Rashkin H, Hajishirzi H, Mehta H, Bogar H, Shevlin H, Schütze H, Yakura H, Zhang H, Wong H, Ng IA-S, Noble I, Jumelet J, Geissinger J, Kernion J, Hilton J, Lee J, Fisac JF, Simon JB, Koppel J, Zheng J, Zou J, Koco'n J, Thompson J, Kaplan J, Radom J, Sohl-Dickstein JN, Phang J, Wei J, Yosinski J, Novikova J, Bosscher J, Marsh J, Kim J, Taal J, Engel J, Alabi JO, Xu J, Song J, Tang J, Waweru JW, Burden J, Miller J, Balis JU, Berant J, Frohberg J, Rozen J, Hernández-Orallo J, Boudeman J, Jones J, Tenenbaum JB, Rule JS, Chua J, Kanclerz K, Livescu K, Krauth K, Gopalakrishnan K, Ignatyeva K, Markert K, Dhole KD, Gimpel K, Omondi KO, Mathewson KW, Chiafullo K, Shkaruta K, Shridhar K, McDonell K, Richardson K, Reynolds L, Gao L, Zhang L, Dugan L, Qin L, Contreras-Ochando L, Morency L-P, Moschella L, Lam L, Noble L, SchmidtL, He L, Col'on LO, Metz L, cSenel LK, Bosma M, Sap M, Hoeve MT, Andrea M, Farooqi MS, Faruqui M, Mazeika M, Baturan M, Marelli M, Maru M, Quintana M, Tolkiehn M, Giulianelli M, Lewis M, Potthast M, Leavitt M, Hagen M, Schubert MAA, Baitemirova M, Arnaud M, McElrath MA, Yee MA, Cohen M, Gu M, Ivanitskiy MI, Starritt M, Strube M, Swkedrowski M, Bevilacqua M, Yasunaga M, Kale M, Cain M, Xu M, Suzgun M, Tiwari M, Bansal M, Aminnaseri M, Geva M, Gheini M, MukundVarma T, Peng N, Chi N, Lee N, Krakover NG-A, Cameron N, Roberts NS, Doiron N, Nangia N, Deckers N, Muennighoff N, Keskar NS, Iyer N, Constant N, Fiedel N, Wen N, Zhang O, Agha O, Elbaghdadi O, Levy O, Evans O, Casares PAM, Doshi P, Fung P, Liang PP, Vicol P, Alipoormolabashi P, Liao P, Liang P, Chang PW, Eckersley P, Htut PM, Hwang P-B, Milkowski P, Patil PS, Pezeshkpour P, Oli P, Mei Q, Lyu Q, Chen Q, Banjade R, Rudolph RE, Gabriel R, Habacker R, Delgado ROR, Millière R, Garg R, Barnes R, Saurous RA, Arakawa R, Raymaekers R, Frank R, Sikand R, Novak R, Sitelew R, Le Bras R, Liu R, Jacobs R, Zhang R, Salakhutdinov R, Chi R, Lee R, Stovall R, Teehan R, Yang R, Singh SJ, Mohammad SM, Anand S, Dillavou S, Shleifer S, Wiseman S, Gruetter S, Bowman S, Schoenholz SS, Han S, Kwatra S, Rous SA, Ghazarian S, Ghosh S, Casey S, Bischoff S, Gehrmann S, Schuster S, Sadeghi S, Hamdan SS, Zhou S, Srivastava S, Shi S, Singh S, Asaadi S, Gu SS, Pachchigar S, Toshniwal S, Upadhyay S, Debnath S, Shakeri S, Thormeyer S, Melzi S, Reddy S, Makini SP, Lee S-H, Torene SB, Hatwar S, Dehaene S, Divic S, Ermon S, Biderman SR, Lin SC, Prasad S, Piantadosi ST, Shieber SM, Misherghi S, Kiritchenko S, Mishra S, Linzen T, Schuster T, Li T, Yu T, Ali TA, Hashimoto T, Wu T-L, Desbordes T, Rothschild T, Phan T, Wang T, Nkinyili T, Schick T, Kornev TN, Telleen-Lawton T, Tunduny T, Gerstenberg T, Chang T, Neeraj T, Khot T, Shultz TO, Shaham U, Misra V, Demberg V, Nyamai V, Raunak V, Ramasesh VV, Prabhu VU, Padmakumar V, Srikumar V, Fedus W, Saunders W, Zhang W, Vossen W, Ren X, Tong XF, Wu X, Shen X, Yaghoobzadeh Y, Lakretz Y, Song Y, Bahri Y, Choi YJ, Yang Y, Hao Y, Chen Y, Belinkov Y, Hou Y, Hou Y, Bai Y, Seid Z, Xinran Z, Zhao Z, Wang ZF, Wang ZJ, Wang Z, Wu Z, Singh S, Shaham U (2022) Beyond the imitation game: quantifying and extrapolating the capabilities of language models. ArXiv abs/2206.04615
Stiennon N, Ouyang L, Wu J, Ziegler DM, Lowe RJ, Voss C, Radford A, Amodei D, Christiano P (2020) Learning to summarize from human feedback. ArXiv abs/2009.01325
Su H, Kasai J, Wu CH, Shi W, Wang T, Xin J, Zhang R, Ostendorf M, Zettlemoyer L, Smith NA, Yu T (2022) Selective annotation makes language models better few-shot learners. ArXiv abs/2209.01975
Szegedy C, Reed SE, Erhan D, Anguelov D (2014) Scalable, high-quality object detection. ArXiv abs/1412.1441
Takano W (2020) Annotation generation from IMU-based human whole-body motions in daily life behavior. IEEE Trans Hum–Mach Syst 50(1):13–21
Tjandrasuwita M, Sun JJ, Kennedy A, Chaudhuri S, Yue Y (2021) Interpreting expert annotation differences in animal behavior. ArXiv abs/2106.06114
Vaswani A, Shazeer NM, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv abs/1706.03762
Wang S, Liu Y, Xu Y, Zhu C, Zeng M (Year) Want to reduce labeling cost? GPT-3 Can Help. In EMNLP
Wang Z, Yu AW, Firat O, Cao Y (2021) Towards zero-label language learning. arXiv abs/2109.09193
Watson E, Viana T, Zhang S (2023) Augmented behavioral annotation tools, with application to multimodal datasets and models: a systematic review. AI 4:128–171
Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S, Yogatama D, Bosma M, Zhou D, Metzler D, Chi E, Hashimoto T, Vinyals O, Liang P, Dean J, Fedus W (2022) Emergent abilities of large language models. ArXiv abs/2206.07682
Xue T, El Ali A, Zhang T, Ding G, Cesar P (2021) RCEA-360VR: real-time, continuous emotion annotation in 360° VR videos for collecting precise viewport-dependent ground truth labels. In: Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–15
Yu J, Xu Y, Koh JY, Luong T, Baid G, Wang Z, Vasudevan V, Ku A, YangY, Ayan BK, Hutchinson BC, Han W, Parekh Z, Li X, Zhang H, Baldridge J, Wu Y (2022) Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. ArXiv abs/2206.10789
Acknowledgements
The authors wish to extend their gratitude to Alexander Kruel and Karoly Zsolnai-Fehér for producing various timely machine learning news bulletins on the evolving state-of-the-art in machine learning. The authors also wish to thank A. Safronov for editing assistance.
Funding
This research received no external funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Informed Consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Watson, E., Viana, T. & Zhang, S. Machine Learning Driven Developments in Behavioral Annotation: A Recent Historical Review. Int J of Soc Robotics 16, 1605–1618 (2024). https://doi.org/10.1007/s12369-024-01117-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12369-024-01117-1