Abstract:
Infants learn through interactions with the environment. Thus, to understand infants' early learning experiences, it is critical to quantify their natural learning input—...Show MoreMetadata
Abstract:
Infants learn through interactions with the environment. Thus, to understand infants' early learning experiences, it is critical to quantify their natural learning input—where infants go, what they touch, and what they see. Wearable sensors can record locomotor and hand movements, but cannot recover the context that prompted the behaviors. Egocentric views from head cameras and eye trackers require annotation to process the videos and miss much of the surrounding context. Third-person video captures infant behavior in the entire scene but may misrepresent the egocentric view. Moreover, third-person video requires machine or human annotation to make sense of the behaviors, and either method alone is sorely lacking. Computer-vision is not sufficiently reliable to quantify much of infants' complex, variable behavior, and human annotation cannot reliably quantify 3D coordinates of behavior without laborious hand digitization. Thus, we pioneered a new system of behavior detection from third-person video that capitalizes on the integrated power of computer vision and human annotation to quantify infants' locomotor, manual, and egocentric visual interactions with the environment. Our system estimates a real infant's interaction with a physical environment during free play by projecting a “virtual” infant in a “virtual” 3D environment with known coordinates of all furniture, objects, and surfaces. Our methods for using human-in-the-loop computer vision have broad applications for reliable quantification of locomotor, manual, and visual behaviors outside the purview of standard algorithms or human annotation alone.
Date of Conference: 20-23 May 2024
Date Added to IEEE Xplore: 27 August 2024
ISBN Information: