1 Introduction
Environmental design is a discipline that involves externalizing and iterating changes to the physical environment to meet design goals [
50]. An architect designing an open-space living area might want to visualize how taking down a wall would affect the existing room arrangement, and a landscape designer might want to show how proposed plantings will look given the site plan of a garden. Consumers also face these issues in everyday life. A homeowner might want to see the effect of changing a regular door to a French door or introducing convertible furniture.
In this work, we focus on supporting users, regardless of their design expertise, in environmental design prototyping. Unlike application prototyping, where users modify virtual assets to explore digital design options, we use the term “prototyping” to describe the task of exploring changes to a physical environment. Environmental design prototyping is critical to ensure various spaces are functional and efficient, but design prototyping for a physical space can be costly and difficult. Users need to externalize and iterate design ideas quickly, but the size and rigidity of a physical environment pose many challenges. Physically realizing a design can be costly if it involves heavy or irreversible manipulation—a single person might not be able to move a large piece of furniture; a wall that was removed cannot be easily rebuilt. Modifying a digital replica of the physical environment would have been easier, but current practice is tedious and has a steep learning curve. It involves a large collection of hardware and software, such as using a dedicated 3D scanner, processing scan data to create a textured mesh model, editing the model in some desktop 3D software, and exporting the design for mobile viewing on site.
Inspired by Ben Shneiderman’s design principles for creativity support tools [
45], we seek to develop a novel workflow for environmental design prototyping that 1) is easy for novices to use, 2) reduces frustrating file conversions, and 3) enables users to rapidly generate multiple alternatives.
The first and fundamental requirement for such an environmental design prototyping workflow is that users should be able to capture the physical environment easily. This is made possible by consumer augmented reality (AR) and depth sensing technologies, which are becoming more and more accessible but still under-explored in current environmental design tools. A LiDAR-equipped iPad can capture an entire home in 3D. Although the quality of the resulting scan may not be as good as that created by a professional laser scanner, a rough 3D replica can still provide sufficient spatial context for environmental design prototyping. AR viewing is also supported out of the box. Users can quickly view the scan by walking around and get an approximate sense of the design as if they had been made in the real space [
49].
Current practices often include many file conversions, causing great frustration for designers. The raw data from a 3D scanner is usually a point cloud. For visualization purposes, the traditional graphics pipeline needs to transfer the raw data to a desktop machine for further processing, such as extracting a polygon mesh from the point cloud and computing texture mapping. This is notoriously complicated because it requires great technical expertise, involves multiple algorithms and software packages, and takes significant processing time. As a reference, one of the most optimized 3D scanner apps, Polycam [
40], takes about three minutes to produce a textured mesh for a bedroom, while a custom program can produce a colored point cloud for the same space in less than ten seconds. The additional processing time required for mesh generation is simply too long for an iterative turnaround cycle, especially if the environment is large or the design requires multiple scans of different areas and objects.
Traditional mesh editors also prevent users from quickly creating multiple design alternatives. Even though carefully reconstructed and texture-mapped meshes can be geometrically and photometrically accurate, and they are widely used in the film and video game industry, editing such meshes takes tremendous effort and is not amenable to rapid environmental design prototyping. Simple Boolean geometric operations, such as adding to a scene by performing a union with a new object, or removing part of a scene by intersecting with another object, are a great challenge that is still being actively investigated by an extensive body of graphics research [
35,
54]. Even the best approaches can fail if the underlying mesh has problems like failing to be watertight. Performing direct vertex manipulation on a mesh using current desktop 3D software is straightforward, but the results often do not reflect users’ creative intentions effectively, as illustrated by the famous short film
Cubic Tragedy [
10]. Mesh editing has a steep learning curve and can cause frustration, as it is prone to errors in manipulating vertex connectivity and in texture mapping.
Based on our analysis of the three design principles above, this paper mainly focuses on two research questions. First, we explore whether point clouds are suitable for rapid environmental design prototyping, with the goal of providing a more user-centric experience than mesh editing. Raster tools like Adobe Photoshop [
3] are more widely used for editing captured images [
14], whereas vector tools like Adobe Illustrator [
1] are more suitable for rigid and precise editing. Inspired by this, we choose point cloud as the “raster” representation for editing 3D scans because of its freeform structure and its amenability to fast iterative design prototyping. Second, we investigate whether integrating both point cloud capture and editing in a single in-situ AR workflow could help users explore design ideas quickly without having to physically modify an environment. To answer these questions, we develop PointShopAR, a tablet-based system with integrated point cloud capture and editing to support environmental design prototyping.
PointShopAR leverages the LiDAR scanner available on consumer tablets like iPads to allow users to quickly capture their spatial context as colored point clouds. Users can use the same device in an editor mode to perform a variety of point cloud editing operations, inspired by a formative study, such as selection, transformation, hole-filling, drawing, and animation. PointShopAR provides an integrated capture and editing experience in situ, helping users quickly explore spatial design ideas. Our user study shows that point clouds can represent the design context effectively and serve as a versatile medium for a variety of design tasks in AR, including the design of objects, interior environments, buildings, and landscapes.
In summary, this paper makes the following contributions:
•
A system that integrates the capture and editing workflows, in which the faster feedback loop enabled by eliminating the 3D model construction allows designers to experience and iterate the design rapidly.
•
A demonstration that the point cloud is an effective prototyping primitive, making it easy and intuitive to capture and edit a 3D spatial context.
•
A variety of application scenarios for our system showing its versatility and effectiveness, including object design, interior design, architectural design, and landscape design.
3 Design Considerations
To inform our system design, we conducted an interview-based formative study to understand how designers use their spatial context for environmental design. We invited six designers (P1–P6) to participate. All held advanced degrees in architecture and four had been working for top architecture firms. We carried out open-ended interviews with two goals: understanding how designers capture and edit spatial context when designing for a physical environment, and identifying challenges in their current workflow and what limitations they are facing with their current tool set. We asked participants to describe recent design projects, how they used spatial context in their design, and how they edit 3D scans to explore environmental design ideas.
The interviews helped us understand that spatial context plays a very important role in design for projects of different scales. For example, building a public school requires an urban plan, building a residential house requires elevation and vegetation maps, and interior design requires a floor plan and furniture dimensions. However, all designers said that it is very difficult to capture spatial context due to the lack of accessible tools. Context representation is also limited—site maps are highly abstract, photos and videos do not support changing to a new viewpoint, and 3D scanning is expensive and time consuming. P1 considers site visits to be important and prefers in-situ design because he no longer has access to the immersive spatial context back in the studio.
Editing the spatial context is also challenging. P2, P4, and P5 rely on heavy-weight desktop software like AutoCAD [
9] and Rhino [
8] to manipulate 3D data. P1 said that a common challenge with these tools is the mindset switch in design. Using them to manipulate 3D scenes on a desktop computer means learning and finding the right feature for 3D operations. P3 wanted to modify an environment by selecting and manipulating walls and ceilings, but having no streamlined design prototyping workflow made this very difficult. P4–P6 said that they still prefer to sketch on paper to explore and prototype environmental design ideas. All reported familiarity with AR technology and its potential opportunities for architectural design. They mentioned a top use case of AR is to let designers experience their design in AR to get a more realistic sense of how the physical environment was altered.
The formative study showed designers’ desire for a faster capture-editing feedback loop, more freeform 3D editing, and being able to experience their design in AR, all of which can let them evaluate and iterate their design choices more rapidly and facilitate environmental design prototyping. Based on these interviews, we distill our design considerations for developing the PointShopAR system as follows:
•
Users must be able to capture their spatial context easily and quickly. Our capture representation must be amenable to rapid prototyping.
•
Users must be able to edit the 3D capture using intuitive and direct interactions, such as selection, 3D manipulation, hole filling, and drawing.
•
Users must be able to experience and iterate their design in situ and animate objects to illustrate new functionalities in the design.
8 Limitations and Future Work
Our system validated using point clouds for rapid prototyping of environmental designs in AR. However, there are limitations and many areas for future work.
The iPad LiDAR sensor and the ARKit camera tracking are optimized for operation in room-sized environments, and so we optimized our scanning and point cloud creation mechanisms for this case. Unfortunately, this means that our system is not very useful for small-scale operations like arranging items on a table or desk, or large-scale operations like designing a building complex. It is likely that improved algorithms, such as combining multiple scans or using machine learning approaches to infer details, could overcome some of the hardware limitations. Editing higher-fidelity representations like NeRF [
51] would also be an interesting future direction.
PointShopAR lets users experience their designs in the current environment, and the ability to experience them elsewhere is limited. Approaches explored in DistanciAR [
49] could address this. Our system also has all the building blocks needed to support collaborative design. We are interested in seeing how our system can be used across multiple tablets and head-mounted displays.
The current selection method works well in many cases, but it can be difficult to tune the selection, especially in cases where the environment prevents the user from observing the selection from certain angles. It would help to adopt techniques that let the user change the viewpoint without moving [
49]. We would also like to use non-bounding-box selection methods for irregular objects and those in crowded environments, such as letting the user draw a curve as in SpaceCast and TraceCast [
52].
Object manipulation would be easier if the user could incorporate physical constraints between objects and the device. It is easy to rotate a door around a vertical axis, but modification would be even easier if the user could place the axis at the door’s edge. Similarly, placement and movement would be easier if objects could snap to detected planes in the environment. The user interactions would also be more intuitive if the tablet’s pose could be linked to different manipulation modes [
30], so the user can avoid switching modes using extra buttons.
Our inpainting algorithm fills patches using the average color of the surrounding points, leading to patches that do not integrate well into the scene if the lighting is uneven or the background has multiple objects. A flood-fill algorithm could provide better color matching. We could also improve the spacing and alignment of points to make patches integrate better with the surroundings.
PointShopAR currently captures a static scene as a point cloud, but the corresponding modules can be readily extended to support recording dynamic scenes to fully explore the potential of the point cloud representation. This would allow freeform animation of spatial elements in combination with other features like morphing. We believe the future incorporation of dynamic point cloud support can enable more creative prototyping for environmental design, such as exploring the relationship between spatial design and moving people. A dynamic point cloud recording can also provide useful information about lighting for the AR renderer to provide a more immersive experience.
9 Conclusion
In this paper we introduced PointShopAR, a novel system for prototyping environmental design in AR that integrates point cloud capture and editing on a tablet device. Our user studies helped identify three major contributions.
First, we showed that point clouds are quick and easy to capture, enabling a system that lets users start editing after just a short delay. The turnaround time is sufficiently fast that users can effectively switch back and forth between capturing and editing if their task requires it.
Second, we showed that point clouds support a wide variety of editing operations. The representation is intuitive enough that first-time users were able to make meaningful changes to their environment in about 20 minutes.
Finally, we showed that a tablet-based AR interface is an effective way to edit point clouds and iterate the design. Users could carry out their design on site, move around to view the scene from different angles, and use a combination of touch gestures and device movement to manipulate their physical environment.
We look forward to seeing how future research will be able to expand on our work and take AR point cloud capture and editing in new directions.
A Appendix: Point Cloud Operations
This section provides details of various point cloud operations implemented in PointShopAR.
A.1 Generation
After the user finishes the scan, all the recorded data is sent to a web service to generate a point cloud. We first recover the depth maps in meters (range: 15 m) and keep depth values with high confidence. They are used to build a truncated signed distance function (TSDF) volume [
11,
36], integrating the RGBD images every 30 frames with corresponding extrinsic and intrinsic camera parameters. The TDSF uses a voxel size of 5/512 meter and 0.05 as the truncation value. We then extract a colored point cloud from the TSDF volume and downsample it with a voxel size of 5/256 meter. The server sends the point cloud to the front-end iPad for rendering and editing.
To interact with points efficiently, we build a three-level octree on the iPad by sorting all points by their XYZ coordinates and computing the size and center of the enclosing bounding box. For each point, we compute a three-digit index of its relative position in an 8 × 8 × 8 subdivision of the volume, e.g., 026 to indicate that it is in the 0th eighth of the volume in the x direction, the 2nd eighth in the y direction, and the 6th eighth in the z direction. We first create a fully-populated octree with 256 (= 83) leaf nodes, and then traverse all 3D points to update the octree. Each octree node records the number of points in its space partition (all children, grandchildren, etc.) and the xyz octree index (e.g., 026) Each leaf node includes a list of all 3D points in its volume. The octree significantly improves the speed of point-related computations.
A.2 Rendering
Once the point cloud has been processed, we begin rendering it on the iPad. We use Apple’s SceneKit for point cloud visualization and editing. For each point, we record its XYZ coordinates, RGB color, and the index of its octree node. A point cloud object is then defined as a list of such points. For each point cloud object, we also maintain its vertex, color, and index buffers to create a
SCNGeometry. We can visualize the point cloud once the
SCNGeometry is added to the scene graph. The point cloud is relocalized so the user can view it from the current AR camera in situ. The user can also adjust point opacity and point size using two sliders in the editing interface to make sure they can understand the spatial context represented by the point cloud.
A.3 Point Selection
Fig.
11 shows how we use the octree to enable efficient point selection. Starting at the top level, we project the eight vertices of each octree space partition onto the screen plane and check whether the tapped point falls within the bounding box of the resulting eight points. If not, we ignore all points in that partition. Otherwise we recur down to the next level of the tree. Because the tree often contains large areas with no points, we further optimize this by skipping partitions that had no points assigned to them in the construction. The resulting performance is logarithmic in the number of points. For example, it takes about half a second for a tap to select a 3D point from a point cloud object containing hundreds of thousands of points.
A.4 Inpainting
Our inpainting algorithm has the following steps. First, we use the singular value decomposition (SVD) to find a best-fit plane for each list of control points. We then create one to three rectangular patches that fill the hole but do not extend too far beyond it. For a single-plane hole, we project the control points onto the fitted plane and take their bounding box. For two- and three-plane holes, we find the intersection lines between each pair of planes. Then, for each plane, we project its control points onto its fitted plane and also onto the intersection lines between its plane and the others. We then take the bounding box of these projected points. The result is one rectangle in space, two intersecting rectangles, or three rectangles that intersect at a single point. We sample each rectangle in two dimensions with a spacing of 5/256 meter—the same resolution as our original capture—to generate a grid of points that make a patch. Finally, we assign the average color of the control points on the patch’s plane to the patch’s points and merge them all to create a point cloud object to send back to the iPad.