tl;dr at the bottom 😃
I'm completely fascinated by product design. Every day I comb through design trends, observing talented creators and artists to see if I can absorb a bit of that intuition. Although it's fairly new to me, the world of 3d interface design is the most interesting of them all. This type of design goes into interfaces for Augmented Reality and other forms of spatial computing. While AR is pretty cool, it is also extraordinarily difficult to produce smooth, predicable and beautiful experiences. At Placenote, we're constantly looking for ways to improve design for our AR SDK so I went ahead and looked into Google Lens to see how it utilizes 3d interface design.
"Spatial computing is human interaction with a machine in which the machine retains and manipulates referents to real objects and spaces"
Google Lens Product Teardown
Google Lens was introduced in 2017 as a tool that expands on how users interact with search. I first encountered Google Lens years after it launched while watching my coworker use it to identify a type of specific maple tree in Canada. Very Canadian, I know. He held out a leaf, pointed the camera at it, and within a few seconds had the exact species of tree on his phone. I was extremely impressed by the quick search results and thought it was an intuitive bit of software.
The best part about the product is that the experience was so good that the design felt invisible. Invisibility is a goal that many designers pursue as a way to gently fold aesthetics into a product that creates an emotional connection and bond with the user. Invisible interactions are what I'm most interested in; the interaction happens so seamlessly that you perform a task without thinking about it. Google Lens nailed this. The experience was so intuitive that I didn't realize it was there in the first place. With Google Lens, it just worked, and very well at that.
The Issue of Optimizing Screen Real Estate
As AR developers, we know that mobile AR is the most common interface for building spatial computing apps. It has the lowest barrier to entry, decent performance, and is perfect for testing future ideas you might have for head-mounted displays. However, working with mobile AR has its restrictions. One of the biggest pains in design is managing screen real estate. When working with an app that uses camera viewfinder, you're walking the fine line of too much UI versus user confusion. A crowded user interface is counterproductive when the user needs the camera as a viewfinder to navigate their environment. On the flip side, not having enough UI can leave a user feeling lost, with no clue of how to use the tool at all.
Google was aware of this issue as they built Google Lens. The designers were thoughtful about the use of UI on the camera screen by mimicking familiar camera UI to keep the confusion for new users at a very low level. On the top of the screen, there is an exit button, a toggle flash, Google Lens logo, import from camera roll button, and settings.
The UI also mimics some of the traditional parts of a Google search, keeping these features vertically aligned with Google's most well-known product maintains the need for user education at a very low level. The bottom buttons are simply there to help categorize search results before they even begin. The breakdown is Translate, Text, Default Search, Shopping, and Dining. Similar to how a traditional google search has a breakdown for search results (All, Images, Videos, News, etc). In Google Lens, the "Default Search" button can be compared to the "All" tab on a normal google search. The "Translate" button is for printed word translation, "Text" is for copying printed text to your phone's clipboard, "Shopping" shows shopping results, and "Dining" yields restaurant results.
A User Interface That Incentivizes Physical Movement
One of the most difficult parts of creating an AR application is trying to get users to move. Simply making a user walk around one's house to interact with digital content is a deceivingly hard problem. "Of course you need to move your phone to use an AR app!" you might say, but you'd be surprised how many people freeze when they are handed an AR app that requires them to point the camera, walk, and/or interact with their surroundings.
Google Lens doesn't have this problem. Rather than coach a user to move through copious on-boarding screens and instructions, they figured out a way to encourage a user to want to move around their environment and use the product. The application uses something that I've dubbed "Inferred Point of Interest" or "Inferred POI". These Inferred POIs are scene-intelligent, meaning that they react to how the app is used in the environment. As the user points the Google Lens around, it will infer some points that it can associate Google results with. If we look at the GIF below, you can see how the white circle POI picked a stool that it thinks may be interesting to me.
In addition to the dots placed on the screen, there is a bit of persistence capability built into the POI feature. As you move your phone around, the dots will follow the objects they originally tagged. Even if an object briefly leaves the camera frame, it will still be there if you pan back across it. I assume Google is using some sort of AI-enabled image classification to get the inferred points that it finds in the scene. When you do click on one of the dots, the camera frame freezes on that position and presents the search results instantly with impressive accuracy most of the time. At the coffee shop where I was testing this, it picked up a listing for the exact style of stool in about 2 seconds.
Another thing Google did to tastefully encourage interaction with the POIs is that the closer a POI is to the vertical center of the screen, the larger the POI becomes. When using this, it gives a feeling as if the markers are in the environment and get larger as you get closer to them, but in reality, it's just an optical illusion. When a user walks around towards things they're interested in, the POI that is most interesting to the user is usually at the center of the screen.
The inferred POI's and instant search results provide a very special experience that I haven't seen yet in spatial applications. It's so easy to use that there is no need for onboarding and the user automatically moves around their environment scanning things with the camera because they enjoy the experience. Google Lens will identify things that you wouldn't even think to point the camera at, which adds to the fun experience of the tool. Additionally, it shows the capability of the product while gently guiding a user to a series of specific actions. It's assuming it knows what the user wants to look at, but it's not forcing them to click on anything. Whether this was planned or not (it probably was), it's a perfect way of coercing users along a specific experience: open camera > walk around and point the camera at things > submit search results.
How This Applies to Spatial Computing
So Google Lens may look cool and all, but how does it fit into spatial computing design at all? In its simplest form, it's just a camera and image search, not some multiplayer AR FPS with environmental occlusions. While that may be true, I think Google Lens is a great starting point for designers to look at when designing any app that is supposed to work in a 3d environment.
The most impressive takeaway that I got from this application is how solid the user experience was so that there was no need for onboarding. The inferred POI's do their job of invisibly pushing the user all around their environment. The UI in the camera view is unobstructed, clear, and straight to the point. It will be interesting to see how AI can push the boundaries of spatial user experience. Even though the Google Lens workflow is still fairly basic in the world of spatial apps, there is a lot to be learned here. I'd encourage other people to give it a shot and see what other things you might gather from using a tool like this.
- Google lens is remarkably intuitive to use
- "Invisibility design" makes interactions happen so seamlessly that you perform a task without thinking about it
- No need for onboarding at all
- Effective management of screen space, to not clutter the camera view
- Use of AI-enabled point-of-interest markers in the camera view incentivize all user actions including movement and in-app search results
- Cool interaction design to give the illusion that the markers are super spatially aware
This was my first product teardown that I've put on paper but I think there may be more in the future! If you're working on a tough problem to solve, take the time to step back and look around at the not-so-obvious things to gather a bit of insight. Write them down in a notebook so you don't forget them. A lot of times I will find great product ideas from things as simple as a bag of coffee.