Here at Placenote, we believe that we're witnessing a rapid change in the way people interact with computers. We've entered the age of spatial computing; we no longer have to settle for applications confined to the boundaries of your phone screen, and we can now build programs that interact with the world around us.
Developing these spatial apps has a bit of a learning curve, especially when it comes to working with 3D content. Although we take care of providing accurate camera poses (position and orientation) and a map of the space, it's still up to you, the amazing developer, to properly place your content in your scene.
In today's post, we'll teach you some of the background knowledge behind 3D applications, which will make the process of building your spatial application a little bit smoother!
- Creating spatial 3D apps can be a little daunting due to the new tools and terminology.
- 3D coordinate frames are the foundation of these applications. Read the article to find out more!
- Different tools have different coordinate frame conventions. Read the article to find out what they are!
- The world, camera, and local content coordinate frames are the 3 key frames you need to know about when building your app. Read the article to find out why!
- If I could fit this entire article into a TLDR, I wouldn't have written an article.
Why Should I Care???
Making the leap from developing a 2D application to 3D can be a little daunting, especially if it means starting to use some new tools like Unity or SceneKit. These programs have a ton of functionality and the documentation typically dives right into some of the technical terms. We've found that there's a significant amount of trial and error involved when you first get started with building a spatial app, especially when it comes to properly placing content and setting up your scene. Although coordinate frames can be boring (sorry in advance), they really are the foundation for any AR application. Having a strong understanding of the relationship between coordinate frames will save you a ton of time as you build your AR app.
Defining 3D Space
First, we have to define the world around us. We live in three-dimensional space, which means that three values are required to fully define the position of any point in the world. These three values can be expressed by the x, y, and z-axes of the coordinate frame shown below. Each of these 3 axes are orthogonal to each other, meaning that they all intersect with each other at a 90-degree angle.
The intersection point of the 3 axes (the black dot) is the origin, and its position expressed in [x, y, z] coordinates is [0, 0, 0]. The coordinate frame is the foundation for defining content in 3D space. Without a reference coordinate frame, the definition of a point is meaningless; we always have to specify the reference frame to provide context.
For example, let's say we want to define a point p that is positioned at 3 metres in the x-axis, 2 metres in the y-axis, and 5 metres in the z-axis relative to this coordinate frame. This point in [x, y, z] coordinates is expressed as [3, 2, 5], and it would look as follows:
The position is also called the translation of the point with respect to the reference frame. Typically in our AR application, we define a coordinate frame called World, and this frame is the fixed reference for all of our content.
You might be asking, "so what does this have to do with AR? I just want to put a 3D horse on my coffee table". You're right, most of the time we are working with objects instead of points. Objects in 3D have 6 Degrees of Freedom (DoF); to fully define an object in 3D space, we have to also define its orientation as well as its position. This requires 3 more parameters (hence 6 DoF - 3 to define translation, 3 to define rotation). These 3 parameters are the rotation around the x-axis (roll), y-axis (pitch), and z-axis (yaw).
All this information brought together is called an object's pose (position and orientation). This fully defines the 3D transformation of the object with respect to the reference coordinate frame.
To build on the previous example, let's say we take our 3D horse and placed it at [3, 2, 5] as before.
We can now visualize the effects of a 90 degree roll, pitch, and yaw when separately applied to the model.
Or of course, any combination of roll, pitch, and yaw can be applied to Li'l Sebastian here.
In all cases, the position of Li'l Sebastian is unchanged, but since we are rotating the model, the horse's orientation, and therefore pose, are being modified.
Left-Handed versus Right-Handed Coordinate Frames
Based on the application you're using, coordinate frames can be defined either using the right-hand or left-hand convention. The right and left-hand rules are an easy way to remember the directions of the x, y, and z-axes. While it would be nice for every application to use the same convention, this is not the case, because all standards were created to be ignored.
All of the frames shown above in this article are right-handed coordinate frames. If you use your right hand (thumb for the x-axis, index finger for the y-axis, and middle finger for the z-axis), you can align your fingers into the 3D coordinate frame axes. If you use your left hand, the positive z-axis points downwards!
Really Important Note: Actually try to create the coordinate frame using your fingers. You'll look absolutely ridiculous. It's a rite of passage in the computer vision world.
Placenote SDK is compatible with both Unity (left-handed) and Swift/SceneKit (right-handed), so it's always a good idea to know your application's coordinate frame convention before you get started. Below are examples of coordinate frames as seen in Unity and Swift.
Left and Right-Handed Rotations
When switching between a left and right-handed coordinate system, the direction of a positive rotation angle also changes.
We see that for all the rotation angles, the arrow points in the opposite direction when compared to the other coordinate frame convention. However, it's easy to determine the proper direction by using your hand! Point your thumb (either left or right hand, depending on the convention) in the direction of the axis arrow, and then wrap your fingers closed. Your fingers will curl in the direction of a positive rotation.
Important Coordinate Frames
For an AR application, there are several key coordinate frames that we have to be aware of.
World Coordinate Frame
The first frame is the world coordinate frame. This is a fixed reference frame that remains static for the duration of the application. When you are using an ARKit app, the pose of the world frame is determined as soon as the AR Session is started. When the session starts, the world frame is placed, and remains, at the initial position of your camera and is oriented as seen below, with the positive y-axis pointing in the opposite direction to gravity.
In Unity, the world frame is at the same position and orientation, however as it is a left-handed frame, the positive Z-axis points away from the user. This frame is extremely important, as it is the global reference for all 3D objects in the scene.
Camera Coordinate Frame
The second important frame is the camera coordinate frame. This coordinate frame is placed at the center of the phone's camera and is used specifically to denote the pose of the phone at any given time. As per ARKit's documentation, the camera frame is a right-handed coordinate frame, with the x-axis pointing down the long side of the device, and the z-axis pointing towards the user. Using the right-hand rule, we can determine the direction of the y-axis.
Similar to the world frame, once we bring this into Unity, this becomes a left-handed coordinate frame. The x-axis still points down the long side of the device, but now the z-axis points away from the user, in the direction of the camera.
Unlike the world coordinate frame, which is always fixed, the camera coordinate frame moves as the user walks around. The only aspect that is "fixed" about the camera coordinate frame is that it is always placed at the center of the camera. Therefore, at each timestep, which is every image provided by ARKit, the pose of the camera will have changed with respect to the world frame. Tracking the camera poses as you walk through a scene might look like this:
The camera coordinate frame is important because it's the user's interface with the rest of the world and your AR content. The user's AR experience is entirely dependent on the accuracy of the camera pose (which we're constantly working to improve!), and as we'll discuss in Part 2 of this blog series, Unity and SceneKit both provide functions that are especially convenient when applied to the camera transformation with respect to the world frame.
Object Coordinate Frame
The third important frame to know is the coordinate frame for any AR objects in the world. Whenever we place any content in the world, we are defining the pose of the object with respect to a reference frame.
We can actually represent each object as having its own coordinate frame, usually placed at the object's center. A coordinate frame is created for every single object placed in the scene. Following up with our horse example, we see that its local coordinate frame would appear as such. The
w subscript represents the world frame, while the
h subscript represents the horse frame.
This coordinate frame is always fixed at the center of the object, but as the horse moves with respect to the world frame, the local coordinate frame will also move.
Tying it Together
To wrap up this article, we will look at a scene in Unity (left-handed convention!) with a camera and a single object. The world frame is positioned at [0, 0, 0]. The camera frame is positioned at [-3, 3, 5], and is rotated by 20 degrees in pitch, and 120 degrees in roll. The horse is positioned at [5, 1, 3], and is rotated -60 degrees in pitch.
Viewing this scene from the camera shows us what the user would experience.
We've set the horse's pose via the inspector.
One important thing to note is that the rotation is always expressed in the local coordinate frame, and not the world frame! For example, let's say we apply 40 degrees in roll. The horse rotates about the content x-axis, and not the world x-axis. As we can see in the right hand corner, the world x-axis (red) is parallel with the z-axis (blue) of the horse.
The updated inspector pose is now:
Now, we can move the horse closer to the camera. If we set the world position to [2, 1, 3], this is the new updated scene and camera view.
We are hiring - check out our job openings! If you or someone you know wants to help build the future of computing, reach out to us :).