Bring your iOS or iPadOS game to visionOS

Bring your iOS or iPadOS game to visionOS

Discover how to transform your iOS or iPadOS game into a uniquely visionOS experience. Increase the immersion (and fun factor!) with a 3D frame or an immersive background. And invite players further into your world by adding depth to the window with stereoscopy or head tracking.

Chapters
- 0:00 - Introduction
- 1:42 - Render on visionOS
- 3:48 - Compatible to native
- 6:41 - Add a frame and a background
- 8:00 - Enhance the rendering
Resources
Related Videos

WWDC24
WWDC23
- Build spatial experiences with RealityKit
Download

Hello. I'm Olivier, a Software Engineer on RealityKit and visionOS. In this video I will show you how you can transform your iOS or iPadOS game to be an even more immersive experience on visionOS.
For example, the game Wylde Flowers is a cozy life and farming simulator. Here is the iPad version of the game.
And here is the same iPad version running on visionOS as a compatible app, in a window. The latest version of Wylde Flowers that you can get from the visionOS App store has many enhancements specific to the platform.
For example, there is a 3D frame around the window. And this frame changes based on the gameplay: for example, when the player interacts with the garden, the frame shows a 3D model of the garden and additional UI.
There is also an immersive background around the game, with dandelions falling or a humming bird flying around.
Finally, the game view is rendered in stereoscopy, to add more depth to the scene.
Those enhancements make playing Wylde Flowers on visionOS even better; and in this video, I will show you how you can do the same for your iOS or iPadOS game.
I'll first talk about the rendering technologies available on visionOS. Then I'll show you how to convert your iOS app to a native visionOS app. I'll show you how to add a RealityKit frame and a background to your game. And finally, I'll show you how to enhance the Metal rendering of your game with stereoscopy, head tracking and VRR.
First, let’s go over the rendering technologies available on visionOS. You can do 3D rendering on visionOS with RealityKit and Metal.
RealityKit is a framework that you can use directly in Swift or indirectly through technologies like Unity PolySpatial. It enables multiple apps in the same Shared Space, on visionOS. For example, you can use RealityKit to render content in a volumetric window, like the games Game Room and LEGO Builder’s Journey. Or you can use RealityKit to render content in an ImmersiveSpace, like Super Fruit Ninja and Synth Riders. You can also render directly with Metal on visionOS, which you might choose if you need a custom rendering pipeline or if it would not be practical for you to port your game to RealityKit.
There are two main modes for rendering with Metal on visionOS.
You can run your game as a compatible app in a window, where your app behaves very similarly to how it would on an iPad. This is the case of the iPad version of Wylde Flowers, running as a compatible app on visionOS.
A big advantage of having your app in a window is that it can run along side other apps, in the Shared Space. For example, you can play the game at the same time as using Safari or sending messages to friends.
You can also use CompositorServices to run your game as a fully immersive app where the game's camera is controlled by the player's head. For example, the sample from the video "Render Metal with passthrough in visionOS” is rendered using CompositorServices. Learn more by watching the video.
It is very easy to run your iPad app as a compatible app on visionOS; whereas making your game an immersive app with CompositorServices makes the app much more vivid, but might require a big redesign and rework since the player has full control of the camera and can look anywhere in the scene.
In this video, I will present a set of techniques that are in-between those two modes: I will you show how you can start with a compatible app and progressively add features to increase the immersion and leverage the capabilities of Vision Pro.
The first and easiest step is to run your game as a compatible app on visionOS. I'll use the Metal Deferred Lighting sample as an example of an iOS game. This video shows the sample running on iPad.
You can download the iOS version of this sample on the developer website at developer.apple.com.
Let’s start by compiling the app with the iOS SDK and run it on visionOS as a compatible app.
Compatible apps will run in a window on visionOS.
Note that both touch controls and game controllers are available on visionOS, giving a uniform experience across all platforms, out of the box.
You can find more information about game inputs in the video “Explore game input in visionOS”. Compatible apps work great on visionOS, but let’s convert the app to a native app to start using the visionOS SDK. Go in the Build Settings of the app and select the iOS target. Currently, it is marked as Designed for iPad because it uses the iOS SDK on visionOS.
Let’s add Apple Vision as a supported destination, to compile the app with the visionOS SDK.
You might get a few compilation errors but if the app was made for iOS, most of the code should compile.
For example, you have several options to display the content of your Metal iOS game on visionOS. You can render to a CAMetalLayer, which can easily be integrated into UI Views. Or you can render directly to a RealityKit texture with the new LowLevelTexture API.
You can start with a CAMetalLayer if it is easier, but I recommend moving to a LowLevelTexture to get the most control.
If you want to render to a CAMetalLayer, you can create a View that contains it.
And you can create a CADisplayLink to get a callback every frame.
Here is what the code looks like. The UIView declares a CAMetalLayer as its layerClass. It then creates a CAMetalDisplayLink to get a render callback. And finally, it renders the CAMetalLayer in the callback, every frame.
You can use LowLevelTextures in a similar way. You can create a LowLevelTexture from a given pixel format and resolution. You can then create a TextureResource from the LowLevelTexture, and use it anywhere in a RealityKit scene. And you can use a CommandQueue to draw to the LowLevelTexture, through an MTLTexture.
Here is how you can do it in code. It creates a LowLevelTexture. Then creates a TextureResource from it, to be used anywhere in the RealityKit scene.
And then draws into the texture every frame.
For more details about LowLevelTexture, see the video “Build a spatial drawing app with RealityKit”. Now that we have converted our game to a native visionOS app, we can add visionOS specific features. For example we can increase the immersion of the app by adding a frame around the game view and a background in an ImmersiveSpace.
The game Cut The Rope 3 has a dynamic frame around its window.
The frame is rendered with RealityKit and the game is rendered with Metal.
You can achieve this by using a ZStack that has the Metal view rendering the game to a texture, like I have shown earlier, and you can create the frame with a RealityView that loads a 3D model around the game.
The frame can be dynamic, by using a @State variable. For example, in Cut The Rope 3, the frame changes depending on the level. You can also add an immersive background behind your game. The game Void-X is a good example of that. Most of the gameplay takes place in the window, but Void-X adds rain and lightning in the background and bullets flying all over your room in 3D, to increase immersion.
You can create the background with an ImmersiveSpace in SwiftUI.
And you can put your iOS game in a WindowGroup.
You can also have shared @State between the window and ImmersiveSpace, by using a SwiftUI @State object.
I've shown you how to add elements around your game. I'll now take you through some techniques you can use to enhance the Metal rendering of your game. First, I'll show you how to add stereoscopy, to add depth to your game. Then how to add head tracking, to make your game look like a physical window into another world. And finally, I'll show you how to add VRR to your game, to improve the performance.
You can add stereoscopy to your game, to add more depth to the scene, similar to how stereoscopic movies work.
Here's the scene from the Deferred Lighting sample rendered with stereoscopy. For illustrative purposes, I'll show the stereoscopy in anaglyph, with red and cyan tints; but on the Vision Pro, each eye would see a different image.
Stereoscopy essentially works by showing a different image to each eye. You can achieve this on visionOS by using a RealityKit ShaderGraph, and more specifically the CameraIndex node.
And you can provide a different image to each eye, to achieve stereoscopy.
The depth effect from stereoscopy comes from the distance between the views of an object in each image, which is called parallax.
This parallax will make your eyes converge more or less, depending on the distance between you and the object. Your eyes will be parallel when looking at infinity and will converge when looking at something close, which is one of the cues that your brain will use to judge distances. This is how stereoscopy can bring depth to your scene, as if it was a physical miniature in front of you or a physical window into another world.
Objects with negative parallax appear in front of the image. Objects without parallax, which means that the two images overlap, appear on the image plane, just like in a 2D image. And objects with positive parallax appear behind the image plane.
And this is just a representation of what stereoscopy feels like, when viewed from the front.
In practice, if you look from the side, you cannot see anything coming out of the window, since the content is simply displayed on a rectangle.
If you want objects to come out of the bounds of the rectangle, you can render them with RealityKit and APIs such as the new portal-crossing API. See the video “Discover RealityKit APIs for iOS, macOS and visionOS” for an example of portal-crossing.
And if you don’t use the player’s head position, then the scene will look projected when viewed from the side. I will present later on how to use the head position. Here is a diagram showing how stereoscopy works. The object is perceived at the intersection between the two rays. The perceived depth varies depending on the parallax between the two images. As the parallax changes, the position of the intersection point also changes.
Also note that for a given stereoscopic image, the perceived depth varies depending on the size and position of the window, even if the images do not change.
The video "Build compelling spatial photo and video experiences” goes over more details about creating stereoscopic content for Vision Pro.
One of the main situations that I recommend avoiding, is having content rendering beyond infinity. When looking at content, your eyes should either converge or be parallel. Content will go beyond infinity if the parallax becomes bigger than the distance between the viewer’s eyes: the rays will diverge and there will be no intersection point. That situation never happens when looking at real content, and it is very uncomfortable for the viewer. One way to solve this is to display the stereoscopic image on an infinitely far plane. For example, here is an image rendering on the window plane. Some content appears behind the image plane and some content appears in front of the image plane. By rendering the content on an infinitely far image plane, for example through a portal like Spatial Photos do, content with a parallax of 0 will appear at infinity, and everything else will appear in front of it, with a negative parallax. This way, you can guarantee that all the content will appear in front of infinity.
I also recommend adding a slider to the settings of your game, for the player to adjust the intensity of the stereoscopy to their comfort, which you can implement by changing the distance between the two virtual cameras. In order to generate the stereoscopic images, you will need to update your game loop to render to each eye. The game loop of the Deferred Lighting sample on iOS looks something like this.
The sample updates the game state and the animations. The sample then does offscreen renders, such as shadow maps. And then it renders to the screen. Finally, the sample presents the render.
For stereoscopy, you will need to duplicate the screen rendering for each eye. And for the best performance, you can use Vertex Amplification to render both eyes with the same draw calls.
There's an article about Vertex Amplification on the developer documentation.
For example, here is how I adapted the code of the Deferred Lighting Sample. It starts by encoding the shadow passes, once. Then, it goes over each view and sets the appropriate camera matrices. And finally, it encodes the rendering commands to the color and depth textures of that view.
Stereoscopy adds depth to the scene, similar to a stereoscopic movie or a Spatial photo. To make your game look even more like a window into another world, you can also add head tracking. For example, here is the Deferred Lighting sample with head tracking. The camera moves as my head moves.
You can get the position of the player’s head by opening an ImmersiveSpace and using ARKit. You can then get the head position from ARKit every frame and pass it to your renderer to control the camera.
Here is what the code looks like. It first imports ARKit. It then creates an ARKitSession and a WorldTrackingProvider. And, every frame, it queries the head transform.
Also note that windows and ImmersiveSpaces have their own coordinate spaces on visionOS. The head transform from ARKit is in the coordinate space of the ImmersiveSpace. To use it in a window, you can convert the position to the window’s coordinate space.
Here is how you can do it in code. You can get the head position in the coordinate space of the ImmersiveSpace, from ARKit. You can then get the transformMatrix of an Entity in your window, relative to the ImmersiveSpace, using this new API from visionOS 2.0.
You can invert this matrix and convert the head position to the window space. Finally, you can set this camera position to your renderer.
To get the best results, make sure to do prediction of the head position. Because it takes time for the rendering to happen and for the render to be displayed, use an estimate of that time and predict the head position, so that the render matches the final head position as best as possible. ARKit will do the head prediction if you give it an estimated render time for your app. In the sample, I used 33 milliseconds for the estimated presentationTime, which corresponds to 3 frames at 90fps. To make your game look like it is rendered through a physical window, you will also need to build an asymmetric projection matrix. If you use a fixed projection matrix, it will not match the shape of the window. You have to make the camera frustum go through the window. For example, you can use the vectors to the left and right sides of the window to build the projection matrix. One advantage of building the projection matrix this way is that you can use a near clipping plane that is aligned with the window, to prevent objects from intersecting with the sides of the window.
Here is what the code looks like. You start with the position of the camera and the 3D bounds of the viewport. The camera is facing towards -Z, at the given position. You then compute the distances to each side of the viewport. And you use those distances to build an asymmetric projection matrix. This is how you can use head tracking to make your game look like a physical window into another world. Stereoscopy increases the immersion of your game. But it also increases the cost of the rendering, since your game needs to render twice as many fragments. You can offset some of this by using Variable Rasterization Rates to improve the rendering efficiency of your game. Variable Rasterization Rates is a feature of Metal to render with a variable resolution across the screen.
You can use it to lower the resolution at the periphery and increase the resolution at the center. If you are using head tracking, you can build a VRR map from the head transform since you can know whether a pixel is at the center of the field of view or at the periphery. If you are in Shared Space, you don’t have access to the head position, but you can still build a VRR map by using the AdaptiveResolutionComponent and placing the components in a 2D grid over your game viewport.
The AdaptiveResolutionComponent gives you an approximation of the size in pixels that a 1 meter cube would take on the screen at this 3D location. For example, here, the values go from 1024 to 2048 pixels.
This video shows how the values on the AdaptiveResolutionComponent change as the camera gets further and closer.
You can extract a horizontal and a vertical VRR map from the 2D grid. For smoother results, you can interpolate each VRR map and you can finally pass them to your Metal renderer. Finally, once your content is rendered with VRR, you will have to re-map it to the display by inverting the VRR map. This is how you can use Variable Rasterization Rates to improve the performance of your game by adjusting the rendering resolution to the camera transform. With all those enhancements, you can make your game even better on visionOS, just like how the game Wylde Flowers was transformed for visionOS.
In this video, we have seen how you can bring your iOS game to visionOS. And how you can add a frame and an ImmersiveSpace to increase the immersion of your game. How you can add stereoscopy and head tracking to your Metal renderer, to make your game look like a window into another world. And how you can use VRR to optimize performance. I hope that those techniques will help you make your iOS games even better on visionOS. And I’m looking forward to play your games on my Vision Pro.

// Render with Metal in a UIView.

class CAMetalLayerBackedView: UIView, CAMetalDisplayLinkDelegate {

    var displayLink: CAMetalDisplayLink!

    override class var layerClass : AnyClass { return CAMetalLayer.self }

    func setup(device: MTLDevice) {
        let displayLink = CAMetalDisplayLink(metalLayer: self.layer as! CAMetalLayer)
        displayLink.add(to: .current, forMode: .default)
        self.displayLink.delegate = self
    }

    func metalDisplayLink(_ link: CAMetalDisplayLink,
              needsUpdate update: CAMetalDisplayLink.Update) {
        let drawable = update.drawable
        renderFunction?(drawable)
    }
}

6:20 - Render with Metal to a RealityKit LowLevelTexture

// Render Metal to a RealityKit LowLevelTexture.

let lowLevelTexture = try! LowLevelTexture(descriptor: .init(
    pixelFormat: .rgba8Unorm,
    width: resolutionX,
    height: resolutionY,
    depth: 1,
    mipmapLevelCount: 1,
    textureUsage: [.renderTarget]
))

let textureResource = try! TextureResource(
    from: lowLevelTexture
)
// assign textureResource to a material

let commandBuffer: MTLCommandBuffer = queue.makeCommandBuffer()!
let mtlTexture: MTLTexture = texture.replace(using: commandBuffer)
// Draw into the mtlTexture

7:06 - Metal viewport with a 3D RealityKit frame around it

// Metal viewport with a 3D RealityKit frame 
// around it.

struct ContentView: View {

    @State var game = Game()

    var body: some View {
        ZStack {
           CAMetalLayerView { drawable in
                             game.render(drawable) }

           RealityView { content in
                content.add(try! await 
                            Entity(named: "Frame"))
            }.frame(depth: 0)
        }
    }
}

7:45 - Windowed game with an immersive background

// Windowed game with an immersive background

@main
struct TestApp: App {

    @State private var appModel = AppModel()

    var body: some Scene {
        WindowGroup {
            // Metal render
            ContentView(appModel)
        }

        ImmersiveSpace(id: "ImmersiveSpace") {
            // RealityKit background
            ImmersiveView(appModel)
        }.immersionStyle(selection: .constant(.progressive),                                     in: .progressive)
    }
}

13:11 - Render to multiple views for stereoscopy

// Render to multiple views for stereoscopy.

override func draw(provider: DrawableProviding) {

    encodeShadowMapPass()

    for viewIndex in 0..<provider.viewCount {
        scene.update(viewMatrix: provider.viewMatrix(viewIndex: viewIndex),
               projectionMatrix: provider.projectionMatrix(viewIndex: viewIndex))
        var commandBuffer = beginDrawableCommands()
        if let color = provider.colorTexture(viewIndex: viewIndex, for: commandBuffer),
           let depthStencil = provider.depthStencilTexture(viewIndex: viewIndex,
                                                                 for: commandBuffer)
        {
            encodePass(into: commandBuffer, color: color, depth: depth)
        }
        endFrame(commandBuffer)
    }
}

13:55 - Query the head position from ARKit every frame

// Query the head position from ARKit every frame.

import ARKit

let arSession = ARKitSession()
let worldTracking = WorldTrackingProvider()

try await arSession.run([worldTracking])

// Every frame

guard let deviceAnchor = worldTracking.queryDeviceAnchor(
    atTimestamp: CACurrentMediaTime() + presentationTime
) else { return }

let transform: simd_float4x4 = deviceAnchor
    .originFromAnchorTransform

14:22 - Convert the head position from the ImmersiveSpace to a window

// Convert the head position from the ImmersiveSpace to a window.

let headPositionInImmersiveSpace: SIMD3<Float> = deviceAnchor
    .originFromAnchorTransform
    .position

let windowInImmersiveSpace: float4x4 = windowEntity
    .transformMatrix(relativeTo: .immersiveSpace)

let headPositionInWindow: SIMD3<Float> = windowInImmersiveSpace
    .inverse
    .transform(headPositionInImmersiveSpace)

renderer.setCameraPosition(headPositionInWindow)

15:05 - Query the head position from ARKit every frame

// Query the head position from ARKit every frame.

import ARKit

let arSession = ARKitSession()
let worldTracking = WorldTrackingProvider()

try await arSession.run([worldTracking])

// Every frame

guard let deviceAnchor = worldTracking.queryDeviceAnchor(
    atTimestamp: CACurrentMediaTime() + presentationTime
) else { return }

let transform: simd_float4x4 = deviceAnchor
    .originFromAnchorTransform

15:47 - Build the camera and projection matrices

// Build the camera and projection matrices.

let cameraPosition: SIMD3<Float>
let viewportBounds: BoundingBox

// Camera facing -Z
let cameraTransform = simd_float4x4(AffineTransform3D(translation: Size3D(cameraPosition)))

let zNear: Float = viewportBounds.max.z - cameraPosition.z
let l /* left */: Float = viewportBounds.min.x - cameraPosition.x
let r /* right */: Float = viewportBounds.max.x - cameraPosition.x
let b /* bottom */: Float = viewportBounds.min.y - cameraPosition.y
let t /* top */: Float = viewportBounds.max.y - cameraPosition.y

let cameraProjection = simd_float4x4(rows: [
    [2*zNear/(r-l),             0, (r+l)/(r-l),      0],
    [            0, 2*zNear/(t-b), (t+b)/(t-b),      0],
    [            0,             0,           1, -zNear],
    [            0,             0,           1,      0]
])

Looking for something specific? Enter a topic above and jump straight to the good stuff.

An error occurred when submitting your query. Please check your Internet connection and try again.

Chapters

Resources

Related Videos

WWDC24

WWDC23