WebXR Augmented Reality Module - Level 1

Editor’s Draft,

More details about this document
This version:
https://immersive-web.github.io/webxr-ar-module/
Latest published version:
https://www.w3.org/TR/webxr-ar-module-1/
Implementation Report:
https://wpt.fyi/results/webxr/ar-module?label=experimental&label=master&aligned
Feedback:
GitHub
Editors:
(Google)
(Google [Mozilla until 2020])
(Meta)
Former Editor:
(Amazon [Microsoft until 2018])
Participate:
File an issue (open issues)
Mailing list archive
W3C’s #immersive-web IRC

Abstract

The WebXR Augmented Reality module expands the WebXR Device API with the functionality available on AR hardware.

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was published by the Immersive Web Working Group as an Editors' Draft. This document is intended to become a W3C Recommendation. Feedback and comments on this specification are welcome. Please use Github issues. Discussions may also be found in the [email protected] archives.

Publication as an Editors' Draft does not imply endorsement by W3C and its Members. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 2 November 2021 W3C Process Document.

This WebXR Augmented Reality Module is designed as a module to be implemented in addition to WebXR Device API, and is originally included in WebXR Device API which was divided into core and modules.

1. Introduction

Hardware that enables Virtual Reality (VR) and Augmented Reality (AR) applications are now broadly available to consumers, offering an immersive computing platform with both new opportunities and challenges. The ability to interact directly with immersive hardware is critical to ensuring that the web is well equipped to operate as a first-class citizen in this environment. The WebXR Augmented Reality module expands the functionality available to developers when their code is running on AR hardware.

1.1. Terminology

Augmented Reality describes a class of XR experiences in which virtual content is aligned and composed with the real-world environment before being displayed to users.

XR hardware can be divided into categories based on display technology: additive light, pass-through, and opaque.

Devices described as having an additive light display technology, also known as see-through, use transparent optical displays to present virtual content. On these devices, the user may always be able to see through to the real-world environment regardless of developer requests during session creation.

Note: Such devices typically will not do any compositing in software, relying on the natural compositing afforded by transparent displays.

Examples of such devices include the Hololens 2 and Magic Leap devices.

Devices described as having a pass-through display technology use an opaque display to combine virtual content with a camera stream of the real-world environment. On these devices, the real-world environment will only be visible when the developer has made an explicit request for it during session creation.

Note: Such devices will typically use cameras to collect images of the real world, and composite the AR scene with these images in software before displaying them to the user.

Examples of such devices include handheld mobile AR with a phone, and the Varjo XR-3 device.

Devices described as having an opaque display technology fully obscure the real-world environment and do not provide a way to view the real-world environment.

Note: Such devices are typically VR devices that have chosen to allow "immersive-ar" sessions in an attempt to provide a compatibility path for AR content on VR devices.

2. WebXR Device API Integration

2.1. XRSessionMode

As defined in the WebXR Device API categorizes XRSessions based on their XRSessionMode. This module enables use of the "immersive-ar" XRSessionMode enum.

XRSessionMode

In only one current engine.

FirefoxNoneSafariNoneChrome79+
OperaNoneEdge79+
Edge (Legacy)NoneIENone
Firefox for AndroidNoneiOS SafariNoneChrome for Android79+Android WebView79+Samsung Internet11.2+Opera MobileNone

A session mode of "immersive-ar" indicates that the session’s output will be given exclusive access to the immersive XR device display and that content is intended to be blended with the real-world environment.

On compatible hardware, user agents MAY support "immersive-vr" sessions, "immersive-ar" sessions, or both. Supporting the additional "immersive-ar" session mode, does not change the requirement that user agents MUST support "inline" sessions.

NOTE: This means that "immersive-ar" sessions support all the features and reference spaces that "immersive-vr" sessions do, since both are immersive sessions.

The following code checks to see if "immersive-ar" sessions are supported.
navigator.xr.isSessionSupported('immersive-ar').then((supported) => {
  if (!supported) { return; }
  // 'immersive-ar' sessions are supported.
  // Page should advertise AR support to the user.
}
The following code attempts to retrieve an "immersive-ar" XRSession.
let xrSession;

navigator.xr.requestSession("immersive-ar").then((session) => {
  xrSession = session;
});

2.2. XREnvironmentBlendMode

When drawing XR content, it is often useful to understand how the rendered pixels will be blended by the

XREnvironmentBlendMode

In only one current engine.

FirefoxNoneSafariNoneChrome79+
OperaNoneEdge79+
Edge (Legacy)NoneIENone
Firefox for AndroidNoneiOS SafariNoneChrome for Android79+Android WebView79+Samsung Internet11.2+Opera MobileNone

XRSession/environmentBlendMode

In only one current engine.

FirefoxNoneSafariNoneChrome79+
OperaNoneEdge79+
Edge (Legacy)NoneIENone
Firefox for AndroidNoneiOS SafariNoneChrome for Android79+Android WebViewNoneSamsung Internet11.2+Opera MobileNone
XR Compositor with the real-world environment.
enum XREnvironmentBlendMode {
  "opaque",
  "alpha-blend",
  "additive"
};

partial interface XRSession {
  // Attributes
  readonly attribute XREnvironmentBlendMode environmentBlendMode;
};

The environmentBlendMode attribute MUST report the XREnvironmentBlendMode value that matches blend technique currently being performed by the XR Compositor.

2.3. XRInteractionMode

Sometimes the application will wish to draw UI that the user may interact with. WebXR allows for a variety of form factors, including both handheld phone AR and head-worn AR. For different form factors, the UIs will belong in different spaces to facilitate smooth interaction, for example the UI for handheld phone AR will likely be drawn directly on the screen without projection, but the UI for headworn AR will likely be drawn a small distance from the head so that users may use their controllers to interact with it.

enum XRInteractionMode {
    "screen-space",
    "world-space",
};

partial interface XRSession {
  // Attributes
  readonly attribute XRInteractionMode interactionMode;
};

The interactionMode attribute describes the best space (according to the user agent) for the application to draw interactive UI for the current session.

Note: The WebXR DOM Overlays module, if supported, can be used in some of these cases instead.

2.4. XR Compositor Behaviors

When presenting content to the XR device, the XR Compositor MUST apply the appropriate blend technique to combine virtual pixels with the real-world environment. The appropriate technique is determined based on the XR device's display technology and the mode.

NOTE: When using a device that performs alpha-blend environment blending, use of a baseLayer with no alpha channel will result in the real-world environment being completely obscured. It should be assumed that this is intentional on the part of developer, and the user agent may wish to suspend compositing of real-world environment as an optimization in such cases.

The XR Compositor MAY make additional color or pixel adjustments to optimize the experience. The timing of composition MUST NOT depend on the blend technique or source of the real-world environment. but MUST NOT perform occlusion based on pixel depth relative to real-world geometry; only rendered content MUST be composed on top of the real-world background.

NOTE: Future modules may enable automatic or manual pixel occlusion with the real-world environment.

The XR Compositor MUST NOT automatically grant the page access to any additional information such as camera intrinsics, media streams, real-world geometry, etc.

NOTE: Developers may request access to an XR Device's camera, should one be exposed through the existing Media Capture and Streams specification. However, doing so does not provide a mechanism to query the XRRigidTransform between the camera’s location and the native origin of the viewer reference space. It also does not provide a guaranteed way to determine the camera intrinsics necessary to match the view of the real-world environment. As such, performing effective computer vision algorithms wil be significantly hampered. Future modules or specifications may enable such functionality.

2.5. First Person Observer Views

Many AR devices have a camera, however the camera is typically not aligned with the eyes. When doing video capture of the session for streaming or saving to a file, it is suboptimal to simply composite this camera feed with one of the rendered eye feeds as there will be an internal offset. Devices may use reprojection or other tricks to fix up the stream, but some may expose a secondary view, the first-person observer view, which has an eye of "none".

Site content MUST explicitly opt-in to receiving a first-person observer view by enabling the "secondary-views" feature descriptor.

Enabling the "secondary-views" feature for a session that supports first-person observer views SHOULD NOT enable the first-person observer view unconditionally on every frame of the session, rather it will only expose this view in the views array for frames when capture is going on.

While the XRSession has a blend technique exposed by the environmentBlendMode, first-person observer views always use alpha-blend environment blending.

Site content may wish to know which view is the first-person observer view so that it can account for the different blend technique, or choose to render UI elements differently. XRView objects that correspond to the first-person observer view have their isFirstPersonObserver attribute returning true.

partial interface XRView {
  readonly attribute boolean isFirstPersonObserver;
};
For most programs, supporting secondary views is simply a matter of:
let session = await navigator.xr.requestSession("immersive-ar", {optionalFeatures: ["secondary-views"]});
let space = await session.requestReferenceSpace("local");
// perform other set up
let gl = /* obtain a graphics context */;

session.requestAnimationFrame(function(frame) {
  let views = frame.getViewerPose(space);

  // IMPORTANT: use `view of views` here instead of
  // directly indexing the first two or three elements
  for (view of views) {
    render(session, gl, view);
  }
});

function render(session, gl, view) {
  // render content to the view
  // potentially use view.isFirstPersonObserver if necessary to
  // distinguish between compositing info
}

3. Privacy & Security Considerations

Implementations of the AR Module MUST NOT expose camera images to the content, rather they MUST handle any compositing with the real world in their own implementations via the XR compositor. Further extensions to this module MAY expose real-world information (like raw camera frames or lighting estimation), however they MUST gate this behavior on an additional feature descriptor that requires user consent.

Compared to the WebXR Device API it extends, the AR module only provides some additional details about the nature of the device it is running on via the environmentBlendMode and interactionMode attributes. It allows websites to start an XR session as "immersive-ar" which blends the real world behind the XR scene.

Even if this module does not allow websites to access the camera images, it may not be obvious to end users and user agents SHOULD clarify this.

Changes

Changes from the First Public Working Draft 10 October 2019

4. Acknowledgements

The following individuals have contributed to the design of the WebXR Device API specification:

And a special thanks to Vladimir Vukicevic (Unity) for kick-starting this whole adventure!

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[COMPOSITING-1]
Rik Cabanier; Nikos Andronikos. Compositing and Blending Level 1. URL: https://drafts.fxtf.org/compositing-1/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/
[WEBXR]
Brandon Jones; Manish Goregaokar; Rik Cabanier. WebXR Device API. URL: https://immersive-web.github.io/webxr/

Informative References

[MEDIACAPTURE-STREAMS]
Cullen Jennings; et al. Media Capture and Streams. URL: https://w3c.github.io/mediacapture-main/

IDL Index

enum XREnvironmentBlendMode {
  "opaque",
  "alpha-blend",
  "additive"
};

partial interface XRSession {
  // Attributes
  readonly attribute XREnvironmentBlendMode environmentBlendMode;
};

enum XRInteractionMode {
    "screen-space",
    "world-space",
};

partial interface XRSession {
  // Attributes
  readonly attribute XRInteractionMode interactionMode;
};

partial interface XRView {
  readonly attribute boolean isFirstPersonObserver;
};