Norbert Kovacs has been overseeing the technical strategy and product development as CTO, and has worked on award-wining projects at Augmented Reality company INDE since 2012. We've asked him to explain the basics of AR in a way that "non-techies" understand it – he did, and the answers are not only interesting, but also really useful.

What is Augmented Reality?

Augmented Reality is a technology that extends the user’s view of the real world with digital, virtual content. AR enhances, extends or teaches about the real world with in-context digital content.

Snapshot from a large-screen AR experience created by INDE

Snapshot from a large-screen AR experience created by INDE

What's the difference between Augmented Reality and Virtual Reality?

In our experience, there is a lot (and I mean a LOT) of confusion when it comes to AR and VR. Even mainstream media tends to confuse the two. There are countless examples of article titles claiming some fact about AR and going on and on about VR. It’s interesting, because there is a very well defined difference between the two: VR places the viewer into digital surroundings while AR places digital objects around the viewer. In VR the viewer is experiencing a totally standalone, most of time unique (virtual) reality, while in AR the reality is the real world but extended with some digital content.

How do you ensure the digital additions spatially register with the real world, in real time?

Oh, this is a topic we could go on and on about for hours. In general, to provide a perfect experience it’s important for the digital content to appear accurately placed in the real world. There are different techniques and technologies that help with that. Without going too much into details or giving a comprehensive technical overview, the 3 most used techniques:

        1.) Visual keypoint matching – This method uses a visual trigger - or marker. This is the “standard” AR that comes to mind for most people when they hear about Augmented Reality.  Unique features (corners, edges) of the visual target extracted and stored in a target database. The system utilising this method is constantly extracting and comparing features of the live camera view with the one stored in the database. Once a match is found (and after a series of other calculations), an algorithm generates/calculates a virtual flat surface based on the position and angle of the visual target and all of the digital content is placed in the space related to that flat surface. Whether the visual target remains in the camera view is academic at that point, the flat surface remains the anchor of the digital content.

        2.) Spatial mapping – this is where AR (or MR) gets interesting. Usually supported by a depth sensor or similar, the framework is constantly building a virtual representation of the real world (sort of like a 3D scan). This virtual copy of the real world is then used for mainly two things:

        • Detect flat surfaces in the real world (horizontal and vertical)
        • Use the virtual “mesh” of the real world for occlusion

Using spatial mapping we can define the virtual anchors in the real space and position virtual content in relation to those anchors. Whenever the viewer returns to that space, the virtual content will be in the same place.

The interesting part here is not really the spatial positioning of content though. Occlusion of the virtual content with real world objects is just as important, if not more important. To give the perfect illusion of a virtual content in the real world occlusion is necessary and will be (or already is) the focus of development. Right now occlusion is achievable using special depth sensors (think HoloLens or Magic Leap), and building a high enough quality copy of the real world.

        3.) Device sensors (accelerometer, gyroscope, magnetometer, GPS) – this technique provides no visual search or mapping of the real world. Instead, the viewer is the anchor of the virtual space and content is placed in relation to the viewer, positioned usually using the compass of the device and held in place by combining multiple sensors. The easiest way to imagine this is: think of a VR scene you’re watching in Cardboard for example. The scene has a table, and a room as a background. Now replace the room with the camera view of the real world, but keep the table there. That’s basically it. A big drawback of this approach is that the sensors are usually either not accurate enough to keep the virtual content locked to a certain point or it’s impossible to detect movement of the viewer with this technique alone. As ARKit and ARCore showed though, when you support visual tracking with this technique the results can be very convincing. We were able to successfully mix this approach with the visual trigger technique resulting in convincing AR experiences.

It’s important to mention that ARKit and ARCore successfully integrated visual tracking with sensor data providing a result that rivals with 3D spatial mapping in accuracy. So it is a very exciting time to watch those two frameworks and others to see how they will move forward with occlusion for example.

How does mobile AR work? Can you explain the core components that make it work?

Mobile AR works by utilising an AR framework that handles placement of a virtual content in the real world and a rendering engine that provides the virtual content. The two is inseparable, and AR as a technology means absolutely nothing without a meaningful content. Mobile AR usually uses visual triggers to start a certain content in relation to that visual: think of a movie poster triggering the trailer of that particular movie. With the introduction of ARKit for iOS devices and ARCore for supported Android devices visual tracking mixed with sensor data is also a very important and new way of locking AR content to the real world.

Is the technology behind large screens and mobile AR different?

Yes. Our BroadcastAR product was launched way before it was feasible to create high quality AR experiences on mobile - due to a not-so-reliable tracking and low hardware specs in mobile devices. It does not include visual tracking, but it simply doesn’t need to. Without any special device or a distracting technology between the viewer and the virtual characters, people in front of the large screen can experience AR as freely as possible.

How long has INDE been creating AR?

We’ve been working with AR since 2010, back when it meant placing a cube on a black square marker, and even that resulted in a “woah, that’s magic” reaction. AR has been the center of our work ever since and we experienced the worst and the best of this world over the years. How much we’re excited and passionate about AR is hard to express.

What type of professionals are needed to create an AR app?

That is entirely dependent on what kind of virtual content will be shown to the viewer and what the functionality of the app will be. AR technology itself means nothing without meaningful and high quality content just as Photoshop itself isn’t valuable unless you use it to create a digital painting or touch up a photo.

This AR content can be 3D, but it can also be an intuitive user interface, a useful, spatially placed in-context information, and so on.

So it’s really hard to answer this question. You definitely need a good UX designer since it’s safe to say that AR requires a new approach to how users interact and consume content. You need a very good content team that creates high quality content and a very good engineer team who can program the content to do what it needs to do and appear where it needs to appear.



An interview with Norbert Kovacs, CTO of INDE, one of the world”s top Augmented Reality companies.