You may remember my brief involvement with AR standardisation (the annotated whiteboards are here). As a result I’m on the mailing list for various things, and the Open Geospatial RFC for the ARML 2.0 spec just dropped. Being on a plane, I got around to reading it. I have comments.
Back in Barca, in order to “simplicate and add lightness”, I wanted to make the point that fieldnaming and typing more important than anglebracket jihad (there was a row about HTML or JSON style), to make sure relative coordinates were a possibility, as I thought they would be really important for industrial applications and I’d just been bitching at Dirk from Layar about this. I’m very pleased to see that relative location made it into the spec.
I also think it’s important to have simultaneous, intermixed display – this fits with the (usually) urban environment these things are used in, and map overlays are a powerful pattern. This brings problems, though.
The ARML spec
I was a strong advocate for markup, but I’m really not sure about this. It’s a bit of a mess of HTML and object oriented concepts. I challenge anyone to try explaining a “trackable” to an intelligent programmer, and recommend you don’t try explaining it to your dad.
Reading through, I jotted various points.
What should the internal context of the browser, equivalent to DOM, be like?
Are we representing the AR content internally as several independent objects (like browser tabs) containing placemarks, or are we representing all the placemarks currently loaded in a common context, or are we keeping them in a store and then lazyloading them into the display?
is the current composed scene several “pages” overlaid, or one metaclass of objects from multiple sources?
“Composed scene” is the AR term-of-art for the subset of all the currently loaded content that’s visible in the current view. This led me to the next point:
HTML thinks of everything as a document
We definitely don’t look at documents, or Web pages, like this. You don’t have multiple tabs overlaid on each other, with elements selectively hidden. HTML is a fundamentally textual medium, AR things just…aren’t. This raises an interesting point.
should different AR things be able to see each other?
OK, what if I wanted to show some object, depending on the presence of another object? Or if I wanted to track an AR object and use that as a reference to place another? I can’t think of any reason why I shouldn’t be able to do that within my own page/application/whatever one of these is called. It might be cool to be able to do things based on the content of other pages, but I can see why having a same-domain policy would be valuable (you don’t want RandomApp smearing dogshit all over FindYourHSBC).
Should objects change state because some sort of conditional was passed to the browser when they were created, or because scripts tell them?
At the moment, you can have an “enabled=” property on an ARML object that determines if it is shown or not. You can also have a script show or hide objects, or even create them ex novo, based on its program logic. I think we ought to decide how much logic should be in the supposedly structural ARML and the browser, and how much in scripting.
What about audio, and, and…
The spec sort-of deals with the possibility that you might want to show things based on sound, or on conditions that aren’t locations in general. But I don’t think it provides a convincing starting point for this.
Probably best not lazyload data
It often suggests that content ought to be pulled over the network as late as possible. But I think this is wrong. We’ve got lots of RAM! We’ve got fast radio air interfaces! But our radio networks like interactions that crank up to linerate, and are then finished. Mobile networks are stressed more by signalling than by traffic, and that’s driven by setup/teardown. Chatty is the enemy. Further, the other big constraint in mobility is battery, and that’s driven by how long the radio is active.
If you must be chatty, hold open a data connection. Don’t do endless setup/teardowns.
Rather than hiding stuff, creating various kinds of basic registration objects, etc, what about this simple point? Registration is a conditional statement. IF we are within LoD range of point(x,y,z) THEN show this. Therefore, all the ARML anchors are conditionals. It doesn’t matter if it’s a WGS84, a range/bearing/elevation, a noise, a camera recognition, a QR, a smell…or importantly a combination of these.
We might want to work with other objects in the app. We might want to work with objects from other apps in the current scene. We are very likely to want to show content on the basis of program logic, though, and of course, even a simple placemark is shown because a conditional statement is evaluated and an event fired. So all an anchor should be is a named conditional. Rather than “trackables” and such, we have anchors which bind an object to an event, covering all kinds of location, audio, etc.
No, this isn’t terribly webby, but then most advanced Web apps aren’t either. Google Maps isn’t much like rfceditor.org. But the wonderfulness of the www comes from links, view source, REST, openness. There’s no reason we can’t have those! but we don’t need to have hellish app syntax either.
And I think the big meta-question here is “is it an app or a page?”