We navigate the internet by typing phrases into our browsers and invoking our favorite search engine. But more and more, we type in commands, not search items. All the major search engines now allow commands to be typed that directly yield answers without the need to go to an intermediate webpage. Consider these three examples, each for a different search engine.
- Google: the phrase 'define:embodiment' directly provides the definition.
- Yahoo!: the phrase 'time in Nagoya' directly provides the time (3:13 AM Friday, when I tried it last).
- Live.com: the phrase 'cars in China' returns with '15 per 1000 people'.
Even though these three services are called "search engines," in fact they are becoming "answer services" controlled through their command line interfaces.
I've been wanting to write an essay myself about command line UIs because it's an idea I've been working with in the mobile software context for a few months now and I'm quite excited about it. Frankly, I think Don's focus on "answer services" doesn't scratch the surface of the potential. That's not his fault; I think that's mainly because he's responding to what he's seeing happening with the PC user interface right now, where the bigger (and as yet uncaptured) opportunity for the command line is on smart mobile devices.
Let me try to explain why.
There are three UI patterns in the mobile interface that are seriously in need of reconsideration: heirarchical menus, the "application icon checkerboard", and the form. All were borrowed from the PC GUI and transferred without very much thought onto mobile devices. "What's wrong with menus, icons and forms?" you ask. "They're familiar! They're intuitive!" Perhaps, but they're intuitive in the context of the PC desktop, at which people have long, immersive sessions, not the mobile context where people have a task that comes to mind (perhaps prompted by the device itself) and need to get it done with minimal time, effort and attention. Mobile users are out and about doing things, interacting in the real world, and only briefly dipping into the digital world. This suggests a different approach to the mobile UI to support the different mode of interaction.
The PC GUI is designed around the concept of data silos and applications. A data silo (like a file format, a directory, or a database engine) is meant to separate different types of data, while applications are meant to unlock one of the silos so you can get in and do some work on what's inside. Neither data silos nor applications (in the PC sense of the word) are a good thing on a mobile device and it's high time we figure out how to get rid of them. They create indirection and fragmentation of the user experience, which are tolerable while you're seated at your desk but become major stumbling blocks on the mobile. The mobile UI needs to be as direct as possible. Even the best that we have today (Palm OS comes to mind) doesn't come close to being as direct as it should.
You know the drill. You want to send a quick email. You click a button to flip between your "phone" screen to "applications," traverse a field of application icons to get to one you want to drill into (Email) then drill into a menu and traverse its items to "New Message," or traverse a form to a button that does the same thing, then click a couple of things to bring up a pick list that you sift through to find the contact you want, click OK, navigate down through the email form (past the "CC" and "BCC" fields—click, click click) ... we've clicked 15 or 20 times and we haven't even started entering the message yet.
Some operating systems (like Palm OS and Windows Mobile) have support for touchscreens so you can go more directly to the applications or screen elements that you want to interact with, but it's a weak and partial solution and it comes at a price. Now the phone is a two-handed device like your PC, maybe even one that requires you to pull out a fidgety little stylus and put it away again once you go back to using the keypad. Devices like the Treo and BlackBerry have honed and touted their one-handed navigation abilities, which hints that this is the preferred way to use a handset. Either way, we are fighting bad UI and losing—losing the mass market customers if we are selling smartphones or software for them.
My contention is that with the right user interface 70% of what you want to do with a smartphone can be done in three clicks or less, not counting entry of text content if any is required. Almost everything you want to do on a mobile device can be initiated with a verb and a noun: "call Jane," "check mail," "make appointment," "text Bob," "play REM," "read Pikesoft," "price stock," "time Tokyo." A well designed mobile device knows all the verbs it can perform and knows most if not all of the nouns it can perform them on. Furthermore, it knows them the same way we do: by name. Best of all, it knows how to auto-complete these verbs and nouns:
You: "cj"
Device: "Call Jane Doe or Call Jim Beam?"
You: "a" (second letter of Jane's name) or "d" (last name initial)
Device: "Calling Jane Doe... 719-555-1234"
This wouldn't be a secret language. The screen from which you launch every task expects you will enter a verb first and should display a list of available verbs. One or occasionally two letters is enough for the device to know the verb you want from the list. Then it expects a noun. Nouns on this ideal device aren't trapped in silos like on a PC. When they signed up for duty on the phone they broadcasted their availability to all the verbs that they know how to perform, possibly adding a new verb or two that they didn't find already in the system's dictionary. "Play" might prompt the user with nouns like "Doom" or "Bejeweled" and if you had an MP3 player in there you'd also see "Favorites," "Clapton & Cale/It's so Easy," and a list of other recent songs you are likely to want to play again, then "Title search" and "Performer search." You don't care that Doom and Music Player are different applications, and a good mobile UI should be just as indifferent when offering you options of what to "play."
"Adjectives," "adverbs" and "prepositions" would be added to the command for more complex operations. In these cases where an action involves more information the UI would show in natural language the status of the task you are setting up and prompt you for required information or ways you could modify it:
To make an appointment:
You: "ma"
Date and time are required to make an appointment so you're prompted immediately:
Device: (status: "Making appointment") "Tomorrow, This [day of week], Next [day of week], Enter Date"
You: "n" (you're going for "next Wednesday" here)
Device: (status: "Making appointment for next...") "Sun, Mon, Tue, Wed, Thu, Fri, Sat"
You: "w"
Device: (status: "Making appointment for next Weds at...") "Time?"
You: "11a"
etc.
Seeing it all spread out on the page like that it may not look obviously faster and it doesn't give a good impression of the visualizations that are possible, either. (Sorry I don't have animated screenshots that I am ready to show yet.) But the full extent of the input for an 11am appointment with Ed Colligan next Wednesday in conference room 1234 with a notification 15 mins before the appointment might look like this: "manw11awecir1234n15m" See if you can guess what natural language prompts and options were presented to guide you in entering this complex appointment to your calendar. It's not hard to guess when you take it one character at a time and think in natural language.
What happens when you enter a word that the system doesn't understand? It searches, based on the context of the parts of the command it did understand. But it doesn't just return a list of search results—it uses the results to form actual courses of action that it can directly perform. "btm" might be understood as "buy ticket for movie" so the unrecognized movie name that followed would become part of the query to a web service that sells movie tickets for a theater in your area. "tofu junction" would be completely unrecognized, so the software might search first for these characters in local storage (like the wonderful Palm OS "find" feature, the original Google Desktop). Maybe it turns up the result of a Google Maps query you made a few months ago when you last thought about going to Tofu Junction. If not it prompts you to try online services like Google, Amazon, eBay or Flickr. As Don Norman points out, command line interfaces can degrade gracefully when they don't understand something completely. It's actually not so much a "degrading" as an extending of the command language so it can perform queries against a local database or services on the web. The impulse to consider the command line was prompted by my frustration with the tedious GUI of today's smart devices, but it's become apparent to me that the big win is the way a text-based interface naturally extends the capabilities of the device to work with remote data and services. Ordinary graphical user interfaces tend to be far less flexible and powerful because of their explicitness, their emphasis on drill-down, and their insistence that everything must be done by first launching an application. "You want to buy a ticket online? Ok, so first you find your browser application and launch it, then drill down into a menu to find the link to Fandango in your bookmarks, then find the place on the page where you enter the movie you want to see (getting past all the ads and suggestions about other tasks or services you don't care about)..."
There's an obvious objection that probably formed in your mind a few paragraphs ago. It takes form in words like "command lines are ugly," "why have a big color screen if your whole user interface is a textbox and some words on the screen?" or "people want iPhone, not DOS prompt."
Well, yeah.
But who says this has to look like a DOS console? The command line I have in mind sits atop gorgeous dynamic visualizations of tasks being assembled to take the place of tired forms with grids, droplists and buttons. With a simple text box at the top of the screen, the rest of the screen is a wide open canvas for artistic renderings of the state of your interaction with the device, the presentation of requested information, or ambient information about things that are going on in the background that you care about. Significantly, the screen need no longer be owned by an application in this system and limited in the information it presents by the imagination of a single software developer. (Things like browsers and video players are obvious exceptions.) This means the screen background can be open to creative visualizations of anything that the user is interested in even as they go about performing a task: a glow in one part of the screen the intensity and color of which indicates the number and importance of messages in their inbox, the ghosted icons of two favorite blogs softly pulsing in another area to let you know that there are new posts to read. (And yes, there could be some familiar GUI elements to facilitate certain kinds of interaction.) I'm not a great designer or artist, so these examples aren't necessarily wonderful. But I'm convinced that there's a huge opportunity for deep, unified, visually stunning personalization of the user experience that is difficult to capture in the chopped up, application-centric UI of present-day smart devices.
Initially, the environment I'm developing is just for my own use and doesn't offer much in the way of beautiful visual effects. It runs cross-platform in Palm OS, Windows Mobile, and S60. In my first pass I'm implementing PIM applications (tuned for use with David Allen's "Getting Things Done" method) with integrated calling and sending of email. I'm excited about what I've got so far but sometimes wonder if my judgment isn't impaired by my heavy usage of command interfaces as a developer. When it's a little more complete and polished I will share it with users and start collecting feedback. If the response is as good as I hope I will open up an API for developers to create their own plug-in extensions to the command language.
But if I'm right about this, where it belongs is not in an application that runs on a smart device, but at the ground floor of the mobile user interface. It's a replacement for the whole idea of menus, applications and forms, and to be the most beneficial it should be the way you access every feature of your device. Here's hoping that Don Norman's command line idea makes the impression I think it should on the folks who are producing the next great mobile UI.
(Thanks to mobiface for the link to the Don Norman piece, and for the opposing (but also retro) perspective: that the mobile user interface will be heading back in the direction of Microsoft Bob.)
Posted by cervezas at 07:20:00. Filed under: Mobile User Interface
Comments
Add Comment