Friday, March 09, 2007

Don Norman has a suggestive and thought-provoking essay about the latest breakthrough in the user interface for computers. It's not virtual reality or heads-up display eye-glasses with retinal I/O. It's the command line interface. He uses the evolution of Internet search engines to make his point:
We navigate the internet by typing phrases into our browsers and invoking our favorite search engine. But more and more, we type in commands, not search items. All the major search engines now allow commands to be typed that directly yield answers without the need to go to an intermediate webpage. Consider these three examples, each for a different search engine.

  • Google: the phrase 'define:embodiment' directly provides the definition.
  • Yahoo!: the phrase 'time in Nagoya' directly provides the time (3:13 AM Friday, when I tried it last).
  • Live.com: the phrase 'cars in China' returns with '15 per 1000 people'.

Even though these three services are called "search engines," in fact they are becoming "answer services" controlled through their command line interfaces.

I've been wanting to write an essay myself about command line UIs because it's an idea I've been working with in the mobile software context for a few months now and I'm quite excited about it. Frankly, I think Don's focus on "answer services" doesn't scratch the surface of the potential. That's not his fault; I think that's mainly because he's responding to what he's seeing happening with the PC user interface right now, where the bigger (and as yet uncaptured) opportunity for the command line is on smart mobile devices.

Let me try to explain why.

There are three UI patterns in the mobile interface that are seriously in need of reconsideration: heirarchical menus, the "application icon checkerboard", and the form. All were borrowed from the PC GUI and transferred without very much thought onto mobile devices. "What's wrong with menus, icons and forms?" you ask. "They're familiar! They're intuitive!" Perhaps, but they're intuitive in the context of the PC desktop, at which people have long, immersive sessions, not the mobile context where people have a task that comes to mind (perhaps prompted by the device itself) and need to get it done with minimal time, effort and attention. Mobile users are out and about doing things, interacting in the real world, and only briefly dipping into the digital world. This suggests a different approach to the mobile UI to support the different mode of interaction.

The PC GUI is designed around the concept of data silos and applications. A data silo (like a file format, a directory, or a database engine) is meant to separate different types of data, while applications are meant to unlock one of the silos so you can get in and do some work on what's inside. Neither data silos nor applications (in the PC sense of the word) are a good thing on a mobile device and it's high time we figure out how to get rid of them. They create indirection and fragmentation of the user experience, which are tolerable while you're seated at your desk but become major stumbling blocks on the mobile. The mobile UI needs to be as direct as possible. Even the best that we have today (Palm OS comes to mind) doesn't come close to being as direct as it should.

You know the drill. You want to send a quick email. You click a button to flip between your "phone" screen to "applications," traverse a field of application icons to get to one you want to drill into (Email) then drill into a menu and traverse its items to "New Message," or traverse a form to a button that does the same thing, then click a couple of things to bring up a pick list that you sift through to find the contact you want, click OK, navigate down through the email form (past the "CC" and "BCC" fields—click, click click) ... we've clicked 15 or 20 times and we haven't even started entering the message yet.

Some operating systems (like Palm OS and Windows Mobile) have support for touchscreens so you can go more directly to the applications or screen elements that you want to interact with, but it's a weak and partial solution and it comes at a price. Now the phone is a two-handed device like your PC, maybe even one that requires you to pull out a fidgety little stylus and put it away again once you go back to using the keypad. Devices like the Treo and BlackBerry have honed and touted their one-handed navigation abilities, which hints that this is the preferred way to use a handset. Either way, we are fighting bad UI and losing—losing the mass market customers if we are selling smartphones or software for them.

My contention is that with the right user interface 70% of what you want to do with a smartphone can be done in three clicks or less, not counting entry of text content if any is required. Almost everything you want to do on a mobile device can be initiated with a verb and a noun: "call Jane," "check mail," "make appointment," "text Bob," "play REM," "read Pikesoft," "price stock," "time Tokyo." A well designed mobile device knows all the verbs it can perform and knows most if not all of the nouns it can perform them on. Furthermore, it knows them the same way we do: by name. Best of all, it knows how to auto-complete these verbs and nouns:

You: "cj"
Device: "Call Jane Doe or Call Jim Beam?"

You: "a" (second letter of Jane's name) or "d" (last name initial)
Device: "Calling Jane Doe... 719-555-1234"


This wouldn't be a secret language. The screen from which you launch every task expects you will enter a verb first and should display a list of available verbs. One or occasionally two letters is enough for the device to know the verb you want from the list. Then it expects a noun. Nouns on this ideal device aren't trapped in silos like on a PC. When they signed up for duty on the phone they broadcasted their availability to all the verbs that they know how to perform, possibly adding a new verb or two that they didn't find already in the system's dictionary. "Play" might prompt the user with nouns like "Doom" or "Bejeweled" and if you had an MP3 player in there you'd also see "Favorites," "Clapton & Cale/It's so Easy," and a list of other recent songs you are likely to want to play again, then "Title search" and "Performer search." You don't care that Doom and Music Player are different applications, and a good mobile UI should be just as indifferent when offering you options of what to "play."

"Adjectives," "adverbs" and "prepositions" would be added to the command for more complex operations. In these cases where an action involves more information the UI would show in natural language the status of the task you are setting up and prompt you for required information or ways you could modify it:

To make an appointment:

You: "ma"

Date and time are required to make an appointment so you're prompted immediately:
Device: (status: "Making appointment") "Tomorrow, This [day of week], Next [day of week], Enter Date"

You: "n" (you're going for "next Wednesday" here)
Device: (status: "Making appointment for next...") "Sun, Mon, Tue, Wed, Thu, Fri, Sat"

You: "w"
Device: (status: "Making appointment for next Weds at...") "Time?"

You: "11a"

etc.

Seeing it all spread out on the page like that it may not look obviously faster and it doesn't give a good impression of the visualizations that are possible, either. (Sorry I don't have animated screenshots that I am ready to show yet.) But the full extent of the input for an 11am appointment with Ed Colligan next Wednesday in conference room 1234 with a notification 15 mins before the appointment might look like this: "manw11awecir1234n15m" See if you can guess what natural language prompts and options were presented to guide you in entering this complex appointment to your calendar. It's not hard to guess when you take it one character at a time and think in natural language.

What happens when you enter a word that the system doesn't understand? It searches, based on the context of the parts of the command it did understand. But it doesn't just return a list of search results—it uses the results to form actual courses of action that it can directly perform. "btm" might be understood as "buy ticket for movie" so the unrecognized movie name that followed would become part of the query to a web service that sells movie tickets for a theater in your area. "tofu junction" would be completely unrecognized, so the software might search first for these characters in local storage (like the wonderful Palm OS "find" feature, the original Google Desktop). Maybe it turns up the result of a Google Maps query you made a few months ago when you last thought about going to Tofu Junction. If not it prompts you to try online services like Google, Amazon, eBay or Flickr. As Don Norman points out, command line interfaces can degrade gracefully when they don't understand something completely. It's actually not so much a "degrading" as an extending of the command language so it can perform queries against a local database or services on the web. The impulse to consider the command line was prompted by my frustration with the tedious GUI of today's smart devices, but it's become apparent to me that the big win is the way a text-based interface naturally extends the capabilities of the device to work with remote data and services. Ordinary graphical user interfaces tend to be far less flexible and powerful because of their explicitness, their emphasis on drill-down, and their insistence that everything must be done by first launching an application. "You want to buy a ticket online? Ok, so first you find your browser application and launch it, then drill down into a menu to find the link to Fandango in your bookmarks, then find the place on the page where you enter the movie you want to see (getting past all the ads and suggestions about other tasks or services you don't care about)..."

There's an obvious objection that probably formed in your mind a few paragraphs ago. It takes form in words like "command lines are ugly," "why have a big color screen if your whole user interface is a textbox and some words on the screen?" or "people want iPhone, not DOS prompt."

Well, yeah.

But who says this has to look like a DOS console? The command line I have in mind sits atop gorgeous dynamic visualizations of tasks being assembled to take the place of tired forms with grids, droplists and buttons. With a simple text box at the top of the screen, the rest of the screen is a wide open canvas for artistic renderings of the state of your interaction with the device, the presentation of requested information, or ambient information about things that are going on in the background that you care about. Significantly, the screen need no longer be owned by an application in this system and limited in the information it presents by the imagination of a single software developer. (Things like browsers and video players are obvious exceptions.) This means the screen background can be open to creative visualizations of anything that the user is interested in even as they go about performing a task: a glow in one part of the screen the intensity and color of which indicates the number and importance of messages in their inbox, the ghosted icons of two favorite blogs softly pulsing in another area to let you know that there are new posts to read. (And yes, there could be some familiar GUI elements to facilitate certain kinds of interaction.) I'm not a great designer or artist, so these examples aren't necessarily wonderful. But I'm convinced that there's a huge opportunity for deep, unified, visually stunning personalization of the user experience that is difficult to capture in the chopped up, application-centric UI of present-day smart devices.

Initially, the environment I'm developing is just for my own use and doesn't offer much in the way of beautiful visual effects. It runs cross-platform in Palm OS, Windows Mobile, and S60. In my first pass I'm implementing PIM applications (tuned for use with David Allen's "Getting Things Done" method) with integrated calling and sending of email. I'm excited about what I've got so far but sometimes wonder if my judgment isn't impaired by my heavy usage of command interfaces as a developer. When it's a little more complete and polished I will share it with users and start collecting feedback. If the response is as good as I hope I will open up an API for developers to create their own plug-in extensions to the command language.

But if I'm right about this, where it belongs is not in an application that runs on a smart device, but at the ground floor of the mobile user interface. It's a replacement for the whole idea of menus, applications and forms, and to be the most beneficial it should be the way you access every feature of your device. Here's hoping that Don Norman's command line idea makes the impression I think it should on the folks who are producing the next great mobile UI.

(Thanks to mobiface for the link to the Don Norman piece, and for the opposing (but also retro) perspective: that the mobile user interface will be heading back in the direction of Microsoft Bob.)

Comments

Awesome post David. I really think you and Don are onto something here. I'm a Quicksilver (Mac) user, and it has changed the way I launch programs, find files on my PowerBook, and interact with contacts. After using it for a while, it feels like it learns my usage and becomes even more efficient. Control+Space plus a few typed keystrokes is much faster and more efficient than drilling down through menus and Finder folders.

I highly recommend checking it out if you aren't familiar with it. Enso Launcher (Windows) uses a similar command line interface. Both Enso and Quicksilver have a slick user interface.

You're completely right: Hierarchical menus and folders on mobiles are poor interface analogies. Contextually aware apps combined with a command line launcher/interface are the way to go.
Your appointment example also reminds me of the Newton Intelligence Architecture and the Newton Assist on my MessagePad 2100. The Newton understood names and actions, and could turn a command line "meet david for lunch on Friday" into an appointment slip for Lunch at 12pm with David Beers (or prompt to select from a list of Davids in my contacts) on the next Friday.

Where do I sign up as a beta tester for your app ;-) ?

Posted by Brian at Saturday, March 10, 2007 07:00:18

By the way, it's very encouraging to hear that Paul Mercer is now working at Palm. He's the guy behind the "poof cloud" that evaporates when you erase something on a Newton by scratching it out!

http://www.designinginterac...

He's also been very outspoken about more open cell phone platforms:

http://softwarecommunity.in...

Posted by Brian at Saturday, March 10, 2007 09:39:44

Very good article,

Una cerveza a mi salud.

Posted by eduardocruz at Saturday, March 10, 2007 18:16:18

Wonderful article - distills a bunch of stuff I've been thinking about myself ever since I discovered Quicksilver on the Mac. Add me to the list of folks who would love to try this app out!

Posted by DieterBohn at Monday, March 12, 2007 09:20:19

I spent a great deal of my childhood making people think I was smarter than I really am, by learning all the arcane commands DOS had to offer. I love the idea of command-line interfaces, not least because clacking keys are infinitely more satisfying that point-and-click interfaces. As you mention, the UI hangovers from desktop PCs get in the way of efficient mobile device use - hence why the Treo's combo of exposed keyboard, touchscreen and 5-way gets me all quivery. But even then there's plenty of room for a complete rethink of how we go about using these machines.

But.

While I love the idea, it could be a problem for your everyday user, who has trouble learning how to use the SMS on their phone, let alone what essentially amounts to a new language. I'm not certain such an interface could ever completely replace a GUI; after all, there was a reason they were developed in the first place. Apple have the right idea with the iPhone; lots of gesture-based touch commands, which are much more intuitive than drilling down through menus to find the command/function you want to use.

But enough of my wet-blanketing. When can I try this out? And on what? :)

Posted by freakout at Tuesday, March 13, 2007 20:48:35

It'll run great on your Treo 680, Tim, but you'll have to wait while I get it to a point where I'm ready to share it. And I hope you'll find it's all about eliminating drilldown and making smartphones easier for new and power users alike. If I haven't made it so my mother can see how to use it right away I won't consider it a success.

Posted by cervezas at Tuesday, March 13, 2007 23:26:40

Hmmm. I just read this entry, and I just started to consider it, but I already have a concern that I'm trying to think through. A commandline interface is much more language dependent than the currently popular point-and-click interface. How does this fit with character based languages (Chinese)? Abbreviation works in English, but it isn't the same in a character based language. Since I do not actually operate my "computers" (mobiles and otherwise) in Chinese, I'm probably not the best one to think it through. But I can envision a serious barrier to a commandline interface for character based languages.

Posted by twrock at Thursday, March 15, 2007 17:33:01

Great post David! Seriously thought provoking

A command line interface may definitely reduce the number of clicks but will increase the learning time required to use the system efficiently. A GUI comes with inherent advantages like 'See what you do' and intuitiveness.

A new user with the GUI interface can still find his way out when he does not know how to reach a particular application. Only Command line will overly complicate things for a new user and he may be completely lost with the device.

Posted by erohit at Friday, March 16, 2007 05:20:02

twrock wrote: "I can envision a serious barrier to a commandline interface for character based languages."

You may be right about that, Ron. I'm not claiming the command line is the best single user interface. I'm also not claiming the GUI should go away. The Serenity interface (that's what I call it) is very graphical in the way it presents the options you have and the task you are performing. Because of the way it prompts you at all times it can even be operated with your fingertip or stylus on a touchscreen device. (Hmmm... I'm already regretting that I've used the term "command line" to describe this, but the intent is to make text entry the easiest, fastest way to navigate, like the CLI, and Don Norman's CLI article was what prompted me to finally write about it.)

Anyway, I suspect a more stylus-optimized interface might be best for non-character languages. I experimented with a gestural interface a few months ago that might make more sense for the Asian market: http://www.pikesoft.com/blo... It never got out of early prototypes, but it was intended to address some of the same issues that Serenity does, like how do you make the software discoverable (not hidden in menus) and how do you enable quicker access to common tasks.

Posted by cervezas at Friday, March 16, 2007 06:12:45

Rohit, thanks for the comment. I think the user is *already* completely lost when it comes to using a new phone today. Borrowing elements of the PC GUI has not helped because many of these just don't work on the small screen or in the mobile context where you have many quick tasks that you perform all the time. I'm hoping you'll soon see that having text-driven navigation doesn't mean you have to give up on intuitiveness, discoverability, or even pretty graphical renderings. If you are a Palm OS user you know that the fastest way to get to an application isn't by searching for the icon in the launcher, it's by entering the first letter of the application's name. I'm just taking that idea and kicking it up a couple of notches! :) Hopefully making it a bit more obvious to a newbie in the process.

Posted by cervezas at Friday, March 16, 2007 06:33:58

Add Comment

Comments must be approved before being published. Thank you!