The Smartphone as Application Server

I’ve been working on a small proof of concept over the last couple of months. It’s an architecture for building smartphone apps that seamlessly “serve themselves” to devices around you (such as a smart-TV), using entirely open protocols and technologies.

It can loosely be viewed as a variation on ChromeCast, except instead of sending raw data over the wire to a smart-device, a smartphone app becomes a webapp-server that treats the smart-device as a web-client. In this prototype, the smart-device is a Raspberry Pi 2:

Smartphone interacting with smart-device.

Smartphone interacting with smart-device.

A Small Demo

The prototype is built entirely on open protocols and web technologies. The “app server” is just a standard smartphone app that runs its own webserver. The “app client” is a webpage, hosted by a browser running on the “smart endpoint” (in this demo a TV, or more specifically a Raspberry Pi 2 connected to a TV).

Here I’m demoing an Android app I wrote (called FlyPic) that displays a photo on the screen, and then lets the user control the photo on the screen by swiping it on the phone. The android app implements a “webapp architecture” where the server component runs on the phone and the client component runs on the Pi.

https://www.youtube.com/watch?v=g-zHE33xKng

How it Works

The overall architecture is pretty simple:

App discovers smart-device.

App discovers smart-device.

Step 1 – The FlyPic smartphone app uses service discovery on the local network to find the Pi. The Pi is running Linux, and uses the Avahi mDNS responder to broadcast a service under the name “browsercast”. The service points to a “handshake server”, which is a small HTTP server that is used to initiate the app session. After this step, FlyPic knows about the Raspberry Pi and how to connect to the handshake server.

FlyPic displays the list of discovered endpoints to the user.

App notifies handshake server.

App notifies handshake server.

Step 2 – When an endpoint is selected, FlyPic starts up a small embedded HTTP server, and a websocket server to stream events. It then connects to the handshake server on the Pi and asks the handshake server to start a web session. It tells the handshake server (using GET parameters) the address of the app webserver it started.

Smart-device starts browser

Smart-device starts browser

Step 3 – The handshake server on the Pi starts up a browser (fullscreen showing on the TV), giving it the address of the FlyPic http server (on the phone) to connect to.

Smart-device browser connects to app server.

Smart-device browser connects to app server.

Step 4 – The browser, after starting up, connects to the FlyPic http server URL it was given.

Smart-device shows app page.

Smart-device shows app page.

Step 5 – The browser gets the page and displays it. FlyPic serves up a page containing an image, and also some javascript code to establish a WebSocket connection back to FlyPic’s WebSocket server (to receive push touch events without having to poll the phone).

App receives touch events.

App receives touch events.

Step 6 – Since FlyPic is otherwise a regular Android app (which just happens to be running a HTTP and WebSocket server), it can listen for touch events on the phone.

App forwards touch events to web-client.

App forwards touch events to web-client.

Step 7 – When FlyPic gets touch events, it sends them to the page via the WebSocket connection that was established earlier. On the Pi, the webpage receives the touch events and moves the displayed picture back and forth to match what is happening on the phone.

And voila! I’m using my phone as an app server, my raspberry Pi as an app client, and “controlling” my app on the TV using my phone. At this point, I can pretty much do anything I want. I can make my app take voice input, or gyro input, or whatever. I can put arbitrarily complex logic in my app, and have it execute on the remote endpoint instead of on my phone.

What can we do with this?

I’m using a smartphone to leverage a remote CPU connected to the TV, to build arbitrary apps that run partly on the “endpoint” (TV) and partly on the “controller” (smartphone).

By leveraging web technologies, and pre-existing standard protocols, we can build rich applications that “send themselves” to devices around them, and then establish a rich interaction between the phone as a controller device, and the endpoint as a client device.

Use Cases?

I have some fun follow-up ideas for this application architecture.

One idea is presentations. No more hooking your laptop to your projector and getting screen sharing working. If your projector box acted as an endpoint device, your presentation app could effectively be written in HTML and CSS and JS, and run on the projector itself – animating all slide transitions using the CPU on the projector – but still take input from your phone (for swiping to the next and previous slides).

Another use case is games. Imagine a game-console that wouldn’t need to have games directly installed on it. You carry your games around with you, on your phone. High-performance games can be written in C++ and compiled with asm.js and webgl. When you’re near a console and want to use it to play, your game effectively “beams itself” to your game console, and you play it with your phone as a controller. When finished, you disconnect and walk off. Every game console can now play the games that you own.

Games can be rich and high-resolution, because they run directly on the game console, which has direct, low-latency access to the display. All the phone needs to do is shuttle the input events to the endpoint.

The phone’s battery life is preserved, since all the game computation is offloaded onto the endpoint.. network bandwidth is preserved since we’re only sending events back and forth.. graphics can be amped up since the game logic is running directly on the display device.

Games could also take advantage of the fact that the phone is a rich input device, and incorporate motion input, touch input, voice input, and camera input as desired. The game developer would be able to choose how much processing to do on the phone, and how much to do on the console, and tailor it appropriately to ensure that input from the phone to the game was appropriately low-latency.

The Code

The source code to get this going on your own Pi and Android device is available at: https://github.com/kannanvijayan/browsercast.

Please note, this is a VERY rough implementation. Error handling is barebones, almost nonexistant. The Android app will crash outside of the simplest usage behaviour (e.g. when the network connection times out), because I have very little experience with Android programming, and I learned just enough to implement the demo. The python scripts for the service discovery and the handshake HTTP server are a bit better put together, but still pretty rough.

Lastly, the HTTP server implementation I’m using is called NanoHTTPD (slightly modified), the original source for which is available here: https://github.com/NanoHttpd/nanohttpd

I’ve embedded the source for NanoHTTPD in my sources because I had to make some changes to get it to work. NanoHTTPD is definitely not a production webserver, and its websocket implementation is a bit unpolished. I had to hack it a bit to get it working correctly for my purposes.

Final Thoughts

It really seems like this design has some fundamental advantages compared to the usual “control remote devices with your smartphone” approach. I was able to put together a prototype with basically a raspberry Pi, and some time. Every single part of this design is open:

1. Any web browser can be used for display. I’ve tested this with both Chrome and Firefox. Any browser that is reasonably modern will serve fine as a client platform.

2. Any hardware platform can be used as the endpoint device. You could probably take the python scripts above and run them on any modern linux-based device (including laptops). Writing the equivalent Windows or OSX wouldn’t be hard either.

3. Nearly any smartphone can act as a controller. If a phone supports service discovery and creating an HTTP server on the local network, that phone can work with this design.

4. You can develop your app in any “server-side” language you want. My test app is written in Java for the Android. I see no reason one wouldn’t be able to write an iOS app that did the same thing, in Objective-C or Swift.

5. You can leverage the massive toolchest of web display technologies, and web-app frameworks, to build your app’s client-side experience.

I’d like to see what other people can build with this approach. My smartphone app skills are pretty limited. I wonder what a skilled smartphone developer could do.