Take your phone, or in the near future, contact lenses or some sort of augmenting lens, and look at other people. Tell me what you see. What do you want to see?
The initial pitch was simple. You pick your avatar, your digital outfit, and when other people look at you through their phones (or other lenses), they will be able to experience your true self. An augmented full body experience.
It took 30 seconds to pitch and triggered the imagination of most people we talked to. A world where we are able to look at each other differently. A richer interpersonal experience that will change the way we interact with one another. Dreamy.
On the subject of dreams, the original dream was to disrupt the fashion industry. Slaves no more to supply chains that pollute our rivers and oceans. Force us to consume in order to fit in. Fast fashion will be replaced by augmented fashion. More fluid. Infinitely richer and more connected to our identities. Reactive to others. A new type of experience befitting a more interesting and balanced planet. Note that every year we produce more than 80 billion items of clothing and that a cotton shirt need 700 gallons of water to be produced.
How do we create Glimpse? In order to fulfil this vision we needed the computer to see the human body. To tell it apart from the rest. If you want to sculpt an elephant out of a rock, you chip away everything that is not the elephant. We had to accurately tell human bodies from their surroundings. Only then would we be able to overlay avatars and fashion and dreaminess.
At impossible we had been working on Google Tango for a while, and we saw an opportunity when that project faded away, to try and solve the technical challenges needed to fulfil Glimpse’s original vision. Body pose estimation
It took a while to get a model that could accurately estimate a person’s body pose and overlay a skeleton in realtime. The code for it can be found here. Chris and Rob and Pawel did a great job. Yes it uses machine learning. No it does not use convoluted neural networks. It uses the depth map afforded by the iPhone’s front facing camera to help accelerate the computational tree model.
Because the process took so long (almost two years, part of which lost in tango land), we started getting jittery. Is there a product we can launch, we thought, that will just prove the technology and solve a problem? Is this avatar idea too far fetched? Who are we to disrupt the fashion industry? Let’s focus on a real problem, we thought.
As we saw it, this tech can and will be used primarily in three sectors. Entertainment, Health and Security.
The entertainment aspect of it is easily imagined when everyone’s an avatar. We imagined tribes that fight for geo supremacy in a niantic kind of game. We looked at augmenting our bodies to help with dating. Virtual try-ons to help automate and enrich the shopping experience. One where you could produce and purchase the virtual garment for a fraction of the physical.
Security felt like an inevitable outlet for this type of tech. Body pose estimation is an obvious companion to face tracking. There are certain poses that can arouse suspicion in given, normalised situations. The way your body moves is after all the reaction to a mental state. Maybe you’ve just stolen something from the candy isle, maybe you’re feeling depressed, in love… we will be able to track your inner world of emotions in a combination of face and body pose estimation. We don’t really want to, but we will.
As a company we had been deepening our product dev expertise in health and, when we started to get jittery with how long we had been working on the body pose estimation tech, in hindsight a mistake, we pivoted to a health product. BTW, first lesson to take out: it’s not considered a pivot if you pull out way before failing. It’s more aptly called a waste of energy and bad planning. Regardless, a health pivot felt more aligned with our thinking at the time and also with the state of our tech. Being useful, human centric… We imagined building a business that uses computer vision to guide people through a set of exercises in various disciplines, yoga, tai-chi… physical rehabilitation. A personal trainer, Alexa style, after all why look at a screen when you can have a voice look at you, estimate your body pose against a database of masters, and tell you what pose to strike next. We called it Stand. [Screengrab of the site, now offline].
As a venture builder, most of the ventures we create will be nothing but learning exercises. Failing fast and sometimes not so fast. Expensive learning exercises at best. From a vision perspective it’s great to now see similar companies and concurrent goals come to life. It proves that our instinct is correct. You’re never alone with a good idea. Timing and luck play an important role. We tend to be early.
At a deeper level, what is at stake is a more meaningful interface between computers and humans. 10 years ago computers did not understand our voice, they could not recognise the world around them through imagery. Today they do so, and in many cases more accurately than us. The combination of more powerful machine learning models and the availability of data (our voice and imagery), has made computers better than us at specific tasks. Mechanisation created a new reality in the 18 and 19th centuries. Humans went from machines to machine minders. Mechanisation gave way to automation. A new age for mankind and the planet. Humans can just focus on being creatively human. As computers begin to see and hear more accurately than us, we will be able to shed layers of vulgar style, brand and performance and go straight to the core of what makes us unique, our emotional and health identity and our ability to dream.
The future we have been working on and desire, is one where visions like the one of Glimpse and Stand become a reality. One to disrupt the fashion industry, the other a full body health companion.
A surprisingly more planet centric reality, just around the corner.