Long Cycle Innovation: Enabling New Ways To Search and Browse
"CRAIG MUNDIE: What I'd like to do now is have Blaise Aguera y Arcas come out. He's an architect who joined our Live Labs Group through an acquisition earlier this year. Blaise, great to see you.
BLAISE AGUERA Y ARCAS: Thanks, Craig.
CRAIG MUNDIE: And what I'm going to show you is that, while we do long-cycle innovation, we really are also focused on how we can take great technologies and blend them together with these things we developed over a much longer period of time through our research assets to develop very compelling products, and as part of the live capability be able to accelerate their availability in the marketplace -- so that they complement the long-cycle delivery platforms we've got in the case of the PC or Office or the basic Windows Mobile technologies, or even television technologies. It allows us to add value on top of those things on a much more accelerated basis.
So, Blaise, let's talk about what you've got going here. What is that big dot they see?
BLAISE AGUERA Y ARCAS: Up on the screen now is our dowry. So this is some technology that I brought to Microsoft from my startup, just acquired at the end of 2005. So this is Seadragon technology. It's a method for interacting with very large volumes of visual information very rapidly. So these are mostly cell phone pictures, although we have a few here that are really large, like this map that is in the 100-megapixel range. This is an experience that one can have over an ordinary broadband or even narrowband connection. A thin DSL pipe can do this.
CRAIG MUNDIE: So this is some technology that allows us to take essentially pictures of any size from something you take as a low-res picture on your cell phone to something that is in fact entire documents that are represented as a single image of ultimately the resolution necessary to read. I think you're going to show them one of those now?
BLAISE AGUERA Y ARCAS: Right. So -- well, this is actually another document type now, which is all of "Bleak House," the entire book. Every column is a chapter. And you can see this is not an image. This is real text. So this is the kind of technology that we expect is going to be really changing quite a number of things at Microsoft in the coming years. We've actually done this on cell phones as well.
CRAIG MUNDIE: So this is a new model of navigation. You just zoom around in a two-dimensional space in this case.
So what else do you think we can do with this?
BLAISE AGUERA Y ARCAS: Well, so, a couple of months after the acquisition -- this is now only about four months ago -- we were -- our acquisition I should mention was driven by technical fellow Gary Flake, who founded Live Labs, the idea of Live Labs being to really shorten the innovation cycle dramatically and to bring a lot of the interesting things happening in Microsoft Research very quickly to prototype and to market.
BLAISE AGUERA Y ARCAS: So a couple of months after that acquisition I saw an amazing demo of some research at Microsoft Research at Tech Fest, which is the fair for those things. Can we go to video?
What these guys had gone is develop a system that allows you take a bunch of images -- in this case these are images tagged "Trevi Fountain." Here they're mined from Flickr, so there are lots of cameras, lots of times of day, times of year.
CRAIG MUNDIE: So Flickr is a Web site -- has nothing to do with Microsoft -- where people put their photos up there, and then the community can tag them. So you can put it in and say it's Trevi Fountain. Somebody else comes along, says, “Oh, I know what that is -- that's Trevi Fountain.” And they're adding tags to these pictures. So here they basically took them. And --
BLAISE AGUERA Y ARCAS: Well, what they're able to do is figure out from those images alone what the three-dimensional model was of the Trevi Fountain. That's what's being shown on the screen now. Each of those triangles is the location of a particular camera. So you can simultaneously solve for the geometry of what you're looking at, as well as where the camera was.
CRAIG MUNDIE: So even though none of those people knew each other, none of the pictures were taken at the same time or from the same place --
BLAISE AGUERA Y ARCAS: Right, they're cell phone pictures.
CRAIG MUNDIE: They were able to take them all up and make a 3-D model of that fountain.
BLAISE AGUERA Y ARCAS: Exactly.
CRAIG MUNDIE: So what did that inspire you to do?
BLAISE AGUERA Y ARCAS: Well, of course when I first saw this one of the first things I wanted to do was put it together with our stuff. And this is the result. This is only after about four months of work, so you'll excuse me if it crashes. But this is a collection of images that have been synthesized together using that technology. These are a few hundred images of St. Peter's Basilica in Rome, taken by one of our guys in Italy a few weeks ago. And --
CRAIG MUNDIE: So he just wandered around St. Peter's and took a bunch of pictures?
BLAISE AGUERA Y ARCAS: He wandered around, took a bunch of these pictures. These white boxes are where those pictures were taken. And so let's zoom around. He went up to the top of the cupola. And here's that picture from above. He took these images from the top.
And you can see what's happening here is that all these images are being registered together in 3-D, and they give you an experience that's almost game-like of moving around in the space. All these are places where we stood and took shots in the center, and what that shot was taken of. We can move around from image to image in this way. So it's sort of halfway between a game and a slide show.
CRAIG MUNDIE: So, the 3-D model, which was synthesized by the machine from all the pictures, produces the navigation metaphor. And the Seadragon technology allows us to stream the entire collection of photos to you, just hooking them all together seamlessly, so you're operating within that 3-D space.
BLAISE AGUERA Y ARCAS: Exactly. So this is pretty interesting technology just for collections of images like this. But where we really think this comes into its own is when one thinks about what can happen when we take this technology and deploy it at Web scale, which we'll be doing. So this technology with canned environments we'll be releasing as a technical preview in the fall. And the scaling up to the Web we'll see. But the idea is if we can incorporate this into the Web crawler for images, then we can build up organically a three-dimensional model of the entire world. And that model is built entirely out of those photos in an unstructured way. It can incorporate everything from satellite and aerial photographs that give you a sense of what cities look like from high above down to street-level shots and down to close-ups. And it's one of those really revolutionary kind of paradigm-shifting things. We believe that this can give you a new way of interacting not only with images, but with the information behind it.
CRAIG MUNDIE: And a new way to get information and to discover things. That's fabulous. Thanks a lot. Thanks for sharing with us.
BLAISE AGUERA Y ARCAS: Thanks so much, Craig.