Pages

    Sunday, May 10, 2009

    Google - The 2008 Founders' Letter



    Introduction

    Since 2004, when Google began to have annual reports, Larry and I have taken turns writing an annual letter. I never imagined I would be writing one in the midst of an economic crisis unlike any we have seen in decades. As I write this, search queries are reflecting economic hardship, the major market indexes are one half of what they were less than 18 months ago, and unemployment is at record levels.

    Nonetheless, I am optimistic about the future, because I believe scarcity breeds clarity: it focuses minds, forcing people to think creatively and rise to the challenge. While much smaller in scale than today's global collapse, the dot-com bust of 2000-2002 pushed Google and others in the industry to take some tough decisions — and we all emerged stronger as a result.

    This new crisis punctuates the end of our first decade as a company, a decade that has brought great change to Google, the web and the Internet as a whole. As I reflect on this short time period, our accomplishments and our shortcomings, I am very excited about what the next ten years may bring.

    But let me start a little farther back — in 1990, the very first web page was created at http://info.cern.ch/. By late 1992, there were only 26 websites in the world so there was not much need for a search engine. When NCSA Mosaic (the first widely used web browser) came out in 1993, every new website that was created would get posted to its "What's New" page at a rate of about one a day: http://www.dejavu.org/prep_whatsnew.htm. Just five years later, in 1998, web pages numbered in the tens of millions, and search became crucial. At this point, Google was a small research project at Stanford; later that year it became a tiny startup. The search index sat on a small number of disk drives enclosed within Lego-like blocks. Perhaps a few thousand people, mostly academics, used the service.

    Fast-forward to today, the changes in scale are striking. The web itself has grown by about a factor of 10,000, as has our search index. The number of people who use Google's services every day is now in the hundreds of millions. More importantly, billions of people now have access to the Internet via computers and mobile phones. Like many other web companies, the vast majority of our services are available worldwide and free to users because they are supported by ads. So a child in an Internet cafe in a developing nation can use the same online tools as the wealthiest person in the world. I am proud of the small role Google has played in the democratization of information, but there is much more left to do.

    Search

    Search remains at the very core of what we do at Google, just as it has been from our earliest days. As the scale has changed dramatically over the years, the presentation and quality of our search results have also undergone many changes since 1998. In the past year alone we have made 359 changes to our web search — nearly one per day. Some are not easy to spot, such as changes in ranking based on personalization (launched broadly in 2005) but they are important in getting the most relevant search results. Others are very easy to see and improve search efficiency in a very clear way, such as spelling correction, annotations, and suggestions.

    While I am proud of what has been accomplished in search over the past decade, there are important areas in which I wish we had made more progress. Perfect search requires human-level artificial intelligence, which many of us believe is still quite distant. However, I think it will soon be possible to have a search engine that "understands" more of the queries and documents than we do today. Others claim to have accomplished this, and Google's systems have more smarts behind the curtains than may be apparent from the outside, but the field as a whole is still shy of where I would have expected it to be. Part of the reason is the dramatic growth of the web — for any particular query, it is likely there are many documents on the topic using the exact same vocabulary. And as the web grows, so does the breadth and depth of the curiosity of those searching. I expect our search engine to become much "smarter" in the coming decade.

    So too will the interfaces by which users look for and receive information. While many things have changed, the basic structure of Google search results today is fairly similar to how it was ten years ago. This is partly because of the benefits of simplicity; in fact, the Google homepage has become increasingly simple over the years: http://blogoscoped.com/archive/2006-04-21-n63.html. But we are starting to see more significant changes in search interfaces. Today you can search from your cell phone by just speaking into it and Google Reader can suggest interesting blogs without any query at all. It is my expectation that in the next decade our searches and results will look very different than they do today.

    One of the most striking changes that has happened in the past few years is that search results are no longer just web pages. They include images, videos, books, maps, and more. From the outset, we realized that to have comprehensive search we would have to venture beyond web pages. In 2001, we launched Google Image Search and via Google Groups we made available and searchable the most comprehensive archive of Usenet postings ever assembled (800 million messages dating back to 1981).

    Just this past fall we expanded Image Search to include the LIFE Magazine photo archive. This is a collection of 10 million photos, more than 95 percent of which have never been seen before, and includes historical pictures such as the Skylab space station orbiting above Earth and Neil Armstrong landing on the moon. Integrating images into search remains a challenge, primarily because we are so reliant on the surrounding text to gauge a picture's relevance. In the future, using enhanced computer vision technology, we hope to be able to understand what's depicted in the image itself.

    YouTube

    Video is often thought of as an entertainment medium, but it is also a very important source of high-quality information. Some queries seem like natural choices to show video results, such as for sports and travel destinations. Yet videos are also great resources for topics such as computer hardware and software (I bought my last RAID based on a video review), scientific experiments, and education such as courses on quantum mechanics.

    Google Video was first launched in 2005 as a search service for television content because TV close-captioning made search possible and user-generated video had yet to take off. But it subsequently evolved to a site where individuals and corporations alike could post their own videos. Today Google Video searches many different video hosting sites, the largest of which is YouTube, which we acquired in 2006.

    Every minute, 15 hours worth of video are uploaded to YouTube — the equivalent of 86,000 new full length movies every week. YouTube channels now include world leaders (the President of the United States and prime ministers of Japan, the UK and Australia), royalty (the Queen of England and Queen Rania of Jordan), religious leaders (the Pope), and those seeking free expression (when Venezuelan broadcaster El Observador was shut down by the government, it started broadcasting on YouTube).

    When it began, online video was associated with small fuzzy images. Today, many of our uploads are in HD quality (720 rows and greater) and can be streamed to computers, televisions, and mobile phones with increasing fidelity (thanks to improvements in video compression). In the future, vast libraries of movie-theater-quality video (4000+ columns) will be available instantly on any device.

    Books

    Books are one of the greatest sources of information in the world and from the earliest days of Google we hoped to eventually incorporate them into our search corpus. Within a couple of years, Larry was experimenting with digitizing books using a jury-rigged contraption in our office. By 2003, we launched Google Print, now called Google Book Search. Today, we are able to search the full text of almost 10 million books. Moreover, in October we reached a landmark agreement with a broad class of authors and publishers, including the Authors' Guild and the Association of American Publishers. If approved by the Court, this deal will make millions of in-copyright, out-of-print books available for U.S. readers to search, preview, and buy online — something that has been simply unavailable to date. Many of these books are difficult, if not impossible, to find because they are not sold through bookstores or held on most library shelves; yet they make up the vast majority of books in existence. The agreement also provides other important public benefits, including increased access to users with disabilities, the creation of a non-profit registry to help others license these books, the creation of a corpus to promote basic research, and free access to full texts at a kiosk in every public library in the United States.

    Geo

    While digitizing all the world's books is an ambitious project, digitizing the world is even more challenging. Beginning with our acquisition of Keyhole (the basis of Google Earth) in October 2004, it has been our goal to provide high-quality information for geographic needs. By offering both Google Earth and Google Maps, we aim to provide a comprehensive world model encompassing all geographic information including imagery, topography, road, buildings, and annotations. Today we stitch together images from satellites, airplanes, cars, and user uploads, as well as collect important data, such as roads, from numerous different sources including governments, companies, and directly from users. After the launch of Google Map Maker in Pakistan, users mapped 25,000 kilometers of uncharted road in just two months.

    Ads

    We always believed that we could have an advertising system that would add value not only to our bottom line but also to the quality of our search result pages. Rather than relying on distracting flashy ads, we developed relevant, clearly marked text-based ads above and to the right of our search results. After a number of early experiments, the first self-service system known as AdWords launched in 2000 starting with 350 advertisers. While these ads yielded small amounts of money compared to banner ads at the time, as the dot-com bubble burst, this system became our life preserver. As we syndicated it to EarthLink and then AOL, it became an important source of revenue for other companies as well.

    Today, AdWords has grown beyond just being a feature of Google. It is a vast ecosystem that provides valuable traffic and leads to hundreds of thousands of businesses: indeed in many ways it has helped democratize access to advertising, by creating an open marketplace where small business and start-ups can compete with well-established, well-funded companies. AdWords is also an important source of revenue for websites that create the content that we all search. Last year, AdSense (our publisher-facing program) generated more than $5 billion dollars of revenue for our many publishing partners.

    Also in the last year we ventured further into other advertising formats with the acquisition of DoubleClick. This may seem at odds with the value we place on relevant text-based ads. However, we have found that richer ad formats have their place such as video ads within YouTube and dynamic ads on game websites. In fact, we also now serve video ads on television with our AdSense for TV product. Our goal is to match advertisers and publishers using the formats and mediums most appropriate to their goals and audience.

    Despite the progress in our advertising systems and the growth of our base of advertisers, I believe there are significant improvements still to be made. While our ad system has powerful features, it is also complex, and can confuse many small and local advertisers whose products and services could be very useful to our users. Furthermore, the presentation formats of our advertisements are not the optimal way to peruse through large numbers of products. In the next decade, I hope we can more effectively incorporate commercial offerings from the tens of millions of businesses worldwide and present them to consumers when and where they are most useful.

    Apps

    Within a couple of years of our founding, a number of colleagues and I were starting to hit the limitations of our traditional email clients. Our mailboxes were too big for them to handle speedily and reliably. It was challenging or impossible to have email available and synchronized when switching between different computers and platforms. Furthermore, email access required VPN (virtual private networks) so everyone was always VPN'ing, thereby creating extra security risks. Searching mail was slow, awkward, and cumbersome.

    By the end of 2001 we had a prototype of Gmail that was used internally. Like several existing services at the time, it was web-based. But unlike those services it was designed for power users with high volumes of email. While our initial focus was on internal usage, it soon became clear we had something of value for the whole world. When Gmail was launched externally, in 2004, other top webmail sites offered 2MB and 4MB mailboxes, less than the size of a single attachment I might find in a message today. Gmail offered 1 Gigabyte at launch, included full-text search, and a host of other features not previously found in webmail. Since then Gmail has continued to push the envelope of email systems, including functionality such as instant messaging, video-conferencing, and offline access (launched in Gmail Labs this past January). Today some Googlers have more than 25 gigabytes of email going back nearly 10 years that they can search through in seconds. By the time you read this, you should be able to receive emails written in French and read them in English.

    The benefits of web-based services, also known as cloud computing, are clear. There is no installation. All data is stored safely in a data center (no worries if your hard drive crashes). It can be accessed anytime, anywhere there is a working web browser and Internet connection (and sometimes even if there is not one — see below).

    Perhaps even more importantly, new forms of communication and collaboration become possible. I am writing this letter using Google Docs. There are several other people helping me edit it simultaneously. Moments ago I stepped away and worked on it on a laptop. Without having to hit save or manage any synchronization all the changes appeared in seconds on the desktop that I am back to using now. In fact, today I have worked on this document using three different operating systems and two different web browsers, all without any special software or complex logistics.

    In addition to Gmail and Google Docs, the Google Apps suite of products now includes Spreadsheets, Calendar, Sites, and more. It is also now available to companies, universities, and other organizations. In fact, more than 1 million organizations use Google Apps today, including Genentech, the Washington D.C. city government, the University of Arizona, and Gothenburg University in Sweden.

    Because tens of millions of consumers already use our products, it is easy for organizations — from businesses to non-profits — to adopt them. Very little training is required and the passionate Google users already in these organizations are usually excited to help those who need a hand. In many ways, Google Apps are even more powerful in a business or group than they are for individuals because Apps can change the way businesses operate and the speed at which they move. For example, with Google Apps Web Forms we innovated by addressing the key problem of distributed data collection, making it incredibly simple to collect survey data from within the enterprise — a critical feature for collecting internal feedback we use extensively when "dogfooding" all of our products.

    There are a number of things we could improve about these web services. For example, since they have arisen from different groups and acquisitions, there is less uniformity across them than there should be. For example, they can have different sharing models and chat capabilities. We are working to shift all of our applications to a common infrastructure. I believe we will achieve this soon, creating greater uniformity and capability across all of them.

    Chrome

    We have found the web-based service model to have significant advantages. But it also comes with its own set of challenges, primarily related to web browsers, which can be slow, unreliable, and unable to function offline. Rather than accept these shortcomings, we have sought to remedy them in a number of ways. We have contributed code and generated revenue for several existing web browsers like Mozilla Firefox, enabling them to invest more in their software. We have also developed extensions such as Google Gears, which allows a browser to function offline.

    In the past couple of years, however, we decided that we wanted to make some substantial architectural changes to how web browsers work. For example, we felt that different tabs should be segregated into separate sandboxes so that one poorly functioning website does not take down the whole browser. We also felt that for us to continue to build great web services we needed much faster Javascript performance than current browsers offered.

    To address these issues we have created a new browser, called Google Chrome. It has a multiprocess model and a very fast JavaScript engine we call V8. There are many other notable features, so I invite you to try it out for yourself. Chrome is not yet available on Mac and Linux so many of us, myself included, are not able to use it on a regular basis. If all goes well, this should be addressed later this year. Of course, this is just the start, and Chrome will continue to evolve. Furthermore, other web browsers have been spurred on by Chrome in areas such as JavaScript performance, making everyone better off.

    Android

    We first created mobile search for Google back in 2000 and then we started to create progressively more tailored and complex mobile offerings. Today, the phone I carry in my pocket is more powerful than the desktop computer I used in 1998. It is possible that this year, more Internet-capable smartphones will ship than desktop PCs. In fact, your most "personal" computer, the one that you carry with you in your pocket, is the smartphone. Today, almost a third of all Google searches in Japan are coming from mobile devices — a leading indicator of where the rest of the world will soon be.

    However, mobile software development has been challenging. There are different mobile platforms, customized differently to each device and carrier combination. Furthermore, deploying mobile applications can require separate business arrangements with individual carriers and manufacturers. While the rise of app stores from Apple, Nokia, RIM, Microsoft, and others as well as the adoption of HTML 5 on mobile platforms have helped, it is still very difficult to provide a service to the largest group of network-connected people in the world.

    We acquired the startup Android in 2005 and set about the ambitious goal of creating a new mobile operating system that would allow open interoperation across carriers and manufacturers. Last year, after a lot of hard work, we released Android to the world. As it is open source, anyone is free to use it and modify it. We look forward to seeing how this open platform will spur greater innovation. Furthermore, Android allows for easy creation of applications which can be deployed on any Android device. To date, more than 1000 apps have been uploaded to the Android Market including Shop Savvy (which reads bar codes and then compares prices), our own Latitude, and Guitar Hero World Tour.

    AI

    The past decade has seen tremendous changes in computing power amplified by the continued growth of Google's data centers. It has enabled the growth and processing of increasingly large data sets such as the web, the world's books, and video. This in turn has allowed problems once considered to be in the fantasy realm of artificial intelligence to come closer to reality.

    Google Translate supports automatic machine translation between 1640 language pairs. This is made possible by large computer clusters and vast repositories of monolingual and multilingual texts: http://www.google.com/intl/en/help/faq_translation.html. This technology also allows us to support translated search where the query gets translated to another language and the results get translated back.

    While the earliest Google Voice Search ran as a crude demo in 2001, today our own speech recognition technology powers GOOG411, the voice search feature of the Google Mobile App, and Google Voice. It, too, takes advantage of large training sets and significant computing capability. Last year, PicasaWeb, our photo hosting site, released face recognition, bringing a technology that is on the cutting edge of computer science to a consumer web service.

    Just a few months ago we released Google Flu Trends, a service that uses our logs data (without revealing personally identifiable information) to predict flu incidence weeks ahead of estimates by the Centers for Disease Control (CDC). It is amazing how an existing data set typically used for improving search quality can be brought to bear on a seemingly unrelated issue and can help to save lives. I believe this sort of approach can do even more — going beyond monitoring to inferring potential causes and cures of disease. This is just one example of how large data sets such as search logs coupled with powerful data mining can improve the world while safe guarding privacy.

    Conclusion

    Given the tremendous pace of technology, it is impossible to predict far into the future. However, I think the past decade tells us some things to expect in the next. Computers will be 100 times faster still and storage will be 100 times cheaper. Many of the problems that we call artificial intelligence today will become accepted as standard computational capabilities, including image processing, speech recognition, and natural language processing. New and amazing computational capabilities will be born that we cannot even imagine today.

    While about half the people in the world are online today via computers and mobile phones, the Internet will reach billions more in the coming decade. I expect that by using simple yet powerful models of computing such as web services, everyone will be more productive. These tools enable individuals, small groups, and small businesses to accomplish tasks that only large corporations could achieve before, whether it is making and releasing a movie, marketing a product, or reporting on a war.

    When I was a child, researching anything involved a long trip to the local library and good deal of luck that one of the books there would be about the subject of interest. I could not have imagined that today anyone would be able to research any topic in seconds. The dark clouds currently looming over the world economy are a hardship for us all, but by the time today's children grow up, this recession will be a footnote in history. Yet the technologies that we create between now and then will define their way of life.

    No comments: