Google Compute Engine: An overview
// <![CDATA[ (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-42543297-1', 'tumblr.com'); ga('send', 'pageview'); // ]]>
(Originally posted on July 4, 2012)
Google Compute Engine (hereafter GCE) is the Infrastructure as a Service (IaaS) offering by Google, as such, it compares, and competes, directly with the IaaS offered by Amazon (AWS). Imitation is the sincerest form of flattery, as the saying goes, but I am fairly confident that comforting thoughts of adulation is not what preoccupies the minds of Amazon executives, right now.
One can only assume that Google is trying to enter a field that Amazon dominates because there is money in it. In order to take a fair share of the market, they must offer a better product than AWS (or the other incumbents) and do so with serious incentives. So, can they do that?
We do not have the answer to that question, of course. What is publicly available today is fairly limited for use in production. However, given the engineering potency of Google, it is worth having an early and careful look at GCE.
The GCE offering brakes down into the following broad categories:
The first three are the fundamental blocks of any data center. The last is crucial for gaining adoption because even if computing, networking, and storage are less expensive or work better on GCE than, say, on AWS, the ability to setup an environment, operate it, and integrate it with some in-house infrastructure is crucial. Moreover, just offering these capabilities will not be enough for Google, the tools must be user friendly and integrate well with existing development environments as well as data center environments (e.g. vSphere from VMware).
It remains to be seen whether the speed, scale, and global footprint that Google has achieved for its own operations can scale when other customers jump on the same data centers. For example, right now, it seems that GCE offers only two “zones” (us-central1-a and us-east1-a) and only 8 types of instances; the former limitation renders GCE more “local” than “global” and the latter limitation indicates fewer options for setting up exactly the cloud infrastructure that you need. Compare that situation with Amazon’s eight regions (each of which has two or more availability zones): US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), and South America (Sao Paulo). Amazon has proved that they can do that and do it well, Google might be ranked first based on traffic but most of that traffic is on the search site, which is fairly simplistic compared to, say, the Amazon retail site. Lastly, it should be noted that, so far, the overall Google infrastructure has been running only Google products.
The equivalent of the AWS web console is the Google APIs Console, which is the “one-stop shop” for creating and managing multiple API projects — a project is the concept used by Google for grouping GCE resources, defining team membership to these resources, defining group ownerships, and billing for the use of these resources.
There is also a command line utility (the gcutil tool) that can run on Linux distros and Mac OS X, which have Python 2.6.x or 2.7.x installed — currently, the gcutil tool does not support Python 3.x. For MS Windows, the gcutil tool is supported if you install Cygwin.
Authentication and authorization for GCE is based on the OAuth 2.0 protocol. The SSH connection to a running instance seems to be established by the following command:
ssh -o UserKnownHostsFile=/dev/null -o CheckHostIP=no -o StrictHostKeyChecking=no -i /Users/<someuser>/.ssh/google_compute_engine -A -p 22 <someuser>@<IP>
At this time, it is not clear neither how the SSH key pairs are created and maintained nor what are the accepted formats or key lengths — for example, AWS does not accept DSA keys and the allowed key lengths are 1024, 2048, and 4096.
The default instances seem to be Ubuntu based, and the name of the host is set by the name of the instance via the Dynamic Host Configuration Protocol (DHCP). The API is JSON over HTTP and it supports three main channels, the gcutil tool (and its siblings, I suppose), the UI console, and code (through open source libraries and third parties).
From a hardware perspective, the infrastructure seems to be built on Intel Sandy Bridge. The CPU utilization is based on the concept of a GCEU (Google Compute Engine Unit), which is a unit of CPU capacity that describes the compute power of GCE instance types. The minimum power of one logical core (a hardware hyper-thread) on the GCE Sandy Bridge platform is considered equivalent to 2.75 GCEU. The number of virtual cores can be 1, 2, 4, or 8 depending on the machine (instance) type. The number of GCEUs is proportional (x 2.75) to these numbers, of course. You can have ephemeral as well as persistent storage; the latter ranges between 128 GB and 1024 GB. The amount of memory (RAM) per virtual core is 3.75 GB; a table with all the details about the various GCE machine types can be found here.
Google’s virtualization technology is based on Kernel Virtual Machines (KVM) — a full virtualization solution for Linux on x86 hardware with virtualization extensions (Intel VT or AMD-V) — and Linux cgroups, a Linux kernel feature to limit, account and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups.
Each project is assigned its own Private Virtual Network (PVN — a concept that probably corresponds to the AWS VPC) — and that virtual network seamlessly extends across geographic locations (regions). From a security perspective, it is important to note that the traffic within your virtual network never goes through the public Internet infrastructure — that makes sense since packets with a private destination address are ignored by all public routers. The private IPv4 space is based on RFC 1918, so the range of IPs ought to be 172.16.0.0/12. In terms of external networking capabilities, GCE offers external IPs (types supported are “reserved”, “ephemeral”, “none”) with dynamic attachment to an instance, 1-to-1 NAT, and (of course) built-in firewalls. Some limitations include the blocking of outgoing SMTP traffic (presumably to avoid abuse) and the restriction to UDP, TCP, and ICMP for network traffic between your instances and the (public) Internet.
The above is a summary of the information that I have obtained from publicly available sites (primarily Google IO 2012 content). I have not covered storage sufficiently here but I have requested access to the service for a fair and balanced review. If and when I get it, I intend to update this post.