Google offers about 50 different products in its Cloud solution, from storage and computing infrastructure to Machine Learning, including massive data analysis and transformation tools. These solutions are mainly quick to set up (around 10 minutes or less) and cheap compared to standard on premise softwares.
You can find some turnkey solutions following the SaaS (Software as a Service) model, such as Machine Learning APIs (facial recognition or emotion detector), and low-level applications following the IaaS (Infrastructure as a Service) model, such as storage or Cloud Computing. Products halfway between infrastructure and service are also available and allow you to deploy your own applications: it’s the PaaS model (Platform as a Service).
3 different kinds of services: from Infrastructure service (IaaS) to turnkey service (SaaS)
In the era of serverless
The main part of Google Cloud products operates in “serverless” architecture. These servers are somewhere in Google Cloud, but they are invisible for the user, who doesn’t need to worry about infrastructure dimension anymore. In other words, the system automatically scales down depending on the storage and computing power needs of the user.
Google App Engine allows you to deploy your own PaaS application in a secure way by easily managing versioning. Moreover, it’s possible to implement A/B testing by deploying two different versions of the same application. This solution is able to scale up to 1 billion users.
Google Cloud offers numerous storage solutions adapted to different needs. Only a few minutes are necessary to set up a cloud storage instance.
The most basic one is flat files storage, immutable on the Cloud, where you pay according to the volume of files you have in the instance (called a bucket). You can choose the kind of buckets you want depending on the way you use it. The latency will then adapt if you need fast and regular access or rare access.
Some products, such as BigTable or DataStore, allow you to store data as documents with a high speed and a very low latency (5ms to access 1 To). It’s ideal for real-time processing.
It’s very easy (it takes less than one minute) to mount a machine through either the Google Cloud Platform Console or the cloud command-line tool. You have to choose the number of cores (up to 64!), the RAM and the drive (HDD or SSD). You can mount network machines in order to balance the computing load between your different resources. This process lasts around 10 minutes. The computing power is charged in proportion to the operation of the machine (a turned off machine costs $ 0).
Several tools allow you to process very large amounts of data. With Dataflow, you can process batch or stream ETL. It’s automated provisioning and its API allows you for example to compute an average on a sliding window in a single line of code.
BigQuery offers a relational system (but not a transactional one) in a serverless mode, allowing you to analyze relational databases in SQL language. With its computing power, you are able to process 1 billion lines in two seconds for a query count with group by. It’s difficult to find this kind of performance on an infrastructure with such computing power.
Finally, it’s possible to mount a Hadoop cluster (with Pig/Hive/Spark components) in less than 2 minutes through the Cloud Console or the command-line tool. In your cluster, you can use preemptible machines, which are cheaper and can be used by Google at any time. Moreover, it’s possible to add or to remove a node even if a processing is in progress.
Machine Learning/Deep Learning
In terms of Machine Learning, Google offers APIs based on Tensorflow, allowing the user to implement neural networks. An NLP (Natural Language Processing) API is able to identify entities in a text and its characteristics (organizations, people, celebrities) and to extract a feeling and its intensity.
With the Google Vision API, the user is able to detect faces from an image, and the related emotions with a certain confidence level.
It’s also possible to use vision recognition in videos and then to list the visible objects on an image at different times of the video.
It’s interesting to note that the infrastructure behind these APIs doesn’t use normal processors such as a CPU (Computer Processing Unit) or a GPU (Graphic Processor Unit). Google made a new kind of processor named TPU (Tensor Processor Unit), optimized for matrix calculation, needed by neural networks, explaining the negligible calculation time. TPUs are also available on Cloud Compute.
For Data Scientists, the cloud DataLab instance allows to handle data and to build Machine or Deep Learning models (with Tensorflow) through a Python notebook, very close to Jupyter, and equipped by Data Science libraries (numpy, pandas, scikit learn, ntlk for text-mining, matplotlib and seaborn for data viz and tensorflow). However, this component still shows some instability (especially SSH connectivity issues).
One last interesting possibility, halfway from turnkey API and “from scratch” model: Cloud AutoML. With this feature, you can train Google’s models with your own data to fit the context of the problem being addressed.
The Google Cloud Platform solution allows the user to quickly create applications and to host them in serverless environments, with a billing method based on consumption per second of the infrastructures. For Data Science contexts, Google supplies a complete suite of technical components and solutions that quickly provide significant capabilities for exploring and exploiting data.