IoT and the Autonomous Vehicle in the Clouds: Spark Summit East talk by Jay White Bear
seen from China

seen from Germany
seen from Netherlands
seen from Russia

seen from Ukraine
seen from Australia

seen from Germany
seen from Australia
seen from United Kingdom
seen from Netherlands

seen from Ukraine
seen from Poland
seen from Türkiye
seen from China
seen from Ukraine
seen from United States

seen from Malaysia
seen from Ukraine
seen from South Korea
seen from China
IoT and the Autonomous Vehicle in the Clouds: Spark Summit East talk by Jay White Bear
IoT and the Autonomous Vehicle in the Clouds: Spark Summit East talk by Jay White Bear
Spark Summit 2016 – Key Highlights
I recently had the amazing chance to meet many interesting folks at Spark Summit 2016 and also learnt quite a bit about the technology updates and where the industry is heading. In this blog, I would like to summarize my key take-aways from the event. Spark summit 2016 keynote was heavily focussed on Deep Learning (DL). Jeff Dean of Google TensorFlow project showcased how they are using DL in most of their products- be it instant replies in Inbox app, Google photos app suggesting text related to photos, Google real-time language translation from images or suggesting solar panel for your home by analyzing your house rooftop. They have even provided APIs for the community to use the DL models without having to spend the time in re-inventing the wheel to solve critical business problems.
Here are some great links if you would like to delve deeper on some of the DL products by Google:
Project Sunroof
Vision API – Image Content Analysis
Cloud Machine Learning – Predictive Analytics
We are constantly seeing the increase of DL in day-to-day products and they are getting better and better. Jeff even claimed that currently 60% of the replies in Inbox mail app happens through smart replies which relies extensively on Deep Learning. Isn’t it amazing?Not just Google, even Andrew Ng, Chief Data Scientist of Baidu and CoFounder of Coursera had shared lot of awesome data products he is building which extensively use Deep Learning (DL). Needless to say AI is going to revolutionize many industries ranging from Healthcare, Industrial, Manufacturing & Transportation.
New features in Spark 2.0 & MLlib 2.0
Structured streaming which combines streaming and interactive analysis
Tungsten phase 2 speedups 5-20x
Unification of DataSets and DataFrames
DataFrame API will become primary but RDD based API will still exist in maintenance mode
Expansion of Python/R API
Model persistence
MLlib for exploratory data analysis
Following new algorithms have made into 2.0: - Generalized Linear Model - Approximate counting of distinct elements - Approximate Quantile algorithms have been added
Customizing ML pipelines - 29 feature transformers (Tokenizer, Word2Vec) - 21 models (for classification, regression, clustering) - Model tuning & evaluation
Other interesting talks related to Data Science
Huohua Distributed time series analysis by TwoSigma - Timeseries RDD in Huohua - Temporal joins - Group function on time series data
Elasticsearch-hadoop project
Apache SystemML project is going strong
Baidu has built Parallel Asynchronous Distributed Deep Learning Engine (PADDLE) with CPU & GPU support to perform vision, speech, and NLP workloads at scale
Automatic features generation and model training on Spark using bayesian approach showed lot of interesting optimization opportunity in hyper parameter tuning
Red Hat team showed how they are analyzing log data to find anomalies and reducing False alarms by using techniques like Ensembles of Decision trees and Self organizing maps
The summit overall was an amazing exposure into the diverse initiatives being done in Spark and how are companies positioning their needs amidst the Industrial Internet boom. The next months will be truly interesting to watch the interesting use-cases data science will empower users with.
Important links
Apache Spark MLlib 2.0 Preview: Data Science and Production
Approximate Algorithms in Apache Spark: HyperLogLog and Quantiles
Spark 2.0 will offer Interactive Querying of Live data
Bayesian optimization for Hyperparameter tuning