A quick introduction to Apache Spark
seen from United States

seen from Malaysia

seen from United States
seen from Malaysia
seen from United States

seen from United States
seen from China
seen from United States
seen from United States
seen from United States
seen from China
seen from United States

seen from Malaysia

seen from Japan
seen from India
seen from India
seen from China

seen from Malaysia
seen from China
seen from Germany
A quick introduction to Apache Spark
(via https://www.youtube.com/watch?v=7vky0HWxMiE)
This is a video that our friend Kathleen Hayes at Typesafe helped to make. It sumamrizes the ideas of Open Genomics clearly and concisely. The biology side is brilliantly explained by the folks from Driver, and the way AMPLab OSS approaches it by Matt Massie and Frank Nothaft. The key opportunity, how OSS developers can help, is outlined by David Patterson, Matei Zaharia, and Martin Odersky. Alexy Khrabrov invites developers to join Open Genomics to learn more about the space and figure out ways to contribute.
Psychology Researchers Study Overclaiming
Stav Atir, Emily Rosenzweig and David Dunning at the Attention, Memory, and Perception Lab in the Psychology Department at Cornell have been testing their theory about over-claiming in a series of recent studies.
The scientists proved in their work that over-claiming is “domain-specific” and their research “ suggests, importantly, that self-perceived knowledge prompts mistaken claims of impossible expertise--not dishonest claims,” according to Huffington Post.
The findings of these studies will be published in the a forthcoming issue of Psychological Science.
How to build SPARK on Windows
It's great to play around with SPARK in local mode. Here is how you can build it for Windows.
get the latest source from http://spark-project.org/downloads/ .. I built Spark 0.6.2 on my Windows 7 laptop.
I had latest scala version 2.10.0 installed in my machine and Spark is built with Scala version 2.9.2. Ideally this should not create any problem. But it did not work for me. So I had to uninstall Scala and install Scala version 2.9.2 from scala-lang.org
While installing scala, remember to install it at some path that does not contain any space, eg, C:\Program Files\scala will NOT work as it contains a space. So install it at some other path like C:\software\scala
Set this PATH as windows env variable.
It also requires sbt (simple build tool) and it comes bundled with Spark Code. But when I tried to build, I got the following error,
###########
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.
###########
Instead of correcting the error, I downloaded sbt msi from sbt site
and installed it and set the Windows PATH to include c:\Program Files\sbt\bin
I ran "sbt package" from the spark-0.6.2 directory and the build compiled successfully.
Before building: if you are working with hadoop, please identify which version of hadoop you are working with and specify that in SparkBuild.scala file under the project folder.
--------------
// Hadoop version to build against. For example, "0.20.2", "0.20.205.0", or
// "1.0.3" for Apache releases, or "0.20.2-cdh3u5" for Cloudera Hadoop.
val HADOOP_VERSION = "0.20.205.0"
val HADOOP_MAJOR_VERSION = "1"
// For Hadoop 2 versions such as "2.0.0-mr1-cdh4.1.1", set the HADOOP_MAJOR_VERSION to "2"
//val HADOOP_VERSION = "2.0.0-mr1-cdh4.1.1"
//val HADOOP_MAJOR_VERSION = "2"
---------------------------------
After the build is successful, In the conf directory, create spark-env.cmd and set the following environment variable
-------------------
set SCALA_HOME=<SCALA PATH> (Example: C:\software\scala\scala\bin)
set SPARK_CLASSPATH=C:\......\SPARK\source\spark-0.6.2-sources\spark-0.6.2\core\target\scala-2.9.2\spark-core_2.9.2-0.6.2.jar;C:\......\scala\scala\lib\scala-library.jar;C:\......\scala\scala\lib\scala-compiler.jar;
--------------------
Now you should be able to run the REPL, using the spark-shell.cmd
If you want to run the master for the standalone mode, you can do so by running run spark.deploy.master.Master
By default you can access the web UI for the master at port 8080. You can refer to the guide for more information.