Recipe 2: Install/Start Spark/Scala REPL
Scala has an interactive shell, or prompt, or interpreter, or, more formally, REPL -- Read-Evaluate-Print-Loop, a venerable term descending from Lisp, the functional ancestor of Scala.
Both standard Scala and Spark distributions come with diver programs wrapping the REPL invocation. For Scala it is called scala, and for Spark it is spark-shell.
You can download both Scala and Spark as tarballs and have a REPL ready to go. We'll focus on the Spark REPL as it is in fact a Scala REPL with the Spark libraries made available and SparkContext defined.
Instead of getting binary downloads of JVM jars, it is much more general and wise to use a build tool like Maven or SBT to keep all of the jars in the same local repository. Maven is a general-purpose build tool for the JVM based on XML configuration files, and SBT started as a Simple Build Tool to become a powerful if certainly not so simple standard for Scala projects. It has a DSL (domain-specific language) to define builds which can contain both Scala and Java dependencies.
Let’s get SBT first. It has its own home at scala-sbt.org, and there are several options listed there. In essence SBT is just a jar and a driver shell script. If you have brew you can just do brew install sbt, or get the jar and a shell script from the Manual Installation link. SBT has its own versions, which use their own version of Scala, so if you ever need excruciatingly fine-grained control over versions of Java, Scala, and SBT from the very first SBT invocation, you can get Paul Phillips’ excellent sbt-extras driver version instead of the vanilla one. In that case make sure it’s on your shell PATH first.
Now that we got the shiny new sbt available on the command line, let’s write a build.sbt file. This file is a build definition for a Scala project, and at a minimum, it defines several names and optional dependencies. Here’s what our first build.sbt looks like:
name := "first-spark" organisation := "spark.recipes" version := "0.1" scalaVersion := "2.10.4" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0" initialCommands in console := """val sc = new org.apache.spark.SparkContext("local[4]", "demo")"""
Now from the project directory say sbt, and there at the prompt say console, and you are at the Spark REPL prompt!
scala> sc res5: org.apache.spark.SparkContext = org.apache.spark.SparkContext@798ddf93











