Apache Spark flatMap transformation
Apache Spark
In our previous post, we talked about the Map transformation in Spark. In this post we will learn the flatMap transformation.
As per Apache Spark documentation, flatMap(func) is similar to map, but each input item can be mapped to 0 or more output items That means the func should return a scala.collection.Seq rather than a single item.
Let’s…
Made a top view of the proposed scene, with a half buried hedron in the middle nad a broken on the side. squares represents cabins and circle represent trees. Circle with a cross represent more lifeless tree, purhaps burned and snow covered? The sky should also contain some hedrons floating in the distance. Hedrons should be futuristic but with exposed components representing damage perhaps? Or can go for a more steampunk vibe with pistons outside? Cabins need to be primitive and maybe even collapsed cabins. the whole scene need to be covered in snow. Have not yet decided if human figures will be included. Could add some smoke indicating camp fire.
This Article is about FlatMap In Angular. Let’s say we wanted to implement an AJAX search feature in which every keypress in a text field will automatically perform a search and update the page with the results. How would this look? Well we would have an Observablesubscribed to events coming from an input field, and on every change of input we want to perform some HTTP…
If you read our blog, you could notice that Node.js community is growing extremely fast. In this post, we are happy to announce wonderful news – Node 12.6.0 and 12.5.0 are released! There are many changes in the new versions of Node. Here we gathered the most important ones: build: su...
Seguimos con la serie Aprende apache spark con Java Apache Spark 2 : Flat map con Java #java #apachespark #transformations #flatmap #devs4j
A diferencia de Map, FlatMap nos permitirá generar una lista de elementos de uno solo, en Map teníamos un elemento de entrada y teníamos uno de salida, con FlatMap tendremos un elemento de entrada y a partir de este generaremos un conjunto de elementos de salida.
En el post Apache Spark 2 : Conceptos básicos utilizamos Flat map para realizar un conteo de palabras, en este post haremos una…
In this blog, we would be looking at how map() and flatMap() operations work with Option and Future of scala, literally speaking both Futures and Options are very effective features of scala, A Future lets us have a value from some task on a differnt thread and Option provides us a hand from null of java as using null in scala is seen a very bad approach in functional programming.
We’ve been using Rx for a while now and across a variety of projects. Yet we continue to learn new things - in this case, a clear understanding of FlatMap.
My team and I recently discovered a bug in one of our projects, and the culprit turned out to be the FlatMap operator—or rather, our misuse of it. I don’t know why, but FlatMap was a recurring source of confusion. (And the official Rx docs didn’t really shed much light on the subject.) Fortunately for us, we eventually gained clarity around the FlatMap operator using a real-life analogy that I thought would be worth sharing:
FlatMap is like the combining or flattening of commits pushed by a growing team of developers on a project.
Think of a team of developers on a project that uses a Continuous Integration (CI) service to build each pushed commit. The CI is interested in all of the commits pushed by all of the developers, including newly added ones, on the project.
FlatMap Analogy. Animation by Deyu Wang
We can dig into this more with code—in this case, Swift (and RxSwift). First, let’s define the objects we’ll need.
class Project { private let developerSubject = PublishSubject<developer>() var developerStream: Observable<developer> { return developerSubject.asObservable() } func addDeveloper(_ developer: Developer) { developerSubject.onNext(developer) } func stop() { developerSubject.onCompleted() } } class Developer { private let commitSubject = PublishSubject<commit>() let name: String init(_ name: String) { self.name = name } func startCoding() -> Observable<commit> { return commitSubject.asObservable() } func stopCoding() { commitSubject.onCompleted() } // Helper to externally simulate coding activity. func pushCommit(_ hash: String) { commitSubject.onNext(Commit(author: name, hash: hash)) } } struct Commit { let author: String let hash: String } class CI: ObserverType { func on(_ event: Event<commit>) { switch event { case .next(let commit): print("CI is building \(commit).") case .completed: print("CI stopped.") case .error(let error): print("CI errored: \(error).") } } }
Type Declarations
Now let’s instantiate things and hook it all up.
let project = Project() let jim = Developer("Jim") let anna = Developer("Anna") let bob = Developer("Bob") let ci = CI() project.developerStream .flatMap { developer -> Observable<commit> in print("\(developer.name) started coding...") return developer.startCoding() } .subscribe(ci)
Setup Objects
We flatMap the developerStream onto each developer’s commits. The resulting stream is subscribed to by the CI.
Looking closer, in the flatMap closure, for each developer emitted on developerStream, we return the developer’s stream of commits via the startCoding() function. This allows the flatMap to observe the commits emitted by all developers and then flatten them into a single output stream. The CI subscribes to this output stream so it can build each commit.
Let’s play with this to see what happens when we start adding developers and pushing commits. The trailing comments are the output from the print()s.
project.addDeveloper(jim) // Jim started coding... jim.pushCommit("1") // CI is building Commit(author: "Jim", hash: "1"). project.addDeveloper(anna) // Anna started coding... jim.pushCommit("2") // CI is building Commit(author: "Jim", hash: "2"). anna.pushCommit("3") // CI is building Commit(author: "Anna", hash: "3"). jim.pushCommit("4") // CI is building Commit(author: "Jim", hash: "4").
Normal Operation
Notice how even after Anna is added to the project, Jim remains “active” and the CI continues to see Jim’s additional commits. Jim’s commit stream returned from startCoding() doesn’t get replaced by Anna’s. Further, the order in which each developer is added doesn’t matter. The CI only sees the commits in the sequence they are pushed. The CI doesn’t care about the developers themselves, it only cares about their commits.
Taking a closer look, our flatMap subscribes to the commits from each developer received from developerStream. Even when new developers are received, those subscriptions continue to live on. This enables the commits from both Jim and Anna to be combined and flattened into a single output stream.
Wait — how is this different from **Merge?**
Instead of FlatMap, we could of course use the Merge operator to combine commits from multiple developers into a single stream. If the project’s developers are known and fixed, this would be fine, as Merge takes a static list of observables.
However, if Bob came along to join the project at a later point, we’d have to handle that somehow. With FlatMap, since we subscribe to new developers being added, it’s handled for us already. Bob’s commits would automatically be taken into account and added to the flatMap’s output stream. Thus, we could say that Merge is for a static list of observables, while FlatMap is for a dynamic list of observables.
So, that’s most of it. But let’s get a more complete (Rx joke) understanding of how FlatMap works by exploring complete and error events.
Completions
It’s easy to start things, hard to complete them.
This saying is very true for FlatMap. In a nutshell, the CI will receive the completed event once all “active” observables in the chain have completed.
Let’s look at an example where, after some normal operation, only the project stops. Will the CI stop?
project.addDeveloper(jim) // Jim started coding... project.stop() project.addDeveloper(bob) bob.pushCommit("1") jim.pushCommit("2") // CI is building Commit(author: "Jim", hash: "2").
Completing the Project (Parent Observable)
No, the CI doesn’t stop. After the project stops, adding Bob has no effect because developerStream has already completed so he’s never “activated” in the flatMap. Thus, when Bob pushes a commit, the CI doesn’t see it. However, existing project members like Jim are free to continue pounding away and pushing commits which do get built by the CI.
Now let’s examine when, instead of the project stopping, every developer on the project stops. Will the CI stop?
project.addDeveloper(jim) // Jim started coding... project.addDeveloper(anna) // Anna started coding... jim.stopCoding() anna.stopCoding() jim.pushCommit("1") project.addDeveloper(bob) // Bob started coding... bob.pushCommit("2") // CI is building Commit(author: "Bob", hash: "2").
Completing the Developers (Child Observables)
No, the CI doesn’t stop. As shown, when all the project’s developers stop, it affects neither the CI nor the project. Bob is still able to join the project afterwards and push commits which the CI builds.
The only way to stop the CI’s subscription is to stop the project and all existing developers:
project.addDeveloper(jim) // Jim started coding... project.addDeveloper(anna) // Anna started coding... project.stop() jim.stopCoding() anna.stopCoding() // CI stopped. project.addDeveloper(bob) bob.pushCommit("1") jim.pushCommit("2")
Completing the CI’s Subscription
It’s worth noting here that if no developers are ever added to the project, calling project.stop() would also stop the CI.
Errors
error events work as expected. When either the project or any added developer errors, the CI will receive the error event and the whole chain stops working.
Conclusion
Now that we know how FlatMap actually behaves, we’re able to use it with more confidence and in the right places.
The easiest way to play with this code (or any RxSwift code) on your own is to use the RxSwift repo’s playground. To get a closer look at things, including when the isDisposed event is being emitted, you may want to add this code to the playground along with some debug operators.
Acknowledgements
I’d like to thank Eli Burnstein for helping shape and edit this article, Deyu Wang for creating the visual animation, and everyone on my project team for helping me better understand FlatMap.