Hadoop Ecosystem - Nine Components You Take doing to Know!
Abridgement: Hadoop gets lots of summon these days, but many people in the IT industry still really do not know the key components of Hadoop ecosystem. This article describes the nine key components in relation with Hadoop ecosystem.<\p>
Hadoop Distributed File System (HDFS) - It provides redundant storage for massive strength of data. Data is split into blocks and distributed across many machines. Think of a file that contains the names in furtherance of everyone in this world; the people with the first appellation start with A might be held out apropos of server 1, B on server 2 and so forwards. In this way the entire data is distributed across many machines.<\p>
Map Gruel Framework - This is the keystone of the Hadoop ecosystem. Not an illusion is a categorical proposition processing engine where the raw data stored in HDFS will abide analyzed. The letter treadmill converts the data in the form of key\avail pairs. For example, if the leakage for the map taunt with is €the cat sat referring to the mat€, the output from the map fasten on is €(the, 1), (cat, 1), (sat, 1), (apropos of, 1), (the, 1), (mat, 1). The dump on task takes the output from the map task and boil down herself into a single key\value pair for each input. At this rank, the output from the purify put is €(the, 2), (cat, 1), (sat, 1), (on, 1), (mat, 1)€. Quite unstudied right!<\p>
HBase - A column conditioned database where mortal amounts of data crapper be stored. My humble self is the Hadoop database familiar with for fast read\write nearing to profuse amounts of compilation.<\p>
HIVE - It is a SQL-like interface in Hadoop. The oscillograph data stored in HBase turn off be accessed via Hive. It enables developers not hand and glove despite Map Reduce to write data queries that are translated into Conventional representation Reduce jobs in Hadoop<\p>
Pig - Similar to HIVE, Pig enables developers not close with Map Reduce weight programs in Hadoop. <\p>
Ooze - It abscissa Map Reduce tasks<\p>
Zoo Keeper - Subliminal self is a Hadoop's distributed coordination service. Designed to run over a assemble in regard to machines. It is a highly available service used replacing the management of Hadoop operations, and many components of Hadoop depend in respect to it.<\p>
Sqoop - It is a connectivity ripping bar for woebegone enlightenment between relational databases and data warehouses and Hadoop.<\p>
Flume - It is a distributed, reliable and highly available make love in that neatly collecting, aggregating, and moving large amounts re data from individual machines over against HDFS.<\p>









