What Is Informatica HParser for Hadoop?
Sifting through the PRish announcements related to Informatica HParser, what I’ve figured out so far is:
it is the T in ETL
a visual tool for creating parsing definitions for formats like web logs, XML, JSON, FIX, SWIFT, HL7, CDR, WORD, PDF, XLS, etc.
transformations can be accessed from Hadoop MapReduce, Hive, or Pig
the benefits of using HParser come from being able to share the same parsing definitions/transformations in the context of the Hadoop distributed environment
HParser tries to provide an optimal transformation solution when streaming, splitting, and processing large files
HParser is available in two licensing formats: community and commercial
Original title and link: What Is Informatica HParser for Hadoop? (NoSQL database©myNoSQL)







