Discover Top Posts Tagged with #datamatics

https://www.ryojiikeda.com/project/datamatics/

#ryoji ikeda #data visualization #datamatics #computer graphics #digital art

#AISolutions #Copilots #Datamatics #Microsoft #stocknewsindia

Datamatics | Trainee | Mumbai | jobs

View On WordPress

Datamatics recognized in Gartner Hype Cycle for NLT

Mumbai, August 12, 2020: Datamatics Global Services Ltd. announced that it is recognized in Gartner Hype Cycle for Natural Language Technologies, 2020. This report is authored by analysts Bern Elliot, Anthony Mullen, Adrian Lee, and Stephen Emmott.

It is the first year that Gartner is publishing a Hype Cycle for Natural Language Technologies (NLT). According to the report, “Recent advances in…

View On WordPress

#artificial intelligence #BPM #Cognitive Sciences solution #data lakes #Datamatics #Natural Language Technologies #NLT #TruAI

Build modern data platform with Apache Hadoop (data lakes)

Over the last few years, the data type and data quality have varied dynamically. Besides, the data volumes have increased exponentially. As a result, the traditional data warehouses, which are accustomed to process only structured data, are finding it increasingly difficult to store, process, and analyze at scale leading to bottle necks and task failures. A data lake, built using Apache Hadoop, on-premise or on-cloud is fast becoming the newage solution for storing, processing, and analyzing multi-variate and high volume data in real-time.

Benefits of data lakes over data warehouses:

Data lakes integrate seamlessly with traditional database systems, analytics tools, and query engines used for business reporting. They can systematically extend a traditional data warehouse. At a broad level, data lakes provide the following benefits:

Distributed File system: It offers a highly scalable, fault-tolerant, distributed storage. It works concurrently with a number of data access applications through a YARN service.

YARN: It simultaneously allows multiple data processing engines provide analytics at scale; for example: interactive SQL, real-time streaming, data science, analytics workbench, and batch processing.

Support for Apache Spark: The augmented technology offers lightning fast unified analytics for large scale data processing. It allows writing parallel applications in Java, Scala, Python, R, and SQL.

Reduced TCO of data analysis: Data lakes reduce the cost of data management and analytics. Organizations experience reduced time-to-insights as well as deployment from days to minutes.

Unified storage platform: It reduces the number of data marts, employs business centric rules, and helps access policies within the same storage platform.

New insight creation: With a wide range of analytics tools, the platform helps quickly answer queries with reduction in time for insights and complex analytics.

Business Impact:

Data lakes offer a centralized repository, which is scalable across innumerable machines.

They allow ingesting, storing, processing, and transforming both structured and unstructured data, unlike data warehouses.

They empower a business with multi-functional tools such as data discovery, reporting, advanced analytics, and visual reporting on stored data irrespective of the native format.

They offer to draw insights in real-time and access it on-demand.

Best Practices:

Here are some best practices for building a data lake solution as a new initiative or as a re-architecture of a data warehouse:

Configure data lakes to be flexible and scalable for aggregating and storing all types of data.

Include Big Data Analytics components, which support data encryption, search, complex analysis, interactive analytics, and querying.

Implement access control policies and data security mechanisms to protect the stored data.

Provide data search mechanisms for quick and easy search and retrieval towards performing 360 degree analysis.

Ensure data movement for allowing import of any amount of data in its native format from disparate sources into a unified storage.

Securely store, index, and catalog data for allowing easy understanding and search of data streaming from mobile apps, IoT devices, and social media.

Perform comprehensive analytics using popular Big Data frameworks, such as Apache Hadoop and Spark without moving the data to a separate analytics system.

Use Machine Learning to derive valuable insights and perform self-learning using models, predict outcomes, and suggest actions for achieving optimal results.

Use BI tools, which seamlessly integrate with the data lake platform, to provide faster business analytics, dashboards and visualizations that are accessible from any browser and mobile device.

Strategies to extend a traditional data warehouse using data lake:

Retain the frequently used data in the warehouse and offload the unused data and ETL workload to data lake repository. Use a Big Data Analytics framework such as Apache Spark to perform fast in-memory analytics and maintain business continuity.

Migrate data in batches by using Network File System (NFS) or Apache Sqoop or real-time methods such as Kafka Connect. Subsequently, store the data in Hive tables or Parquet or Avro files.

Use unified SQL engines to deliver data to Business Intelligence teams. Leverage stored data in tables using BI tools. BI teams can query the offloaded data using SQL whereas the Data Science teams can analyze the newly sourced data using Analytics workbench.

Data lakes on cloud:

Data lakes augment data storage, intelligent processing, and complex analytics, especially on cloud platforms such as Infrastructure as a Service (IaaS). Data lakes usage can also be extended through Data Analysis (DAaaS) platform:

Data lake analytics: The goal is achieved by scaling the data storage and processing over the Infrastructure as a Service (IaaS) platform, which is provided by the likes of AWS and Microsoft Azure. With IaaS, organizations can develop and run massive parallel data transformations and processing programs without business overheads.

On-premise or on-cloud Apache Spark and Hadoop Services: AWS EMR and Azure HDInsights provide a fully managed cloud-based Hadoop cluster with Analytics capabilities and extended support for Machine Learning libraries. It helps to quickly spin-up the cluster on-demand and scale-up and down based on organization requirements and needs.

Data lake storage to power Big Data Analytics: AWS s3 powered data lake solutions provide massively scalable and secure storage. They are highly available, designed to deliver 99.999999999% durability, and store data for millions of applications. They also provide ‘query in place’ functionality, which allows running a query on the data set at rest. AWS s3 is supported by the large community of third-party applications and AWS services.

In Summary:

Data lakes solve challenges related to business intelligence and analytics. However, business needs are constantly evolving. Future-proofing data lake implementations, which evolve with the organization’s business needs, are the way ahead. Data lakes built on Hadoop platform empower the businesses to grow around existing and new data assets and easily derive business insights without limitations.

#datalakes #hadoop #datamatics #analytics #aws

Simplify product review analysis using Deep Learning and Natural Language Processing

A plethora of products are available in a highly consumerized world. As it is a ‘buyer’s market’, people prefer to read the product online reviews before purchasing them. Product review analysis per se also helps companies get consumer feedback and add value to their product. Though in both the cases, reading massive amounts of inputs provided by consumers in an unstructured format is an extremely lengthy and cumbersome process. Artificial Intelligence (AI) technologies such as Natural Language Processing (NLP) and Deep Learning help to analyze vast amounts of product review data and decipher consumer sentiment as positive, negative, or neutral.

A modular approach:

The solution to this business challenge comprises of four modules, which greatly simplify the review task:

Review classification: By using Word Embedding and Deep Learning, the module classifies the product review in to four categories – Query, Complaint, Praise, and Suggestion. Word Embedding uses Word2Vec and GloVe mechanism.

Feature capture: By using Advanced NLP and Topic Modeling, the module extracts the feature from the review.

Topic capture: By using Latent Dirichlet Allocation (LDA), the module extracts the topic from the review.

Sentiment analysis: By using Deep Learning methods such as Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM), the module captures the sentiment including sarcasm.

#natural language process #nlp #deep learning #datamatics

Datamatics Global Services Limited (DGSL) and AEP Ticketing solutions (AEP) from Italy have won the contract for the automated fare collection system for 52 metro stations of the Mumbai Metro.

#Mumbai metro #AEP #Datamatics

Week 4 Visualizing the Blockchain

For the Assignment 1 project an aspect of computation I want to explore is Blockchain. The Wikipedia definition of blockchain is a ‘growing list of records, called blocks, which are linked using cryptography. Each block contains a cryptographic hash of the previous block, a timestamp, and transaction data. By design, a blockchain is resistant to modification of the data.’

This technology still seems allusive to me, despite the Transmediale lectures I attended, so my goal is to explore this topic from a theoretical, computational as well as artistic perspective. Something that personally concerns me is the preservation and commodification of new media art, as it seems to me that most institutions have not yet come to terms with this new discourse. Blockchain offers a digital signature as a sign of authenticity and encapsulates decentralized transactions based upon trust as well as online anonymity. With a similar democratic approach to the Open Source movement, this could potentially revolutionize buying and selling computational artworks. With the modern digitization of every aspect of culture, I feel now is a good time to research this contemporary topic and explore what artistic potential it may have.

In our group session we discussed each other’s projects and it seems that I am not the only person somewhat confused by the concept of blockchain, so the first step would be to uncover this mystery. To initialize this process, I began reading an article by Artnome which explores the potential of blockchain within the art world. The text serves as a brilliant introduction to the topic as it briefly covers areas such as entrepreneurial approaches to the technology, the problems with blockchain and its applications within computational art.

However, for more in depth reading it came to my attention that Furtherfield recently published a new book entitled ‘Artists Re:thinking the Blockchain.’ This text would be an absolute necessity for my speculative research project and to tie in with the release of this publication they hosted many recent events which I missed unfortunately. However, I plan to keep my eyes peeled for any future Furtherfield events or exhibitions and, as primary research, I might also contact them for discussion on the topic.

Ultimately this is for a final research artifact, to which I believe a Ryoji Ikeda style data visualization would be most suitable. From a theoretical perspective, this exercise would help to visually communicate what the blockchain is using practical skills from Lior’s Programming for Artists module. For inspiration, I found another article on visualizations of bitcoin data which are both informative and impressive to behold. Some of these even come with recipes and links to API’s so this would help me to begin the process of Visualizing the Blockchain.

References • https://en.wikipedia.org/wiki/Blockchain • https://www.artnome.com/news/2018/7/21/art-world-meet-blockchain • Catlow, R., Garrett, M., Jones, N., & Skinner, S. (2017). Artists re:thinking the blockchain. • https://datalion.com/visualizing-blockchain-7-beautiful-informative-bitcoin-visualizations/

Image from Visualizing blockchain 7 beautiful informative bitcoin visualizations by: Data Lion https://datalion.com/visualizing-blockchain-7-beautiful-informative-bitcoin-visualizations/

Image of datamatics [prototype-ver.2.0] by: Ryoji Ikeda http://www.ryojiikeda.com/project/datamatics/#datamatics

#theory #blockchain #furtherfield #ryoji ikeda #datamatics #data visualization #bitcoin