Data, Data Everywhere: How Machine Learning Tools Capture the Most Valuable Insights from Your Massive Data Collection
Big Data. Data Lakes. Analytics. Machine Learning. Are these tools only meant for tech giants like Google or Facebook? Or, can these tools meaningfully assist any business and help it respond to the ever-changing needs of the market? As the software industry continues to move toward consuming and delivering cloud-based solutions, many companies now realize they sit on a gold mine of data. Although many understand the value of data, few understand how to unlock that value.
Every time one of your customers clicks on a link, completes a transaction, or views a page, it can be logged and tracked. Some customers realize the value of this data and set up a data lake to store it. To put it simply, a data lake is a database that contains huge amounts of raw transactional data. Data lakes often get confused with data warehouses because they are similar, but data lakes are much more versatile and can grow much larger than data warehouses. With SaaS or connected on-premise solutions, data can be exported on a regular basis from the solution to a data lake, where the data is normalized. This enables you to apply analytics at a later date. If you need help choosing the right data lake, we suggest Hortonworks Data Platform. (Full disclosure: OFS partners with Hortonworks.)
Data lakes collect endless amounts of raw transactional data, but it is just that: raw data. Without applying analytics to that data, obtaining any useful information or action items from it is very difficult. Everyone knows what analytics is, but most only scratch the surface of what analytics can do for their business. For example, how many people look at a data lake and ask, “How many people logged in?” or “How many people clicked on a link after we deployed this new product/feature?” or “How many widgets were purchased after this marketing campaign?” It’s good to have this information when answering one-off tactical questions, but it does not tell a story about how your business is doing. It does not provide insights into your customers’ and users’ activities and trends.
What makes things more difficult is analyzing data from disparate systems. Think about a typical e-commerce shop. It has multiple, different backend systems all connected to create a complete solution:
Customer-facing web portals
ERP/OMS/WMS/other order and fulfillment systems
Sales/contact management systems
Each one of these systems comes from a different provider, has its own database, methods, naming conventions and APIs. The data lake ingests all data from each system, then uses the data for various analytics. Readmore