Data has become ‘the’ asset of the digital age. Organizations with access to it and having the ideas, ability and the technology to process it, are able to transform themselves and disrupt the existing business models, while those without it are at the receiving end of this wave. However, leveraging the mammoth data at one’s expense can be quite daunting given the multitude of challenges around people, process and technology it can pose.
We spoke to Rao Yendluri, Chief Technology Officer at Innominds, to gather his views on the subject as well as get a detailed overview of how their Advance Analytics Solutions Platform — “iFusion Analytics” solves these bottlenecks.
Zinnov: Big Data as a phenomenon has been there for quite some time now; it was a buzz word a few years back but increasingly, we are seeing it as a more prominent technology being adopted by enterprises all over. In general, how has the adoption of big data in industry transformed in the last few years?
Rao Yendluri: If we take a step back, we can observe that technologies using Big Data have been in practice before the term “Big Data” was actually coined. In the initial days, it was all about mining historical data to get insights for driving future strategies. The data volumes were already into terabytes and petabytes. But recent years have pushed forward the need to collect data from new sources (variety) and analysing it for optimal usage. Take for example e-commerce, where the large amount of transaction data could be used to gain real time insights to win more customers and provide better experience. As people became more conversant and familiar about getting insights from the varied sources of data, prescriptive and predictive analytics came to the forefront to predict the future and take the right actions.
The transformation of Big Data has taken off in the last 12 years. Over this period, several challenges were addressed that led to the evolution of big data solutions and platforms. One challenge was the large amount of time taken to process data in batch oriented solutions. Google’s MapReduce was an early solution to this, after which we saw Yahoo building their original initiative, resulting in the offshoot of Hortonworks and Cloudera, ultimately giving birth to the Hadoop world. Apart from the processing time, another major concern of enterprises was the security of their data, preventing them from moving to such platforms. Also limiting their adoption was inadequate support to SQL. Many customers were already using SQL and were reluctant to completely move to a new platform. There were issues of slow processing when Hadoop started supporting SQL, enabling distributed cluster to be mined. This forced Hadoop to extend support and make the platform optimized for third party solutions to ensure it did not end up as a research product. Also, such platforms initially supported only sequential job tasks. As data across enterprises grew, the need to run multiple sequential job tasks emerged. It led to the birth of Impala by Cloudera and Yarn in Hadoop which allowed more query capabilities with higher speed, ultimately resulting in pipelining and parallelism capabilities. When the need for real time processing arose, Spark came to the rescue, showcasing near real time processing capabilities. The adoption further grew as Spark on top of Hadoop produced better results, leading to the rise in customers embracing it. Security and Hadoop was enhanced too with technologies emerging around identity and access, masking, securing data in flight and at rest.
Zinnov: What would you say are the major concerns of Enterprises today for big data implementations? Also, how do enterprises make the right choice from amongst the different offerings of Big Data available today?
Rao Yendluri: The biggest pain point today is that there is no single tool that addresses all Big Data implementation requirements. In the golden era of typical data warehouses, there was a one stop solution to enterprise data mining or data warehouse problems. However, the new big data solutions require enterprises to identify the right set of tools to collect the data, clean it and prepare it for various types of analysis. They need experts for these activities which becomes a costly, time-consuming affair. The expert would not only need to identify the right tool for the activity, but also need to know the tool very well, and be conversant in its installation, configuration and effective usage.
Another problem is that of migration. Enterprises have data lying in heterogeneous stores and migrating all of this data to Big Data lakes is time consuming and expensive. The problem does not really stop there. Even after streamlining of data collection process and the selection of the right tool, another important concern today is identifying data scientists needed for the job. Data scientists are versatile people with knowledge in statistics but in short supply with relevant business domain understanding and programming skills. Finding the right Data Scientist for a company is a hard task indeed. After effectively building the required algorithms and including all delays, IT overheads, etc., there can be cases where a Data Scientist might come back to say that we do not have all the fields required to run an effective algorithm. All these challenges make the process extremely time consuming and delaying in gaining insights into the data.
Zinnov: With this background, can you talk about your platform, iFusion?
Rao Yendluri: iFusion is a self-service platform for analytics that simplifies and combines various necessary packages for analytics; iFusion helps enterprises make the appropriate choices in identification of tools required, and streamlines installation and configuration. It also addresses security concerns; iFusion comes with LDAP, Active Directory integration including CA Security stack. Post installation, one only needs to worry about from which source data needs to be collected, the configuration of which is again made easier by the simplified interface of iFusion. Depending on the source of data, the right tool for the job such as Sqoop, Kafka, etc. is selected automatically.
In short, our iFusion platform eases the following tasks for users:
Zinnov: Talk about how iFusion is solving the big data problems you mentioned earlier?
Rao Yendluri: Data in an enterprise can range anywhere from gigabytes to petabytes. Many of the tools that work on top of Hadoop can solve such data volume issues. But if data goes beyond terabytes, it becomes a challenge to store, manage and use the data at optimal costs. With iFusion, you do not have to copy all the data but only extract as needed from the existing operational system and copy the necessary parts. It can then be directly curated and consumed by data scientists to execute necessary algorithms based on the objective. Another key objective for iFusion solution is the support for collecting data from heterogeneous sources. We also allow effective collection of data from federated data stores and with our data virtualization server, users can see unified view of the data so that work can be started quickly. With Federated data store support, you can connect it to the existing data stores which could have data from different sources such as SQL, oracle, web logs, xml, etc. Post data collection, only a subset of relevant data can be retrieved for consumption.
We have also created a workflow that allows sometimes repetitive, work to be done by a data analyst or a business analyst. Typically, data scientists must invest time in multiple steps of data preparation before application of final set of algorithms. Algorithms are core skills of a data scientist. iFusion would enable the data scientist to focus only on the algorithm development or application instead of spending precious hours on repetitive work. In general, 60% of the data scientist’s time is spent on data preparation. With iFusion we intend to bring that time down to 20%. This would result in the top management or a decision maker getting results faster. Getting insights into the data faster could yield to cost savings and/or better profits.
iFusion makes it easier to do ETL and ELT tasks. We do more of ELT scenarios where we load the data and users can then transform the data anyway they like. On the other hand, if user requires ETL tool, our workflow allows them to easily integrate any 3rd party ETL tool in iFusion runtime context.
Zinnov: How is iFusion superior to some of the competing offerings in the market? What are its core differentiators?
Rao Yendluri: There are several differentiators compared to other platforms. iFusion offers the ability to read through the operational data and take only a subset of needed data, and to process it instead of reading through the entire data. Users hardly have to worry about things such as tools to be used, security, data life-cycle etc. Because of our seamless integration into user environment along with easy integration with users current LDAP or Active Directory, they can maintain one central system for security instead of having to manage two separate systems.
Another differentiating factor is the drastic reduction in project development cost and delivery time as you won’t need many people to work on such development activities due to the self-learning capabilities that iFusion offers. We expect users would be able to reduce project cost and delivery time by nearly 50% and gain insights into the data much faster.
Zinnov: What are the top use cases that iFusion is most relevant to?
Rao Yendluri: This platform gives you the flexibility to build as many solutions as you have ideas of. You can prototype it and if it looks good, the tool can be used to put that solution into the production environment. We have already worked on three innovative use cases and are working on few more as we speak.
Zinnov: Are there any actual case examples of the implementations that have happened which you want to highlight?
Rao Yendluri: We are doing number of projects to build solutions to our customers using iFusion. We have version V1.0 available for customers. We are planning to deploy at two telecom companies in EMEA soon. We are also actively exploring multiple use cases across industries such as Banking, Healthcare, Insurance and Retail.
Zinnov: The last question to you — what is the way ahead for this product? Are there any instances of using advanced technology with the platform?
Rao Yendluri: There is a lot to be added to the platform. Our platform would be primarily driven by customer requirements. So, based on the requirements, our platform would keep evolving. We already use machine learning, though we may also incorporate other cognitive computing technologies such as NLP and image/ video processing going forward. We would wait and see how customer requirements dictate the road map for 2018.