Certified Datasets as Drivers for Data Democratization

We started our journey at Infostrux in November 2020, two months before we officially launched the business publicly in January. We came up with many theories and hypotheses around the market and customer needs and felt strongly that data engineering is a challenging problem for many businesses. We indeed correctly predicted there would be great interest in such expertise. However, we didn’t anticipate what we learned in the four months since January.

The punchline is that customers want to make sense of their data fast! The engineering effort to bring the data together under one platform is essential, and the expertise to do that reliably is challenging to find. However, many technologies exist to make that work more manageable, and patterns and best practices have been developed to automate the process using data lakes and pipelines. In the cloud, customers were used to the advantages of virtually infinite scalability and performance, expecting the same with data. Spending months to build the infrastructure for ingesting data from multiple sources, writing and deploying ETL scripts, and configuring data lakes and warehouses is not what they want to see. They want to move to value quickly, but there is one big hurdle for that.

The next step, integrating and modeling the data, is a real challenge with no easy shortcuts. It requires a thorough understanding of the data from the various sources used by the organization, along with a deeper understanding of the business operations and how multiple departments and stakeholders use data. It is a collaborative effort that requires the coordination of various groups of people. Effective communication, the use of precise language, and the ability to cut through organizational divides are some of the softer skills needed to successfully bridge the gap between a random pile of data that is seemingly clean. But, only a few can effectively use carefully curated datasets that democratize access to information by lowering the barrier for anyone with decent knowledge of SQL or a standard BI and analytical tool to start gaining insights from that information.

At Infostrux, we come to data from the bottom up. We like solving the plumbing problems that typically plague BI and analytics projects. We want to remove undifferentiated issues for our customers by letting automation do the heavy lifting. We realize we’re uniquely positioned to work with various familiar data sources that many businesses commonly use. This allows us to build an intimate knowledge of those sources' data and understand how most customers use them. We didn’t reasonably expect that virtually every customer we talk to will ask for help with integrating and modeling their data and giving access to reliable datasets to different parts of the organization to work with them directly.

By accelerating the process for quickly ingesting data from multiple sources and bringing it all together under one platform (in our case, Snowflake), we’re enabling many of our customers to have an opportunity for the very first time to access all of their data at once. This is driving the appetite for making sense of that data right there where the data is hosted so it can be shared directly with analysts to run their processing and generate insights without the data models being locked inside reports and dashboards. Curating reliable datasets that anyone can work with unlocks a lot of value and creates many data power users within the organization. Data Models are becoming the interface through which data is democratized, and investments in new capabilities like advanced analytics or data science are enabled.

As a business, we focus on solving problems. Having access to reliable, curated, and trusted datasets is a common problem holding organizations back from moving toward becoming data-driven. We’re happy to be able to help remove that problem for our customers.

– Goran Kimovski