When we think about Data in computer technology terms, we think of the bits stored in databases or files. Data can be structured in tables with rows and columns, semi-structured in formats like JSON or XML, or unstructured and stored in various formats. Data can be both text and binary like photos or videos.
Those bits turn into information when retrieved and examined to seek patterns or create a higher understanding of the data they represent. Information is, effectively, the first level of semantic organization, processing, and categorization of data. For example, knowing the total number of orders in the previous month is helpful information for an organization selling products.
When we compare the newfound understanding brought by this information along with other available information, we create knowledge. This is necessary for any kind of informed action. To continue our example, knowing if the number of orders last month has grown compared to the same month a year ago is precious knowledge for an organization trying to increase its sales.
We can all probably agree that there is no shortage of data created all the time. There are many technologies available for examining and processing the data and turning it into information. Snowflake’s Data Cloud is one we’re strongly invested with at Infostrux.
What strikes me is that if the creation of data and information is abundant, then the rate-limiting step for most organizations must be the creation of knowledge.
Lack of knowledge and the time it takes to acquire that knowledge can significantly impact the businesses’ ability to make quick and vital decisions that can influence growth, market position, customer satisfaction, and more.
Companies obsess over KPIs and OKRs and SMART goals yet, they don’t have reliable ways to measure the metrics they identified as key.
Without a reliable way to measure those goals and initiatives’ effectiveness, companies are left to rely on intuition, anecdotes, or past experiences.
We do know that companies who implemented data-driven decision cultures and processes are generally more successful. There is no silver bullet that guarantees success and there is such a thing as, “high confidence, bad answers,” that an improperly designed or poorly implemented data platform can lead to.
The path towards a solution starts with centralizing the company’s knowledge and democratizing access to it. To that end, we need data and information to be put together in one central place along with any previous knowledge gained before.
While we know this, most businesses seem to create silos and hoard data at the department level instead of making it easier to share across departments or with their partners.
In many instances, this is driven by fear or laziness: “Security requires that we prevent unauthorized access to financial data, so we need to segregate it into a data mart used for financial reporting only to the finance team.” I have often seen silos created by protectionism in the disguise of a policy excuse: “IT lacks a solid data governance policy, so we have no proper way to share our data outside my department.”
Then there are technical challenges too: “Loading more data into our EDW will make our analytics perform slower so we have to do the processing outside first and only load the aggregated data. This takes a lot of a developer’s time which we don’t have.”
The problem of data silos is an old problem. Many have tried to solve it on the product side for a long time. I was lucky to have participated in some of those attempts in the second half of the 2000s and in early 2010. I have been reflecting a lot lately on those experiences and this post is an attempt to share some of those reflections.
The enterprise culture is a part of the problem and in my experience, shifting it takes a lot more than technology. Part of the success with our previous business, TriNimbus, in the cloud infrastructure space, had to do with education and ongoing support to help companies to shift their mindset from the traditional IT vs Dev silos to a more collaborative Agile/DevOps/DevSecOps model.
Of course, we built great solutions uniquely enabled by the cloud and automated many processes using principles like infrastructure as code, CI/CD, the policy as code, etc. Ultimately the value of the technology was limited without the cultural change.
For this reason, I am motivated to drive a similar cultural change with Infostrux and our partner, Snowflake. We’re making progress towards incentivizing organizations to break data silos by making it possible to get all of the data in one place and removing performance barriers for processing and analyzing that data.
Concepts like governed data sharing which make it easy for business units and vendors in a supply chain to access and integrate their data might provide more incentive for businesses to break their data silos.
The proliferation of cloud platforms and SaaS that are used by organizations create integration challenges. Businesses find themselves dealing with a lot more sources of data than before and getting data from some of those sources is challenging.
To provide a specific example from e-commerce, omnichannel strategies with businesses offering their products through their own digital properties as well as channels like Amazon Marketplace, Facebook, and Instagram find it hard to understand the performance of their investments as the various platforms are not incentivized to give their customers full access to their data.
Unfortunately, their business models are built around obtaining and retaining a lot of data around their users and suppliers. Transparently sharing that data with their customers is not part of the service.
The digital transformation puts a lot of pressure on organizations to deploy more technology, adopt more software, instrument more of their products, etc. Suddenly, organizations that previously ran their business on the data from a limited set of systems like ERP, CRM, and accounting find themselves collecting a plethora of data across many systems and measure all kinds of aspects of their business, products, users, etc.
Bringing that data together and creating useful insights has developed a whole new industry, set of technologies, and approaches that we know as Big Data, Data Analytics, Data Science, etc.
All of this creates an increase in data and puts a greater demand on organizations to drive more data analysis and make “data-driven decisions.” Organizations often find out that they’ve fallen into the “high confidence, bad answers,” trap by building solutions that don’t work well enough due to missing or unreliable data.
I think this will be an ongoing race where the goalpost is changing all the time. Product innovation, new models for analyzing data, new tools for integrating data, and efforts from businesses like ours will all be needed to try and stay ahead of the curve.
At Infostrux, we help you deal with the undifferentiated heavy lifting required to improve data reliability and break data silos.