Data Sharing and the Future of Data Warehousing in the Cloud

Data Sharing and the Future of Data Warehousing in the Cloud

Recently, I had a conversation with a product leader of a SaaS organization who shared how many of their customers don’t use the built-in analytics they offer as part of their product and instead ask them for direct access to their data. Their customers prefer to load the data into their own data warehouse or data analytics platform for direct processing and analysis on their end. I’ve heard this same scenario over a dozen times in recent months and I am used to it by now.

Such a request would not be unusual if only their largest enterprise customers were asking for it — that has been a common request I have observed for over a decade now. 

What is unusual is how commonplace it is becoming for many SaaS vendors to have to export their data or give access to it to many of their customers, regardless of their size. Further, many of the SaaS vendors I’ve talked to in the past six months are telling me that many of their customers are asking them if they can share their data directly with them using Snowflake. Many of their customers started using Snowflake and would prefer to use Snowflake’s data sharing capabilities rather than importing the data using ETL tools.

Let me preface this post by saying that I love Snowflake’s private data exchange capabilities and the growing public data marketplace. It was one of the reasons why I got excited to focus Infostrux exclusively on Snowflake as the main data platform for our services. 

Recently, I had a fireside chat conversation with one of Snowflake’s early investors and board members, S. Somasegar (transparently, Soma is an investor and advisor to Infostrux too), who said in our interview that he thinks data sharing in the cloud will be bigger than data warehousing. We’re, of course, both biased and love the data cloud kool-aid, so I am always trying to seek evidence that prevents me from becoming blind to my biases. So far, I haven’t been able to find one. 

It feels validating that the organizations I am talking to are finding the data-sharing requirements rising in priority and are looking for ways how they can do that safely and efficiently.

What does data sharing in the cloud look like?

The old approaches of data sharing typically involve one of two methods:

  • The most common by far involves some kind of export or backup mechanism on the provider side. This puts the data to be shared into one or multiple files, and stores the data on a secure FTP or cloud storage. The data then needs to be imported or restored on the consumer side to bring the data into a data lake or a data warehouse technology for processing and integration with the rest of the organization’s data.
  • Larger enterprises may alternatively leverage their existing enterprise service bus or develop web services to facilitate the direct exchange of data with their customers, suppliers or third-party partners without the need for export and import processes.

Advanced architectures involving streaming services, big data technologies, microservices, etc. have also been applied to the problem, but they’re typically not suitable for organizations that don’t have sizable R&D capacity and know-how on both sides of the exchange.

Snowflake showed that it is possible to simplify sharing of data by giving the consumer direct SQL access to the provider’s data without the need for copying the data while maintaining a fully governed, secure, and highly controlled exchange between the provider and the consumer. 

This novel approach makes it possible for anyone with a Snowflake account to connect to the data shared by their SaaS vendors or third-party partners. As long as both parties are already using Snowflake, they can access the data and start gaining insights from it by connecting a BI or analytics tool. It does not require creating a data pipeline and importing the data into a data warehouse of their own.

Of course, they quickly discover that they can enrich that data with additional data from the public marketplace and may even start bringing other internal data from their own systems and over time develop a data warehouse or an analytics solution themselves where the original share that started the whole initiative is just one of multiple shares and sources feeding that solution.

Architect your data solution with data sharing as a core approach

My intent is not to write about Snowflake’s data sharing capabilities as those are well documented on Snowflake’s website. Instead, I would like to focus on the value of designing your architecture for data sharing internally within your organization even if you don’t have plans to give access to your data to anyone outside of your organization yet.

If you’re not new to the public cloud, you may already be familiar with the approach of using a multi-account cloud architecture for implementing an effective solution for separations of concerns by mapping each cloud account to your organizational structure, security and compliance concerns, or some other design pattern that best meets your organization’s needs. The same approach applies to your data architecture in Snowflake’s data cloud by using multiple Snowflake accounts within your data architecture and using data sharing as the interface between those accounts.

Note: even if you’re not using Snowflake today and you’re building on another cloud platform, you should consider making your architecture support data sharing as many of the other platforms are adding data sharing capabilities among their functionality.

The simplest example of how using data sharing can help you practice this approach is by creating a second sandbox account for data science. Using data sharing, you can give direct read-only access to your raw (“data lake”) schema as well as your processed (“data warehouse”) and modelled (“data mart”) schemas to your data science team. They can import whatever other data they want and create any datasets they need for their models or ML training directly inside their sandbox. 

This approach simplifies the security controls required for the two groups to work together on the same platform and empowers each to continually work on enhancing their data and solutions without spending a lot of time writing ETL to move the data across different technology stacks.

Next, you can enable additional accounts to support your development and testing efforts. This approach takes advantage of the ability to define how to handle sensitive data at the point of exchange by effectively classifying, masking, anonymizing, tokenizing, or filtering protected data. This makes it so the users in the development and testing accounts can use functionality like zero-copy cloning to work safely and directly with production data without putting any effort into cleaning it before using it during their tasks.

Once you gain enough experience empowering different functional groups within your data teams to work with your data, you can further extend the same approach by creating specific analytics accounts for different OUs like marketing, finance, sales, etc. 

Too often data teams respond to requests for new reports and dashboards in a centralized BI or analytics tool, or they have to add additional data sources to a central data warehouse only some OUs may need or use. Instead, you can maintain a core centralized data repository for the entire organization and enable each separate group to further work off of that repository. Once this is in place, they can run their own reporting or to enhance the data with additional datasets imported in their own account so they can bring that additional data into their reports.

From Data Sharing to Data Networks

Once you have been down the path of using data sharing as a way to architect a flexible, scalable, and extensible data platform for your organization, it is a very small step to extend that to your external partners, customers, etc. 

This has the side effect of inviting those organizations to in turn invest in their own data platforms and further enable data sharing, which can enable a two-way sharing opportunity for you too. 

With this approach, data warehousing becomes a component in a much wider data network solution where each “node” in that network can use a different data architecture for integrating, modelling, and preparing the data for the specific uses it is meant to serve.

It is not a big stretch to suggest that soon organizations can run their entire supply chain on a data cloud solution like Snowflakes. It is certainly a compelling opportunity for anyone who had to enable data sharing and data exchanges in the past and understands how difficult, complex, and often time brittle the traditional solutions are. 

With cloud-based data sharing, Snowflake is enabling new data architectures that are faster to evolve and easier to maintain. I am certainly excited about this opportunity and can’t agree more with Soma that data sharing will be much bigger than data warehousing!

Scroll to Top

Book Your FREE 2-Hour Data Analytics Workshop

In this one-on-one session, we explore your data analytics landscape and share our expertise about modern practices to guide the next steps in your data journey.