Instead of a serious business or a technical post, let's start our journey through The Dataland Zoo with a fun and hopefully educational topic. As the time comes, we will share our views on data engineering, architecture, analytics, and such.
Welcome to Infostrux!
It would be appropriate to take a lighter note at the start and a moment to enjoy how far we've evolved from data structures, hierarchical models, and entity-relationship representations of data. Let’s explore the wonderful and exotic world of various data species at every part of the modern technology zoo (ahem, stack).
The Early Exploration of Data
Early in my software development career, I did some reporting and my fair share of database and SQL work. Around 2004/2005, I worked with Crystal Reports and SQL Server Reporting Services on customer projects. This was when I started getting interested in data in terms of business intelligence value.
In 2006, I fully immersed myself in the world of BI when joining Business Objects (later acquired by SAP.) This is where I spent time with customers and product folks, focusing on operational BI, predictive analytics, actionable reporting, etc.
We predicted that data would be the engine that would power businesses and drive many well-informed decisions. This was reflected in the names used in our technology stack – BI Platform, Universe Model, and Web Intelligence.
When I was inside the BI bubble, it was difficult to predict that only a tiny fraction of the overall data that businesses generated (and, more importantly, are interested in) would be well-structured, neatly organized relational data stored in database tables and ready to be analyzed with SQL queries.
The so-called Big Data revolution was driven by ever-growing SQL databases and by the explosion of all kinds of data stored in all kinds of formats, organized in all kinds of structures and models. The infrastructure around data also evolved to enable the processing of a plethora of data formats, structures, and volumes.
The technical innovations demanded a new skill set unavailable in the previous database and IT world. On the business end, all kinds of stakeholders started using more and more software (often SaaS.) As a result, businesses generated increasing volumes of data for their business units and functions. Companies began to face the challenge of having control and access to their data for analysis and decision-making.
These factors created a strong demand for a new unicorn skillset wrapped in one term — Data Engineering. Data engineering loosely refers to people who can handle anything from low-level infrastructure work to deploying data technologies.
Data engineers connect data sources to data warehouses and lakes using automated pipelines. From there, they create the coding work inside those pipelines and platforms for cleaning, transforming, integrating, and modeling the data.
These steps enable the teams of business analysts and data scientists to start developing their reports and dashboards. Teams can then uncover valuable business insights or train their machine-learning models.
Many analysts and scientists are hired to build reports and develop analytics without the proper investment to complicate matters further. Investment in proper data platforms and the data engineering effort are required to make the work successful.
When teams have to learn how to do that themselves, they get lost in The Dataland Zoo of technologies and approaches without proper support, guidance, or knowledge of best practices.
The Future of The Dataland Zoo
We're at an exciting stage in evolving the relationship between data and business and the collision of the IT and software worlds. It started with the advent of cloud, DevOps, and ‘as code’ approaches to building everything, and we see that it continues to impact the data layer too.
Concepts such as Snowflake's Data Cloud and the open approach to data architecture – embracing all forms of data and expanding the use cases for data beyond traditional BI or data analytics – are bound to drive more innovation, further expanding The Dataland Zoo.
I am very excited to see where these trends take us and the journey we’re embarking on with our new business.
At Infostrux, we feel very strongly about data engineering as a valuable practice that should be invested in. Many businesses will start reliably controlling their data when they embrace ‘as code’ principles of disciplined automation based on solid software development practices and cloud-native architecture.
Not only this, but businesses can finally work with trusted analytics to power their decision-making. Teams can make and build truly innovative data products to differentiate them in their markets.
Onwards and upwards. Looking forward to the next evolution of the cloud, with data at the center of everything!