Welcome to Infostrux!
Instead of a serious business or a technical post, let’s start our journey through The Dataland Zoo with a topic that is fun and hopefully educational. As the time comes, we will share our views on topics like data engineering, data architecture, and data analytics. Right now, it would be appropriate to take a lighter note at the start and a moment to enjoy how far we’ve come to evolve from data structures, hierarchical models, and entity-relationship representations of data. Let’s explore the wonderful and exotic world of various data species that exist at every part of the modern technology zoo (ahem, stack).
The Early Exploration of Data
Early in my software development career, I had done a bit of reporting work and my fair share of database and SQL work. Around 2004/2005, I was working with Crystal Reports and SQL Server Reporting Services on customer projects. This is when I started getting really interested in data in terms of its
business intelligence value. In 2006, I fully immersed myself in the world of BI when joining Business Objects (later acquired by SAP.) This is where I spent time with customers and product folks focusing on topics like operational BI, predictive analytics, actionable reporting, etc. We predicted that data would be the engine that would power businesses and drive many well-informed decisions. This was reflected in the names used in our technology stack – BI Platform, Universe Model, and Web Intelligence.
When I was inside the BI bubble, it was difficult to predict that only a small fraction of the overall data that businesses generated (and more importantly are interested in) would be well structured neatly organized relational data, stored in database tables, and ready to be analyzed with SQL queries. The so-called Big Data revolution was not only driven by ever-growing SQL databases but by the explosion of all kinds of data stored in all kinds of formats, organized in all kinds of structures and models. The infrastructure around data also evolved to enable the processing of
the plethora of data formats, structures and, volumes. The technical innovations demanded a new type of skill set that was not available in the previous database and IT world. On the business end, all kinds of stakeholders started using more and more software (often SaaS.) As a result, businesses generated increasing volumes of data for their business units and functions. Companies began to face the challenge of having control and access to all of their data for analysis and decision making.
All of these factors created a strong demand for a new unicorn skillset wrapped in one term — Data Engineering. Data engineering is loosely used to refer to people who can handle anything from low-level infrastructure work to deploying data technologies. By using automated pipelines, data engineers connect data sources to data warehouses and data lakes. From there they create the coding work inside those pipelines and platforms for doing the cleaning, transformation, integration, and basic modelling of the data. All of these steps enable the teams of business analysts and data scientists to start developing their reports and dashboards. Teams can then uncover useful business insights or start training their machine learning models.
To complicate matters further, many analysts and scientists are hired to build reports and develop analytics without the proper investment. Investment in proper data platforms and the data engineering effort are required to make the work successful. When teams have to learn how to do that themselves they get lost in The Dataland Zoo of technologies and approaches with no proper support, guidance, or knowledge of best practices.
The Future of The Dataland Zoo
We’re at an interesting stage in the evolution of the relationship between data and business and the collision of the IT and software worlds. It started with the advent of cloud, DevOps, and ‘as code’ approaches to building everything and we see that it continues to have an impact at the data layer too. Concepts like Snowflake’s Data Cloud and the open approach to data architecture – embracing all forms of data and expanding the use cases for data beyond traditional BI or data analytics – are bound to drive more innovation; further expanding The Dataland Zoo. I am very excited to see where these trends take us and the journey we’re embarking on with our new business.
At Infostrux, we feel very strongly about data engineering as a valuable practice that deserves to be invested in. Many businesses will start to reliably control their data when they embrace ‘as code’ principles to disciplined automation based on solid software development practices and cloud-native architecture. Not only this, but businesses can finally work with trusted analytics to power their decision-making. Teams can make and build truly innovative data products that will differentiate them in their markets.
Onwards and upwards. Looking forward to the next evolution of the cloud, with data at the center of everything!