Stop Overprocessing Data
“We’re starving for data!”
This is something I regularly hear as I meet with stakeholders from companies large and small. Perhaps the most painful message came from the Director of Marketing of a large restaurant chain. We were on our second conference call when she said, with a sigh of frustration I’ll never forget, “Does this mean my team will be able to get current item data to analyze without having to request it from IT each time?”
My professional passion is being a data evangelist for people who want to make business more efficient. In today’s fast-paced world, hearing comments like that is a kick in the gut for me.
I’ve been in the data business for years. I have watched architectural patterns evolve and morph from enterprise data warehousing to operational data stores/integration hubs to logical data marts/data virtualization to big data stores and finally to ETL, ELT and Reverse-ELT.
Undoubtedly, these have been exciting times in the data business because each of those 5 data management patterns has merits in the right context. However, they fail to consider three things that often force BI and data leaders to extend their data pipelines through to successive data stores, thus overprocessing data and increasing the time needed to get data to consumers.
1 — Modern platforms are different
The world has seen a rapid rise in Software as a Service (SaaS) platforms, such as Dynamics, Workday and Salesforce, all of which have far simpler data schemas than the complex line of business and custom applications of old. Older systems stored data within a database optimized for efficiently writing data. Data reading, on the other hand, required numerous steps to extract data from tables, perform joins, flatten and convert oddly named columns names into something humans could consume.
Modern platforms require little processing to support analytics and data discovery. Indeed, many digitally native platforms have out-of-the-box integrations with leading cloud data warehouse vendors ready to deliver data directly to users.
Blueprint Technologies advocates for a fast-lane approach to pipe data from these services into a cloud-based data warehouse. The speed at which these data are delivered to analysts correlates directly to value. A data architecture should support as direct a path as possible for an organization.
2 — Modern data warehouses can scale up and down
In the past, the performance of a data warehouse depended on how much a company spent on compute nodes. Data warehouse and BI managers had to request and allocate capital to purchase additional compute and storage resources — often over yearly budgeting cycles. This meant there was an intense focus on curtailing query loads. One strategy was to — yet again — process data from the warehouse into data marts. These data marts held less data and were optimized for ad hoc queries.
While this was a way to protect the warehouse from uninformed users doing baseline analysis that could become costly, the data sets eventually provided to users were so overprocessed, and often only covered a short timeframe, it left little room for exploring data in any innovative way.
For cloud-based data warehouses, however, users exploring data in ad hoc or unprescribed ways have far less an impact on the system. Cloud data warehouses can scale dynamically using various techniques. There is often no need to create these secondary data marts because the compute behind a data warehouse can auto-scale up and down with a query.
Blueprint believes the modern data estate has at its heart a cloud data warehouse. As data inform decisions (or should be informing decisions) at a much greater rate, Blueprint advises clients to seriously consider the reasons driving additional data processing in a data pipeline.
3 — End users are more data savvy than you think
Historically, a single major contributor drove the need to overprocess data into secondary data marts and data cubes. BI tools (Excel Pivot, Power BI and other enterprise reporting and analytical platforms) relied on data to be in a specific format to support a click-and-drag user experience.
While that may be appropriate for a certain class of users, data managers have quickly found that employees today have the skills and access to data exploration tools that often negates the need to create these tailored data marts. Ten to 15 years ago, data needed to be overprocessed because end users didn’t know anything more than fundamental report navigation — dumping data into Excel and making pivot tables. Now though, whether a student received a degree in finance, MIS or engineering, they often understand statistics, possess SQL skills and are quite savvy with modern data tools.
Blueprint is passionate about unleashing the power of data. Overprocessing data and reducing the degrees of freedom these savvy analysts have in exploring insights and outliers for a business is an ever-greater risk to improving operations. Carefully consider the risk/reward tradeoff to overprocessing and delaying access to data.
Aim important data pipelines & processing rituals at the right areas
Certainly, it is necessary to recognize that there are valid reasons to process data thoroughly. Formal financial statements, SEC filings and other scenarios in which transactional auditing and data precision within the confines of a financial or planning data model are high on that list.
But when it comes to connecting data from many other business areas, companies should question the need to overprocess. When exploring trends about tickets from the field, it doesn’t materially change anything if the numbers are off slightly. Companies need to get over their fears of handing over raw data to their BI engineers and analysts.
Let’s start a conversation about how Blueprint can help you establish a cloud-based data estate that prioritizes getting data to users quickly, enabling data-driven business decisions in a competitive landscape.