In recent years, the hype surrounding the value of data has grown continuously, and a multitude of concepts and methods have emerged on how companies can become 'data-driven'. From strategic top management to detail-oriented data analysts attempts are being made to place data at the heart of value creation. The resulting initiatives regularly oscillate between a holistic data strategy (“we need common rules before we start working with data”) and exploratory data analysis (“we have so much data, there must be something valuable in it”). As a result, however, there are only a few initiatives that produce real progress on the way to becoming a data-driven company. This article presents Applied Data Products—an integrative approach to data products with concrete steps that aid in this process.
A core problem on the way to becoming a data-driven company is too much emphasis on one of two poles: Business or technology. While governance, regulation, and use cases are on one side, technology with concepts and techniques like APIs, data mesh, and UML diagrams is on the other. Very few people in a company manage to combine both sides effectively, which makes achieving true alignment a challenge.
In order to better bundle the business- and technology-oriented forces in our customers’ companies, we have combined proven and well-known frameworks into a consulting approach to data products that addresses specifically the connection between the two aforementioned poles. We call our approach Applied Data Products and–true to the motto “Learn first, scale second”–it focuses on creating the greatest possible added value in order to develop an overall strategy to become data-driven based on successful individual cases.
Increase Efficiency with Simple Data Products
Let's make it very concrete: If New York's authorities want to track down tax evaders with a holistic data strategy, the project is lengthy, risky, and has a high probability of failing. The key question is: How can I send data-supported tax inspectors to the right companies? If you start with the hospitality industry, the use case is almost obvious: Because a lot of waste means a lot of turnover, it should be enough to systematically record waste volumes and compare them with reported turnover. Where the discrepancies are large, a more detailed check for potential tax evasion should be carried out. In fact, this procedure was used successfully a few years ago (see this German article). The recording of waste and its subsequent comparison with submitted turnover data led to a significant increase of detected tax evaders.
When introducing Applied Data Products, the first step is to clearly define one or more specific data products following the principle of “quality over quantity”. It is better to have a handful of well-considered data products on the way to becoming a data-driven company than trying to achieve too much at the same time. Next to defining actual data products, another goal of this step is to learn about your own organization: Which aspects of data-driven action are already working and in which specific areas there is still a need for action? While it doesn't matter whether a data product is a local Excel sheet or a data warehouse with a connected self-service BI dashboard, learning to think in terms of data is key here.
We have developed the Data Product Canvas (DPC) for the definition of data products, and evaluated its usefulness in various industries and scenarios. Based on the Business Model Canvas, the DPC serves to provide all stakeholders with a quick and comprehensive overview of a data product. It is not intended to be a substitute for specialist or technical documentation. Rather, we position it as a conversation starter. That is, instead of addressing all details clearly and unambiguously, it primarily serves as a basis for discussion, including questions such as:
- Which source systems do we need to integrate? Who is responsible for these systems (and therefore the quality of the data supplied)?
- What is the added value of our product? What processing steps do we need to carry out on the source data?
- What actions should our data product enable, directly and indirectly?
- Where do costs arise in the provision of the product?
- How can success be measured?
The Structure of the DPC
Our DPC is divided into five areas with a total of 12 fields in order to capture all essential information regarding a data product. Above everything is a management summary that in a clearly understandable fashion describes the core idea of a data product as well as its owning and responsible party.
The main part of the DPC consists of three blocks:
- Sources, Storage & Processing
- Value Proposition
- Discoverability, Distribution & Usage
At its bottom, the canvas gathers auxiliary information that is not at the core of a data product, but nevertheless plays a role in the product’s design and development.
The Contents of the DPC
Sometimes information about a data product cannot be clearly assigned to a field of the DPC. However, in the sense of a conversation starter, it is not important that topics are included in the right field, but that they are actually included and taken into account.
From left to right and top to bottom, the DPC’s fields capture the following information.
Data Providers
Organizational and technical data sources are at the beginning of the value chain. Depending on the degree of abstraction, departments or specific systems can be named here. The information in the Data Providers field addresses aspects such as
- What content does the data product draw from which source systems?
- Is it internal company data or external data?
- Which departments are data suppliers and who is responsible for defining this data?
Data Processing
As soon as data is available, actions are carried out with it. In the simplest case, data is only aggregated, but preparation and processing steps are probably also required. The documentation of data processing within the DPC can be abstract according to anticipated implementation phases, or specific with the naming of tools. The information in the Data Processing field addresses aspects such as
- What steps are carried out with the data before it is made available?
- What steps are taken for data cleansing?
- How are data quality and redundancy addressed?
- Are summarization steps carried out via ETL or machine learning?
Frequency and Storage
In order to make data available within the product, it must be stored. Whether and when data is stored long-term or only calculated and distributed ad hoc is the core content of the DPC field Frequency and Storage. There is also the question of how often the (summarized) data changes and is available to consumers, e.g., monthly, daily, hourly, or real-time?
Value Proposition(s)
At the heart of the DPC is the value proposition, i.e., the actual added value of a data product. The corresponding field thus addresses the central question: What value does the (use of) the data provide for a department or the entire company? The focus here is on the impact or outcomes, not the small-scale outputs associated with a data product.
Distribution and Access
The provision of data and appropriate regulation of access is essential for successful data products. Whether sent weekly by email or made available at any time via APIs, it must be clear to everyone involved who actually controls data product access. Can individual data records be released granularly or does access to a data product mean that all available information can also be viewed? Technical and organizational information also come together in the Distribution and Access field.
Discoverability and Semantics
Related to the topic of Distribution and Access is the aspect of discoverability. How do departments and users know that a data product exists? Is there a data catalog or are users specifically invited? There is also the question of how existing data is to be interpreted and what its semantics is—what is the definition of “turnover” or “customer”, for example? Who is responsible for documenting the semantics that are already relevant when accessing data sources?
Target Users
In the context of products, it is always important to consider for whom they are being implemented. This means that the target users are those for whom value is to be created conceptually. However, the fact that other people within or outside a company may also be able to use a data product is not the focus of the DPC’s Target Users field. Instead, its main focus is on defining for whom the product is being optimized. This usually results in use cases, which are captured in the following field.
Use Cases
Once Target Users have been defined, the DPC’s Use Cases field documents what is to be done with the data. Will existing use cases be better supported or will new use cases be made possible in the first place? Is the considered data product just a preliminary product that is part of a longer value chain? A clear understanding of what actions take place on the basis of the captured data product helps both producers and consumers to manage expectations.
Cost Structure
At the DPC’s bottom, the Cost Structure field addresses the biggest cost drivers for a data product. Note that the field is not about detailed cost planning, but about gaining clarity as to whether storage, processing, or even acquisition of external data are among the largest cost blocks.
Success Measurement
As experience has shown that many data products are initially created with a focus on internal departments, a cost structure cannot always be compared with a revenue structure. For this reason, the DPC’s Success Measurement field contains criteria that are intended to make the success of a data product measurable. Success can, for example, be defined in the form of revenue or profit, but also as the number of API calls or the reduction of the access count for an Excel list in Sharepoint. It is essential to establish measurable parameters in order to actually be able to learn on the way to becoming a data-driven company.
Conclusion
A proven approach to enable companies and employees to make sustainable progress towards becoming (parts of) data-driven organizations is the definition of data products, which also requires a change in mentality to think in terms of such products. In general, it doesn't matter whether data products are locally maintained Excel spreadsheets or highly complex machine learning pipelines behind API gateways. What is crucial is that all stakeholders start to understand the value of data for business decisions and translate it into data products. The presented Data Product Canvas (DPC) is just one aspect on the way to Applied Data Products, which is supplemented by other tools such as stakeholder maps. The DPC’s structure allows focusing on added data product value as much as possible. In the following, appropriate rules and specifications can be derived from successful data product definitions. Too much abstraction and specifications in the early phases usually lead to lower learning effects and significantly slower progress on the path to becoming data-driven.
More articles
fromDr. Florian Rademacher & Stephan Hochhaus
Your job at codecentric?
Jobs
Agile Developer und Consultant (w/d/m)
Alle Standorte
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Gemeinsam bessere Projekte umsetzen.
Wir helfen deinem Unternehmen.
Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.
Hilf uns, noch besser zu werden.
Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.
Blog authors
Dr. Florian Rademacher
IT Consultant & Softwarearchitekt
Do you still have questions? Just send me a message.
Stephan Hochhaus
Standortleitung Dortmund
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.