Beliebte Suchanfragen
|
//

When Business Meets Technology: From Data Product to Data Architecture with Domain-Driven Design

6.8.2024 | 24 minutes of reading time

Abstract

The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates discussions about these areas through specific questions, in which stakeholders from management, specialist and technical departments can participate equally. For this reason, however, the DPC deliberately leaves one question unanswered: How can conceptual data products defined with the DPC be transformed or integrated into technical data architectures? This article presents a systematic approach for an essential step from a data product to a data architecture. Specifically, Domain-Driven Design (DDD) is used to translate DPC elements into domain models, which capture the underlying data structures of a data product as well as the required interfaces to other systems. These models can then be transferred into executable software systems leveraging established DDD techniques or they can be used for downstream analysis and documentation purposes.

Introduction

If you're on LinkedIn and the like, you don't have to look far: Success stories about the establishment of large language models, proprietary machine learning models, or data-driven decisions are abundant. This contrasts with a representative survey by Bitkom which found that only six percent of the 600 German companies surveyed are convinced that they are fully exploiting the potential of their data (see this German article). Accordingly, only seven percent consider themselves to be pioneers in data-driven business models. In their article "Becoming a Data-Driven Company with Applied Data Products", Stephan Hochhaus and Florian Rademacher discuss why this is the case and in response present the Data Product Canvas (DPC)—a tool that enables the effective, lightweight, and iterative design of data products without running the risk of getting lost in technical details or being forced to have already developed an existing data strategy for the entire company.

While the DPC is a tool with which data products can be defined in a short period of time by the collaboration of various stakeholders from management, specialist and technical departments with a low entry barrier and resource expenditure, it deliberately leaves one question unanswered: How can defined conceptual data products be transferred or integrated into technical data architectures while taking existing system landscapes into account? This article presents a systematic approach for an essential step from a data product to a data architecture, i.e., the translation of DPC elements into domain models. These models capture the underlying data structures of a data product as well as the required interfaces to other systems. They are created using Domain-driven design (DDD), which is a methodology that has many similarities with the DPC like (i) focusing on collaboration between business and technology; (ii) a low entry barrier if you limit yourself to selected modeling elements; and (iii) a visual representation of results. In contrast to the DPC, however, DDD anticipates the technical refinement of domain models. Consequently, they can serve as a blueprint for software design or, with a DPC as a starting point, the domain-driven design of data architectures. Another advantage of DDD is the availability of an established knowledge base on potential usages of domain models including the derivation of interfaces, the design of event-driven software architectures, or the establishment of a basis for increasing systems’ maintainability.

This article begins with a brief summary of required DDD knowledge, then illustrates the transformation of DPCs into DDD-based domain models on the example use case “assortment benchmarking” from the Retail domain, gives a summarizing overview of the transformation steps in a reusable form, and eventually concludes with an outlook.

Background: Domain-Driven Design

DDD goes back to the book of the same name by Eric Evans, published in 2003. The core of the DDD methodology is the collaboration between domain experts and stakeholders with a technical background, such as software developers or architects. Through this collaboration, a deeper understanding of the concepts and processes of a business domain can be gained to enable their realization in software. This is achieved by capturing business concepts and processes in domain models, which initially enable a discussion at domain level, but can also be refined into more technology-aligned artifacts using specialized DDD patterns. The implementation of sufficiently refined domain models then leads to software which naturally embodies business concepts and processes in technical artifacts like source code, and thus meets business requirements better than alternative approaches to software design.

Evans divides DDD into two design phases, Strategic Design and Tactical Design, which are presented in the following sections to the extent necessary for the further understanding of this article.

Strategic Design

Following the divide-and-conquer approach, DDD’s Strategic Design phase decomposes an existing or envisioned software system into smaller, more manageable units. To this end, DDD defines the Bounded Context pattern: a bounded context makes the limits of the validity and applicability of a domain model and its elements explicit. Bounded contexts can represent, e.g., organizational units in a company, other systems, or software modules. In the Retail domain, typical examples of bounded contexts in E-Commerce systems include “Invoicing”, “Shopping Cart”, and “Customer Management”.

The restriction of the validity and applicability of a domain model to a bounded context results in a bounded context to always model its own domain language, which DDD refers to as Ubiquitous Language. In addition to more targeted communication between domain experts and technical stakeholders, this also has the advantage that the meaning of eponymous domain concepts in different bounded contexts can be clearly defined by the Ubiquitous Languages of these contexts. For example, “Invoicing” understands the concept “Customer” as unchangeable information that includes the name and billing address of customers, whereas “Customer Management” focuses primarily on customer satisfaction rather than invoice dispatch.

In order for complex software systems to fulfill their functionality and remain maintainable, different system parts must work together. In the context of E-Commerce systems, for example, an invoice typically has to be generated from the shopping cart after a customer has completed a purchase transaction. This process would therefore involve the interaction between the bounded contexts “Invoicing” and “Shopping Cart”. DDD provides specific modeling means to capture such collaboration relationships that model the exchange of domain concepts between the collaborating bounded contexts. “Invoicing” and “Shopping Cart” would, for example, collaborate on information about shopping cart content that is exchanged with invoice issuers, or the assignment of anonymous shopping carts to customers, who were not logged in as they filled their shopping carts. DDD recognizes seven types of bounded context relationships as Interaction patterns. However, only the following three are relevant for the further understanding of this article:

  • Customer/Supplier: This Interaction pattern considers one of the collaborating bounded contexts as the customer and the other as the supplier. The customer is dependent on some of the domain concepts or functionalities of the supplier. The pattern thus makes the customer context’s requirements for the supplier context explicit. However, whether and when the supplier fulfills these requirements is not predetermined by the pattern and can depend on many factors in the overall system, existing team and communication structures, or the requirements and their prioritization.
  • Anticorruption Layer (ACL): Analogous to the Customer/Supplier pattern, the ACL pattern also addresses situations in which one bounded context is dependent on the concepts of another. By contrast to the Customer/Supplier pattern, an ACL however models an explicit translation layer for the dependent bounded context. This layer ensures the dependent bounded context integrates the concepts of the independent bounded context into its domain model. Compared to the Customer/Supplier pattern, the coupling of the bounded contexts in an ACL relationship is therefore smaller in the sense that the independent bounded context ignores the requirements of the dependent bounded context. The latter must serve them itself as part of its implementation of the modeled ACL.
  • Open Host Service (OHS): With an OHS, a bounded context specifies a communication protocol for the use of some of its domain concepts and functionalities. Furthermore, it makes this protocol publicly available so that it can be used by other dependent bounded contexts to interact with the independent bounded context. OHSs are often realized through module interfaces. The OHS pattern implies a coupling between two bounded contexts that goes beyond the extent of the coupling of a customer/supplier relationship: dependent bounded contexts use the interface provided by the independent bounded context in the form offered. Requirements of individual dependent bounded contexts are typically ignored.

The modeling of Interaction patterns between bounded contexts results in a coherent context map that provides information about conceptual and technical dependencies between different system modules. Since DDD always assigns responsibility for a bounded context to exactly one team, context maps also always show team relationships and communication structures.

Tactical Design

In its Tactical Design phase, DDD focuses on the design of the domain model of a bounded context, i.e., the modeling of the domain concepts and associated functionalities encapsulated by a specific context. Analogous to Strategic Design, DDD also specifies certain patterns for this design phase. These patterns help determine the semantics of model elements and, in some cases, already anticipate certain decisions for their technical implementation.

In the following, these Tactical Design patterns are relevant:

  • Entity: Entities are domain concepts whose instances must be clearly distinguishable from one another. In this sense, an entity defines a domain-related identity constraint. For example, the “Order” domain concept of an E-Commerce system distinguishes two orders on the basis of an “Order Number” property. Properties of domain concepts that distinguish two concept instances are usually immutable once initially assigned. All other domain concept properties determine the status of a concept instance and can also be adjusted up to a certain point after the instance has been created. For example, the order number of an order often remains the same, whereas information such as shipping status or delivery updates can be subject to several changes.
  • Aggregate: Aggregates are domain concepts that bundle other concepts and define validity criteria and a life cycle for these concepts. In this sense, an aggregate enables a set of domain concepts to be treated as a unit. In the context of an E-Commerce system, the “Customer” domain concept can be modeled as an aggregate that encapsulates orders and addresses. The aggregate could then specify that customers are only valid if they each have at least one delivery and one billing address.
  • Service: Services are used to model business logic that cannot be clearly assigned to an entity or aggregate. For example, the domain model of an E-Commerce system might specify a “Shipping Service” for calculating the shipping costs of an order, taking into account customer properties like distance to delivery location or participation in loyalty programs.
  • Repository: Repositories model interfaces for the data storage of entities or aggregates. These interfaces usually provide functions for persisting, reading, and deleting domain concept instances, and indicate that a database connection is required for the technical implementation of the respective domain model.

From Data Product Canvas to Domain Model

This section explains the step-by-step transformation of a DPC into a DDD-based domain model. The starting point is DDD’s Strategic Design, in which bounded contexts are identified and their relationships classified to create a context map. In the next step, the context map is refined in line with DDD’s Tactical Design, focusing on the bounded context of the data product under consideration. This step eventually results in a domain model for the data product and subsequently allows the derivation of a software design for the product’s implementation.

The Example Data Product “Assortment Benchmarking”

To illustrate the transformation of DPCs into DDD-based domain models, the article draws on an example from the Retail domain: The data product “Assortment Benchmarking” enables companies to determine their own market position in comparison to competitors based on the products offered, their categories, and other characteristics such as price or quality.

The following figure shows the DPC of the example data product.

Data Product Canvas for the example product Assortment Benchmarking

The example data product “Assortment Benchmarking” uses various Data Providers that (i) automatically extract data on competing retail products from competitors' online catalogs (web crawlers); or (ii) gather insights from on-site visits to competitors' stores by mystery shoppers (on-site data collection). The data available in this way is processed using specialized Data Processing steps, for example to resolve conflicts in the categorization of products or to assign assortments manually. The Frequency at which input data is updated varies between daily and weekly cycles, depending on the data provider. Both a dedicated data platform and an off-the-shelf Lakehouse are used for storage to persist input and generated data. The majority of application scenarios for the Target Users addressed are however implemented via the data platform. The DPC’s Distribution and Access field captures the information that Analysts can query competitor product portfolios in the form of standard reports via a data platform self-service. In addition, the data product offers an interface for assortment assignment by Category Managers, while Purchasers can access company-specific reports for strategic decisions.

Identification of Bounded Contexts

The transformation of the example DPC “Assortment Benchmarking” into a DDD-based domain model begins with the identification of bounded contexts from the DPC using Strategic Design. Bounded contexts that are not directly part of the data product but describe external systems are also explicitly taken into account here. The following figure shows the bounded contexts identified from the example DPC. Rounded rectangles with the stereotype <> represent bounded contexts and gray rectangles with dashed lines indicate the DPC field from which the bounded contexts shown in the rectangle originate.

Bounded contexts for assortment benchmarking, division into data providers, data product, storage and target users

Each data provider of the example DPC is mapped to its own bounded context, as each provider represents a system, e.g., a “Web Crawler”, or an organizational unit, e.g., a service company for “On-Site Data Collection”. In this sense, data providers are components external to the data product on which the data product however depends. This also includes the company's “Own Product Data Catalog”, as it is assumed for the example DPC that this catalog is developed, maintained, and provided by a dedicated team. Depending on the working environment it could albeit also be part of the bounded context of the data product if, for example, the maintenance of product data is also in the responsibility of the assortment benchmarking team. This circumstance demonstrates a strength of DDD-based domain models: they make knowledge about conceived or already existing organizational or technical contexts explicit and thus enable targeted discussions about the meaningfulness of these contexts and their boundaries to each other through a comparatively high level of abstraction.

The actual data product “Assortment Benchmarking” is represented by the bounded context of the same name. This one-to-one mapping of data products to bounded contexts initially makes sense for every DPC, as the recorded data product represents a new or existing system that interacts with other components. However, if this mapping appears to be debatable for a specific DPC, this could indicate that the data product is coupled too tightly with other components. Such a finding usually emerges from the modeling of existing system landscapes with DDD and again underlines the strength of DDD as an analysis tool. On the other hand, the compatibility of DPC and DDD is also evident here, as the DPC can both be used for the realization of new data products as well as for the collection and analysis of existing ones.

Similar to the data product, the “Own Data Platform with Self-Service” from the Storage field of the example DPC is also encapsulated in its own bounded context. That is because the platform functions as an independent system that grants the target users application-specific access to the data product. However, data products are often directly integrated into data platforms, as only the platform is understood as a product, not the data. Due to its inherent focus on data products, the DPC can reveal such strong couplings and, by transferring them to DDD-based domain models, also contribute to their resolution if a corresponding bounded context is resolved in such a way that platform and data concepts are modeled in new, specialized bounded contexts. The separation of these concepts is a prerequisite for understanding data as products, as data products are generally subject to different development approaches, life cycles, and quality requirements than software platforms.

Each type of target user in the example DPC is mapped to a bounded context, as the users of assortment benchmarking are assigned to different departments or teams within the company. The heterogeneity of their requirements therefore already becomes clear in the domain model, for example when the relationships between the bounded contexts of the target users and that of the data product are each classified differently.

Modeling and Classification of Context Relationships

This step models and classifies relationships between the bounded contexts of the example DPC. It leads to the following context map, in which gray rectangles with dashed lines indicate the semantics of two or more relationships between the same bounded contexts.

Relationships between Bounded Contexts, Data Sources are connected by Data Product via OHS and ACL, Target User via OHS

The data provider bounded contexts “Web Crawler per Competitor” and “Third Party Data Provider” are connected to the data product bounded context “Assortment Benchmarking” via an ACL. This comparatively weak relationship indicates that the data product itself is responsible for integrating the data supplied by the two external systems into its own domain model. In this sense, the two external systems do not take the requirements of the data product into consideration. However, this relationship means increased responsibility and additional effort for the data product team, as the ACL for the transformation of the external data must be designed, implemented, and adapted if the domain models and therefore the data structures of the external systems change. Thanks to the context map, this circumstance can be recognized at an early stage and planned for accordingly.

“Assortment Benchmarking” offers an OHS for the integration of the data provider bounded contexts “On-Site Data Collection” and “Advertising Brochures”. OHSs are often implemented as synchronous interfaces on an HTTP basis. An OHS relationship models a strong coupling of two bounded contexts, as on the one hand the bounded context offering the OHS (here “Assortment Benchmarking”) must implement and maintain the interface whereas bounded contexts using the OHS (here “On-Site data collection” and “Advertising Brochures”) must continuously consider the requirements of the interface in terms of syntax and semantics in order to ensure correct data exchange. The teams of the bounded contexts connected in this way must communicate actively in order to coordinate requirements and resulting implementation decisions concerning the OHS. In the context of the example DPC, this type of relationship is necessary because the mystery shoppers transmit their findings from the on-site data collection to the data product via a dedicated interface so that they are immediately available for analysis. In addition, the OHS for the bounded context “Advertising Brochures” enables employees of the data product’s company to upload advertising materials from competing companies for downstream analyses.

The data product also integrates with the company's own product data catalog, whereby, in contrast to the previous paragraph, the OHS is provided by the data provider bounded context “Own Product Data Catalog” and not the data product bounded context “Assortment Benchmarking”. The background to this modeling is the fact that the product data catalog is implemented by a software system that was purchased by the data product’s company. While the catalog is maintained by the company's own team, the development and operation of the underlying software system is the responsibility of an external service provider. This service provider offers an interface for querying cataloged product data so that the data product can process it automatically.

The data product uses the company's own data platform with self-service for storage. The context map depicts this dependency through the customer/supplier relationship (C/S relationship) between the contexts “Own Data Platform with Self-Service” and “Assortment Benchmarking”. This relationship indicates that the data product bounded context “Assortment Benchmarking” provides domain concepts and data that are required by the storage bounded context “Own Data Platform with Self-Service”, whereby the latter places certain requirements on the provision of the concepts and data. However, it does not yet make any statements about the concrete realization of the data exchange. For example, in the case of the example data product, continuous streaming of data would make sense in order to be able to carry out analyses based on the most up-to-date data possible. However, this implementation aspect must be discussed in more detail between the data product and platform team, as it places high technical demands on the scalability, processing speed, and fault tolerance of the data platform. These demands should be taken into account by the data product team as a supplier of the C/S relationship.

At this point, it becomes clear that DPCs and derived DDD-based domain models do not capture all the information required to implement a data product. Rather, both types of model enable quick documentation, including an overview of certain findings about the product and the relationships between its business significance and software design. Due to the high degree of abstraction, DPCs and domain models are also very well suited for discussions between different stakeholders w.r.t. product implementation. However, relevant additional information, e.g., on the organizational structure of companies, the existing system environment in which the data product must be embedded, or the exact meaning of domain concepts, should be recorded in separate artifacts such as organization charts, architecture documentation, or glossaries.

For the target user bounded contexts “Analytics”, “Category Management” and “Purchasing”, the data product bounded context “Assortment Benchmarking” provides two separate OHSs. The first supports the category management team in exporting metadata about the data provided by the data product. This metadata includes, for example, field names, types, and relationships, or the origin, timeliness, and query intervals of data. Furthermore, a second OHS provides the category management and purchasing teams with an interface for the manual assignment of company products to assortments, including the elimination of potential conflicts. All target user bounded contexts have a C/S relationship with the “Own Data Platform with Self-Service” context: Target Users have the option of querying data for their analyses via the platform. This includes, in particular, the interaction of employees with the data product itself.

Domain Modeling of the Data Product

As part of DDD’s Tactical Design phase, the domain model of the data product can now be created on the basis of the context map. The model’s concrete design results, among other things, from the relationships of the data product bounded context “Assortment Benchmarking”. Furthermore, it also incorporates the domain concepts already known at the time, for example for data storage or structuring. The semantics of these concepts can be specified with the help of special Tactical Design patterns, whereby certain decisions for the later transfer of the domain model into an executable implementation are already anticipated to a certain extent. Alongside the context map, the domain model is the central artifact in the application of DDD to DPCs. It represents a lightweight documentation and communication medium between the business and technical perspectives on a data product.

The following figure shows the domain model of the data product bounded context after tactical design has been implemented.

Data sources integrated per service, assortment data are aggregates of meta and position data, services for target users

The domain model integrates a service for each data provider bounded context connected via an ACL or OHS to the data product bounded context “Assortment Benchmarking”. The converter services “HTML Assortment Data Set (ADS) Converter” and “Third Party ADS Converter” transfer HTML pages from competitors and market data from specialized service providers into the data product’s domain model. The “Shelf Photo Analysis” and “PDF Analysis” services implement analogous functions with which data uploaded via the interfaces for on-site data collection and advertising brochures can be prepared for usage through the data product. The “SAP Connector” service queries and prepares the company's own product data catalog.

The data product’s domain model is defined by the modeled DDD aggregates and entities, whereby a distinction is made between data structures for the harmonization of incoming data (product-internal focus) and data structures for the use of the product by other components (product-external focus). The aggregate “Assortment Data Set” represents the central domain concept for the product-internal focus. It encapsulates data structures that can be used to map assortments. The “Metadata” entity stores information that describes an assortment data set, e.g., when it was created or last updated, or whether it describes the company's own assortment or that of a competitor. This domain concept therefore plays a key role in benchmarking the company's own assortment against competing ones. The “Item Data” entity records the actual data of assortment products.

All the services described above must transfer data to the “Assortment Data Set” aggregate. Like the entities’ fields, this information is not yet included in the data product’s domain model. However, the model can be supplemented accordingly with DDD, for example by adding method signatures to services, which can be used to identify which domain concepts they process, or by extending entities with attributes that model entity fields. Nonetheless, for domain models whose creation involves stakeholders with different levels of knowledge about DDD, it may initially make sense to record the structures of domain concepts in a separate glossary in natural language and only integrate them into the created models as confidence in the application of DDD increases.

The repository “Assortment Data” models the storage required for the aggregate “Assortment Data” and thus indicates that the data product persists instances of the aggregate over a longer period of time.

The “Product Categorization” entity and the “Categorized Products” repository are used to enrich the data product with product categories, on the basis of which assortment benchmarking is ultimately carried out. The “Category Matcher” service combines products into categories, e.g., clothing, household appliances, or sweets, whereby the domain model does not provide any information on how the matching is realized. The use of a specially trained machine learning model would be conceivable here. In addition, the service could give the category management and purchasing teams the option of manually adding or subsequently adjusting matchings, for example with the help of the “Conflict Resolver” service, which identifies matching conflicts and offers suggestions for resolving them.

Finally, the category management team can use the “Metadata Exporter” service to retrieve metadata about the data product.

Overview of the Transformation Steps from Data Product Canvases to Domain Models

Based on the above example, the following list provides a condensed overview of the steps involved in converting DPCs into DDD-based domain models. The general formulation of the steps enables application to DPCs other than “Assortment Benchmarking”.

  • General sequence of steps:
    • Strategic Design: Identification of bounded contexts
    • Strategic Design: Creation of context map through modeling and classification of context relationships
    • Tactical Design: Creation of the data product’s domain model
  • Identification of bounded contexts:
    • One bounded context for the data product with the possibility of later refinement according to iterative DDD
    • One bounded context for each external system
    • One bounded context for each data provider
    • One bounded context for each storage that should not or cannot be integrated into the data product like generic data platforms with self-service
    • One bounded context for each type of target user
  • Creation of context map:
    • Modeling of relationships based on the strength of the coupling between bounded contexts
    • ACL, if the data product has to transfer external data into its own structures
    • OHS, if persons or systems initiate the data exchange with the data product themselves
    • C/S, if the specific design of the data exchange is not yet clear but it is known whether the data product or the other system specifies the requirements for the exchange
  • Creation of data product’s domain model:
    • One service for each ACL or OHS of the data product
    • Differentiation between data structures for the harmonization of incoming data (product-internal focus) and data structures for the use of the product by other components (product-external focus)
    • Initial recording of additional information such as structures of domain concepts or their use by services in separate artifacts (e.g., glossaries) and subsequent integration into domain models if necessary

Conclusion and Outlook

This article presented a systematic approach for the transfer of data product canvases (DPCs) into domain models as an essential preliminary stage of technical data architectures. Using Domain-Driven Design (DDD), and its two foundational phases of Strategic Design and Tactical Design with their associated patterns, DPC elements can be mapped to DDD elements. This results in insights that are essential for data product implementation, such as (i) product decomposition into specialized modules according to the divide-and-conquer principle; (ii) the dependency structure of these modules; (iii) teams’ communication relationships and responsibilities; and (iv) a cost-effective blueprint for a data product’s software design, which can be iteratively adapted using established DDD techniques.

|

share post

//

More articles in this subject area

Discover exciting further topics and let the codecentric world inspire you.

//

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.