Skip to content
All posts

Practical Data: Taxonomy of Data (Some Definitions)

Continuing in the series of Practical Data Management, when starting an engagement with a new client, I often find the first priority is to listen to the terminology and language they use to describe their data content, data challenges, current ecosystem, and opportunities.

As with many things in life, people often use different terms to describe similar concepts. While I mostly find myself adapting to their language, I've listed below some terms and definitions that I tend to carry in my back pocket and refer to when discussing topics, in an attempt to maintain consistency. It's not that I think these definitions are "right" either - however, what I do believe is critical is that you write them down (and keep updating them) . The below is taken from our Data Policy and Definition work at M&A Operating System and is often included in the Appendix of our template Data Governance Policy for easy reference.

As always, I would love to hear feedback on terms you think I have missed or definitions you think I have got "wrong".

Taxonomy of Data.

 

Data Modelling Definitions

Data Domain - A data domain describes different data sets based on the subject of the data elements that each record represents. A data domain describes the data records themselves, as well as the data concepts and attributes that comprise the data model for that domain. Examples of Data domains include Legal Entities, Securities, Transactions, People, Documents, etc.

Data Model – A data model provides comprehensive details of the data content that comprises a data domain, including the types of records, data concepts, data attributes, relationships, and other structural information. Data models can take various forms, depending on the purpose and characteristics of the data model, as well as the data storage systems in use.

Data Concept – A data concept describes a group of attributes that work together to describe a specific aspect of a data record. Examples of data concepts include Ratings, Identifiers, Exchange Listings for Securities, Alternate Names, Operating Locations, and Ratings for Legal Entities, among others.

Data Attribute – A data attribute is an individual data element or field that represents the informational value stored about a data record.

Data Processes Definitions

Data Producer / Source – A Data Producer or Data Source is any system, process, group, or individual that generates data content of value in the form of often creating or maintaining records. Data Producers may be internal or external. Data producers may consume data from one domain to aid the creation of data in another.

Data Consumer/User/Processor – A Data Consumer or Data User is any system, process, group, or individual that consumes data from one or more data domains to create value for the organization.

Data Consolidator – A Data Consolidator is any system, process, group, or individual that consumes data from multiple Data Sources to create a single, consolidated dataset, often within the same domain. Consolidation can happen vertically (i.e., consolidating different records from different sources) or horizontally (i.e., consolidating different attributes about the duplicate records from other sources)

Data Enricher– A Data Enricher is any system, process, group, or individual that enhances the data it consumes as part of a data pipeline, often adding or updating additional attributes on the same record, usually in the same domain. It again makes that data available for reuse by further consumers. Data Custodian processes are examples of data enrichment functions.

Master Data Management (MDM) – Master Data Management is a collection of processes and activities that result in well-crafted, valuable master records of data that can be considered as a “trusted version of truth” for the firm and leveraged across multiple business processes to ensure quality, accuracy and consistency. Master Data Management often includes elements of data model design, data source selection, data transformation, data matching/concordance, data quality and data cleansing. Master Data Management usually also has a series of governance and oversight functions associated with it.

Data Functional System Definitions

Operational Data Stores (ODS), Product Processors (PP), and Business Applications – An Operational Data Store provides the core business value-added functionality required to perform and/or manage the firm’s business activities. At the same time, these platforms often consume data to support their functions and generate data related to primary business transactions as Data producers.

Master Data System(MDS) - A Data Master system provides a comprehensive set of data management capabilities to consolidate data from multiple sources into a single, consistent, and clean dataset, ready for use by numerous business processes that each require consistent, cleansed, and validated data. These solutions often follow a consistent data pipeline approach, sourcing, normalizing, consolidating, and cleansing data with embedded rules engines, workflows, and data validation capabilities built in.

Data Management Systems are often used to perform both horizontal (attributes) and vertical (Sources/records) data consolidation and data enrichment within a single platform.

Data Mastering solutions often include (manual) data operations capabilities to allow for custodian functions.

Data Warehouse – A Data Warehouse system provides a way to centrally store large quantities of data from multiple providers, aiming to bring consistency to data content and structure, thereby improving data consumption. Data Warehouses are often optimized for fast and efficient consumption of data.

Data Warehouse solutions generally do not include (manual) data operations capabilities to allow for data custodian functions.

Data Lake – A Data Lake system enables the centralization of large quantities of data from multiple data providers in their native, original form. Data Lakes are frequently optimized for fast storage and efficient data storage.

Data Lake solutions generally do not include manual data operations capabilities to support data custodian functions.

Data Record Definitions

Source Record – We use the term' Source Record' to refer to an individual record of data obtained from an authoritative source of information for that record set.

Golden Master Records – We use the term Golden Master Record to describe an individual record of data that is recognized as the single version of truth for information used across a wide variety of business processes and use cases. A golden master record is often established by combining data from one or more source records into a well-defined data model for a specific domain and validated through a well-defined set of data quality rules.

Golden Copy Records – We use the term Golden Copy Record to describe a validated and identical copy of a golden master record in part or whole that has been copied into an alternative data store for convenience.

Data Content Styles Definitions

Master Data – We use the term Master Data to define critical data that is used centralized and managed within a core organization, often used by multiple business functions and use cases where data accuracy and consistency are vital.

Reference Data – We use the term "Reference Data" to describe a subset of master data that is often used for classification and identification purposes, where consistency is essential. Reference data can include business records (ie, tradable securities) and definition records (i.e., country codes)

Market Data – We use the term "Market Data" to describe externally available industry data on a specific topic, often sourced from Market Data vendors such as Bloomberg and FactSet.   Market Data vendors can provide reference data, master data, pricing data, and other relevant information.

Transactional/Business Data – We use the term "Transactional data" to represent point-in-time business events or transactions that occur throughout the natural business lifecycle.