data management

satya - Wednesday, June 18, 2008 10:07:58 AM

what is MDM?

Master Data Management. A way to gather data in to a central data hub for a) single point of truth b) summarized data c) historical analysis

satya - Wednesday, June 18, 2008 10:09:18 AM

what is CDI?

Customer Data Integration. Includes a) uniquely identifying a customer b) arbitrary grouping of customers c) deciphering their relationships to other customers and groups

satya - Wednesday, June 18, 2008 10:15:49 AM

General issues with multiple operational data stores

No primary keys across

Differences in attribute definitions: definitions are not the same or different

Differences in parent child attributes: relationships are not the same or different

satya - Wednesday, June 18, 2008 10:18:28 AM

ETL tools: near real time transformation

Although they run in batch mode some can do near real time. Keep data hub in sync with operational stores

satya - Wednesday, June 18, 2008 10:19:32 AM

Is there a difference between a data hub and EDW?

Is there a difference between a data hub and EDW?

satya - Wednesday, June 18, 2008 10:19:45 AM

Is there a difference between a data hub and EDW?

Search for: Is there a difference between a data hub and EDW?

satya - Wednesday, June 18, 2008 10:21:56 AM

book: MDM and CDI for a Global Enterprise by Berson/Dubov

book: MDM and CDI for a Global Enterprise by Berson/Dubov

satya - Wednesday, June 18, 2008 10:22:51 AM

what is a good way to use ODS and EDW effectively?

what is a good way to use ODS and EDW effectively?

satya - Wednesday, June 18, 2008 10:24:51 AM

EII: Enterprise Information Integration

Provides a virtualized view of a customer without creating a persistent physical image of the aggregation, perhaps using SOA or ETL

satya - Wednesday, June 18, 2008 10:28:46 AM

Going from account managment to customer centric

A customer may operate outside of an account or accounts. This will force identifiers tied to a customer independent of unique account numbers. This needs to be thought of in customer interactions.

what strategies would you use to expose numbers to customers? will that be a customer number or account numbers?

satya - Wednesday, June 18, 2008 10:40:44 AM

Why is "Matching and LInking" so central to MDM or CDI?

Find a primary key based on partial or full attributes

Discover who else is related to a given customer similar to google pickinup relevent ads for a given email or content

Being able to generate a unique key based on attributes

satya - Wednesday, June 18, 2008 10:43:09 AM


could be the full transactional data hub and a completely self contained master of the information it manages...

why not stick to one database then???

satya - Wednesday, June 18, 2008 10:48:53 AM

Why a key generation service?

Why a key generation service?

why not use database generated keys?

satya - Wednesday, June 18, 2008 10:59:51 AM

The nature of the data hub

it may have partial data for a client, which means the rest must come from somewhere else

it can be updated by clients and not just for reads. It may have to propagate that data to sources where they got originally loaded from. hub-to-source integration

Data may be updated in an ODS requiring a sync to the data hub.

satya - Wednesday, June 18, 2008 11:01:47 AM

what is subject area in the context of a metadata repository?

what is subject area in the context of a metadata repository?

satya - Wednesday, June 18, 2008 11:01:54 AM

what is subject area in the context of a metadata repository?

Search for: what is subject area in the context of a metadata repository?

satya - Wednesday, June 18, 2008 11:05:14 AM

what is a record locator service?

This is a meta data table where every record in the MDM is linked to their dependent ODS records through foreign keys. Transactional safety is important.

satya - Wednesday, June 18, 2008 1:27:56 PM

Registry Style Hub: No data ownership

hub is not the source or owner for any entities or attributes. It justh holds references to other ODSes

satya - Wednesday, June 18, 2008 1:28:40 PM

Reconciliation Style Hub: Partial Ownernship

Data hub owns part of the data and changes to that data should be synchronized.

satya - Wednesday, June 18, 2008 1:30:17 PM

The transaction hub: full ownership

owns all data attributes becoming the true master in that space and propagates data up and down.

satya - Wednesday, June 18, 2008 1:30:38 PM

initial loads and delta loads are common strategies

initial loads and delta loads are common strategies

satya - Wednesday, June 18, 2008 1:33:19 PM

unidirectional syncing may be preferable..

You may want to implement unidirectional synching as opposed to bidirectional.

satya - Wednesday, June 18, 2008 1:33:37 PM

Compensating transactions may be necessary

Search for: Compensating transactions may be necessary

satya - Wednesday, June 18, 2008 1:37:49 PM

peer-to-peer data sharing...

Master/slave relationships may be better in defining ownership of attributes or entities. with out that bidirectional synching could get hairy.

Single ownership on a single data attribute is preferable.

satya - Wednesday, June 18, 2008 1:40:23 PM

How does transactional, summary, and historic elements work together in MDM?

How does transactional, summary, and historic elements work together in MDM?

satya - Wednesday, June 18, 2008 1:40:30 PM

How does transactional, summary, and historic elements work together in MDM?

Search for: How does transactional, summary, and historic elements work together in MDM?

satya - Wednesday, June 18, 2008 1:43:16 PM

Multiple owners...

In some extreme cases some data attributes have many masters. This may be queried from an attribute location service.

satya - Wednesday, June 18, 2008 1:47:19 PM

Metadata: recognize and address the challenge of semantic integration

Metadata: recognize and address the challenge of semantic integration

satya - Wednesday, June 18, 2008 2:48:32 PM

why CDI?

The required business process granularity to be defined should be at the level of detail that is sufficient to define the logical data model of the CDI solution.

satya - Wednesday, June 18, 2008 2:50:20 PM

CDI and 360 degrees: a common goal


Deliver an authoritative system of record for customer data that includes a complete, 360-degree view of customer data including the totality of the relationships the customer has with the organization.

satya - Thursday, June 19, 2008 7:58:26 AM

Should DataHub and EDW the same effort?

Should DataHub and EDW the same effort?

The authors of the above book seem to think otherwise. They state

If the data warehouse is not available yet, we do not recommend mixing the MDM-CDI data hub project and a data warehousing effort, even though interdependencies between the two efforts should be well understood

They go on to say that CDI data hub will feed the EDW when one is in place.

satya - Thursday, June 19, 2008 2:20:28 PM

books: Look for common data models

The Data Model Resource Book

Vol 1 A library of Universal Data Models for All Enterprises

Vol 2 A library of data models for specific industries

by Len Silverston

satya - Thursday, June 19, 2008 2:20:42 PM

Len Silverston data models

Search for: Len Silverston data models

satya - Thursday, June 19, 2008 2:22:19 PM

Evolving the CDI data model over multiple releases needs consideration

Evolving the CDI data model over multiple releases needs consideration

satya - Wednesday, July 02, 2008 9:15:10 AM

Industry data model standards


satya - Wednesday, July 02, 2008 9:15:37 AM

Contrasting MDM and Warehousing again

Contrasting MDM and Warehousing again

satya - Wednesday, July 02, 2008 9:16:58 AM


a) Data is cleansed and during extract, transform, and load b) sources data to build a well-defined data subject area or domain to serve a particular set of applications

satya - Wednesday, July 02, 2008 9:17:38 AM


a) sales data ware house b) financial data ware house c) product data ware house

satya - Wednesday, July 02, 2008 9:17:58 AM

What do you do with changing data then in a data ware house?

What do you do with changing data then in a data ware house?

satya - Wednesday, July 02, 2008 9:18:54 AM

CDI however...

solving data quality issues in an integrated fashion across the enterprise...

satya - Wednesday, July 02, 2008 9:19:04 AM

what on earth does that mean??

what on earth does that mean??

satya - Wednesday, July 02, 2008 9:20:04 AM


in cdi data not just during the load but also in the process of matching and linking, identification and aggregation, and data synching and reconciliation

satya - Wednesday, July 02, 2008 9:21:45 AM

for instance when

Data hub is "referential" the data quality is focused on maintaining references as opposed to the actual content.

satya - Wednesday, July 02, 2008 9:22:20 AM

Data warehouse can not be referential like a data hub could be. is that a true statement?

Data warehouse can not be referential like a data hub could be. is that a true statement?

satya - Wednesday, July 02, 2008 9:35:58 AM

dataware house in unidirectional where as CDI bidirectional. what do they mean by that?

dataware house in unidirectional where as CDI bidirectional. what do they mean by that?

satya - Wednesday, July 02, 2008 9:39:27 AM

Who is Larry English and what are information quality principles

Search for: Who is Larry English and what are information quality principles

satya - Wednesday, July 02, 2008 10:04:14 AM

How often a datawarehouse gets updated?

Once a day is a norm

satya - Wednesday, July 02, 2008 10:05:01 AM

Instead they want customer information available in real time..

Instead they want customer information available in real time..

satya - Wednesday, July 02, 2008 10:07:37 AM

is that the only difference?

is that the only difference? what if I make the dataware house real time? can I? why not?

satya - Wednesday, July 02, 2008 10:53:27 AM

DataExtend another product

DataExtend another product

satya - Wednesday, July 02, 2008 10:54:51 AM

They seem to suggest a referential hub for legacy corporations

They seem to suggest a referential hub for legacy corporations

satya - Wednesday, July 02, 2008 10:55:23 AM

however for new startups ....

They are not that forthcoming outrightly to suggest a "transactional hub". Not sure why.

satya - Wednesday, July 02, 2008 10:59:12 AM

Roles of a transaction manager

a) record b) exception processing c) compensating transactions d) composite and complex transactions

satya - Wednesday, July 02, 2008 11:08:21 AM

There is a thought that Data Hub vendors are still maturing...

and that it is still a young technology

satya - Wednesday, July 02, 2008 11:13:59 AM

Customer recognition and identification capabilities...

have to be adaptable to the business model, rules, and semantics of a given industry.

satya - Wednesday, July 02, 2008 11:15:26 AM

Another worthy note...

MDM solutions are optimized for real-time operations and not for batch reporting. An important distinction.

satya - Wednesday, July 02, 2008 11:17:57 AM

A datamart is being recommended for reporting needs

A datamart is being recommended for reporting needs

satya - Wednesday, July 02, 2008 2:31:32 PM

CDI Data model

A business domain specific, proven data model, should be at the top of the CDI data hub criteria.

There are some solutions however which help you to realize any data model using meta models and then allow you to generate some base level services.

satya - Wednesday, July 02, 2008 2:33:43 PM

Parts of a synchronization machine

Enterprise Message bus - canonical message transport
Transaction Manager - record, exception management
Identity resolver
Record locator
attribute locator
Distributed Query Constructor

satya - Wednesday, July 02, 2008 2:35:38 PM

A query approach from a CDI hub...

1. Have transaction manager receive and persist

2. Identify the keys

3. Do a distributed query after locating systems and their access paths

satya - Wednesday, July 02, 2008 2:36:50 PM

The suggested approach is highly message oriented ....

SOA doesn't show up much

satya - Wednesday, July 02, 2008 2:46:43 PM

Websphere Customer Center (IBM WCC)

Emerging leader in the CDI Data hub space. b) customer specific customizable data structures c) a few hundred basic web services d) real time services to change address, roles, relationships, grouping, alerts, matching, duplicate suspect processing. e) composite transactions using web services f) pub-sub g) business object model h) interfaces i) Batch framework j) originally developed for insurance industry k) currently used in financial and others

satya - Wednesday, July 02, 2008 2:50:09 PM


a) meta-data driven b) works with any data model c) bath and real time synching with metadata d) Hierarchy manager for relationships e) cleanse and match f) can use external such as trilium g) MetaMatrix data services registry style h) originally for pharmaceutical industry

satya - Wednesday, July 02, 2008 2:53:48 PM


a) reference-hub b) meta-data driven federation c) real time d) organizational hierarchies e) web services f) flow-invocation g) federated data retrieval h) view records across lines of business i) auditor rolesre j) reporting database k) originally from medical records

satya - Wednesday, July 02, 2008 2:55:01 PM

Siebel Universal Application Network

Siebel Universal Application Network

satya - Wednesday, July 02, 2008 2:55:59 PM

Siebel Universal Customer Master

Siebel Universal Customer Master

satya - Wednesday, July 02, 2008 2:56:45 PM

From Oracle

a) Customer data hub b) financial consolidation hub c) Product hub

satya - Wednesday, July 02, 2008 2:58:57 PM


a) analyzing customers for relationships b) Correlation Engine c) exception management d) web services

How does purisma support bi directional synching?

satya - Wednesday, July 02, 2008 3:36:15 PM

what is netweaver SAP?

Search for: what is netweaver SAP?

satya - Wednesday, July 02, 2008 3:39:47 PM

SAS DataFlux

Search for: SAS DataFlux

a) master customer reference database b) data quality ui and exception ui c) batch and real time sync d) rules in metadata

satya - Wednesday, July 02, 2008 3:40:38 PM


Search for: MultiVue

satya - Wednesday, July 02, 2008 3:43:42 PM

Object River

Search for: Object River

a) model driven b) enables any data model based on meta model definition c) generates all CRUD stuff d) soa e) generates basic web portal f) pub/sub on data changes

satya - Wednesday, July 02, 2008 3:47:25 PM

what is data profiling informatica

Search for: what is data profiling informatica

satya - Wednesday, July 02, 2008 3:47:37 PM

Similarity Systems

Search for: Similarity Systems

satya - Wednesday, July 02, 2008 3:50:41 PM

Acxiom an address database

Search for: Acxiom an address database

satya - Wednesday, July 02, 2008 3:52:27 PM

Experian people database

Search for: Experian people database

satya - Wednesday, July 02, 2008 3:55:32 PM

Data Delta

Search for: Data Delta

satya - Wednesday, July 02, 2008 3:55:38 PM


Search for: Netrics

satya - Wednesday, July 02, 2008 3:55:47 PM


Search for: Exeros

satya - Wednesday, July 02, 2008 4:01:04 PM

CDI may continue to be vertical

CDI may continue to be vertical