data management
satya - Wednesday, June 18, 2008 10:07:58 AM
what is MDM?
Master Data Management. A way to gather data in to a central data hub for a) single point of truth b) summarized data c) historical analysis
satya - Wednesday, June 18, 2008 10:09:18 AM
what is CDI?
Customer Data Integration. Includes a) uniquely identifying a customer b) arbitrary grouping of customers c) deciphering their relationships to other customers and groups
satya - Wednesday, June 18, 2008 10:15:49 AM
General issues with multiple operational data stores
No primary keys across
Differences in attribute definitions: definitions are not the same or different
Differences in parent child attributes: relationships are not the same or different
satya - Wednesday, June 18, 2008 10:18:28 AM
ETL tools: near real time transformation
Although they run in batch mode some can do near real time. Keep data hub in sync with operational stores
satya - Wednesday, June 18, 2008 10:19:32 AM
Is there a difference between a data hub and EDW?
Is there a difference between a data hub and EDW?
satya - Wednesday, June 18, 2008 10:19:45 AM
Is there a difference between a data hub and EDW?
Search for: Is there a difference between a data hub and EDW?
satya - Wednesday, June 18, 2008 10:21:56 AM
book: MDM and CDI for a Global Enterprise by Berson/Dubov
book: MDM and CDI for a Global Enterprise by Berson/Dubov
satya - Wednesday, June 18, 2008 10:22:51 AM
what is a good way to use ODS and EDW effectively?
what is a good way to use ODS and EDW effectively?
satya - Wednesday, June 18, 2008 10:24:51 AM
EII: Enterprise Information Integration
Provides a virtualized view of a customer without creating a persistent physical image of the aggregation, perhaps using SOA or ETL
satya - Wednesday, June 18, 2008 10:28:46 AM
Going from account managment to customer centric
A customer may operate outside of an account or accounts. This will force identifiers tied to a customer independent of unique account numbers. This needs to be thought of in customer interactions.
what strategies would you use to expose numbers to customers? will that be a customer number or account numbers?
satya - Wednesday, June 18, 2008 10:40:44 AM
Why is "Matching and LInking" so central to MDM or CDI?
Find a primary key based on partial or full attributes
Discover who else is related to a given customer similar to google pickinup relevent ads for a given email or content
Being able to generate a unique key based on attributes
satya - Wednesday, June 18, 2008 10:43:09 AM
Ultimately....
could be the full transactional data hub and a completely self contained master of the information it manages...
why not stick to one database then???
satya - Wednesday, June 18, 2008 10:48:53 AM
Why a key generation service?
Why a key generation service?
why not use database generated keys?
satya - Wednesday, June 18, 2008 10:59:51 AM
The nature of the data hub
it may have partial data for a client, which means the rest must come from somewhere else
it can be updated by clients and not just for reads. It may have to propagate that data to sources where they got originally loaded from. hub-to-source integration
Data may be updated in an ODS requiring a sync to the data hub.
satya - Wednesday, June 18, 2008 11:01:47 AM
what is subject area in the context of a metadata repository?
what is subject area in the context of a metadata repository?
satya - Wednesday, June 18, 2008 11:01:54 AM
what is subject area in the context of a metadata repository?
Search for: what is subject area in the context of a metadata repository?
satya - Wednesday, June 18, 2008 11:05:14 AM
what is a record locator service?
This is a meta data table where every record in the MDM is linked to their dependent ODS records through foreign keys. Transactional safety is important.
satya - Wednesday, June 18, 2008 1:27:56 PM
Registry Style Hub: No data ownership
hub is not the source or owner for any entities or attributes. It justh holds references to other ODSes
satya - Wednesday, June 18, 2008 1:28:40 PM
Reconciliation Style Hub: Partial Ownernship
Data hub owns part of the data and changes to that data should be synchronized.
satya - Wednesday, June 18, 2008 1:30:17 PM
The transaction hub: full ownership
owns all data attributes becoming the true master in that space and propagates data up and down.
satya - Wednesday, June 18, 2008 1:30:38 PM
initial loads and delta loads are common strategies
initial loads and delta loads are common strategies
satya - Wednesday, June 18, 2008 1:33:19 PM
unidirectional syncing may be preferable..
You may want to implement unidirectional synching as opposed to bidirectional.
satya - Wednesday, June 18, 2008 1:33:37 PM
Compensating transactions may be necessary
satya - Wednesday, June 18, 2008 1:37:49 PM
peer-to-peer data sharing...
Master/slave relationships may be better in defining ownership of attributes or entities. with out that bidirectional synching could get hairy.
Single ownership on a single data attribute is preferable.
satya - Wednesday, June 18, 2008 1:40:23 PM
How does transactional, summary, and historic elements work together in MDM?
How does transactional, summary, and historic elements work together in MDM?
satya - Wednesday, June 18, 2008 1:40:30 PM
How does transactional, summary, and historic elements work together in MDM?
Search for: How does transactional, summary, and historic elements work together in MDM?
satya - Wednesday, June 18, 2008 1:43:16 PM
Multiple owners...
In some extreme cases some data attributes have many masters. This may be queried from an attribute location service.
satya - Wednesday, June 18, 2008 1:47:19 PM
Metadata: recognize and address the challenge of semantic integration
Metadata: recognize and address the challenge of semantic integration
satya - Wednesday, June 18, 2008 2:48:32 PM
why CDI?
The required business process granularity to be defined should be at the level of detail that is sufficient to define the logical data model of the CDI solution.
satya - Wednesday, June 18, 2008 2:50:20 PM
CDI and 360 degrees: a common goal
quote
Deliver an authoritative system of record for customer data that includes a complete, 360-degree view of customer data including the totality of the relationships the customer has with the organization.
satya - Thursday, June 19, 2008 7:58:26 AM
Should DataHub and EDW the same effort?
Should DataHub and EDW the same effort?
The authors of the above book seem to think otherwise. They state
If the data warehouse is not available yet, we do not recommend mixing the MDM-CDI data hub project and a data warehousing effort, even though interdependencies between the two efforts should be well understood
They go on to say that CDI data hub will feed the EDW when one is in place.
satya - Thursday, June 19, 2008 2:20:28 PM
books: Look for common data models
The Data Model Resource Book
Vol 1 A library of Universal Data Models for All Enterprises
Vol 2 A library of data models for specific industries
by Len Silverston
satya - Thursday, June 19, 2008 2:20:42 PM
Len Silverston data models
satya - Thursday, June 19, 2008 2:22:19 PM
Evolving the CDI data model over multiple releases needs consideration
Evolving the CDI data model over multiple releases needs consideration
satya - Wednesday, July 02, 2008 9:15:10 AM
Industry data model standards
OASIS XCRL HL7
satya - Wednesday, July 02, 2008 9:15:37 AM
Contrasting MDM and Warehousing again
Contrasting MDM and Warehousing again
satya - Wednesday, July 02, 2008 9:16:58 AM
warehouse...
a) Data is cleansed and during extract, transform, and load b) sources data to build a well-defined data subject area or domain to serve a particular set of applications
satya - Wednesday, July 02, 2008 9:17:38 AM
examples...
a) sales data ware house b) financial data ware house c) product data ware house
satya - Wednesday, July 02, 2008 9:17:58 AM
What do you do with changing data then in a data ware house?
What do you do with changing data then in a data ware house?
satya - Wednesday, July 02, 2008 9:18:54 AM
CDI however...
solving data quality issues in an integrated fashion across the enterprise...
satya - Wednesday, July 02, 2008 9:19:04 AM
what on earth does that mean??
what on earth does that mean??
satya - Wednesday, July 02, 2008 9:20:04 AM
further...
in cdi data not just during the load but also in the process of matching and linking, identification and aggregation, and data synching and reconciliation
satya - Wednesday, July 02, 2008 9:21:45 AM
for instance when
Data hub is "referential" the data quality is focused on maintaining references as opposed to the actual content.
satya - Wednesday, July 02, 2008 9:22:20 AM
Data warehouse can not be referential like a data hub could be. is that a true statement?
Data warehouse can not be referential like a data hub could be. is that a true statement?
satya - Wednesday, July 02, 2008 9:35:58 AM
dataware house in unidirectional where as CDI bidirectional. what do they mean by that?
dataware house in unidirectional where as CDI bidirectional. what do they mean by that?
satya - Wednesday, July 02, 2008 9:39:27 AM
Who is Larry English and what are information quality principles
Search for: Who is Larry English and what are information quality principles
satya - Wednesday, July 02, 2008 10:04:14 AM
How often a datawarehouse gets updated?
Once a day is a norm
satya - Wednesday, July 02, 2008 10:05:01 AM
Instead they want customer information available in real time..
Instead they want customer information available in real time..
satya - Wednesday, July 02, 2008 10:07:37 AM
is that the only difference?
is that the only difference? what if I make the dataware house real time? can I? why not?
satya - Wednesday, July 02, 2008 10:54:51 AM
They seem to suggest a referential hub for legacy corporations
They seem to suggest a referential hub for legacy corporations
satya - Wednesday, July 02, 2008 10:55:23 AM
however for new startups ....
They are not that forthcoming outrightly to suggest a "transactional hub". Not sure why.
satya - Wednesday, July 02, 2008 10:59:12 AM
Roles of a transaction manager
a) record b) exception processing c) compensating transactions d) composite and complex transactions
satya - Wednesday, July 02, 2008 11:08:21 AM
There is a thought that Data Hub vendors are still maturing...
and that it is still a young technology
satya - Wednesday, July 02, 2008 11:13:59 AM
Customer recognition and identification capabilities...
have to be adaptable to the business model, rules, and semantics of a given industry.
satya - Wednesday, July 02, 2008 11:15:26 AM
Another worthy note...
MDM solutions are optimized for real-time operations and not for batch reporting. An important distinction.
satya - Wednesday, July 02, 2008 11:17:57 AM
A datamart is being recommended for reporting needs
A datamart is being recommended for reporting needs
satya - Wednesday, July 02, 2008 2:31:32 PM
CDI Data model
A business domain specific, proven data model, should be at the top of the CDI data hub criteria.
There are some solutions however which help you to realize any data model using meta models and then allow you to generate some base level services.
satya - Wednesday, July 02, 2008 2:33:43 PM
Parts of a synchronization machine
Enterprise Message bus - canonical message transport Transaction Manager - record, exception management Identity resolver Record locator attribute locator Distributed Query Constructor
satya - Wednesday, July 02, 2008 2:35:38 PM
A query approach from a CDI hub...
1. Have transaction manager receive and persist
2. Identify the keys
3. Do a distributed query after locating systems and their access paths
satya - Wednesday, July 02, 2008 2:36:50 PM
The suggested approach is highly message oriented ....
SOA doesn't show up much
satya - Wednesday, July 02, 2008 2:46:43 PM
Websphere Customer Center (IBM WCC)
Emerging leader in the CDI Data hub space. b) customer specific customizable data structures c) a few hundred basic web services d) real time services to change address, roles, relationships, grouping, alerts, matching, duplicate suspect processing. e) composite transactions using web services f) pub-sub g) business object model h) interfaces i) Batch framework j) originally developed for insurance industry k) currently used in financial and others
satya - Wednesday, July 02, 2008 2:50:09 PM
Siperian
a) meta-data driven b) works with any data model c) bath and real time synching with metadata d) Hierarchy manager for relationships e) cleanse and match f) can use external such as trilium g) MetaMatrix data services registry style h) originally for pharmaceutical industry
satya - Wednesday, July 02, 2008 2:53:48 PM
Initiate
a) reference-hub b) meta-data driven federation c) real time d) organizational hierarchies e) web services f) flow-invocation g) federated data retrieval h) view records across lines of business i) auditor rolesre j) reporting database k) originally from medical records
satya - Wednesday, July 02, 2008 2:55:01 PM
Siebel Universal Application Network
Siebel Universal Application Network
satya - Wednesday, July 02, 2008 2:55:59 PM
Siebel Universal Customer Master
Siebel Universal Customer Master
satya - Wednesday, July 02, 2008 2:56:45 PM
From Oracle
a) Customer data hub b) financial consolidation hub c) Product hub
satya - Wednesday, July 02, 2008 2:58:57 PM
Purisma
a) analyzing customers for relationships b) Correlation Engine c) exception management d) web services
How does purisma support bi directional synching?
satya - Wednesday, July 02, 2008 3:36:15 PM
what is netweaver SAP?
satya - Wednesday, July 02, 2008 3:39:47 PM
SAS DataFlux
a) master customer reference database b) data quality ui and exception ui c) batch and real time sync d) rules in metadata
satya - Wednesday, July 02, 2008 3:43:42 PM
Object River
a) model driven b) enables any data model based on meta model definition c) generates all CRUD stuff d) soa e) generates basic web portal f) pub/sub on data changes
satya - Wednesday, July 02, 2008 3:47:25 PM
what is data profiling informatica
satya - Wednesday, July 02, 2008 3:50:41 PM
Acxiom an address database
satya - Wednesday, July 02, 2008 3:52:27 PM
Experian people database
satya - Wednesday, July 02, 2008 4:01:04 PM
CDI may continue to be vertical
CDI may continue to be vertical