If we generalize this design challenge, it
is really asking if it makes sense to use a data element for something
not originally intended in the business or on the logical data model. So
we need to first decide if these "fake" classifications are really
classifications and if they are maybe we should broaden our logical
definition (and therefore business view) to accommodate. If we
decide that these are not valid logical classifications is there a
chance that they are valid physical classifications? We know the
physical data model can at times be substantially different from the
logical to accommodate the reality of databases, reporting tools,
existing applications, etc. This can include using techniques such as
denormalization, partitioning, and indexes. Maybe there are good
functionality or performance reasons why "fake" classifications should
be stored with the rest of the classifications. If there is not a good
logical or physical reason, I think it is safe to stay that this is a
short-sighted move to save $$$ and time now (which, as we know does
happen every once in a while, :O)). In this situation, I would tactfully
present my thoughts to the project team and listen to their thoughts as
well.
Before we read our responses, I know
you're wondering which piece of paper Sadie picked up for this
challenge...drum roll please....Sadie picked up a piece of paper with
David Hay's name on it. Congratulations Dave and your prize is on the
way. Enough from me, let’s hear what our group has to say. I’ve
presented the responses in two categories, Yes and No, and within each
category the responses are listed alphabetically by author. Hope you
enjoy reading these responses as much as I did!
Yes, I would add these "fake" classifications
Pat Arellano
With the ERP tool, there are usually base tables and extension tables
attached to them to handle either one-to-one or one-to-many
relationships. One could probably use an existing column in a
one-to-many extension (1:n) table to track where the candy/asset is
being sold. This should not cause management heartburn b/c it would not
require a schema change within the tool.
Dave Hay
In a conceptual data model absolutely not.
A data model is supposed to represent the true nature of all
relationships, and the fact that a product may be sold in a region is a
fact that deserves proper representation.
Indeed, in your example, each of the kinds of classification may
call for a separate entity.
Having said that, a particular physical database installation may only
be concerned with products and their "predicates", so as a practical
matter, having a simple CLASSIFICATION and what I call CATEGORY and
CATEGORY TYPE tables may be a very good solution.
Of course, once requirements change, the designer may be compelled to
bring one or more CLASSIFICATIONS and CATEGORIES (of CATEGORY TYPE
("region", for example) into separate tables.
Is
that wishy-washy enough for you?
Pieter de Bruyn
Basically it is not a "fake classification".
If the Organisation has it's data correctly classified, the region
should be defined in a classification called Location.
If the region has not been defined or added to the Location, first
add the region and then use the relationship between Product and
Location (the table indicating the specific location for a product, that
includes the physical address) to indicate in which regions the product
can be sold. This should not
have any effect on the true classifications or the business, because it
is already defined and part of the organisations solution.
Stacey Haurin
I would prefer to keep the classification structure pure to its original
intent. Considering it is a
purchased application, one should confer with the vendor to understand
the impact to the application and associated functions, such as
reporting.
Also, it would become very messy, if not impossible, to track
attributes and other relationships associated with the "classifications
in disguise".
However, I answer yes to this question because I am a realist and
understand that in a crunch, things aren't always implemented ideally.
I would review all these limitations with the team members and if
it is absolutely needed by the business, then I would add the fake
classification. I would also
discuss short term and long term implementation of this request and
convince the decision makers to address the fake classification in a
future release.
Lavon Missell
The ingredient list of a candy ultimately affects the salability of a
product particularly as it fits the tastes preferences of a particular
geography or cultural group. Thus, I would say that the information
should be captured or tracked. For example, Hispanic and Spanish
speaking people have some preference for cinnamon flavoring in
chocolate.
The problem is that the ingredient in this example cinnamon is a
small constituent but important. Another consideration is that as people
who favor this type of chocolate move about the world, the movement may
manifest itself as geography.
I would add the row and have it act as a pointer to other model
structures that contained the detail on these multi-faceted
classifications.
Martin Richley
These classifications are how the business see their information needs,
they must be modelled at the information level.
It is up to the system and database designers to decide how they
can present the classifications to the business user in the most
efficient manner. Modelling
of classifications using tradition data modelling methods is fraught
with danger. However using
the Corporate Business Modelling Language (CBML), classifications can be
handled rigourously and precisely, attributing information to the
correct entity classes, keeping the subject entity-classes pure, while
allowing them to be classified as the business requires them.
So the true answer to your problem is of course Yes and No, since
all business classifications need to be captured, but not as part of the
data structure. If you
require more information on CBML, then please initially see the web site
cbml.info, which is still under development or contact me direct.
Graeme Simsion
Given the forced choice of "yes" or "no" I could play it safe and say
"no" for the moment - but I've said "yes" with qualifications, because I
have more arguments on that side...
The only immediate justification I can offer for standing in the
way of this (presumably relevant) "someone who came up to me" is the
statement that the classification has "more to do with the relationship
with a geographical region".
If the database already contains information about geographical
regions, then I don't want to duplicate it.
I'm
less swayed by the "you believe" and "true" vs "definitely not"
arguments...
Let's look at the problem a bit more carefully.
It seems to me that the ERP package supports a "many-to-many"
relationship between "classifications" of a particular kind and other
entities e.g. in this example, a product could be sold in many regions.
I base this on the statement that by using classifications, we avoid
adding a column to the product table - which would be the usual way of
supporting a many-to-one relationship with Region.
Of course you can enforce a many-to-one relationship using
constraints on the intersection table, but it's not the most obvious way
of doing it...
On this basis, we could use the "classification" facility to handle not
only single valued attributes, but multi-valued attributes...
And at this point, I'm starting to say "if that's the way the tool does
it physically, get with the program!"... subject of course to not making
a mess by duplicating stuff like Regional information.
Now there may be other good arguments against using the
"classification" facility, but they're not apparent in the description
of the problem. What is
apparent is an argument based on what is "true" vs "fake" rather than
what is practical and workable.
I don't like these arguments: classification and entity definition
is about usefulness, not truth.
Prima facie, "easy" and "cheap" is good!
IF, as a data modeler, you have an uneasy feeling about the
elegance of the solution, you'll need to translate that into practical
impacts on quality or cost: does it make the database less flexible,
harder to understand, harder to program against, more difficult to
enhance, less able to enforce business constraints...?
Does it introduce redundancy with associated update costs? You may
need to dig deep to answer these questions: and if the tool is
well-designed and the "classification" technique solidly embedded, you
may find that your fears are unwarranted.
More
broadly, the issue of modeling and implementing classifications is one
which is typically handled inconsistently and clumsily - even by
modelers not constrained by ERP packages.
Graham Witt and I have a good look at it and make some recommendations
(essentially model consistently, implement according to facilities
available) in our new Edition of Data Modeling Essentials (left the
commercial for last...)
No, I would not add these "fake" classifications
Charles S. Anderson
I would not add the REGION as a classification. My argument would
include the definition of a classification "A way of grouping similar
items together based on the physical attributes of the product."
followed with the comment that location where a product sold is not a
physical attribute of a product. I would also advise that the REGION or
concept of a location can have many other values and relationships such
as warehouse location, manufacture location. So by opening this short
cut you leave yourself vulnerable to many other "exceptions". I would
recommend that the data model and ERP application be modified to include
a LOCATION entity which would be much more beneficial by its flexibility
and use in reporting sales information.
In order to facilitate my explanation, I would develop a data model
diagram and use it to describe the need for separate tables. The picture
would make it clearer for someone to understand that adding the new
table provides the metadata that makes the model more understandable.
Reuse of columns for multiple purposes leads to convoluted and difficult
to understand database designs. One of the main complaints of business
users is that the database is difficult to understand for users who are
developing their own reports. I would also use this model with sample
data to illustrate the potential for data quality issues from combining
multiple non-related values in a column.
Frank Beal
First, the "real" classifications would be corrupted with non related
data and would likely add confusion to those who use the real
classifications. Reporting is complicated as now, these non related
"classifications" must be excluded from the real ones. Terminology is
now corrupted and incorrect values are listed with correct ones. The
initial meaning of the field will most likely be lost over time.
Ray Doggendorf
A region is a totally different domain with it's own set of valid
values.
To make it a classification (and populate it with the other
classification existing values) would be like adding apples and oranges
into the same basket.
Whenever I wanted fruit I could simply pull out something and I would
get an apple or an orange. I wouldn't care as long as I got a piece of
fruit.
But if I wanted only an orange, I might have to pull out a lot of
apples to get to the orange.
So, if someone wanted to list products by region, they may have to look
through a long list of categories, sizes, seasonal, and storage lists to
get to the region lists.
Questions --- If everything was in the same classification, how would
they know they had a region list (or a category list, size list,
seasonal list, or storage list) in the first place?
Eric Dome
It's important to keep the classifications standard across all parts of
a company. Otherwise the classification loses its reason for being.
Suppose you put "deer crossing" signs in places where there aren't any
deer. Eventually people will ignore the signs in areas with deer.
Causing mountains of road kill. You don't want the same chaos in your
data models. Especially if you have a large ERP system that's accessed
by many people. Records will be inconsistent, data will slow down and
costly mistakes will be made. In other words, don't cut corners. It will
come back to haunt you.
Debi Henry
By adding a new "entity/column" we would be begin defining the
geographical subject area which would support future development needs
in addressing business requirements of a geographical nature. Kinda like
a "pay now or pay later" situation as well as confusion to the business
definition of the column because of "dual function" column content.
Jeff Lawyer
I would not add these "fake classifications".
It is true the one-time cost of design, implementation, and meta
data creation occurs only once.
However, the cost of:
1) Filtering out "product
/ region" from BI and other reporting will occur many times.
2) Explaining why
"product / region" is not really a category will occur frequently.
3) One incorrect business
decision based upon this bogus and confusing classification could be
orders of magnitude times more than the cost of adding new columns.
This type of ERP
tool customization (adding a new column), properly designed, documented,
and communicated is justified in this particular case.
However, "faking" the classification is not.
Bruce McTavish
The 'region' data is probably already defined in another system, as a
Sales Region, or a Financial 'slice / view'.
In this case there is no need to pay to add it to the Product data.
The request from the user is for reporting and analyzing Sales, not for
extending the product definition.
If there is a 'very large ERP tool' at this company, then there is
likely a data warehouse / mart / reporting / analytical environment as
well.
Assemble the required pieces for reporting in this DW environment.
Raghavendra
1. There is always different column in the ERP application that would
give this information like the inventory warehouse, Sold by sales rep
attributes etc that would give this information.
2. There is a many to many relationship between product and region. It
would be difficult for the business to maintain this data. Product can
stop selling in a region or we can start selling the product in a
region.
3.Product sold in region is a transational data and not attribute of the
product. The data can be derived from transational data.
Christian A. Strasser
I would not primarily on the basis of mutual exclusivity.
That is, a classification by definition consists of a set of
choices that can only have one in each set be true.
Thus, a product cannot be both a seasonal and non-seasonal product.
Adding fields to an item also allows for future expansion based on the
fields.
A better model would have product and sales_area as separate tables
with a product-sales_area table between them.
This would allow for much more flexibility.
As a classification, if something like the region changed or expanded,
some sort of clumsy method of handling this would be needed (such as
combo codes). Further if you
wanted to track history of where a product was sold, it would not be
possible.
Clyde
Watts
As we know the entropy of the universe is increases , similary over time
the entropy of the Data Model increases . The Universe slowly goes from
a ordered energy/matter system to a ministronee soup of lower enger and
matter , so a LDM goes from a Normalised model to a denormalised model (
star schema ). To keep a model in a normalized form requires energy or
in this case BUDGET. Back to the question , altho it is easy to add the
fake classification and uses less budget , this will be the start of the
insidious decline of the model from a normalised model to a denormalised
model.
Larry Weismantel
While it would be relatively easy to accommodate the first couple of
"fake classifications", eventually the addition of more and more of
these would cause enough database anomalies that the cost of having to
write special code and/or queries for certain situations would outweigh
the benefit of being able to include the "fake classifications" by just
adding a new row. With the
"region the product sells in" classification, I feel it would be a much
better solution to add an associative table that links the region and
product tables to indicate which products are sold in which regions.
Tracy
Welty
The argument for adding these as classifications is incomplete.
Existing reports and processes using 'real' classifications would
have to be updated with logic to properly utilize/exclude the 'false'
classifications. This will
increase maintenance costs over time as specific data knowledge will be
required for any new development or changes to existing work.
Meta-data cost also rise.
Finally, the logical model will end up generalized to the point that it
will not be as useful as a tool to describe the business.
|