How to best structure product data and metadata is a challenge
facing many companies today. Take, for example, "Memory
Organization," a complex attribute found in Memory ICs. Memory
cells are organized into rows and columns and one chip can even
have two different organizations of its cells. This makes it
possible to have two values for the attribute.
Memory Organization: 256 K x 16 bit; 512 K x 8 bit
What is the optimal architecture for a complex attribute like
the one above? The old adage of "Keep It Simple" persists and leads
many to capture "Memory Organization" in one text field.
Solution A:
Memory Organization: 256 K x 16 bit; 512 K x 8 bit
Solution A may simplify the initial process of data modeling and
capture, but over time it proves inflexible and virtually
guarantees inconsistent and unnormalized data. Manufacturers detail
specification information in different ways and Solution A does not
provide a structure for how the information should be captured and
stored. Additionally, because all of the elements of the attribute
are captured in one field, any manipulation of the data has to take
place manually.
An alternative model might look like the following:
Solution B:
Memory Organization No. of Units 1: 256 K
Memory Organization Unit Size 1: 16 bit
Memory Organization No. of Units 2: 512 K
Memory Organization Unit Size 2: 8 bit
(Concatenated) Memory Organization: 256 K x 16 bit; 512 K x 8
bit
Solution B treats the attribute as having two values, each
composed of two elements: number of units and unit size. It is the
superior solution for a number of critical reasons:
- Consistency: Name, value and unit of measure
are broken out for each element of each attribute value. Limiting
what is found in each field ensures consistency. Where appropriate,
restricted values can be defined for the value and unit of measure
fields to further promote normalized data that can be effectively
maintained over time.
- Search: Each "Memory Organization" can appear
separately in the dropdown list of a faceted search menu, reducing
the number of unique values a customer must sort through. As search
technologies advance and navigation attributes employ searchable
textboxes, a customer will later be able to search on the element
of the attribute that is of most interest. Perhaps the customer is
concerned with finding a 32 bit organization and less worried about
the number of units. Solution B enables more effective search for
both internal and external users and can easily feed increasingly
sophisticated systems.
- Flexibility: Solution B is a flexible
structure that can be manipulated in a number of additional ways.
Conversion and other mathematical operations on values and units
are possible. Elements can be added or removed as the data needs
change. Perhaps the test conditions of a particular attribute value
were captured-"120 V at 50 A"-and it is later determined that only
the voltage is of interest. If the entire string was captured in a
single text field there is no simple way of deleting the "Test
Condition." If, however, the model parses out each element of the
attribute value into "Voltage" and "Test Condition," the latter is
easily deleted or altered across an entire data set.
- Integration: A data model that fully defines
of all of its constituent elements and their relation to one
another is a model that can more easily integrate with other
systems both internal and external to the organization.
Gina Bulatovic
Director of Professional Services