Opinions, News and Views on Semantic Technology and Research

Semantics Online

Subscribe to Semantics Online: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Semantics Online: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Semantics Online Authors: Haim Koshchitzky, Lori MacVittie, Charles Silver, Barry Morris, Maureen O'Gara

Related Topics: XML Magazine, Semantics Online

XML: Article

Semantics and Context

One of the core tenets of XML is its extensibility and flexibility

Although XML defines each data element in a given transaction (the semantics), there's no mechanism to also communicate the business context. This represents the difference between reading XML and understanding the business impact of the transaction. The use of namespaces, numeric values, and time stamps all create some context when looking across transactions or business entities. In this article we'll discuss the difference between semantics and context and the challenges this difference creates relative to performance and scalability.

One of the core tenets of XML is its extensibility and flexibility. XML facilitates these tenets because it's self-describing and has a DTD that provides the data structure necessary for reading the content of the associated document. This is the capability that sits at the core of the XML "hype versus hope" debate.

In today's increasingly dynamic business world this self-describing capability provides hope against obsolescence. By providing a mechanism for maintaining systems that can communicate changes in semantics as part of the transmission of a document, XML provides a way to reduce the maintenance cost for solutions based on changing business requirements. One of the main justifications for introducing XML into a solution is that this adaptability enables the core solution to scale and perform despite changes in requirements or exceptions to processes.

However, this same capability contributes to the hype surrounding XML. Take the assumption that if an XML document is self-describing and the tags associated with the description are clear, then the system responsible for reading the document is able to understand the implications of the change. A common mistake is to assume that this description in the XML actually de- scribes the changes. In reality the change is only implicit - it's impossible to connote the intent of the change, understand the business context that required the change to occur, or automatically derive how a new data element should be handled.

This article looks at a specific business example, differentiates the context from the semantics, and discusses the issues around these differences with respect to namespaces, numeric values, and date/time stamps. By separating the semantics from the context, one can consider these issues with respect to maintenance costs to improve system performance and scalability.

The Problem Definition
Let's look at the communications associated with placing a request for a proposal, beginning with an enterprise seeking to purchase specific products from its suppliers. In the center of Figure 1 is the Enterprise that Betty Buyer works for. On the right, left, and bottom of Enterprise are potential suppliers of the widgets that Betty Buyer wants to purchase. Betty must submit a Request for Quote (RFQ) to each supplier before she makes the purchase. Betty's Enterprise manages the RFQ through a homegrown system.

To reach each supplier, the RFQ must be sent via the Exchange, then on to the appropriate supplier. To accomplish this, Betty, through her RFQ system, must know the name of each supplier within her namespace as well as how it resolves that name to the namespace of the Exchange. While Betty may know Sam's company as "Best Supplier," the Exchange may use "Bsupplier, Inc.," or the supplier's D&B number. Indeed, Best Supplier could participate either directly with the Enterprise, through the Exchange on the right of Figure 1, or as a participant in Enterprise's Private Exchange. In this case, Betty's RFQ system may need to maintain three different names for Best Supplier.

To fully understand the contextual issues, let's look at the RFQ Betty sends through an intermediary like this Exchange. Betty wants to send the information in this RFQ to three of the potentially hundreds of registered suppliers. However, she doesn't want to duplicate the effort of creating the RFQ in the Exchange's system for every one of the many RFQs she creates. If the information is in XML, the Exchange can import and convert the RFQ into its system. But Betty still must address the RFQ to the appropriate suppliers as they're referenced in the Exchange's namespace. The Exchange has to resolve the business context - the product, terms, and delivery date communicated in the RFQ.

Since every participant in this scenario has a different software solution, XML is the ideal choice for communicating and translating the semantic information in the RFQ because XML is system independent. Betty Buyer creates an RFQ on Enterprise's RFQ system, which in turn creates an XML document that's similar to the example in Listing 1. Betty Buyer sends this document to all suppliers, then waits for a response from each. The self-describing nature of the XML RFQ enables each receiving supplier to map the tagged pairs of the RFQ data to its own internal system representation so the supplier can process the RFQ and respond in kind. But this isn't as simple as it sounds. Let's see what happens with the namespace, numeric, and date information that have contextual differences between the business entities.

Namespaces - What's in a Name?
Every system manages its own namespace. Names of companies, people, and items are all stored and referenced locally. When this is done by systems that must share information, the practice creates significant challenges. Let's consider how it impacts Betty Buyer's ability to buy from Sam Supplier.

To force a business context, the larger exchanges state that the transactions must be created in their environment, so all participants use the processes and business context enforced by each exchange. Consider how this works. An exchange maintains a centrally managed catalog that resolves part number, description, and pricing namespace issues. The exchange also maintains its own unique company namespace, document namespace, and all the business processes that allow two companies to coordinate the buying, selling, shipping, invoicing, and sometimes even the exchanging of funds.

But each exchange represents only a fraction of the entire market, so the practice of suppliers participating in multiple exchanges has become rather common. And despite the rapid growth in the number of exchanges that are currently available, many large enterprises believe they gain a competitive advantage by maintaining their own private exchange. A private exchange allows the enterprise to define and enforce its own unique business process and forces its suppliers to comply with that process.

While participating in multiple exchanges (both public and private) may seem to resolve the immediate business issue of gaining maximum market exposure and control of the business process, it actually defeats the purpose of having an intermediated marketplace. The intermediated marketplace was supposed to enable a wide range of suppliers to bid on an enterprise's RFQ, and the RFQ was supposed to be sent only once. If each enterprise participates in multiple marketplaces while also building its own private marketplace, then the only way to deliver on the promise of the intermediated marketplace is to enable all exchanges (both public and private) to communicate with each other. But this reintroduces the original namespace resolution issue - only an order of magnitude more complex.

One solution is to explicitly state that all B2B XML documents include data elements that are tied to a specific namespace. For example, a company may reference internal specifications by URL. In the earlier example, Betty's listing would need to tag part numbers as belonging to the Exchange's catalog so the receiving system could call that catalog. To handle this issue, a namespace specification that's based on the Universal Resource Indicators (URI) standard has been suggested. While the URI can eliminate broken links and identify a link universally and unambiguously, their use significantly complicates the parsing of the XML document. When parsing an XML document that contains a namespace reference, the referenced link must be called. Performing just a single Internet link within an XML document would introduce significant delays; when multiple links are referenced, the parsing challenge becomes even more problematic.

Products - What Do You Want?
The namespace issue isn't limited to company names. It becomes an issue for every object. At times this is made even more difficult by business practices. For example, it's not unusual for a supplier to offer the same product at different prices in different markets. One may be a spot market for inventory overstock, another may be an industry-specific market operated by a consortium, while an individual enterprise might have its own contract with a preferred supplier that guarantees 20% off list price.

But a supplier is unlikely to use the same part number across every ex- change. This would make it too easy for buyers and competitors to track its pricing policies. Instead, in the case of our example, Best Supplier maintains a different catalog of products for each market. Some items use the same part number, while others don't. To make sure Betty is getting the best possible deal, she must submit the RFQ to all three markets. But to do this, she must also know what product number and description is used to reference this product within each market.

Numeric Values - How Much Do You Want?
A more basic contextual issue involves amounts. For example, let's say Betty Buyer has requested 10,000 wid- gets. If you look at Table 1, the Exchange catalog has one supplier selling widgets in bulk packs of 12. How does the conversion between these units occur? What's the mechanism for communicating optional units of measure in an XML transaction? If Betty is willing to buy an overage, she can get a better price. But none of this information is represented in the semantics of the XML document.

Again, there's a semantic approach to solving this contextual issue if standards are considered. Most specifications require that units of measure be optional elements in the definition. Some even use attributes to communicate the amount that's in the document. However, these semantic capabilities are often associated with the accompanying business context. For suppliers who agree to use these standards, such as Open Applications Group, Inc., or RosettaNet, the solution can leverage the standards-based approach that's considered and addressed through semantics that map the specific contextual issues. In Listing 2 we've added a UOM section to the XML that will allow for communication of the specific semantics.

Date and Time - When Do You Want It?
Finally, let's look at date and time as a function of the time zone you're operating in and the regional notation of the data format. The contractual issues associated with physical delivery create this challenge. For example, the suppliers responding to the RFQ may be communicating deliveries based on their time zone and region. Best Supplier is located on the West Coast, and their ability to deliver widgets, as stipulated in the RFQ, would require next-day shipping, so their shipping costs may be higher. By modifying the XML DTD to allow for separation of the elements of a date, we can address some of the issues associated with the date-specific semantics.

However, anything beyond simple transformations requires a date data type. The solution for date and numerical issues will be much simpler when XML schemas arrive. The draft specification will allow for multiple data formats that in turn would allow the schema-based semantics to address these issues.

Conclusions
As you can see in each of these examples, although XML allows for easy resolution of the semantic differences between the business entities, the business context presents a greater challenge. Once the contextual issue is well understood, it becomes clear that XML can solve problems only within a finite domain, one identified by a shared context. For those who believed the hype generated by XML, this limitation is disappointing. For those wrestling with how to communicate with business partners, XML continues to deliver incredible business benefits:

  • Flexibility:Look at the date format and the ability to configure time zones.
  • Extensibility: Extend semantics to allow for the communication of context through optional elements and attributes.
  • Ease of use: Changes in context can be communicated through a revised DTD without breaking the overall solution.

Just because a document is self-describing and solutions can correctly read the document, it doesn't necessarily follow that the same system can understand the context. Many exchanges know full well the issues of business context in trying to create a single integrated catalog of products. Think of the product code for widgets and how that's represented to a supplier internally versus how it's presented to a buyer bidding on widgets through an exchange that represents hundreds of suppliers.

There are B2B specifications that allow for optional information for each of these capabilities. But the introduction of a new business context makes the entire solution more difficult to maintain. Hopefully, these standards will consolidate to a few, since adhering to multiple standards drastically reduces the efficiencies promised by adopting XML into your solution.

Certainly, one key is the completion of the specifications under review by the WC3. Consider the ability of XML schemas to differentiate between numbers, dates, and text - if combined with the ability for an Xquery to calculate and transform content, one could see a tool set built on these ratified specifications that would allow for solutions to translate between business context just as XML is used to transform B2B transactions today. Additionally, initiatives like Universal Description Discovery and Integration (UDDI), which focus on the creation of namespaces in the Internet as opposed to the proliferation of additional flavors of the same solution, present shared namespaces where context can be published and shared by business entities. If we're to avoid the cynicism and backlash from our business sponsors when the XML-hype bubble breaks, we need to drive our solutions to minimize the propagation of multiple standards and dialects.

The final article in this series will focus on parsers. As the processing engine for XML, transaction parsers are core to the use of XML. We'll look at the scalability of these engines and their ability to efficiently handle the emerging dialects of XML. We'll also summarize the series.

If you'd like to discuss a particular aspect of this or any other topic, e-mail me at kpatel@tilion.com.

Glossary SE·MAN·TICS

  • Linguistics: The study or science of meaning in language forms.
  • Logic: The study of relationships between signs and symbols and what they represent. In this sense, also called semasiology.
  • Semantics: The meaning of a string in some language, as opposed to syntax, which describes how symbols may be combined independent of their meaning.

CON·TEXT
The part of a text or statement that surrounds a particular word or passage and determines its meaning.

The circumstances in which an event occurs; a setting.

NAME·SPACE
The unique definition of companies and people within a business entity.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.