Date:
Sun, July 20, 2008 04:40:09 PMFrom:
Robin Cover
Subject:
XML Daily Newslink. Friday, 18 July 2008
XML Daily Newslink. Friday, 18 July 2008
A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS http://www.oasis-open.org
Edited by Robin Cover
====================================================
This issue of XML Daily Newslink is sponsored by
Sun Microsystems, Inc. http://sun.com
====================================================
HEADLINES:
* Experimental RFCs from Email Address Internationalization Working Group
* Information from Google: Hitting 40 Languages
* An ESB for the Web?
* NETCONF Event Notifications
* What Makes for a Successful Protocol?
* BizTalk Services Have Been Updated
* Interview with Kenton Varda: Google Open Sources Protocol Buffers
* Linus Torvalds, Geek of the Week
* Tech Giants Tackle Information Overload
----------------------------------------------------------------------
Experimental RFCs from Email Address Internationalization Working Group
Jiankang YAO, Wei MAO, and Abel Yang (eds), IETF RFCs
The IESG has approved the two specifications from the IETF Email Address
Internationalization Working Group as Experimental RFCs. An IETF
"Experimental" designation typically denotes a specification that is part
of some research or development effort subject only to editorial
considerations and to verification that there has been adequate
coordination with the standards process. If the IETF may publish something
based on this on the standards track once they know how well this one
works, it's Experimental. This IETF WG was chartered to study email address
internationalization problems and create proposed solutions. Background
and History: Mailbox names often represent the names of human users. Many
of these users throughout the world have names that are not normally
expressed with just the ASCII repertoire of characters, and would like to
use more or less their real names in their mailbox names. These users are
also likely to use non-ASCII text in their common names and subjects of
email messages, both in what they send and what they receive. This protocol
specifies UTF-8 as the encoding to represent email header field bodies.
The traditional format of email messages RFC 2822 allows only ASCII
characters in the header fields of messages. This prevents users from
having email addresses that contain non-ASCII characters. It further forces
non-ASCII text in common names, comments, and in free text (such as in the
Subject: field) to be encoded (as required by MIME format RFC 2047. The
two experimental documents form the core specification for an extension
to SMTP and RFC 2822 that allow the use of UTF-8 in message headers
without encoding. This includes the use of non-ASCII characters in email
addresses, both on the left and right hand sides of the '@' character.
The documents have been extensively reviewed by people with mail expertise.
There have been reports of implementations, but no interoperability tests
have been reported to date. (RFC #1) The "Internationalized Email Headers"
RFC specifies an experimental variant of Internet mail that permits the
use of Unicode encoded in UTF-8, rather than ASCII, as the base form for
Internet email header field bodies. It removes the blanket ban on applying
a content-transfer-encoding to all subtypes of message/, and instead
specifies that a composite subtype may specify whether or not a
content-transfer-encoding can be used for that subtype, with "cannot be
used" as the default. This form is permitted in transmission only if
authorized by an SMTP extension, as specified in an associated
specification. And this specification updates section 6.4 of IETF RFC
2045 to conform with the requirements. (RFC #2) "SMTP Extension for
Internationalized Email Address" specifies an SMTP extension for transport
and delivery of email messages with internationalized email addresses or
header information. The extension is identified with the token "UTF8SMTP".
In order to provide information that may be needed in downgrading, an
optional alternate ASCII address may be needed if an SMTP client attempts
to transfer an internationalized message and encounters a server that
does not support this extension.
nla_internal_3356107.jpg also SMTP Extension for Internationalized Email Address: http://xml.coverpages.org/draft-ietf-eai-smtpext-13.txt
----------------------------------------------------------------------
Information from Google: Hitting 40 Languages
Mario Queiroz, Google Blog
"One of our goals is to give everyone using Google the information they
want, wherever they are, in whatever language they speak, and through
whatever device they're using. A huge part of that goal is making our
services available in as many languages as possible. And as I'm sure
you can imagine, that isn't as easy as simply as translating a few lines
of text. Take Hebrew or Arabic, which are written from right to left. An
Arabic speaker may search for [example 'world cup football 2008'] where
part of the query will be written from right to left in Arabic, while
the numbers will be written left to right. Sometimes the right-to-left
difference can mean having to change the entire layout of a page, as
with Gmail. Or take Russian, where words change depending on their
placement and role in a sentence. In Russian, for example 'pizza in Moscow'
is encoded [see example] but 'pizza near Moscow' [differs markedly]. Then
there's the whole challenge of ensuring that queries are locally relevant.
While many Australians searching for 'freedom' are looking for the
Australian furniture chain, UK and US users are often looking for the
definition of the word itself. Our search results, then, have to take
into account these local differences... In 2007, we undertook a
company-wide initiative to increase the availability of our products in
multiple languages. We picked the 40 languages read by over 98% of
Internet users and got going, relying heavily on open source libraries
such as International Components for Unicode (ICU) and other
internationalization technologies to design products... Today we have
more than 30 products in more than 30 languages, up from 5 products in
30 languages just a year ago. In 2004, we had 150 local-language versions
of various products (e.g. a product local to the UK, not just the
English-speaking world), while today we're at more than 1500. From
January to March of 2008, we launched 256 local-language versions of
various products, compared to 55 in the same period of 2007. And we
have upgraded to Unicode 5.1 to make sure that we can handle any
characters people read or write in..."
http://googleblog.blogspot.com/2008/07/hitting-40-languages.html
See also Google adoption of Unicode 5.1: http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html
----------------------------------------------------------------------
An ESB for the Web?
Jim Stogdill, O'Reilly Radar
Imagine my surprise when I saw the web acting a bit like the enterprise
with the launch of Gnip. [The Gnip API provides notification of activities
(events) occurring in a variety of services and, whenever possible, a
GUID that identifies the activity itself vis a vis the service it was
created on; API users have two primary roles Publishers and Subscribers.]
As the web moves toward a network of widespread transactional API's,
each with it's own vocabulary, it is starting to look a lot like a legacy
enterprise writ large or maybe like an industry eco-system... In the
enterprise space, faced with the N-squared problem, you probably define
an enterprise vocabulary, build a bunch of services that conform to it
(or buy your applications from vendors that provide them), and then hook
them all together through your Enterprise Service Bus (ESB). ESBs by
definition support web services interfaces, provide translation services,
and process orchestration on top of a message routing backbone. They
usually come from vendors that probably used to sell Message Oriented
Middleware (MOM) (of both the store and forward and pub/sub variety),
Application Servers, Enterprise Application Integration (EAI), and even
Export Transform and Load (ETL) software. There are also a growing stable
of open source versions built on standards like Java Business Integration
(JBI)... [Getting back to Gnip as the ESB for the web]: When I first saw
their drawing on the web site (RSS, REST, Comet, XMPP, Atom -- handled
via the Gnip protocol bridge, which enables you to act as though the
entire Web uses your preferred data exchange protpcol) I immediately
thought "Cool, I bet they are using a JBI backbone with a service engine
for translation and a bunch of binding components to deal with XMPP, HTTP,
SIP, RSS, and etc." Because I come from the enterprise space this seems
like a natural use case for an ESB, and for JBI in particular... It seems
that there is a growing trend towards the use of XMPP as a generic XML
routing bus, a role that makes it look suspiciously like an ESB... You
may be thinking "If message oriented middleware is the backbone of many
ESB's, why isn't Gnip using Amazon's SQS as the foundation for the Web's
ESB?" After all, SQS is essentially a simple web-friendly message bus.
The simple answer is latency. SQS has performance characteristics more
like store-and-forward-style MOM than like pub/sub MOM. Because of that
it is more suitable for use cases that need guaranteed delivery but that
can support average latencies on the order of one second (and may be high
as ten seconds)...
http://radar.oreilly.com/2008/07/an-esb-for-the-web.html
See also the Gnip web site: http://www.gnipcentral.com/
----------------------------------------------------------------------
NETCONF Event Notifications
S. Chisholm and H. Trevino (eds), IETF Proposed Standard Protocol
The IETF RFC Editor Team announced the availability of a new Standards
Track Request for Comments in online RFC libraries: "NETCONF Event
Notifications." The document is now an IETF Proposed Standard Protocol,
and is a work product of the Network Configuration Working Group. The
document specifies an Internet standards track protocol for the Internet
community, and IETF requests discussion and suggestions for improvements.
The IETF NETCONF Working Group was chartered to produce a protocol
suitable for network configuration, whereas "configuration of networks
of devices has become a critical requirement for operators in today's
highly interoperable networks. Operators from large to small have
developed their own mechanisms or used vendor specific mechanisms to
transfer configuration data to and from a device, and for examining
device state information which may impact the configuration. Each of
these mechanisms may be different in various aspects, such as session
establishment, user authentication, configuration data exchange, and
error responses. The NETCONF protocol is using XML for data encoding
purposes, because XML is a widely deployed standard which is supported
by a large number of applications. The NETCONF Event Notifications
document defines mechanisms that provide an asynchronous message
notification delivery service for the NETCONF protocol. This is an
optional capability built on top of the base NETCONF definition. It
defines the capabilities and operations necessary to support the service.
Document section 4 specifies the XML Schema for Event Notifications.
http://xml.coverpages.org/draft-ietf-netconf-notification-14.txt
See also the IETF Network Configuration (NETCONF) Working Group: http://www.ietf.org/html.charters/netconf-charter.html
----------------------------------------------------------------------
What Makes for a Successful Protocol?
Dave Thaler and Bernard Aboba (eds), IETF RFC
IETF announced the availability of a new Informational Request for
Comments in the online RFC libraries. An IETF "Informational"
specification is published for the general information of the Internet
community, and does not represent an Internet community consensus or
recommendation. RFC #5218 "What Makes for a Successful Protocol?" The
document discusses "success" from several points of view, and makes
recommendations about questions that should be asked when evaluating
protocol designs. The Internet community has specified a large number
of protocols to date, and these protocols have achieved varying degrees
of success. Based on case studies, this Informational RFC document
attempts to ascertain factors that contribute to or hinder a protocol's
success. It is hoped that these observations can serve as guidance for
future protocol work... Two major dimensions on which a protocol can be
evaluated are scale and purpose. When designed, a protocol is intended
for some range of purposes and was designed for use on a particular
scale. According to these metrics, a "successful" protocol is one that
is used for its original purpose and at the originally intended scale.
A "wildly successful" protocol far exceeds its original goals, in terms
of purpose (being used in scenarios far beyond the initial design), in
terms of scale (being deployed on a scale much greater than originally
envisaged), or both. That is, it has overgrown its bounds and has
ventured out "into the wild"... The case studies described in Appendix A
of the document indicate that the most important initial success factors
are filling a real need and being incrementally deployable. When there
are competing proposals of comparable benefit and deployability, open
specifications and code become significant success factors. Open source
availability is initially more important than open specification
maintenance. In most cases, technical quality was not a primary factor
in initial success. Indeed, many successful protocols would not pass
IESG review today. Technically inferior proposals can win if they are
openly available. Factors that do not seem to be significant in determining
initial success (but may affect wild success) include good design,
security, and having an open specification maintenance process. Many of
the case studies concern protocols originally developed outside the IETF,
which the IETF played a role in improving only after initial success was
certain. While the IETF focuses on design quality, which is not a factor
in determining initial protocol success, once a protocol succeeds, a good
technical design may be key to it staying successful, or in dealing with
wild success. Allowing extensibility in an initial design enables initial
shortcomings to be addressed...
http://www.rfc-editor.org/rfc/rfc5218.txt
See also IETF Informational and Experimental Status RFCs: http://www.ietf.org/u/ietfchair/info-exp.html
----------------------------------------------------------------------
BizTalk Services Have Been Updated
Abel Avram, InfoQueue
BizTalk Labs "is where Microsoft shares early access to experimental
connectivity and business process technologies in order to get feedback
from customers... whereas an Enterprise Service Bus (ESB) is a commonly
deployed set of technologies that most large organizations use to make
it easier to build and maintain complex Enterprise applications, an
Internet Service Bus (ISB) is the evolution of this approach that
leverages advances on the Internet to make it easier to connect
applications between organizations and to integrate with browsers, RSS,
and other Web technologies...An Internet Service Bus consists of a set
of integrated hosted services that includes: naming; application
messaging (including routing and publish and subscribe); identity and
access control; and workflow and business process management." Abel
Avram reports that "BizTalk Labs has updated its range of connectivity
and business process services through the BizTalk Services SDK which
offers access to the following services: Workflow, Identity, Windows
Live ID Credentials, Unauthenticated Access, TransportClientCredentials,
HTTP Connectivity Mode. The BizTalk Labs SDK works on Windows Vista,
XP, or Server 2003. Internet Explorer 7 and the .NET Framework v3.0
Runtime and SDK are necessary to use the SDK. BizTalk service summary:
(1) Workflow: BizTalk Services has added a new service for running
Workflows for service orchestration in the BizTalk Services cloud. (2)
Identity Service Scopes: The Identity Service now allows for creating
per-service access control management scopes with delegation of management
authority between users. (3) Windows Live ID Credentials: You can now
use Windows Live ID as credential for obtaining tokens. (4)
Unauthenticated Access: For all connection modes, services can opt out
of the client authorization facility provided by the Relay and allow
unauthenticated client access. (5) TransportClientCredentials: Refactored,
WCF-aligned API for configuring/setting credentials for accessing the
Relay, replacing the 'raw' TokenProviders. (6) HTTP Connectivity Mode:
New connectivity mode allowing RelayedOneway, RelayedMulticast, and
RelayedDuplex services to listen on the Relay using HTTP (port 80).
From the web site: "Keep in mind that the technologies available at
BizTalk Labs are experimental. In many cases we have not decided on
what they will be named, whether they will become fully released
products, or how we will charge for them."
http://www.infoq.com/news/2008/07/BizTalk-Services
See also InfoWorld: http://www.infoworld.com/article/08/07/17/Microsoft-adds-workflow-to-cloud-based-SOA-platform_1.html
----------------------------------------------------------------------
Interview with Kenton Varda: Google Open Sources Protocol Buffers
Kurt Cagle, XML.com
Data messaging formats represent the life-blood of any distributed
application. The ability to pass information back and forth between
disparate systems becomes crucial for any organization, but for companies
such as Google, the challenge of setting up communications between the
thousands of different servers that host the various Google services
forced the need for a specialized format that met their needs in particular.
Recently, Kenton Varda, an engineer working on search engine
infrastructure at Google, became the point man for releasing Google's
internal messaging format (called Protocol Buffers) as an open source
project using the Apache license. Kurt Cagle spoke with Kenton about
Protocol Buffers, why they are important to Google and why the decision
was made to open source them -- and why use an internal format rather
than a format such as XML, JSON or related technology. Excerpts from
Varda in the interview: "Practically all our internal data formats, for
both RPC and storage, are based on Protocol Buffers. Many apps need the
them for performance reasons, but they are also often used just because
it's the path of least resistance. Protocol Buffers are good when you
have structured data which you need to encode in a way that is both
efficient and extensible. The second point is important: a lot of people
ask why we didn't just use various existing binary formats, and the
answer is usually that those formats do not provide easy extensibility...
Of course, XML and JSON provide extensibility as well, but Protocol
Buffers have an advantage over them in efficiency -- Protocol Buffers
are both smaller and faster to parse. Furthermore, the data access
classes generated by the Protocol Buffer compiler are often more
convenient to use than typical SAX or DOM parsers. Of course, lack of
human-readability can be a serious disadvantage depending on the use
case. That said, XML is a much better solution when you need to encode
documents composed primarily of text with markup. Protocol Buffers
provide no obvious way to interleave text with structured elements.
XML and JSON are also better if you need a human-readable format --
although there is a standard way to encode Protocol Buffers in text,
it provides no real advantages over JSON... Contrary to what many
people are saying, our intent with this release is not to 'kill XML'.
We simply believe that while XML works very well in the situations for
which it was designed, it is not the ideal solution for every problem.
XML is inherently inefficient both in terms of size and parsing speed
since it is a text-based format. In many applications, these
inefficiencies don't matter, but for us they make a big difference.
Furthermore, XML, despite being a simplification of SGML, is still a
very complicated standard, and many of its features just get in the
way in a lot of cases. Protocol Buffers are designed to be very simple
conceptually."
http://news.oreilly.com/2008/07/interview-google-open-sources.html
See also the Google Protocol Buffers web site: http://code.google.com/apis/protocolbuffers/
----------------------------------------------------------------------
Linus Torvalds, Geek of the Week
Richard Morris, simple-talk
Linus Torvalds is remarkable, not only for being the technical genius
who wrote Linux, but for then being able to inspire and lead an
enormous team of people to devote their free time to work on the
operating system and bring it to maturity. An acknowledged godfather
of the open-source movement, Linus Torvalds was just 21 when he changed
the world by writing Linux. Today, 17 years later, Linux powers everything
from supercomputers to mobile phones. In fact ask yourself this: if
Linux didn't exist, would Google, Facebook, PHP, Apache, or MySQL?
Excerpt on one topic (patents): Richard Morris: 'Do you think software
patents are a good idea?' Linus Torvalds: "Heh -- definitely not.
They're a disaster. The whole point (and the original idea) behind
patents in the US legal sense was to encourage innovation. If you
actually look at the state of patents in the US today, they do no such
thing. Certainly not in software, and very arguably not in many other
areas either. Quite the reverse: patents are very much used to stop
competition, which is undeniably the most powerful way to encourage
innovation. Anybody who argues for patents is basically arguing against
open markets and competition, but they never put it in those terms. So
the very original basis for the patents is certainly not being fulfilled
today, which should already tell you something. And that's probably true
in pretty much any area. But the reason patents are especially bad for
software is that software isn't some single invention where you can point
to a single new idea. Not at all. All relevant software is a hugely
complex set of very detailed rules, and there are millions of small and
mostly trivial ideas rather than some single clever idea that can be
patented. The worth of the software is not in any of those single small
decisions, but in the whole. It's also distressing to see that people
patent 'ideas'. It's not even a working 'thing'; it's just a small way of
doing things that you try to patent, just to have a weapon in an economic
fight. Sad. Patents have lost all redeeming value, if they ever had any'."
Note: other 'Geeks of the Week' in the series include: Tim Berners-Lee,
CmdrTaco (slashdot founder) and Richard Hipp (SQLite creator).
http://www.simple-talk.com/opinion/geek-of-the-week/linus-torvalds,-geek-of-the-week/
See also 'Geeks of the Week' series: http://www.simple-talk.com/opinion/geek-of-the-week/
----------------------------------------------------------------------
Tech Giants Tackle Information Overload
Holly Jackson, CNET NEWS.com
Your BlackBerry buzzes with a text from your boss, snapping you out
of your Twitter-surfing trance. Your friend calls you and tells you
to check out his Facebook profile, as you respond to your spouse's
instant message about dinner plans. All the while, your in-box is
overflowing with new e-mail messages. If humans were like computers,
our screens would be frozen -- overloaded by information and too much
multitasking. The term "information overload" has floated around for
years and been the topic of much analysis, but the situation remains.
According to recent research by enterprise research firm Basex, these
distractions are now costing the American economy more than $650 billion
in lost productivity, and taking up 28 percent of workers' time. Such
numbers led Intel engineer Nathan Zeldes and other tech industry
insiders to form the new Information Overload Research Group. The
nonprofit consortium, whose members include Microsoft Research, IBM,
and Google employees, recently held its first conference in New York,
with members meeting at sessions with titles like "No Time to Think"
and "Visionary Vendors." Now that the group has had its inaugural
gathering, Zeldes, its president, said IORG will continue to recruit
members and financial sponsors from a scope of business sectors. With
more minds applied to finding a solution to what IORG calls "the world's
greatest challenge to productivity," Zeldes hopes to generate innovative
ideas that can benefit both businesses and individuals. With a reported
281,000 terabytes of information created worldwide in 2007, streamlining
and compiling data with software is one way technology can wrangle the
information influx, Vanderbroek says. Of course, given that most of the
parties involved in the IORG have created hardware and software that
contributes to information overload, one might question why those same
people would want to hinder it... Companies also have to think about
balancing their employees' lives. Information overload outside of work,
like using a BlackBerry on weekends or vacations, could hinder the
work-life balance, leading to decreased worker satisfaction. Zeldes
also points out the problem is not just affecting technology companies
or large corporations.
http://news.cnet.com/8301-1001_3-9993917-92.html
----------------------------------------------------------------------
XML Daily Newslink and Cover Pages are sponsored by:
IBM Corporation http://www.ibm.com
Oracle Corporation http://www.oracle.com
Primeton http://www.primeton.com
Sun Microsystems, Inc. http://sun.com
----------------------------------------------------------------------
XML Daily Newslink: http://xml.coverpages.org/newsletter.html
Newsletter archive: http://xml.coverpages.org/newsletterArchive.html
Newsletter subscribe: newsletter-subscribe@xml.coverpages.org
Newsletter ***: newsletter-***@xml.coverpages.org
Newsletter help: newsletter-help@xml.coverpages.org
Cover Pages: http://xml.coverpages.org/
----------------------------------------------------------------------


Back to newsletter list