Note: this is a simplified version of the previous document of August 1996, (osmic.d1), which will still be available. This will all be written up more clearly at greater length.

OSMIC
(Open System for Media InterConnection)

An Integrated Universal Format,
especially for
TEXT, HYPERTEXT, ANNOTATION,
VERSIONING,
and INTERCOMPARISON

Preliminary Remarks

TODAY'S INSANE COMPUTER WORLD

Today's computer world is based on trapping users in "applications" which are controlled by individual manufacturers and get bigger and buggier all the time.

These applications typically are based on a warped idea of what users need, especially in the text areas.

Moreover, each application typically has ITS OWN methods of representing connections among data parts, and collections of data parts.

I believe we should have a common framework in which CONNECTIONS AND COLLECTIONS ARE ALL HANDLED IN THE SAME WAY, and anyone may produce visualizations and interface applets that work across a broad spectrum of materials.

THE HOPELESSNESS OF COMPUTER TEXT SYSTEMS

Today's text systems are a mess, practically worthless. "Word processing" simulates letter tiles on two-dimensional sheets, "desktop publishing" (programs for what used to be called *layout*) add font sugar to word processing, and "outline processing" uses clumsy methods for crude structure management and item control.

THE VITAL ISSUES OF VERSION MANAGEMENT AND ITEM CONTROL

The main issues of text systems, I believe, are version management (keeping track of different versions and comparing them and knowing what's in them) and item management (keeping track of items to be included, and which items are at what locations in which versions).

The only software on the personal market for item control is the "outline processor," which assumes: 1. That you are creating a sequential document (which you may not be). 2. That the best way to create a sequential document is to create a list of items in sequence, then to change the sequence of that list of items continually. You cannot make different piles, for instance (as in the KJ editor).

The only software for version management is so-called "CASE" software (very expensive and clumsy), and certain utilities for Unix, which keep track only of differences between *lines* of text.

NO WAY TO MAKE NOTES ON THINGS

There is no way to make notes on your writings, comparable to writing in the margin or adding sticky notes.

We should be able to make separate sticky-type notes (even be able to make parallel columns of annotation, throughout any objects, rather like spreadsheet columns, but which are only next to the parts we wish to comment on).

We need systems for such annotation, and they should have the same form as other links. Long-term and short-term annotations should have the same implementation.

HYPERTEXT

1. Editing Hypertext--

The popular forms of hypertext, especially World Wide Web and HyperCard (which is World Wide Web without the Internet), allow only drastic editorial changes that are hard to keep track of. Version management in HTML, in particular, is extremely difficult and annoying.

2. Hypertext Links--

HTML links

All these things need to be corrected, but I do not believe they can be corrected within the framework of HTML.

The World Wide Web Consortium is working on all these problems. However, I want to proceed with what I think is the best solution and demonstrate it.

THE HOPELESSNESS OF COMPUTER FILING SYSTEMS

Computer filing is still all hierarchical, even though most projects overlap and often have many versions. (And even though categories in the real world often overlap and combine, so that sharp categories are often incorrect representations of that real world.)

"File synchronization" is another very bad idea which attempts to hide various problems. The concept is to have a program that automatically overwrites the older versions of a file with the most recent version. This is based on the strange assumption that the most recent is always correct and better. In reality the problem of file management is much richer.

TO CORRECT THESE PROBLEMS--

It is to correct these problems that the Xanadu(R) system, and the software we will be working on in the OSMIC project, have been designed.

OSMIC is intended to be a rich, clean, simple compatibility standard for interconnection and change among diverse software families; to improve the Web with enhanced relational and publication structures; to turn back the tide of baroque, proprietary data structures; and to allow all parties to create interface applets in any style with compatible results. Thus it is a natural complement to Java* and the Unix* philosopy.

It offers Mix-and-Match methods and need not be adopted as a whole to provide benefits.

OSMIC is intended to be a public-domain version of the Xanadu system, dealing with these matters as simply as possible. It is intended (to quote Marlene Mallicoat) as an Integrated Universal Format for text and other media.

It continues the proud Xanadu tradition on a public-domain basis:

----------- * trademarks of somebody.

DATA STRUCTURE-ORIENTED, NOT OBJECT-ORIENTED

OSMIC is a data structure, not an object-oriented system. (Object-oriented systems don't allow you to create new visualizations; as OSMIC is intended to do.)

This is essentially an IMPLEMENTATION-INDEPENDENT SPECIFICATION OF ALL THE XANADU STRUCTURES--

LIMITS IN THE FIRST IMPLEMENTATION.

We will consider for the time being ONLY SEQUENTIAL TEXT WITH LINKS (like HTML), in order to make implementation simpler. (Other hyperstructures, which require a structure map, will not be in the first implementaiton.)

Links will not be editable in the first implementation.

Files will not be purgeable in the first implementation.

DATA STRUCTURES OF OSMIC

The system consists principally of the following DATA STRUCTURES:

1. FROZEN TEXT FILES (Append-and-Read-Only), pointing to base data (which we may call Primedia). Arriving text material is appended to the end and not moved again. Most operations deal with addresses of text segments in these frozen files.

2. FROZEN EDIT FILES (Append-and-Read-Only). An edit change which a user makes in a current version is assigned an edit-op index and appended to the edit-op files.

3. CURRENT VERSION LIST(S) of documents. A version is ONE LIST OF ADDRESSES pointing to the frozen text files and links. The addresses of the text segments are the successive pieces of text in the text file, followed by the links which are contained in the document.

The sequence of the version list goes: TEXT SEGMENTS IN ORDER; LINKS IN ORDER OF THEIR IDs. (Since a link can connect to multiple spans of material, the link does not belong at a particular place inside the text, the way HTML tags do.)

4. LINKS. A link is TWO LISTS OF ADDRESSES pointing to the frozen text files or elsewhere. One list (the Left List) points to all the bytes on the left side of the link, the other list (the Right List) points to all the bytes on the right side of the link.

(5. LINK VERSIONS. If links are editable, the current version of a link is maintained like a document.)

(6. TRANSCLUSIONS are important, but they are not explicit in the data structure; a transclusion is the same element-addresses shared by two documents.)

(7. DOCUMENTS aren't really operative parts of the system. A "document" is a family of versions whose relation may be entirely in the mind of the author. However, a Version is an implementable structure.)

Let's look at these ideas a little more slowly.

Basic concept:
the VERSION.

The operative unit is the version, not the document.

Relationships among versions are handled the same way, regardless of whether they are versions of the same thing.

A version is typically virtual, a list of pieces of frozen material (or "Primedia."):

CONNECTIONS I:
HIGHLY GENERALIZED PLURALISTIC LINKS.

WHAT DO WE MEAN BY 'PLURALISTIC LINKS'?

We mean that everybody must be able to make first-rate links to anything.

WRONG APPROACHES: in HTML, only the author can create links. In some hypermedia packages, users are allowed to "create links" that are decidedly inferior in implementation to those provided by the author.

In the OSMIC design, THE AUTHOR'S LINKS ARE NO DIFFERENT FROM ANYONE ELSE'S IN IMPLEMENTATION (though superior in prestige and legitimacy.) For instance:

[pic to transfer to GIF: "AUTHOR'S LINKS (inside the document), OTHER PEOPLE'S LINKS (logically outside the document)"]

Note that all links are implemented identically and may be followed identically.

This is what we mean by Pluralism.

A link is a two-sided connection between sets of elements, which may be scattered all over the Internet. I call the following drawing the Trouser Visualization of a link. It is a drawing of an arbitrary link. It is drawn this way to emphasize that a link has two endsets, left and right, and to illustrate that an endset may be broken into separate pieces. In this simple drawing the right endset is broken into two sepearate sections, making the link look somewhat like a pair of trousers.

The link is

BUTTERFLY REPRESENTATION OF A LINK.

Trousers make a nice visualization; but how do we represent the link internally?

We represent a link as two lists, a Left List and a Right List, plus address & type:

The link has a number (or address) and a type (comment, illustration, disagreement, authorship...)

This, then, we may call the BUTTERFLY TROUSER LINK-- just to help us visualize the link in two ways. "Butterfly trouser link" is meant to be a term which reinforces understanding.

LINK OPERATIONS are nontrivial, requiring many address comparisons. For instance, to follow a link from Version A to Version B:

Find addresses in Version A contained in link Find addresses in Version B contained in link Display as appropriate.

CONNECTIONS II:
TRANSCLUSIONS.

ABOUT TRANSCLUSION. Always the heart of Xanadu, the transclusion principle means that conceptually there is only one copy of anything. Any piece of text, audio, etc., is treated as a manifestation of a cosmic original. (When you see a God or Saint in the road, is he a copy?)

Think of transclusion as a particular form of identic data relationship. There are various forms of identic data relationship, different in key details: copy, instance, counted reference, write-through cached copy, write-back cached copy... Transclusion is a the most general identic relation.

Transclusion in OSMIC is basically very simple:

Its benefits can include: UNRESTRICTED QUOTATION with automatic royalty paid by each downloader to each original publisher (the famous Xanadu payment model) transclusive COMPARISONS OF CONTEXTS, supporting DEEP INTERCOMPARISON transclusive EXCURSIONS (you go examining some other context awhile) transclusive ROTATIONS (you decide you like another context better) transclusive OVERVIEWS, OVERLAYS (pieces may be recomposited in views which lead back to the original contexts) etc.

BRANCHING UNDO

One of the special features of the Xanadu family of software has always been the idea of branching undo-- or the maintenance of as many versions as you like, with deep intercomparison.

The basic concept is simple. Each edit operation creates a new state, and the user may go back to any previous state and do some other edit operation. This is very easy to understand and implement if you just sort it out properly. Sorting it out begins with naming the operations, states and branches.

Many people suddenly understand from seeing the following diagram. As soon as we can name the states and define them, they are easy to reach.