This working document describes how the LII moves material from FFF to HTML. It
assumes basic familiarity with the functionality and terminology associated with both
platforms. It deals first with some problems that must be addressed and then outlines the LII process.
II. Some Problems
FOLIO Views is rich; HTML, lean, measured along several different dimensions. This
difference creates numerous challenges to conversion. Since a FOLIO Views infobase
has features that HTML does not support, the process of porting is a bit like moving a
rich wordprocessor file to ASCII. One can simply drop all infobase features that do not
have direct HTML equivalents, but a more effective translation includes finding proxies
for all important ones -- something like finding a way to show emphasis in an ASCII e-
mail message where the wordprocessor document would use bold, italics or underline.
These problems of translating and choosing between dropping or transposing can usefully be separated into three groups -- basic format, hypertext functionality, and data structure. In terms of the issues of porting they raise, these categories constitute an ascending scale of difficulty.
At the micro-level, character for character, in-line graphic by in-line graphic, HTML is capable of matching FFF (with the important exceptions of white space achieved by multiple spaces or tabs). (Non-ASCII characters, however, require translation to the appropriate HTML & sequence. The section and paragraph symbols important in law materials (§ and ¶ ) must be converted, for example, to § and ¶.)
Characters styles are, with HTML, limited to a relatively short list of physical types (most importantly <B>, <I>, <U>, <TT>) and some logical types (notably <PRE>, <EMP>, and <STRONG>). Conversion from FFF to HTML requires only that all font and other character designations accomplished at the character level or by means of a character style or associated with a paragraph, link, or level style be translated into this more limited set.
When it comes to paragraph formatting, HTML knows only different header levels, and <P>, <BR>, indented block quote, and a variety of list (and nested list) types.
Developing proxies for FFF styles is both here and with character style made much easier if all such formatting in FOLIO Views is accomplished through an organized and comprehensive set of styles. (To illustrate, LII infobases achieve bold or italics either by associating those styles with a level or through a character style and they implement a hierarchical indent structure by having a sequence of paragraph styles denominated "Text - Level 1", "Text - Level 2", and so on.)
Going to HTML the process must be reversed. What is in FOLIO Views a single infobase must be split back into separate files that are at least as small as those that would be used in a wordprocessing situation. In many cases, for reasons of client-server performance and appropriate indexing, they should be even smaller.
In taking a complex and coherent information collection and breaking it into a large number of fragments, the FFF to HTML converter must deal with the challenge of representing relationships among those fragments, a task that FOLIO Views performs dynamically.
The basic LII approach to HTML representation of infobase structure has three components:
The process outlined here is that employed by the LII in moving statutes and codes from FOLIO Views to HTML, other information collections will, no doubt, require significant adaptation.
The process that follows assumes that all records at the File level include at or near their beginning a jump destination that can be the root of their ultimate HTML file name.
All jump destinations below the File level that do not explicitly incorporate the root of the HTML file name are renamed to do so. This is with LII publications commonly the case for defined terms. In an LII infobase the definition of a word, "patentee", say will be a jump destination of that name. (HTML will need to know in which document that named spot lies.) To prepare for conversion, all such jump destinations are visited and renamed with a name that includes the name of the section in which the definition falls as it is expressed in jump destination terms, 35uscs156, say. An underbar separates the two elements, e.g., patentee_35uscs156. Jump destinations that begin with what is to be the HTML file name with an extension representing a subpart (often the case with a subsection, e.g., 35usc156(b) ) don't require any changes as long as the fsr is set to distinguish them from jumps to what will be the file level in HTML.
Here is an illustrative section of FFF prior to operation of the script, followed by the resulting output.
<RD:File><JL:section,15uscs1051><JD:15uscs1051>§ <FD:"section number">1051</FD:"section number">. Registration of trade-marks<EL> <RD:Subfile><JD:"15uscs1051(a)">(a) Trade-marks used in commerce. <RD><PS:"Text - Level 2">The owner of a trade-mark <JL:definition,"used in commerce">used in commerce<EL> may apply to register his or her trade-mark under this Act on the <JL:definition,"principal register">principal register<EL> hereby established: <HR><PS:"Text - Level 3">(1) By filing in the Patent and Trademark Office <HR><PS:"Text - Level 4">(A) a written application, in such form as may be prescribed by the <JL:definition,commissioner>Commissioner<EL>, verified by the <JL:definition,applicant>applicant<EL>, or by a member of the firm or an officer of the corporation or association applying, specifying applicant's domicile and citizenship, the date of applicant's first use of the mark, the date of applicant's first use of the mark in <JL:definition,commerce>commerce<EL>, the goods in connection with which the mark is used and the mode or manner in which the mark is used in connection with such goods, and including a statement to the effect that the <JL:definition,person>person<EL> making the verification believes himself, or the firm, corporation, or *** <RD:File> ***
chop_here="1051.html" <HTML> <H4><A NAME="1051">§ 1051.</A> Registration of trade-marks</H4> <UL><LI><B><A NAME="1051(a)">(a)</A> Trade-marks used in commerce.</B> <LI>The owner of a trade-mark <A HREF="1127.html#used in commerce_1127">used in commerce</A> may apply to register his or her trade-mark under this Act on the <A HREF="1127.html#principal register_1127">principal register</A> hereby established: <UL><LI>(1) By filing in the Patent and Trademark Office <UL><LI>(A) a written application, in such form as may be prescribed by the <A HREF="1127.html#commissioner_1127">Commissioner</A>, verified by the <A HREF="1127.html#applicant_1127">applicant</A>, or by a member of the firm or an officer of the corporation or association applying, specifying applicant's domicile and citizenship, the date of applicant's first use of the mark, the date of applicant's first use of the mark in <A HREF="1127.html#commerce_1127">commerce</A>, the goods in connection with which the mark is used and the mode or manner in which the mark is used in connection with such goods, and including a statement to the effect that the <A HREF="1127.html#person_1127">person</A> making the verification believes himself, or the firm, corporation *** </HTML>
Finally, a set of macros are used to load top and bottom matter templates (see attachments II and III) at the appropriate places of what will become HTML files and to fill their placeholders with the correct text and HREF references. This last step involves a series of forward and backward search and replace moves. For example, since each file will carry a link to the "previous" and "next" document, the template's references to "previous.html" and "next.html" are replaced by the actual file names associated with the adjacent material before and after.
Constructing the overview document comes last. Returning to the infobase a series of level queries are used to pull off ASCII files of the levels down through the HTML file level for its creation -- which at the moment is performed manually using a template document.
Two examples of LII infobases ported to HTML using this process can be viewed at:
http://www.law.cornell.edu/usc/35/i_iv/overview.html and http://www.law.cornell.edu/usc/15/22/overview.html