3 : How AscToHTM works

Documentation for the AscToHTM conversion utility

This documentation can be downloaded as part of the documentation set in .zip format (370k)

3 How AscToHTM works

3.1 The big assumption

AscToHTM makes one big assumption :-

Each text file has been laid out in a consistent manner by its author in a way that makes it easy for a human reader to understand.

Given this, AscToHTM tries to read the text file and mark it up in HTML accordingly. This is achieved by making three passes through the document, an analysis pass (see 3.2), a collating pass (see 3.3), and an output pass (see 3.4).

Note: Sadly this assumption is not always true

3.2 The analysis pass

During the analysis pass AscToHTM gathers together all the statistics that it needs to analyse how the author has laid out the file.

For example, the distribution of line indentations and line lengths is observed, together with the number and types of bullets, section headings and lots of other stuff.

Once this has been done, the program uses this data to determine the rules used by the author in structured their document. For example are the section headings underlined, capitalised or numbered? If numbered, what style of numbering is used, and by how many characters is each type of heading indented?

This information is then used to set the analysis polices (see the Policy manual) which may then be overridden by the user (to correct errors), or by loading a policy file with different values.

3.3 The collating pass

Having performed the analysis, the program makes a second "collating" pass. This is effectively a dry run for the output pass.

During this pass the program determines how the file will be output into one or more output files and where certain key in-line tags occur.

It also assembles any contents list.

This information is then used during the output pass to reduce the likelyhood of errors, and to ensure all internal hyperlinks are valid and will point to the correct anchor point in the correct output file.

3.4 The output pass

During the output pass AscToHTM

generates the HTML

and (optionally)

creates a suite of inter-linked HTML pages

creates a set of FRAMES to place the HTML pages into

copies the HTML to the Windows clipboard

generates a contents list

generates a directory page

3.4.1 Generating HTML

The HTML generated depends on

the original document, including any preprocessor tags placed in the source document.

the calculated document policy, modified by any user policies supplied

any HTML fragments that are defined

HTML markup produced describes the markup produced in more detail.

3.4.2 Generating a contents list

AscToHTM can detect the presence of a (numbered) contents list in the original document. Alternatively you can choose (see Contents generation policies) to have AscToHTM to generate a contents list for you, in which case any original list is omitted from the output HTML document.

Regardless of whether the original or generated contents list is used, AscToHTM will turn the contents list into hyperlinks that will take you to the correct HTML file and location.

There is a fuller discussion of contents lists. The policies that influence contents list production are listed in Contents generation policies, whilst the pre-processor commands are described in 7.1.3.

3.4.3 Splitting the document into many HTML pages

By default AscToHTM creates a single .HTML file. However, through file organisation document policies (see File generation policies) it is possible to

Split the document into a number of smaller .HTML files (see the policy "Split Level").

Insert standard JavaScript into the <HEAD> ... </HEAD> section of each page (see also the policy "HTML script file").

Add a HTML "header" to the top of each generated file (see also the policy "HTML header file")

Add a navigation bar at the foot of each page with links to the Next/Previous .HTML page and the contents list (see also the policy "Add navigation bar").

Add a HTML "footer" to the end of each generated page (see also the policy "HTML header file")

3.4.4 Generating a set of FRAMES

New in version 4

AscToHTM can place the HTML into a set of FRAMES. This is described fully in the chapter on Frames

3.4.5 Generating HTML for the Windows clipboard

New in version 4

The Windows version of the software can place the HTML generated into the clipboard, rather than outputting it into a file. This makes it easier to paste the HTML into another application (such as a HTML editor). When this code of conversion is selected, the <HEAD> and <BODY> tags are omitted from the output.

The use of the clipboard is made even more powerful if a clipboard extender such as ClipMate is used. See http://www.jafsoft.com/clipmate.html

Back to Contents List