Documentation for the AscToRTF conversion utility : How it works

Documentation for the AscToRTF conversion utility

The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html

How it works

AscToRTF analyses your document, looking at how the text is laid out, and trying to identify and quantify the rules used by the author to format the document. These rules are then used to set "policies" that determine how each part of the document should be interpreted. These policies are then used during the output pass to decide how the output document should be formatted.

The user can choose to manually set "Policies", thereby overriding the software's analysis, and may additionally set some policies that only apply to the output pass (such as which fonts should be used). These manual options may be saved in a policy file and reloaded the next time. Different policy files may be created for different document sets, or for different types of output.

For example analysis might determine that a large number of lines appear to be "underlined", and that these may be headings. Having made this decision, lines that are underlined will become headings, while those that are numbered or capitalised may not. If this is the wrong decision, the user can disable the use of underlined headings via a policy file, and even choose to recognize capitalised headings instead should they wish.

Contents of this section

Assumptions made by the program
The analysis pass
The collating pass
The output pass

Assumptions made by the program

AscToRTF makes one big assumption :-

Each text file has been laid out in a consistent manner by its author in a way that makes it easy for a human reader to understand

Given this, AscToRTF tries to read the text file and mark it up in RTF accordingly. This is achieved by making three passes through the document, an analysis pass, a collating pass, and an output pass.

Note: Sadly this assumption is not always true :(

The analysis pass

During the analysis pass AscToRTF gathers together all the statistics that it needs to analyse how the author has laid out the file.

For example, the distribution of line indentations and line lengths is observed, together with the number and types of bullets, section headings and lots of other stuff.

Once this has been done, the program uses this data to determine how the author has structured the document. For example are the section headings underlined, capitalised or numbered? If numbered, what style of numbering is used, and at what level of indentation is the heading placed?

This information is then used to set the analysis polices (see the Policy manual) which may then be overridden by the user, or by loading a policy file with different values.

The collating pass

Having performed the analysis, the program makes a second "collating" pass. This is effectively a dry run for the output pass.

During this pass the program determines how the file will be output, what headings there are and where certain key in-line tags occur.

It also assembles any contents list.

This information is then used during the output pass to reduce the likelyhood of errors, and to ensure all internal hyperlinks are valid and will point to the correct file location.

The output pass

During the output pass AscToRTF generates the RTF file (there's nothing like stating the obvious :-)

The RTF generated depends only on the original document, the calculated document policy, and any user policies supplied.

Understanding the RTF generated describes the markup produced in more detail.

Back to Contents List