Documentation for the AscToHTM conversion utility
This documentation can be downloaded as part of the documentation set in .zip format (370k)
AscToHTM is an ASCII to HTML conversion tool. It has, of course, been used to generate the HTML version of this document from the text file a2hdoco.txt (see an example conversion for more details).
The HTML version of this document is presented "as is". That is, no post-production of the HTML has occurred. This should give you a flavour of what AscToHTM is capable of.
Any RTF version of this document will have been made by AscToRTF, the sister product that shares the same text analysis engine.
AscToHTM is made available for download via the Internet from the download page.
AscToHTM is designed to analyse a document to determine its structure and layout. This analysis allows AscToHTM to decide how best to mark up the HTML so as to accurately represent the author's original meaning as far as possible.
This analysis helps AscToHTM to reduce errors by allowing it to spot anomalies in the document source. This is important in minimising the amount of any post-production work required to fix errors.
AscToHTM tries to create HTML that can be easily read and modified in an editor. This is useful if corrections are necessary, or further development is required.
For example AscToHTM
- produces short (usually <80 character) output lines
- attempts to indent the HTML to match the output indentation.
- adds comments to the HTML to indicate include files etc.
- uses <BLOCKQUOTE> tags for indentation, rather than placing the whole file in <TABLE>...</TABLE> tags.
- produces "clean" HTML without large numbers of unnecessary tags.
Note, later moves to make more standards-compliant and browser-compatible HTML code tend to work against making user-readable code. For example most browsers have rendering problems when newline characters are placed in certain key locations, whereas adding newline characters can make the HTML easier to read.
Inevitably users have supply additional information to tell AscToHTM where its analysis has gone wrong and to add additional information such as a document title etc. AscToHTM offers a large number of options (also known as "policies") that the user can modify.
Broadly speaking, these policies fall into two camps
- Analysis policies. These policies affect the way AscToHTM analyses your file, and can be used to disable searches for things like bullets, or to specify whether or not underlined headings are to be expected.
- Output policies. These policies influence the types of HTML markup that are produced. They also allow you add colour, headers footers, background images and much more to your pages.
AscToHTM can save your policies to a file, so that next time you run it you can load this information back from the "policy" file. This also allows you to create different sets of policies (e.g. to use different colour schemes).
Policies are described fully in the Policy manual.
You can further refine the conversion by placing special lines and tags into your source file. These are known as pre-processor commands (see Using the preprocessor) and in-line tags (see In-line tags).
The preprocessor tags are described fully in the Tag manual
To help users formulate and modify their document's policy, AscToHTM can be made to create an output policy file (see 18.104.22.168). Users can then simply edit this file and feed it back into the conversion process.
A summary of the recognised policy lines is given in the Policy manual.
Earlier versions of AscToHTM (before version 3.2) made no real attempt to be standards compliance. Now standards compliance is a stated goal or the program. Sadly I can't guarantee standards compliance because the HTML generation is so complex that errors can and do occur, but it is a goal, and usually documents will validate with few problems.
Compliance has proved to be vital to get cross-browser compatability, and to stand a chance of successfully applying CSS to created pages.
Original versions of AscToHTM were (loosely) targeted at producing HTML 3.2 code.
Currently the software is targeted at "HTML 4.0 Transitional", which allows CSS, but also permits <FONT> tags (although these are deprecated). This is a compromise standard that is best placed to be well viewed by V3 and V4 browsers.
Future versions of the program may attempt to generate stricter HTML 4.0 code, while still offering production of the earlier HTML standards.
The policy "HTML version to be targeted" offers some ability to choose the style of HTML generated.
- Placing text files quickly and easily on the web
Plain text is still a very popular data format. It is easy to generate, and easy to read. However text files when placed on the web don't look as nice as normal web pages. AscToHTM will allow you to quickly add the HTML markup required to turn a plain text page into a nice looking HTML page. Because it is an automated conversion it will save you time, and ensure you avoid typos in HTML tags that could stop the page displaying wrongly in some web browsers.
- Migration of "legacy" text to HTML.
Large amounts of unconverted text exist. As people plan to put this information on the Web, conversion to HTML will become necessary.
This can be a tedious and time-consuming task. AscToHTM will do much of the work for you.
AscToHTM is priced to be worth an hour of two of your time. This means that the "pay back" time is negligible (we only mention this in case you have bean-counters to convince :). If you don't think AscToHTM will save you hours, then by all means don't buy it.
- Facilitate mastering of HTML pages in ASCII
The HTML created by AscToHTM may not be as pretty or as clever as that generated by a full blown HTML editor (read as "bloated").
It'll be easier to write, edit and spell-check, and it may have a hyperlinked contents list generated.
- Automated conversions
AscToHTM can be used to automatically convert text documents that you receive. For this we usually suggest you run in command line mode.
- Conversion of reports to HTML
Many people have legacy systems that generate printed reports that may be saved to file. AscToHTM can help extend the lifetime of such systems by turning their output to HTML. It may be you'll need some help in getting the best results from the program in such cases, since many reports consist of complex tables.
- Conversion print spool files to HTML
Printer spool files are not strictly speaking plain text, but often - especially in older software systems - these files are plain text with a few printer controls added. Some users have had great success converting such files using asctohtm, and to support this we have added a limited ability to recognise and strip out Unix control characters, VT escape sequences and PCL printer codes. If you have a requirement in this area, contact the author at firstname.lastname@example.org to discuss whether the software can be made to meet your needs.
- Convert Word documents
Please note, AscToHTM DOES NOT convert Word's .doc or .rtf file formats.
AscToHTM was never intended to handle Word documents. We fully expect HTML export and import filters to appear (they have in Word '97), and we would advise anyone whose master document is in Word to search out these filters and give them a try.
That said... a lot of people seem unhappy with what's already available, and AscToHTM does a reasonable job if you save the file as text with line breaks, though obviously tables and figures will get lost (in the case of tables, because Word throws them away).
The main problem is that Word produces lousy looking text. This is one area where AscToHTM does a little better than "garbage in, garbage out"
- Pre-process text for import to Word.
(This is a bit cheeky, but does actually work.).
Use AscToHTM to convert text to HTML, then import this into your word processing package. Since the text analysis engine in AscToHTM out-performs that in Word in many respects (URL, table and heading detection to name but three), you can often get better results than importing from text direct..
That's because AscToHTM's analysis engine is smarter. That's not just our view (see http://www.jafsoft.com/asctohtm/reviews.html)
- The same text analysis engine is used in the text-to-RTF program AscToRTF, which is more suited to this purpose.
- Pre-process text for printing
Use AscToHTM to convert text to HTML, then print the file from within Netscape or whatever. The result is a much nicer looking document with fonts'n'stuff.
- Add hyperlinks to fairly ordinary pages.
AscToHTM has a "link dictionary" feature that can be used to add hyperlinks to any word or phrase (see the Policy manual).
This can greatly enhance an otherwise dull set of text pages.
a single text file by
© 1997-2001 John A Fotheringham