$_$_TITLE The JafSoft text conversion FAQ $_$_CHANGE_POLICY document subject : The JafSoft text conversion FAQ $_$_CHANGE_POLICY Expect contents list : No $_$_CHANGE_POLICY background colour : ffefff $_$_CHANGE_POLICY headings colour : dd00dd $_$_CHANGE_POLICY LINK Definition : "[a2h man]" = "AscToHTM Manual" + "http://www.jafsoft.com/doco/a2hdoco.html" $_$_CHANGE_POLICY LINK Definition : "[a2r man]" = "AscToRTF Manual" + "http://www.jafsoft.com/doco/a2rdoco.html" $_$_CHANGE_POLICY LINK Definition : "[pol man]" = "Policy Manual" + "http://www.jafsoft.com/doco/policy_manual.html" $_$_CHANGE_POLICY LINK Definition : "[tag man]" = "Tag Manual" + "http://www.jafsoft.com/doco/tag_manual.html" $_$_CONTENTS_LIST $_$_BEGIN_IGNORE ** Master copy on VMS ** $_$_END_IGNORE 1.0 Introduction ================ This FAQ is clearly a work in progress. Many of the subjects have no answers as yet. Nevertheless I intend fleshing this out as and when I get time, and I welcome new questions (or prompts to write the answers to questions listed here) for all users. Direct all correspondence to info@jafsoft.com 1.1 Document conventions Often the answer to a question involves setting a policy value (see the [Pol Man] for more about policy files). The policy involved will be displayed as : The is the text that will appear in the policy file. This must be *exactly* as shown, no variability in the spelling will be tolerated by the program. If you misspell the policy text (or if it's been changed in a new version), the program will complain that it doesn't recognize the policy. In addition to adding lines to your policy file by hand, the Windows version allows *most* (not all) policies to be set via property sheets. You'll need to locate the equivalent policy on property sheets. 1.2 Finding JafSoft software on the web 1.2.1 The home page Currently http://www.jafsoft.com/. Each product has it's own page, e.g. http://www.jafsoft.com/asctohtm/ http://www.jafsoft.com/asctortf/ http://www.jafsoft.com/asctotab/ http://www.jafsoft.com/addlinx/ These are are listed on the products page http://www.jafsoft.com/products/ There is also a .co.uk mirror site. 1.2.2 Online documentation Currently http://www.jafsoft.com/doco/docindex.html. Documentation is usually included with all downloads, either as HTML or as ready-to-convert text. In Windows this will usually be found in the folder c:\Program Files\JafSoft\AscToHTM Documentation available includes : - [a2h man]. Describes the text-to-HTML converter AscToHTM - [a2r man]. Describes the text-to-RTF converter AscToRTF - [pol man]. Describes the use of policy files by the software - [tag man]. Describes the use of a preprocessor and tagging system by the software - This FAQ. If you plan to read one or more of these manuals you'd be best advised to download one of the documentation .zip files. 1.2.3 Keeping track of updates There are update pages at http://www.jafsoft.com/asctohtm/updates.html and http://www.jafsoft.com/asctortf/updates.html Registered users get update notifications by mail. To date all updates have been free to registered users, but we can't guarantee that will always be the case. 1.2.4 Who is the author? 1.2.4.1 John A Fotheringham That's me that is. The program is wholly the responsibility of John A Fotheringham, who maintains it in his spare time. He doesn't make a living from it (in case you were wondering). 1.2.4.2 JafSoft Limited Although authoring shareware doesn't earn enough that I can give up my day job, I have created a separate company to handle AscToHTM, AscToRTF and all the shareware and other services I have to offer. The company is called JafSoft Limited, and the web site is http://www.jafsoft.com/ 1.2.4.3 Contacting the author Correspondence should be via email to support@jafsoft.com. Priority is given to registered users and people who want to pay for development [ :) ], however all correspondence will be answered. 1.2.5 Reporting errors and bugs Despite the best of intentions, bugs do happen, and we're always grateful for anyone who takes the time to report them to us. Please feel free to report all errors and bugs to support@jafsoft.com. When you do so please include - a clear description of the problem - which version of the software you are using - a copy of the offending source file (if not too large <50k) - a copy of any policy file being used. - a copy of any .log file generated (save the status messages to file) Please keep any source files small. If the source file is large, try to generate a smaller file that exhibits the same problem. 1.2.6 Requesting changes to the software Feel free to send suggestions for enhancements/changes to support@jafsoft.com. A surprising number of features have been added this way although, naturally, I'm happy for people to think these were all my own ideas. Minor changes may slip into the next release if I think they enhance the product. Major changes to the software can be undertaken on a commercial basis by contracting my services from Yezerski Roper Ltd. This option is not for the faint hearted. Don't let the software's $40 price tag persuade you that that's anything but a bargain, my hourly rate is more than that amount, although I can do quite a lot in one hour :-) 1.3 Registration and updates 1.3.1 Registration Registration can be completed online by visiting http://www.jafsoft.com/asctohtm/register_online.html or [[BR]] http://www.jafsoft.com/asctortf/register_asctortf.html Registration is usually completed via a third party registration service (I use a couple) and an on-line download. The registration service will take your payment and then send you download instructions for a fully registered copy. The registration companies can accept payments using a number of methods, but the commonest is credit card. We do not ship software on media at this time. We'd have to double the price and stop our free upgrade policy if we did. That said, one of the registration companies will put the software onto CD and ship it to you for an extra charge. As yet I haven't set this up, but if interested email support@jafsoft.com with details. 1.3.2 Update policy To date all updates have been free to registered users. This has been true for both minor and major updates. Over time the price of the software has risen, but no-one has ever had to pay extra. I'd like to continue this policy, but I'm unable to actually guarantee this, especially since I've discovered old registered versions circulating on the Net. 1.4 Other related products by the same author 1.4.1 AscToTab [AscToTab] is a subset of AscToHTM which is dedicated to creating tables from plain text and tab-delimited source files. The software is offered as freeware under Windows and OpenVMS. 1.4.2 AscToRTF AscToRTF is a text-to-RTF converter which uses the same analysis engine as AscToHTM, but which creates Rich Text Format (RTF) files instead. RTF is a format better suited for import into Word and other word processors. [AscToRTF] was released early in 2000 and has received a number of 5-star reviews. 1.4.3 AddLinx A registered user (see [[GOTO requesting changes to the software]]) contacted me and asked if I had a program that could add hyperlinks to an *existing* HTML file. At the time I didn't, but on examining the software it seemed I had all the bits and pieces necessary to construct such a tool. Within 24 hours I sent him a first attempt at such a utility, and within a few weeks [AddLinx] was born. It's a very rough utility that I haven't spent much time on. It's available as postcard ware. 1.4.4 API versions of the software For those wanting to programmatically integrate the conversion software into their own products, an API has been produced and is available under license. AscToHTM and AscToRTF are written in C++, and an API is available which provides a C++ header file defining the functions available. The software is then provided as a Windows library to be linked against. In the past clients have successfully integrated this with their Java software, on Windows, Linux and Solaris platforms. Although I'm not a Visual Basic programmer myself I'm less sure of how the software could be integrated with VB, although I presume this can be done. Contact sales@jafsoft.com if interested. 1.4.5 Linux versions of the software Linux versions of all programs are planned. The core conversion software is developed as a command line utility, and in this form it ports to Linux reasonably easily. I plan to offer AscToHTM and AsctoRTF as Linux shareware in the near future. 1.5 Document conversion consultancy 1.5.1 Do you offer consultancy? We always like to offer a little help to users just starting out. Once you register you are free to send a typical sample file to the author, who will offer some advice on problems you might encounter and policies you may use. However, for people wanting to do larger conversions (see 3.1.9) or wanting significant amounts of our time, you will need to buy assistance at consultancy rates. Regrettably this is not cheap, although we feel it's good value for money :) Contact sales@jafsoft.com with details. See also [[GOTO requesting changes to the software]] 1.6 Y2K Compliance From time to time I get asked if my products are Y2K compliant. The short answer is "yes it was" :-) 1.7 Status of this FAQ Clearly it's not finished yet. You might even say it's "under construction" :) I've decided to put this on the web in "unfinished" form so that it may be of *some* benefit to people as soon as possible. If you've a particularly urgent need for a question to be answered contact support@jafsoft.com, and don't be surprised if your answer ends up in this document. 2.0 Getting the best results ============================ 2.1 General 2.1.1 Three words: consistency, consistency, consistency The software works by analysing your document to determine what "rules" you've used for laying out your file. On the output pass these "rules" are used to determine how to categorize each line, and inconsistencies can lead to lines being wrongly treated because they "fail to obey the rules". You can greatly help this analysis by being consistent in your formatting. Many of the decisions the software makes can be overridden by changing the "analysis policies" (see [[GOTO using policy files]]), but if this becomes necessary it can quickly become hard work (if only because you need to familiarize yourself with these policies), so it's better to avoid this if possible. If you're writing a document with text conversion in mind, bear in mind the following - *use of white space* (see [[GOTO white space is your friend]]). In general white space can be used to separate paragraphs, tables and diagrams from normal text and columns of data from each other inside tables. The software *likes* white space :) - *use of tabs*. The software will convert all tabs to spaces on input, assuming that one tab = 8 spaces. This will work fine provided this tab size is correct, or your use of tabs and spaces is consistent. It may not work otherwise, in which case you'll need to tell the software what your tab size is via an analysis policy. - *use of indentation*. The software will calculate the pattern of indentation used in your file, and will output text accordingly. If your use of indentation is inconsistent, then paragraphs will be wrongly broken and headings may not be correctly recognized. - *use of numbering*. The software can spot numbered headings and numbered lists. To avoid confusing the two, the indentation of a given type of heading is tested, together with the numbering sequence. The software can tolerate small gaps in numbering, but large gaps will confuse it. - *use of line lengths*. The software will attempt to determine your "page width" and text justification. These are then used to spot short lines (which get a
added) and centred text. The centred text algorithm has problems and so is disabled by default. Try to avoid really long lines, or highly variable line lengths. If you don't, the software is liable to insert
where you don't want them, unless you set the "page width" and "short line length" analysis policies to correct this behaviour. - *avoid confusing the program*. Numbered lists inside numbered sections all at the same level of indentation is a good example. The numbers become ambiguous and errors start to occur. 2.1.2 Make sure your files are "line-orientated" The software reads files line-by-line. On the first pass it will analyse the distribution of line lengths to determine the "page width" of your file. This in turn is used to detect certain features such as centred text and "short lines". Some files, especially those created on PC, do not include line breaks, instead they only have a single break after each paragraph of text. Whilst not a problem in itself, it does somewhat handicap the software's ability to analyse the file. Where possible, you should attempt to save files "with line breaks" to give the software the best chance of understanding how your file is laid out. 2.1.3 Make sure your use of tabs is consistent The software converts all tabs in your source document on the assumption that one tab equals 8 spaces. In fact, the actual tab size is irrelevant provided your use of tabs and spaces is consistent. If it isn't, you may find tables aren't being analysed correctly. You can set the actual Tab Size used in your documents vie the policy line Tab Size: n where n is the number of spaces per tab. 2.1.4 White space is your friend The software attempts to categorize each line into one of a number of types (e.g. heading, bullet point, part of a table etc). Often this analysis is influenced by adjacent lines. For example a line of minus signs can be interpreted as "underlining" a heading, or perhaps as part of a table or diagram. Confusion can occur where different features are close to each other (e.g. an underlined heading immediately followed by a table). In most cases the ambiguity can be reduced or eliminated by adding 1 or 2 blank lines between the objects being confused. The same argument applies to table columns. If two columns get merged together, try increasing the "white space" between by moving them apart. In almost all situations, adding white space to your document will help reduce the likelyhood of analysis errors. 2.1.5 Use a simple numbering system I've seen documents with section numbers like "Section II-3.b". I'm sorry, but at present the software can't recognise such an exotic numbering system. Equally it can't cope Appendices line A-1 etc. If possible, change your section numbers to numbers (like this document). The software will understand that much better. From version 4 onwards, there is the ability to recognise headings that start with the same word or phrase (such as Chapter, Appendix, Section etc), so this may offer a solution to you. 2.1.6 Save policies into a policy file The program offers a large number of "policies" to customize the conversion. These policies can be saved in a "policy file", which is simply an ordinary text file (which you may edit by hand if you like). By saving policies into files, you can reload these files the next time you do a conversion, which means you won't need to adjust all the settings again. You can create multiple policy file for different conversions or conversion types. Policy files are described at length in the [Pol man]. 2.1.7 Add preprocessor commands to your source file The program has it's own built-in preprocessor. This allows you to add special "directives" and "tags" into your source file which tell the program to perform special functions. Examples include the addition of include files into the source, the insertion of contents lists, adding hyperlinks to sections and much much more. An example is the following hyperlink, whereby [[OT]]GOTO Using preprocessor commands[[CT]] is used to provide the link to the named section. For more details see [[GOTO using preprocessor commands]] The preprocessor is described at length in the [Tag man]. 2.2 Using policy files 2.2.1 Saving "incremental" policies When you choose to save your policies to file you will be asked whether you want to save "incremental" policies, or "all" policies. "Incremental" means only those policies loaded from file, or manually adjusted will be written to file. This is recommended as it leaves the program free to make all other adjustments itself. "All" means that all policies will be written to file. This is useful if you want to document or review the policies used, but it is less useful if you want to reload this policy file, as it will fully constrain the program's behaviour. While this may not be a problem when reconverting the same file, it may well be unsuitable when converting new files. 2.2.2 Editing policy files by hand Policy file are just text files with a ".pol" extension. If you think of them like the old Windows .ini files you'll get the idea. This has been done deliberately so that these files can be manually edited in a normal text editor. OpenVMS users actually have no other way of creating policy file, but Windows users can change most (but not all) policies via the GUI. However I recommend that anyone who comes to regard themselves as a "power" user learns how to edit these files. The policy file consists of one policy per line, usually in the form : e.g. Document title : Here's my favourite URLs When entering policy lines you must use the *exact* indicated in the documentation for the policy to be recognized. If I've misspelt anything then tough, you'll have to follow it (but tell me anyway). The one exception to this rule is I've allowed both British and American spelling of colour/color. The allowed will vary from policy to policy. Most policy lines accept a value of "(none)" effectively negating that policy. The order of lines in the file is largely unimportant. If you're editing a .pol file generated by the program (see [[GOTO generate a .pol file]]) then you'll notice section headings of the form [Hyperlinks] These are purely decorative. That is, they have no significance, and you can ignore them and move the policy lines around, there's no concept of having to place policy lines in the "right" section. As new versions of the software are released policies are moved from one section to another as different grouping expand and appear. As explained above, this usually has no effect on the validity of the .pol file. 2.2.3 Using include files in policy files Policy files may include other policy file as follows include file : ..\policies\Other_policy_file.pol This can be useful if you have multiple policy files but want certain features to be the same. For example I use this to introduce the same link dictionary commands into all my policy file. You could equally put all your colour policies into one file. The "include file" line will have to be manually edited into the .pol file using a text editor.... there is no support currently for setting this via the program itself. NOTE: If you "save" a policy file that has been loaded, then the include file structure will be lost, and all the policies will be output into a single file. 2.2.4 Using a default policy You can make the program use the same policies by default each time it runs. To do this select the policies you want, and then save these to a policy file. Next select the _Settings->Use of Policy Files_ menu option. Check the "Use a default" flag, and select the file you just created. Next time you run the program these policies will be loaded and used for your conversions. Note, you can still reset the policies or load a different file using the options on the Conversion options menu. To stop using a default just clear the "Use a default" flag (you don't need to clear the policy file name). 2.3 Using preprocessor commands 2.3.1 What is the preprocessor? The program has a built-in preprocessor. This will recognize special commands inserted into the source file. These commands can be used to correct analysis errors (e.g. to correctly delimit a table), or to add to the output. For example the TIMESTAMP tag can cause the text "Converted on [[OT]]TIMESTAMP[[CT]]" to be output as "Converted on [[TIMESTAMP]]"). preprocessor commands are of two types *Directives*. These begin with "$_$_" and must be on a line by themselves with the "$_$_" being at the start of the line (i.e. there can be no leading spaces). *Tags*. These take the form [[OT]]TAG [[CT]] and may occur anywhere within your text, but cannot be split over two lines. Some commands may be expressed as either directives or tags. A [Tag man] is also available. 2.3.2 Delimiting tables, diagrams etc The program will attempt to detect tables and diagrams, but sometimes it gets the wrong range for the table, and also diagrams may be interpreted as tables and vice versa. To correct such mistakes, you can bracket the source lines as follows :- $_$_BEGIN_PRE $_$_BEGIN_TABLE ... $_$_END_TABLE $_$_END_PRE or $_$_BEGIN_PRE $_$_BEGIN_DIAGRAM ... $_$_END_DIAGRAM $_$_END_PRE 2.3.3 How do I add my own HTML to the file? You can embed raw HTML in your text file in one of three ways using the preprocessor a) Insert a one-line HTML as follows $_$_HTML_LINE The HTML_LINE and it's arguments must all be on one line. b) Insert a HTML tag as follows [[OT]]HTML [[CT]] The HTML tag must all be on one line. c) insert a section of HTML between two directive lines $_$_BEGIN_PRE $_$_BEGIN_HTML ... lines of HTML, e.g. custom artwork or tables ... $_$_END_HTML $_$_END_PRE For example to enter a anchor point in your text so that you can link to it try $_$_BEGIN_PRE $_$_HTML_LINE $_$_END_PRE To embed an image with a hyperlink you might try $_$_BEGIN_PRE $_$_BEGIN_HTML AscToHTM home page $_$_END_HTML $_$_END_PRE $_$_BEGIN_HTML AscToHTM home page $_$_END_HTML The "$_$_" has to be at the beginning of the line, i.e. not indented as I've shown above. If you look at the program's HTML documentation, and the text used to create it you'll see examples of this and other preprocessors. Indeed if you look at the [[SOURCE_FILE]] for this document you'll see that's exactly how the image on the right was added to *this* document. Future versions of the software will introduce in-line tagging so you can do place LINKPOINTs anywhere in your text. Check your program's documentation for details. 2.3.4 Using standard include files The preprocessor command INCLUDE can be used to include standard pieces of text into your source files. For example $_$_INCLUDE ..\data\footer.inc will include the file "footer.inc" into your source file at this location. Note that the path given must be correct relative to the source file being converted. The contents of the include file simply get "read into" the source. As such they get included in the analysis of the whole document. Include files can be useful to include standard disclaimers or navigation bars to all your pages. For example you could embed HTML to link back to your home page (see [[GOTO how do I add my own HTML to the file?]]) Of course the same effect could be achieved by using a HTML footer file (see [[GOTO adding headers and footers]]) or by defining a HTML fragment called HTML_FOOTER (see [[GOTO customizing the HTML created by the software]]). 2.3.5 Adding Title, keywords etc If you want to add title, keywords and descriptions to your HTML you can do this by embedding special commands in the source file as follows $_$_BEGIN_PRE $_$_TITLE This is the title of my HTML page $_$_DESCRIPTION This page is a wonderful page that everyone should visit $_$_KEYWORDS wonderful, web, page, full, of keywords, that $_$_KEYWORDS everyone, will, want, to search, for $_$_END_PRE The "$_$_" must be the first characters on the line. You can spread the keywords and description over several lines by adding extra $_$_KEYWORD and $_$_DESCRIPTION lines. Note: Most of these commands have equivalent policies, allowing you to set title etc through an external policy file should you prefer. 2.3.6 Adjusting policies for individual files or parts of files You can, if you wish, create one policy file for each file being converted, however this is liable to become a maintenance nightmare. If you don't want to maintain multiple policy files, or if you simply want to adjust a few policies for a given source file, you can use the $_$_CHANGE_POLICY command. The effect will vary according to the type and position of the command. Some policies will affect the whole document, others will only affect the document from that point onwards... it depends on the nature of the particular policy. See the [pol man] for details. For example placing $_$_CHANGE_POLICY background colour : #FF0000 $_$_CHANGE_POLICY text colour : White will change the document background colour to be red, and the text to be white throughout the whole document. 2.4 Making the program run faster You can make the program run faster in a number of ways by disabling features that you know you don't want. 2.4.1 Review the "look for" options As of V3.1, AscToHTM has a number of "look for" options, stating what the program is looking for. Disable the ones you don't want, although most of them will not make a major difference to the program speed. 2.4.2 Don't convert URLs Probably the single most expensive function is the search for URLs to convert into hyperlinks. Every word (and every word fragment) has to be checked individually. The problem isn't helped by having to distinguish URLs with commas in them from comma separated lists of URLs. If you know your document has *no* URLs to be converted, disable this feature and watch the software run 10-20% faster. However this is one feature of the software that people like most. 2.4.3 Don't generate tables The software will attempt to convert regions of pre-formatted text into tables. This can take a lot of analysis even if eventually it decides "it's not a table after all!". This only comes into effect if the program detects preformatted text, so you should only disable this feature if your pre-formatted text is largely non-tabular. If that's the case you probably want to disable this anyway as the tables created may be inappropriate. 3.0 Conversion Questions ======================== 3.1 General 3.1.1 How do I get rid of the "nag" lines? Easy. You register the software (see [[GOTO registration and updates]]), or you remove them by hand. "Nag" lines only appear in unregistered trial copies of the software. If you register, these are removed. 3.1.2 Why doesn't it convert Word/Wordperfect/RTF/my favourite wp documents? Because it wasn't designed to. The software is designed to convert *ASCII* text into HTML. That is plain, unformatted documents. Word and other wp packages use binary formats that contain formatting codes embedded in the text (or in some cases the text is embedded in the codes :-). Even RTF, which is a text file, is so heavily full of formatting information that it could not be perceived as normal text (look at it in Notepad and you'll soon see what I mean). Why the omission? Well, like I said, that was *never* the intention of this program. I always took the view that, in time, the authors of those wp packages would introduce "export as HTML" options that would preserve all the formatting, and in general this is what has happened. To my mind writing such a program is "easy". My software tackles the much more difficult task of inferring structure where none is explicitly marked. In other words trying to "read" a plain text file and to determine the structure intended by the author. 3.1.3 My file has had it's case changed and letters replaced at random by numbers. How do I fix that? Easy. You register the software (see [[GOTO registration and updates]]). The case is only adjusted in unregistered trial copies of the software, either after the line limit is reached, or after the 30 day trial has expired. The case is adjusted so that you can still evaluate the conversion has produced the right type of HTML, but since the text is now all in the wrong case and had letters substitutes the HTML is of little use to you. This is intended as an incentive to register. That said, you *will* find pages on the web that have been converted in this manner. 3.1.4 Why do I sometimes get
markup? How do I stop it? The program is detecting a "definition". Definitions are usually keywords with a following colon ":" or hyphen "-", e.g. "text:" You can see this more easily if you go to Output-Style and toggle the "highlight definition term" option... the definition term (to the left of the definition character) is then highlighted in bold. If the definition spreads over 2 "lines", then a definition paragraph is created, giving the effect you see. If you have created your file using an editor that doesn't output line breaks then only long paragraphs will appear to the program as 2 or more "lines". In such cases only the longer paragraphs will be detected as "definition paragraphs", the rest are detected as "definition lines", even though they're displayed in a browser as many lines. If you view the file in NotePad you'll see how the program sees it. To stop this you have two options. (1) _Analysis policies -> Analysis -> recognize colon (:) characters_ switch this off. This will stop anything being recognized as a definition. (2) _Output policies -> Style -> Use
markup for paragraphs_ disable this. The definitions will still be recognized, but the
markup won't be used. 3.1.5 Why are some of my words being broken in two? Sometimes AscToHTM will produce HTML with words broken - usually over two lines. This can happen if your text file has been edited using a program (like NotePad) that doesn't place line breaks in the output. AscToHTM is line-orientated (see 2.1.2). Programs like NotePad place an entire paragraph on a single "line", or on lines of a fixed length (e.g. 1000 characters). AscToHTM places an implicit space at the end of each line it reads. This is to ensure you don't get the word at the end of one line merged with that at the start of the next. However, in files with fixed length "lines", large paragraphs will be broken arbitrarily, with the result that a space (and possibly a
) will be inserted into the middle of a word. You can avoid this by breaking your text into smaller paragraphs, passing your file through an editor that wraps differently prior to conversion, or selecting any "save with line breaks" option you have. 3.1.6 Why am I getting line breaks in the middle of my text? The software will add a line break to "short" lines, or - sometimes - to lines with hyperlinks in them. You can edit your text to prevent the line being short, or you can use policies to alter the calculation of short lines. Use the [Pol man] to read about the following policies - Add
to lines with URLs - Look for short lines - Short line length - Page Width 3.1.7 Why isn't the software preserving my line structure? Do you mean line structure, or do you really mean paragraph structure? The program looks for "short lines". Short lines can mark the last line in a paragraph, but more usually indicate an intentionally short line. The calculation of what is a short line and what isn't can be complex, as it depends on the length of the line, compared to the estimate with of the page. You have a number of options :- - disable the search for short lines using the _Analysis policies -> What to look for_ tab - explicitly set the page width and/or short line length using the _Analysis policies -> analysis_ tab. If you really want the line structure (as opposed to the paragraph structure) preserving, look at the line and file structure policies under _Output -> File generation_ See also [[GOTO how do I preserve one URL per line?]] 3.1.8 Why am I getting lots of white space? Usually because you had lots of white space in your original document. If that is the case, then you can set the policy Ignore multiple blank lines : Yes to reduce this effect. Some people complain that there are blank lines between paragraphs, or between changes in indentation. Often this is the vertical spacing inserted by default in HTML. This can only be controlled in later versions of HTML which support [[TEXT HTML 4.0]] and Cascading Style Sheets (CSS) Occasionally certain combinations of features lead to an extra line of space. 3.1.9 What's the largest file anyone's ever converted with AscToHTM? Well, at time of writing, I know of a 56,000 line file (3Mb) which was converted into a single (4Mb) HTML file. Of course, it was also converted in a suite of 300 smaller, linked, files weighing in at 5Mb of HTML. This file represented 1,100 pages when printed out. I *do* sometimes wonder if anyone ever reads files that big though. 3.1.10 Does the software support hebrew letters / Japanese / Right to Left Alignment ? The short answer is "no". Certainly the software has no ability to *understand* documents written this way. The software is designed to cope with the ASCII character set, and these all use an alternative character set. Since the program will convert character codes outside the printable range into HTML entities, it's likely that the software will mis-interpret the input characters and convert them into HTML entities that will be wrong for the initial file type. The best you could ever hope for is that the software would leave any special characters alone. To this end the software does have a policy "input file contains Japanese characters" which can be used to warn that the file contains oriental characters (not just Japanese), and that therefore these characters should be left unmolested. To date this function is largely untested. 3.1.11 Why does the program hang after a conversion? Under Windows the software usually tries to display the results files in your browser or viewer of choice. To prevent multiple instances of the browser being launched, DDE is used. DDE is a Windows mechanism that allow requests to be passed from one program to another, in this case the software is asking the browser to display the HTML just created. Some users have reported problems with DDE. When this occurs any program - including AscToHTM - will hang whenever it attempts to use DDE. When this happens you will need to use the Task Manager to kill the program. You can solve this problem by using the _Settings -> Viewers for results_ to disable the use of DDE. From version 4 onwards the software will detect when this has happened, and will disable DDE next time it is run. Note, this is a workaround and not a solution. When DDE stops working on your system other programs sill have problems, e.g. when you click on a hyperlink inside your email client. Sadly I don't know a solution for the DDE problem. Sometimes rebooting helps, sometimes stopping applications helps. Sometimes it doesn't. :-( 3.1.12 How can I use DDE with Netscape 6.0? You can't. Unlike Netscape versions up to and including 4.7, Netscape 6.0 doesn't support DDE ini its initial release under Windows. 3.1.13 Can I use AscToHTM to build a web site with a shopping cart? By itself, no. AscToHTM can only really produce relatively "static", mostly-text web pages. To add any dynamic contents and graphics you'd effectively need to add the relevant HTML yourself, so the answer is essentially "no". Adding a shopping cart is actually fairly tricky. You either have to install the software yourself, or sign up with an ISP that will do this for you. Most such systems require a database (of items being sold). Having not dealt much with such systems myself I can't really advice on a *web authoring* tool (which is what AscToHTM is) that would integrate seamlessly with a shopping cart system. My advice would be to identify an ISP that offers shopping cart functionality and see what methods they offer for web authoring. I wish you luck. 3.2 Tables 3.2.1 How does the program detect and analyse tables? Here's an overview of how the software works, this'll give you a flavour of the issues that need to be addressed. The software first looks for pre-formatted regions of text. It does this by 1) Spotting lines that are clearly formatted, looking for large white space and any table-like characters like '|' and '+'. If may also look for code-like lines and diagram-like lines according to the policies set. 2) Each time a heavily formatted line is encountered and attempt is made to extend the preformatted region by "rolling it out" to adjacent, not so clearly formatted lines 3) This "roll out" process is stopped whenever it encounters a line that is clearly not part of the formatted region. This might be a section heading or a set of multiple blank lines (the default is 2). Once a preformatted region is identified, analysis is performed to see whether this is a table, diagram, code sample or something else. This decision depends on 4) The mix of "graphics" characters as opposed to "text" characters 5) The presence of "code-like" indicators like curly brackets, semi-colons and "++" and other special character sequences. 6) How well the data can be fitted into columns of a table (below) If nothing fits then this text is output "as normal", expect that the line structure is preserved to hopefully retain the original meaning. If the software decides a table is possible, it 7) Characterizes the contents of each character position. So for example a character position that contains mostly blank characters on each line is a good candidate for a column boundary. 8) Infers from the character positions the likely column boundaries Once a tentative set of column boundaries has been identified, the following steps are repeated 9) Place all text into cells using the current column boundaries 10) Measure how "good a fit" the text is to the columns, looking for values that span column boundaries, or columns that are mostly "empty" 11) Eliminate any apparently "spurious" columns. For example "empty" columns may get merged with their neighbours. Finally, having settled on a column structure the software 12) Tries to identify the table header, preferably by detecting a horizontal line near the top of the table. 13) Tries to work out column alignments etc. If the cell contents are numeric the cell will be right aligned, otherwise the placement of the text compared to the detected boundaries will be observed 14) Identifies how many lines goes into each row. If blank lines or horizontal rules are present, these may be taken as row boundaries. 15) places all text into cells, using the configuration found. Naturally any one of these steps can go wrong, leading to less than perfect results. The program has mechanisms (via policies and preprocessor commands) to a) Influence the attempt to look for tables b) Influence the attempt to extend tables (steps (1)-(3)) c) Influence the decision as to what a preformatted region is (steps (4)-(6)) d) Influence the column analysis (steps (7)-(11)) e) Influence the header size and column alignment (steps (12)-(15)) 3.2.2 Why am I getting tables? How do I stop it? The software will attempt to detect regions of "pre-formatted" text. Once detected it will attempt to place such regions in tables, or if that fails sometimes in
...
markup. Lines with lots of horizontal white space or "table characters" (such as "|". "-". "+") are all candidates for being pre-formatted, especially where several of these lines occur. This often causes people's .sigs from email to be placed in a table-like structure. You can alter whether or not a series of lines is detected as preformatted with the policies Look for preformatted text : No Minimum automatic
 size      : 4

The first disables the search for pre-formatted text completely.  The second 
policy states that only groups of 4 or more lines may be regarded as 
preformatted.  That would prevent most 3-line .sigs being treated that way.

If you have pre-formatted text, but don't want it placed in tables (either
because it's not tabular, or because the software doesn't get the table analysis
quite right), you can prevent pre-formatted regions being placed in tables via
the policy

      Attempt TABLE generation          : No


3.2.3 Why am I _not_ getting tables?

First read [[GOTO how does the program detect and analyse tables?]] for an 
overview of how tables are detected.

If you're not getting tables this is either because they are not being detected,
or that having been detected they are being deemed to be not "table-like".  Look
at the HTML code to see if there are any comments around your table indicating
how it's been processed.

If the table is not being detected this could be because

    - the lines don't look table-like.  Try increasing the white space, or 
      adding a vertical bar '|' as your column separator.
	  
    - some lines are table-like, but the "roll out" isn't including the adjacent
      less formatted lines.  Try changing the policy *Table extending factor*
	  	
    - The detected "table" is too small compared to the value in the policy
      *Minimum automatic 
 size*.
	  	
If all this fails, edit the source to add preprocessor commands around the table
as follows

$_$_BEGIN_PRE
	$_$_BEGIN_TABLE
	...
	...(your table lines)
	...
	$_$_END_TABLE
$_$_END_PRE


3.2.4 Why do my tables have the wrong column structure?

First read [[GOTO how does the program detect and analyse tables?]] for an introduction
to how tables columns are analysed.

The short answer is "the analysis went wrong".  Answering *why* it went wrong
is almost impossible to answer in a general way.  Some things to consider

    - Was the table extent correctly calculated?  If adjacent lines were
      wrongly sucked into the table this will affect the analysis.  Try
      adding blank lines around the table, adjusting the "Table extending factor"
      policy, or adding BEGIN_TABLE/END_TABLE preprocessor tags to correct any
      errors in calculating the extent.

Often the table extent is correct, but the analysis of the table has gone 
wrong.

    - Check the text doesn't mix tabs and spaces together in an inconsistent
      manner.  Either set the "Tab size" policy, or replace all tabs by spaces.

    - Look to see if some data just "happens" to line up the blanks.  In some
      small tables this can happen.  Consider adjusting the 
      "Minimum column separation" policy to a value greater than 1.

    - Consider adjusting the "Column merging factor" policy to reduce/increase
      the number of columns produced for the table.

If all this fails you can explicitly *tell* the software what the table layout
by using either the TABLE_LAYOUT preprocessor command, or the "Default TABLE 
layout" policy.  Only use the policy if all tables in the same source file have
the same layout.


3.2.5 Where did all my table lines go?

The software removed them because it thought they would look wrong as 
characters. The lines are usually replaced by a non-zero BORDER value 
and/or some 
tags placed in cells. 3.2.6 How can I get the program to recognize my table header? One tip. If you insert a line of dashes after the header like so... $_$_BEGIN_PRE Basic Dimensions Hole No. X Y ------------------------- 1 3.2500 5.0150 2 1.2500 3.1250 etc..... $_$_END_PRE The program *should* recognize this as a heading, and modify the HTML accordingly (placing it in bold). Alternatively you can tell the program (via the policy options or preprocessor commands) that the file has 2 lines of headers. 3.2.7 Why am I getting strange COLSPAN values in my headers? (see the example table in 3.2.6) The spanning of "Basic Dimensions" over the other lines can be hit and miss. Basically if you have a space where the column gap is expected the text will be split into cells, if you don't then the text will be placed in a cell with a COLSPAN value that spans several cells. For example $_$_BEGIN_PRE | space aligns with column "gap" v Basic Dimensions Hole No. X Y ------------------------- 1 3.2500 5.0150 2 1.2500 3.1250 etc..... $_$_END_PRE In this case you'd get "Basic" in column 1 and "Dimensions" spanning columns 2 and 3. If you edit this slightly as follows then the "Basic Dimensions" will span all 3 columns $_$_BEGIN_PRE | space no longer aligns with column "gap" v Basic Dimensions Hole No. X Y ------------------------- 1 3.2500 5.0150 2 1.2500 3.1250 etc..... $_$_END_PRE It's a bit of a black art. Sometimes when the table is wrong, it's a good idea to set the BORDER size to 0 (again via the policy options) to make things look not so bad. It's a fudge, but a useful one to know. 3.3 Headings 3.3.1 How does the program _recognize_ headings? The program can attempt to recognize five types of headings: *Numbered headings*. These are lines that begin with section numbers. To reduce errors, numbers must be broadly in sequence and headings at the same level should have the same indentation. Words like "Chapter" may be before the number, but may confuse the analysis when present. *Capitalised headings*. These are lines that are ALL IN UPPERCASE. *Underlined headings*. These are lines which are followed by a line consisting solely of "underline" characters such as underscore, minus, equals etc. The length of the "underline" line must closely match the length of the line it is underlining. *Embedded headings*. These are headings embedded as the first sentence of the first paragraph in the section. The heading will be a single all-UPPERCASE sentence. Unlike the other headings, the program will place these as bold text, rather than using heading markup. You will need to manually enable the search for such headings, it is not enabled by default. *Key phrase headings*. These are lines in the source file that begin with user-specified words (e.g. "Chapter", "Appendix" etc.) The list of words and phrases to be spotted is case-sensitive and will need to be set via the "Heading key phrases" policy. The program is biased towards finding numbered headings, but will allow for a combination. It's quite possible for the analysis to get confused, especially when - headings are centred, rather than at fixed indents. The policy "Check indentation for consistency" should be disabled if this is the case. - headings include the words Chapter, Part etc. You should consider using the "Heading key phrase" policy and disabling the search for numbered headings in such cases. - The numbering system repeats (e.g. Part I, 1,2,3,... Part II, 1,2,3...). Again, consider using "key phrase" and/or underlined heading detection as an alternative. - The file has numbered lists at a similar indentation to the numbered sections. If possible move your numbered lists a few characters to the right of the indentation that headings are expected at. - The file has a large number of capitalised non-heading lines. Manually disable the search for capitalised headings if this happens - The numbering system is "exotic" (e.g. II.3.g) To tell if the program is correctly detecting the headings a) Look at the HTML to see if

,

etc. tags are being added to the correct text. b) If the headings are wrong, check the analysis policies are being set correctly by looking at the values shown under _Conversion Options -> Analysis policies -> headings_ after the conversion. Depending on what is going wrong do one or more of the following :- i) Adjust the headings policy (e.g. to disable capitalised headings) ii) Edit the source to replace centred headings by headings at a fixed indentation. iii) Edit the source so that numbered lists are at a different indentation to numbered sections. iv) If your numbering system is too exotic, edit your source so that all the headings are "underlined" and get the program to recognize underlined, rather than numbered headings. v) If possible consider the use of the "Heading key phrase" policy instead. 3.3.2 Why are my headings coming out as hyperlinks? This is a failure of analysis. The program looks for a possible contents list at the top of the file before the main document (sometimes in the first section). If your file has no contents list, but the program wrongly expects one, then as it encounters the headings it will mark these up as contents lines. To prevent this, set the analysis policy Expect contents list : No to "no". Or add a preprocessor line to the top of your file as follows $_$_BEGIN_PRE $_$_CHANGE_POLICY Expect contents list : No $_$_END_PRE 3.3.3 Why are the numbers of my headings coming out as hyperlinks? Either a failure of analysis, or an error in your document. The software checks headings "obey policy" and are in sequence. If you get your numbering sequence wrong, or if you place the heading line at a radically different indentation to all the others, then the software will reject this as a heading line, in which case the number may well be turned into a hyperlink. If it's an error in your document, fix the error. For example, a common problem is numbered lists inside sections. If the list numbers occur at the same level of indentation as the level 1 section headings, then eventually a number on the list will be accepted as the next "in sequence" header. For example in a section number 3.11, any list containing the number 4 will have the "4" treated as the start of the next chapter. If section "3.12" is next, the change in section number from 4 will be rejected as "too small", and so all sections will be ignored until section 4.1 is reached. The solution here is edit the source and indent the numbered list so that it cannot be confused with the true headers, Alternatively change it to an alphabetic, roman numeral or bulleted list. Another possible cause if is the software hasn't recognized this level of heading as being statistically significant. (e.g. if you only have 2 level 4 headings (n.n.n.n) in a large document). In this case you'll need to correct the headings policy, which is a sadly messy affair. 3.3.4 Why are various bullets being turned into headings, and the headings ignored? The software can have problems distinguishing between 1 This is chapter one and 1) This is list item number one. To try and get it right it checks the sequence number, and the indentation of the line. However problems can still occur if a list item on the right number appears at the correct indentation in a section. If possible, try to place chapter headings and list items at different indentations. In extreme cases, the list items will confuse the software into thinking they are the headings. In such a case you'd need to change the policy file to say what the headings are, with lines of the form We have 2 recognized headings Heading level 0 = "" N at indent 0 Heading level 1 = "" N.N at indent 0 (this may change in later versions). 3.3.5 Why are lines beginning with numbers being treated as headings? The software can detect numbered headings. Any lines that begin with numbers are checked to see if they are the next heading. This check includes checking the number is (nearly) in sequence, and that the line is (nearly) at the right indentation. If the line meets these criteria, it is likely to become the next heading, often causing the *real* heading to be ignored, and sometimes completely upsetting the numbering sequence. You can fix this by editing the source so that the "number" either occurs at the end of the previous line, or has a different indentation to that expected for headings. 3.3.6 Why are underlined headings not recognized? The software prefers numbered headings to underlined or capitalised headings. If you have both, you may need to switch the underlined headings on via the policy Expect underlined headings : Yes 3.3.7 Why are only _some_ of my underlined headings not recognized? If the program is looking for underlined headings (see 3.3.6) then the only reason for this is that the "underlining" is of a radically different length to the line being underlined. Problems can also occur for long lines that get broken. Edit your source to - place the whole heading on one line - make the underlining the *same* length 3.3.8 How do I control the header level of underlined headings? The level of heading associated with an underlined heading depends on the underline character as follows:- '****' level 1 '====','////' level 2 '----','____','~~~~' level 3 '....' level 4 The actual *markup* that each heading gets may depend on your policies. In particular level 3 and level 4 headings may be given the same size markup to prevent the level 4 heading becoming smaller than the text it is heading. However the _logical_ different will be maintained, e.g. in a generated contents list, or when choosing the level of heading at which to split large files into many HTML pages. 3.3.9 Why are only the first few headings are working? A couple of possible reasons :- - a numbered list is confusing the software. This is the same problems as [[GOTO why are the numbers of my headings coming out as hyperlinks?]] - Some of your headings are "failing" the checks applied. See the discussion in [[GOTO how does the program recognize headings?]] One of the reasons for "failure" is that - for consistency - headings must be in sequence and at the same indentation. This is an attempt to prevent errors in documents that have numbers at the start of a line by chance being treated as the wrong headings. If some headings aren't close enough to the calculated indent then they won't be recognised as headings. If a few headings are discarded then later headings that *are* at the correct indentation are discarded as being "out of sequence". If you're authoring from scratch then the easiest solution is to edit all the headings to have the same indent. Alternatively disable the policy "Check indentation for consistency". 3.4 Hyperlinks 3.4.1 Why doesn't it correctly parse my hyperlinks? The software attempts to recognize all URLs, but the problem is that - especially near the end of the URL - punctuation characters can occur. The software then has difficulty distinguishing a comma separated list of URLs from a URL with a series of commas in it (as beloved at C|Net). This algorithm is being improved over time, but there's not much more you can do than manually fix it, and report the problem to the author who will pull out a bit more hair in exasperation :) 3.4.2 Why doesn't it recognize my favourite newsgroup? To avoid errors the program will only recognize newsgroups in the "big 7" hierarchies. Otherwise filenames like "command.com" might become unwanted references to fictional newsgroups. This means that uk.telecom won't be recognized, although if you place "news:" in from of it like this news:uk.telecom then it is recognized. If you want to make "uk." recognized as a valid news hierarchy, then set the policy recognized USENET groups : uk Then any work beginning "uk." may become a newsgroup link. 3.4.3 Why are only some of my section references becoming hyperlinks? The program will only convert numbers that match known numbered sections into hyperlinks. If the number is a genuine section heading, then the chances are that this level of heading has not been detected. This has happened in large documents which contained only 2 level 5 headings. In such document you may need to manually add the extra level to your policy file. Another limit is that the program won't convert level 1 heading references, because the error rate is usually two high. For example if I say "1, 2, 3" it's unlikely I want this to become hyperlinks to chapters 1, 2 and 3. 3.4.4 Why are some numbers becoming hyperlinks? In a numbered document numbers of the form n.n may well become hyperlinks to that section of the document. This can cause "Windows 3.1" to become a hyperlink to section 3.1 if such a section exists in your document. You can either insert some character (such as "V" to make "V3.1"), or disable this feature entirely via the policy Cross-refs at level : 3 (which means only "level 3" headings such as n.n.n will be turned into links, or Cross-refs at level : (none) which should disable the behaviour. 3.4.5 Why are some long hyperlinks not working? The software will sometimes break long lines to make the HTML more readable. If this happens in the middle of a hyperlink, the browser reads the end of line as a space in the URL. You can fix this by editing the output text so that the HREF="" part of the file is all on the same line. This "feature" may be fixed in later versions of AscToHTM. 3.4.6 How do I preserve one URL per line? Some files contain lists of URLs, with one URL per line. By default the software will not normally preserve this structure because long lines are usually concatenated into a single paragraph. You can change this behaviour using the option on the _Output policies -> Hyperlinks_ policy sheet. See also [[GOTO why isn't the software preserving my line structure?]] 3.5 Policy files 3.5.1 How many policies are there? Where can I read more about individual policies? First time I looked it was nearly 200, recently the number is approaching 250. The kind of sneak up on you, I guess. The [Pol man] gives a pretty comprehensive description of what each one does and where it can be found. People complain that there are too many policies, but then they say "couldn't you add an option to ...", and so it goes. Organizing these policies in a logical manner is a fairly difficult problem, and if anyone has any bright ideas I'm listening. In recent versions I added overview policies to make things easier to locate or to switch off. To tackle this issue, the section of the documentation on policies and policy files has been spun off into a new document called the "Policy Manual". This should describe them all in great detail. Last time I checked that file was 5000 lines of text before conversion to HTML. 3.5.2 My policy file used to work, but now it doesn't. Why? Make sure you're using an "incremental" policy file, rather than a full one. You can do this by viewing the .pol file in a text editor. An "incremental" policy file will only contain lines for the policies you've changed. A full policy file will contain all possible policies. If you load a "full" policy file you prevent the program intelligently adjusting to the particular file being converted. If this happens either edit out the lines you don't want from your policy file, or reset the policies to their defaults and create a new policy file from scratch. 3.5.3 xxxx Policy is not taking effect. What shall I do? (see 1.7) 3.6 Bullets and lists 3.6.1 Why is the indentation wrong on follow-on paragraphs? The program can't distinguish between indented paragraphs and paragraphs that are intended as follow-on paragraphs from some bullet point or list item. This means that whilst the first paragraph (the one with the bullet point) is indented as a result of being placed inside appropriate list markup, the second and subsequent paragraphs are just treated as indented text. The bullet point will be indented as one level deeper than the text position of the bullet. The follow-on paragraph will be indented according to it's own indentation. Ideally this will be one level deeper than the text position of the bullet. Occasionally the two result in different indentations. The solutions are either to a) Review your *indent position(s)* policy with a view to adjusting the values to give the right amount of indentation to the follow-on paragraphs. Sometimes adding an extra level to match the indentation of the follow-on paragraph is all that's necessary. b) Edit your source text slightly, adjusting the indent of either the list items or follow-on paragraphs until the two match. 3.6.2 Why is the numbering wrong on some of my list items? HTML doesn't allow the numbering to be marked up explicitly. Instead you can only use a START attribute in the
    tag to get the right first number which is incremented each time a
  1. tag is seen. Some browsers don't implement the START attribute, and so they always restart numbering at 1. I've also seen a bug in Opera V3.5 where any tag (such as ) placed between the
      and the
    1. causes the numbering to increment. There's not much I can do about either problem. 3.6.3 Some of my text has gone missing. What happened? There's a bug (in Opera), where a tag between the
        and
      1. tag causes all that text to not be displayed. If there's any other problem of this sort email support@jafsoft.com. 3.7 Contents List generation 3.7.1 How do I add a contents list to my file? There are a number of ways:- - If the file already has a contents list this may be detected if the sections are numbered, and the contents line will be turned into links to the sections concerned. - You can forced the addition of a contents list using the policies under the menu at _Conversion Options -> Output Policies -> Contents List_ A hyperlinked contents list will be generated from the headings that the program detects. This list will be placed at the top of the first file. If you don't want the generated list to be placed at the top of the file, insert the preprocessor command $_$_CONTENTS_LIST at the location(s) you want. This command takes arguments that allow a limited number of formatting options. 3.7.2 Why doesn't my contents list doesn't show all my headings? First read [[GOTO how does the program recognize headings?]]. If you're generating a contents list from the observed headings, then any missing headings are either because a) The program didn't recognize the headings b) The policy *Maximum level to show in contents* has been set to a value that excludes the desired heading. If you're converting an in-situ contents list, then only (a) is likely to apply, in which case you need to ensure the program recognizes your headings. 3.7.3 Some of my contents hyperlinks don't work! There used to be a problem whereby the software would add hyperlinks to sections that didn't exist, or would point to the wrong file when a large file was being split into many smaller files. Both problems should now be fixed, so if you encounter this problem, contact support@jafsoft.com. 3.8 Emphasis 3.8.1 Why didn't my emphasis markup work? Emphasis markup can be achieved by placing asterisks (*) or underscores (_) in pairs around words or phrases. The matching pair can be over a few lines, but cannot span a blank line. Asterisks and underscores can be nested. Asterisks generate *bold markup*, underscores generate _italic markup_, and combining these generates _*bold, italic markup*_. If you wrap a phrase in underscores, and replace and replace all the spaces by underscores [[TEXT _like_this_]] then the result will be underlines _like_this_ and not in italics. The algorithm copes reasonably well with normal punctuation, but if you use some unanticipated punctuation, it may not be *recognized*!&%@! You can have a _phrase that spans a couple of lines that contains *another phrase of a different type* in the middle of it_, but you can't have two phrases of the same type nested that way. Be reasonable :-) 3.9 Link Dictionary 3.9.1 What is the Link Dictionary? The link dictionary allows you to add hyperlinks to particular words or phrases. You can choose the phrase to be matched, the text to be displayed and the URL to be linked to. This can help when building a site by converting multiple text files. For example the whole www.jafsoft.com site is built from text files, and extensive use of a link dictionary is made to add links from one page to another. 3.9.2 My links aren't coming out right. Why? Known problems include - if the "match text" matches part of the URL the program may get confused. Try to keep them different. - if the "match text" of one link is a substring of another the program will get confused - if a link is repeated on the same line on the first occurrence is converted (fixed post V3.0) - if the "match text" spans two lines it won't be detected. One tip is to place brackets round the [match text] in your source file... this not only makes the chances of a false match less likely, but also makes it clearer in the source files where the hyperlinks will be. 3.9.3 I can't enter links into the Link Dictionary. What gives? The Link Dictionary support in the Windows version of the software is a little quirky. Apologies for that. The way it should work is that you click on "add new link definition", button. I realize now that this is counterintuitive, and will probably address this in the next release. If you save your policy, each link appears as a line of the form Link definition: "match text" = "display text" + "URL" e.g. Link definition: "jaf" = "John Fotheringham" + "mailto:support@jafsoft.com" The whole definition must fit on one line. You may find it easier to open your .pol file in a text editor and add these by hand. 3.10 Batch conversion For more information see the section "Processing several files at once" in the main documentation. The software supports wildcards, and console versions are available to registered users which are better suited for batch conversions. In the shareware versions no more than 5 files may be converted at once. This limit is absent in the registered version (see [[GOTO what's the most files I can convert at one go?]]). 3.10.1 How do I convert a few files at once? If you only want a few files converted, then the simplest way is to drag and drop those files onto the program. You can either drag files onto the program's icon on the desktop, or onto the program itself. If you drag files onto the program's icon there is a limit with this approach of around 10 files. This limit arises because the filenames are concatenated to make a command string, and this seems to have a Windows-impose limit of 255 characters. This problem may be solved in later versions. The same limit doesn't seem to apply when you drag files onto the open program. Alternatively you can browse to select the files you want converting. 3.10.2 How do I convert _lots_ of files at once? If you want to convert many files in the same directory, then just type in a wildcard like "*.txt" into the name of the files to be converted. Registered users of the software can get a console version of the software. This can accept wildcards on the command line, and is more suited for batch conversion, e.g. from inside windows batch files (for example it won't grab focus when executed). If you want to convert many files in different directories, either invoke the console version multiple times using a different wildcard for each directory, converting one directory at a time, or investigate the use of a steering command file when running from the command line. See the main documentation for details. 3.10.3 How do I interrupt a conversion? At present you can't. The windows version won't respond to stimulus while a conversion is in progress, meaning that the windows will not refresh. Normally this isn't a problem, but in large conversions this can be a little disconcerting. Fixing this is on the "to do" list. 3.10.4 What's the most files I can convert at one go? The largest number of files converted at one time using the wildcard function was reported to be around 2000. A week later someone contacted me with around 3000 files to be converted. A few weeks after that someone was claiming 7000. If you'd like to claim a higher number, let me know. Theoretically the only limit is your disk space. The program operates on a flat memory model so that the memory used is largely independent of the number of files converted, or the size of the files being converted. Such conversions are a testament to the program's stability and efficient use of system resources. That said if possible we recommend you break the conversion into smaller runs you reduce your risks :-) 3.11 File splitting 3.11.1 Why isn't file splitting working for me? The program can only split into files at headings it recognises (see [[GOTO how does the program recognize headings?]]). You first need to check that the program is correctly determining where the headings are, and what type they are. Headings can be numbered, capitalised or underlined. To tell if the program is correctly detecting the headings a) Look at the HTML to see if

        ,

        etc. tags are being added to the correct text. b) If the headings are wrong, check the analysis policies are being set correctly. If necessary set them yourselves under _Conversion Options -> Analysis policies -> headings_ Once the headings are begin correctly diagnosed, you can switch on file splitting using the policies under _Conversion Options -> output policies -> file generation_ Note that the "split level" is set to 1 to split at "chapter" headings, 2 to split at "chapter and major section" headings etc. Underlined headings tend to start at level 2, depending on the underline character (see 3.3.8) Hopefully this will give you some pointers, but if you still can't get it to work, please mail a copy of the source file (and any policy file you're using) to support@jafsoft.com and I'll see what I can advise. 3.12 Miscellaneous questions 3.12.1 How do I suppress the Next/Previous navigation bar when splitting a large document? Prior to version 4 there was a bug which meant the policy "Add navigation bar" was being ignored when splitting files (the only time it was used). This is now fixed. However also available in version 4 is a new "HTML fragments" feature that allows you to customize some of the HTML generated by the software. This includes the navigation bars so that, for example, if you wanted to suppress just the top navigation bar, you could define the fragment NAVBAR_TOP to be empty. See [[GOTO customizing the HTML created by the software]] and the [Tag man] for more details. 3.12.2 Why am I getting regions of
         text?
        
        The software attempts to detect pre-formatted text in your files and, when
        it finds some, attempts to turn these into tables.  In many cases having 
        detected some pre-formatted text it recognises that it cannot make a table 
        and so resorts to using 
        ...
        markup instead (in RTF is uses courier font), giving a "mal-formed table" error message. These
         sections actually work quite well for some documents, but in other
        cases they would be better not handled this way.
        
        Happily the solution is simple.  On the menu go to
        
              _Conversion Options -> Analysis policies -> What to look for_
        
        and disable "pre-formatted regions of text".
        
        
        3.12.3 Do you have a html-to-text converter, rtf-to-html converter etc?
        
        No.
        
        My converters convert from *plain ASCII text* into HTML or RTF.  Their
        "unique selling point" is that they intelligently work out the structure
        of the text file.
        
        However *other* people provide other converters.
        
        There are a number of html->text converters on top of which Netscape
        has a good "save as text" feature.  Or you can import the HTML into
        Word and use Word's save as text features (although in my opinion these
        are inferior to Netscape's).
        
        If you visit my ZDNet listing at http://www.hotfiles.com/?000M96 and click
        on the "related links" you'll see a number of converters listed.
        
        There are at least two RTF-to-HTML converters called RTF2HTML and RTFtoHTML
        and of course Word for Windows offers this capability (it doesn't suit 
        everyone though).
        
        In fact, here are three products:-
        
        RTFtoHTML can be found at http://www.sunpack.com/RTF/ [[BR]]
        RTF2HTML can be found at http://www.xwebware.com/products/rtf2html/ [[BR]]
        RTF-2-HTML can be found at http://www.easybyte.com/rtf2html.com
        
        
        4.0 Adding value to the HTML generated
        ======================================
        
        4.1 Adding Title and META tags
        
        There are policies that allow Title, Description and keywords to be added to 
        your pages.  
        
        The title will default to "Converted from """, but a number of 
        policies allow the title to be made to adopt the first section title, or 
        any text that you provide.
        
        Alternatively you can use preprocessor commands embedded in the source file
        as follows
        
        $_$_BEGIN_PRE
        	$_$_TITLE This is my lovely HTML page
        	$_$_DESCRIPTION This page was converted from text
        	$_$_DESCRIPTION and this description was added using preprocessor
        	$_$_DESCRIPTION commands
        	$_$_KEYWORDS Converted, from, text
        $_$_END_PRE
        
        This approach is in many ways simpler, as it avoids the need for policy 
        files, and keeps all your source in one file.
        
        
        4.2 Adding Headers and Footers
        
        The software will allow you to add headers and footers to each file generated.
        
        You can do this either through policies or by defining some HTML fragments
        
        The policies concerned are
        
        	HTML header file : c:\include\header.inc
        	HTML footer file : c:\include\footer.inc
        	
        The value is the name of the file to be used (you must supply a full or
        relative path so that the file may be located).
        
        Alternatively you can define the HTML fragments HTML_HEADER and HTML_FOOTER
        (see [[GOTO customizing the HTML created by the software]]). If both are 
        defined then the HTML fragments will be used.
        
        Whether defined by file or as a HTML fragment, these fragments will be 
        copied into each HTML page generated after the  tag and before 
        the  tag respectively.
        
        If a large file is being split into many smaller HTML files these headers and
        footers will be copied into *every* HTML generated.  This is different to using
        an $_$_INCLUDE statement, which only gets executed once.
        
        These files can be useful to add a standard title in the header, and links to
        other parts of the site (home, contacts etc) in the footer of whatever.
        
        
        4.3 Adding Javascript
        
        There's a limit to how you can add JavaScript to a page generated from text.  
        That said the program will allow you to embed javascript (or indeed anything 
        else) into the ... section of the document.  This is the 
        recommended location for including JavaScript as this ensures it is all read 
        before anything is drawn.
        
        The policy concerned is
        
        	HTML script file : ..\scripts\myscript.js
        	
        This should point to a file that contains all the scripting required.  The 
        program will simply copy this text into the header of each HTML page generated.
        
        For the JavaScript to have an effect, you may need to embed further HTML 
        into the body of the source text.  
        
        See [[GOTO how do I add my own HTML to the file?]].
        
        
        4.4 Adding colour/color
        
        A number of policies allow you to choose your document colours.  These can be
        found under the Windows menu
        
        	_Conversion Options -> Output policies -> Document colours_
        	
        and
        
        	_Conversion Options -> Output policies -> Tables_
        	
        All colours should be specified in HTML format, i.e. as 6-character hex values
        in the form rrbbgg.  A few colours like "Red", "White" and "Black" may be
        entered by name.  Wherever possible the program will use the name so as to
        make the HTML more understandable.
        
        If you don't want *any* colours added to your HTML (not even the default white
        background) you can use the policy *Suppress all colour markup*.
        
        For a full list of colour policies, see the [Pol Man].
        
        
        4.5 Adding images to the HTML
        
        See [[GOTO how do I add my own HTML to the file?]] which includes an
        example which is used to add an image to HTML version of this document.
        
        
        4.6 Adding hyperlinks to keywords and phrases
        
        Use the [[GOTO Link Dictionary]].
        
        
        4.7 Splitting large documents into sections
        
        The program can only split into files at headings it recognizes.  So first 
        you need to check that the program is correctly determining where your 
        headings are, and what type they are.  See [[GOTO how does the program recognize headings?]]
        
        Once the headings are begin correctly diagnosed, you can switch on file
        splitting using the policies under
        
              _Conversion Options -> output policies -> file generation_
        
        Note that the "split level" is set to 1 to split at "chapter" headings, 2 to 
        split at "chapter and major section" headings etc.
        
        Underlined headings tend to start at level 2, depending on the underline
        character (see 3.3.8).
        
        Hopefully this will give you some pointers, but if you still can't get it to
        work, please mail me a copy of the source file (and any policy file you're
        using) and I'll see what I can advise.
        
        
        4.8 Customizing the HTML created by the software
        
        From version 4 onwards AscToHTM will allow you do define "HTML fragments"
        that can be used in place of the standard HTML generated by the program in
        certain situations.
        
        See the relevant chapter in the [tag man].
        
        
        5.0 Diagnosing problems for yourself
        ====================================
        
        The program offers a number of diagnostic aids.  These can be awkward to use, 
        but if you want to get a better idea of what's going on these can sometimes 
        help.
        
        The various diagnostic options can be accessed via the menu option
        
        	_Conversion Options -> Output policies -> File generation_
        	
        
        5.1 Generate a .lis file
        
        The program can be made to generate listing files.  A fragment is shown below.
        
        $_$_BEGIN_PRE
                 56:  103  |1.2.4 Who is the author?
                 57:    1  |
                 58:  104  |1.2.4.1 John A Fotheringham
                 59:    1  |
                 60:       |That's me that is.  The program is wholly the responsibility
                 61:       |Fotheringham, who maintains it in his spare time.
                 62:    1  |
                 63:    1  |
        $_$_END_PRE
        
        These show the source lines in truncated form.  Each line is numbered, and 
        markers show how the line has been analysed.  In this case the line with "Who 
        is the author?" has been allocated a line type of 103 ("header level 3") and 
        is followed by a line of type 1 ("blank").  A complete list of line types and 
        code is included at the end of the file.
        
        Three files are generated; a ".lis1" file which is a listing from the 
        Analysis pass, a ".lis" file which is a listing from the output pass and a 
        ".stats" file which lists statistics collected during the analysis.  Ignore 
        this last file.
        
        The ".lis1" and ".lis" files have similar format, but represent the file as 
        analysed before and after the application of program policies.  Thus more 
        lines will be marked as headings in the ".lis1" file, but only those that 
        "pass policy" - i.e. are in sequence and at the right indentation - will be 
        marked as headings in the ".lis" file.
        
        Understanding these files is a black art, but a quick look can sometimes help
        you understand how the program has interpreted particular lines that have gone
        wrong, and give you a clue as to which policies may be used to correct this 
        behaviour.
        
        
        5.2 Generate a .log file
        
        The program will display messages during conversion.  You can filter these 
        messages (e.g. to suppress certain types) by using the Menu option
        
        	_Settings -> Diagnostics_
        	
        These messages can also be output to a .log file by using the options under
        
        	_Conversion Options -> Output policies -> File generation_
        	
        This log file will contain *all* messages, including those suppressed by 
        filtering. In the Windows version you can also choose to save the messages 
        displayed to file.
        
        Looking through the .log file can sometimes reveal problems that the program
        has detected and reported.
        
        
        5.3 Generate a .pol file
        
        The program operates in three passes.  The first pass analyses the file, and 
        sets various policies automatically (assuming these haven't previously 
        been loaded from a policy file).  The second pass calculates the output 
        file structure, and the third pass actually generates the output files.
        
        You can use the screens under Conversion Options to review the policies 
        that have been set.
        
        Alternatively you can save these policies to file, using the menu option
        
        	_Conversion options -> Save policies to file_
        	
        Selecting the "save all policies" option.  Be careful not to overwrite any
        existing "incremental" file.
        
        This file will list all policies used, which you may review... particularly 
        looking for any analysis policies that seem to have been incorrectly set.
        
        
        5.4 Understanding error messages
        
        In the fullness of time an [Error Manual] will be produced.  (see 1.7)
        
        
        5.5 Diagnosing table problems
        
        See [[GOTO how does the program detect and analyse tables?]] and other topics 
        in the [[GOTO Tables]] section of this document.
        
        
        6.0 Future directions
        =====================
        
        6.1 RTF generation
        
        The text analysis engine that lies at the core of AscToHTM is now
        available in a text-to-RTF converter.  This is called [AscToRTF],
        but we prefer the name "rags to Rich Text" :-)
        
        The initial release of this software was in March 2000.  For more details
        visit the [AsctoRTF] home page.
        
        
        6.2 Multi-lingual user interface
        
        AscToHTM (and AscToRTF) support several languages in the user interface.
        These translations have been provided by volunteers, and so far only extend
        to parts of the user interface.  All the programs' documentation and support
        remain in English.
        
        THe software also supports the use of "language skins", i.e. the loading of 
        text files containing all the user interface text.  This will hopefully allow 
        people to convert the program into more languages.  We'd welcome copies of 
        skins developed, and will consider them for future distribution.  Please
        send them to translations@jafsoft.com
        
        For more details visit http://www.jafsoft.com/products/translations.html
        
        
        6.3 Improved standards support
        
        *Standards support is now a stated aim of the program*.  
        
        However, due to the complexities of generating standards-compliant HTML from 
        arbitrarily structured text we don't feel we can *guarantee* 
        standards-compliance.  If you find the program generating faulty HTML, please 
        report it to support@jafsoft.com.
        
        If you want to validate your HTML, visit http://validator.w3.org/
        
        Please note, if you embed your own HTML into your source files, this may
        well upset the balance in terms of compliance.
        
        Note: When the program detects that it has violated standards, error messages
              will be displayed.  You should report such violations to 
              support@jafsoft.com.
               
        
        6.4 Targeting particular HTML versions
        
        Internally the program is aware of the features and limitations of various 
        versions of HTML as follows
        
        $_$_BEGIN_TABLE
        $_$_TABLE_MIN_COLUMN_SEPARATION 2
        	[[TEXT HTML 3.2]]
        	[[TEXT HTML 4.0]] Transitional
        	[[TEXT HTML 4.0]] Strict	 (not yet supported)
        $_$_END_TABLE
        	
        For example certain HTML entities are only supported under newer versions of
        HTML.
        
        Bearing in mind we're converting text files, there's a limit as to how advanced
        the HTML can be (for example I can't work out which text to animate :-)
        
        If you want to target a particular form of HTML, use the
        policy
        
        	HTML version to be targeted : [[TEXT HTML 4.0]] Transitional
        	
        and the program will adjust to do the best it can.
        
        Note:   "[[TEXT HTML 4.0]] Strict" is not yet supported and a number of 
              	"deprecated" tags are still used in "[[TEXT HTML 4.0]] Transitional".
         	
         	
        6.5 CSS and Font support
        
        Font support will be introduced shortly.  Due to the program's history, the
        HTML currently being generated is more akin to [[TEXT HTML 3.2]].
        
        Over time we plan to offer proper [[TEXT HTML 4.0]] and CSS support, although 
        obviously this will be limited to what can be sensibly applied to converted 
        text.
        
        $_$_DEFINE_HTML_FRAGMENT HTML_HEADER
        

        FAQ for JafSoft text conversion utilities

        You can download these files as a .zip file (~100k)


        $_$_END_BLOCK $_$_DEFINE_HTML_FRAGMENT HTML_FOOTER

        Valid HTML 4.0! Converted from a single text file by AscToHTM
        © 1997-2003 John A Fotheringham
        Converted by AscToHTM
        $_$_END_BLOCK