$_$_TITLE Documentation for the AscToPDF conversion utility $_$_DESCRIPTION AscToPDF is a utility for converting plain text (ASCII) into PDF. $_$_CHANGE_POLICY Indent Position(s) : 0 4 8 12 16 $_$_CHANGE_POLICY Create mailto links : no $_$_CHANGE_POLICY Default font : Arial, regular, 10 $_$_CHANGE_POLICY Could be blank line separated : yes $_$_TABLE_HEADER_COLS 1 $_$_TABLE_BORDER 0 $_$_HELP_SUBJECT "Introduction to AscToPDF" $_$_SECTION MAKINGRTF AscToPDF Help Index ******************* $_$_SECTION MAKINGHTML [[HTML
section]] policy. When AscToPDF detects such regions it marks them up in fixed width font which tells PDF this region is pre-formatted. When tables are detected, AscToPDF will attempt to generate the correct PDF table. When AscToPDF gets the detection wrong you can use the AscToPDF [[goto Using the pre-processor,pre-processor]] to mark up regions of your document you wish preserved. Table detection ............... Tables are marked out by their use of white space, and a regular pattern of gaps or vertical bars being spotted on each lines. AscToPDF will attempt to spot the table, its columns, its headings, its cell alignment and entries that span multiple columns or rows. Should AscToPDF wrongly detect the extent of a table, you can mark up a section of text by using the [[goto Pre-processor command: TABLE,TABLE]] pre-processor markup (see the [Tag Manual]). Alternatively you can try adding blank lines before and after, as the analysis uses white space to delimit tables. $_$_BEGIN_IGNORE You can alter the characteristics of all tables via the table policies (see [[GOTO Formatting policies]]). $_$_END_IGNORE You can alter the characteristics of all or individual tables via the table pre-processor commands (see [[goto Pre-processor command: TABLE,TABLE]]). $_$_BEGIN_IGNORE Or you can suppress the whole thing altogether via the [[goto Attempt table generation]] policy $_$_END_IGNORE Code sample detection ..................... AscToPDF attempts to recognize code fragments in technical documents. The code is assumed to be "C++" or "Java"-like, and key indicators are, for example, the presence of ";" characters on the end of lines. Should AscToPDF wrongly detect the extent of a code fragment, you can mark up a section of text by using the [[goto Pre-processor command: CODE,CODE]] pre-processor markup. Or you can suppress the whole thing altogether via the policy [[goto Expect code samples]]. ASCII art and diagram detection ............................... AscToPDF attempts to recognize ASCII art and diagrams in documents. Key indicators include large numbers of non-alphanumeric characters and the use of white space. However, some diagrams use the same mix of line and alphabetic characters as tables, so the two sometimes get confused. Should AscToPDF wrongly detect the extent or type of a diagram, you can mark up a section of text by using the [[goto Pre-processor command: DIAGRAM,DIAGRAM]] pre-processor markup. Text block detection .................... If AscToPDF detects a block of text at a large indent, it will now place that text in such a way as to preserve as faithfully as possible the original indent. Other formatted text .................... If AscToPDF detects formatted text, but decides that it is neither table, code or art (and it knows what it likes), then the text may be put out "as normal", but with the original line structure preserved. In such regions other markup (such as bullets) may not be processed such as it would be elsewhere. $_$_BEGIN_IGNORE $_$_HELP_CHAPTER 2,"Adding features to the document" Adding features to the document =============================== As well as detection features present in the source text, the software allows you to add in features that you would expect in the output file that can't be inferred from the input These include the following. - [[goto Adding a Document Title,Document title]] - [[goto Adding a contents list,A working contents list]] Adding a Document Title ----------------------- AscToPDF can calculate - or be told - the title of a document. This will be placed in document properties section in the header of each PDF file produced. The Title is calculated as in the order shown below. If the first algorithm returns a value, the subsequent ones are ignored. 1) If a [[goto Pre-processor command: TITLE,TITLE command]] is placed in the source text, that value is used 2) If the [[goto Document Details,Document title]] policy is set then this value is used. 3) Finally, if none of the above result in a title the text "Converted from" is used. Adding a Contents list ---------------------- AscToPDF can detect the presence of a contents list ??? in the original document, or it can insert a field code that will generate a contents list from the headings that it observes. This contents field added can be recalculated in Word by pressing F9. There are a number of policies that give you control over how and where a contents list is generated (see [[goto contents policies]]). _Contents lists placement_ By default the contents list will be placed at the top of the output file. You can cause contents lists to be placed wherever you want by using the CONTENTS_LIST preprocessor command (see [[goto pre-processor directives]]). _Contents list detection_ AscToPDF can detect contents lists in a number of ways - By detecting "table of contents" "end contents" or something similar in the text. - By spotting the numbering sequence has been repeated twice. AscToPDF will assume the first set is the contents list. - By spotting [[goto Using the pre-processor,pre-processor]] markup. This is often a hit-and-miss procedure, and is liable to error. Should the analysis fail, you can attempt to correct it via the [[goto contents policies,Contents lists]] policies. $_$_END_IGNORE $_$_HELP_CHAPTER 1,"Using policy files" Using policy files ****************** $_$_HELP_TOPIC_ID ID_POLICY_FILES Document policies have two main uses; to correct any failure of analysis that AscToPDF makes, and to tell the program how to produce better PDF in ways that couldn't possibly be inferred from the original text. Examples of the former may include specifying a nominal page width, and stating whether or not underlined section headings are expected etc. Examples of the latter include adding colour and titles to the page, as well as requesting that a large document is split into several pages. $_$_SECTION MAKINGHTML *Contents of this section* $_$_CONTENTS_LIST 2,,2 $_$_SECTION MAKINGRTF [[goto analysis policies]] - [[goto General analysis policies,General Analysis]] - [[goto Headings Policies]] - [[goto bullet policies,Bullets]] - [[goto Pre-formatted text policies,Pre-formatted text]] $_$_BEGIN_IGNORE - [[goto Table analysis policies,Table analysis]] [[goto output policies]] - [[goto File Structure policies,File generation]] - [[goto Document details]] $_$_BEGIN_IGNORE - [[goto Formatting Policies,Formatting]] - [[goto Hyperlinks policies,Hyperlinks]] $_$_END_IGNORE - [[goto Preprocessor policies,Preprocessor]] $_$_BEGIN_IGNORE - [[goto Link Dictionary Edit Dialog,Link Dictionary]] $_$_END_IGNORE $_$_END_IGNORE $_$_SECTION ALL What are Policy files? ====================== $_$_HELP_TOPIC_ID ID_DEFINEPOLICYFILES AscToPDF has a large number of options available to influence the analysis of your text files, and the output to PDF. These options are called "policies" as they govern how the source file should be interpreted and converted. Policies may be saved in text files, known as policy files. These files have a ".pol" extension by default. The policy files are usually updated by changing the policies and saving the changes in a new file. Because they are text files you can also edit them directly, in a text editor. The files have the format of one policy per line of Text in the form PolicyText : The use of policy files allow a given set of options to be saved and reused for other conversions, or later conversions of the same file. See [[goto Using policy files]] for more information. $_$_HELP_CHAPTER 2,"Analysis policies" $_$_HELP_SUBJECT "Introduction" Analysis policies ================= $_$_HELP_TOPIC_ID ID_CONV_POLICIES Analysis policies are usually calculated by AscToPDF by making a first pass through your document. The resulting policies are then used during the second, conversion pass to categorise all input lines so that they may be correctly converted to HTML. You should only need to change these policies should the analysis fail. - [[goto 'What to look for' policies]] - [[goto General analysis policies,General Analysis]] - [[goto bullet policies,Bullets]] - [[goto File Structure policies,File generation]] - [[goto Headings Policies]] - [[goto Pre-formatted text policies,Pre-formatted text]] $_$_BEGIN_IGNORE - [[goto Table analysis policies,Table analysis]] $_$_END_IGNORE $_$_HELP_CHAPTER 3,"What to look for" policies $_$_HELP_SUBJECT "List of 'look for' policies" 'What to look for' policies --------------------------- $_$_HELP_TOPIC_ID HIDD_LOOKFOR These policies act as "broad stroke" policies enabling or disabling areas of functionality within the software by telling it what to look for and to try to detect. For example you can tell the program whether or not to bother looking for patterns of indentation, bullets, or numbered lists. In many cases if you enable a policy you can further fine tune the conversion details on other policy sheets. - [[popup Look for indentation]] - [[popup Look for white space,Look for paragraphs]] - [[popup Look for short lines]] - [[popup Look for bullets and numbered lists]] - [[popup Look for mail and USENET headers]] - [[popup Look for preformatted text,Look for regions of preformatted text]] $_$_BEGIN_IGNORE - [[popup Look for emphasis]] - [[popup Look for horizontal rules]] - [[popup Look for quoted lines]] - [[popup Look for definitions]] - [[popup Look for underlined text]] - [[popup Look for character encoding]] - [[popup Look for diagrams]] $_$_END_IGNORE NOTE: Some options on this screen are grayed out. These options are supported by other JafSoft conversion utilities and it is hoped to extend this support to AscToPDF as development allows. Look for indentation .................... $_$_HELP_TOPIC_ID ID_LOOKFOR_INDENTS AscToPDF can attempt to detect the indentation pattern of your document and replicate it in the output file. If you chose to disable this policy, all your text will be output with no indentations at all. If the program is wrongly indenting your files, you can try adjusting the pattern of indentation on the [[goto General analysis policies,General Analysis]] tabbed policy sheet. Look for white space .................... $_$_HELP_TOPIC_ID ID_LOOKFOR_PARAS By default AscToPDF will attempt to look for paragraphs in your source. Usually this is signaled by a blank line between paragraphs, a leading indent on the first line of each paragraph, or (in extreme cases) a short line at the end of a paragraph. If you don't want AscToPDF to detect paragraphs, disable this policy. If AscToPDF is wrongly detecting paragraphs, try adjusting the paragraph analysis policies on the [[goto General analysis policies,General Analysis]] tabbed policy sheet. Look for short lines .................... $_$_HELP_TOPIC_ID ID_LOOKFOR_SHORTLINES By default AscToPDF will attempt to detect short lines and preserve their structure by adding a line break. Disabling this will cause short lines to be merged into the surrounding paragraph's text. If AscToPDF is wrongly handling your short lines, you can adjust the short line cutoff point or the page width (which is used in short line detection) in the Sizes section of the [[goto General analysis policies,General Analysis]] tabbed policy sheet. $_$_BEGIN_IGNORE Look for horizontal rules ......................... $_$_HELP_TOPIC_ID ID_LOOKFOR_RULES By default AscToPDF will treat a series of hyphens, minus signs, equal signs on the same line as a horizontal rule. (On occasion it might be regarded as underlining a heading on the previous line). You can disable this is you wish, or you can specify how many "line" characters it takes to make a horizontal rule. $_$_END_IGNORE Look for bullets and numbered lists ................................... $_$_HELP_TOPIC_ID ID_LOOKFOR_BULLETS By default AscToPDF will try to detect bullet points and numbered lists. This can sometimes go wrong if you have lines that look to the program like bullet points. You can disable this behaviour should you wish. Alternatively you can fine tune the detection of bullets on the [[goto bullet policies,"bullet analysis"]] tabbed policy sheet. $_$_BEGIN_IGNORE Look for definitions .................... $_$_HELP_TOPIC_ID ID_LOOKFOR_DEFINITIONS By default AscToPDF will try to detect definitions and notes, usually in the form of a single word and a hanging paragraph. This can often go wrong, so you can use this policy to disable this feature. Look for quoted lines ..................... $_$_HELP_TOPIC_ID ID_LOOKFOR_QUOTES By default AscToPDF will try to identify "quoted" lines. Quoted lines are lines that have had a single character (often ">" or "!") inserted at the start. This is common practice when quoting email in a reply. AscToPDF places such text in italics. You can disable this behaviour should you wish. Look for emphasis ................. AscToPDF will try to look for text that has been marked up with underscores and asterisks to signify bold an italic text. For example $_$_CHANGE_POLICY look for emphasis : no *This is bold* and _this is italic_ $_$_CHANGE_POLICY look for emphasis : no becomes *This is bold* and _this is italic_ Look for underlined text ........................ AscToPDF will try to detect where a line of text has been "underlined" by following it by a same length row of dashes, hyphens, equal signs etc. This text will then be regarded as a candidate for being an underlined heading or - if those are not allowed - underlined text. If you have tables and reports, you may want to switch this policy off since the line at the end of a table may appear to under- or over-line the last line of text in the table. $_$_END_IGNORE Look for mail and USENET headers ................................ $_$_HELP_TOPIC_ID ID_LOOKFOR_MAILHEAD AscToPDF will try to look for email and USENET headers. Where these are recognised they can be simplified so that only the To, Form and Subject lines are shown in the output. You can disable this behaviour should you wish. $_$_BEGIN_IGNORE Look for character encoding ........................... Specifies whether or not the software should attempt to detect alternative character sets, such as those used for languages such as Greek, Turkish, Chinese etc. The software does this by doing a statistical analysis on the characters used in the source file. This process isn't perfect, and when it fails you will need to manually set the correct character set using the [[GOTO Character encoding]] policy. If you find the program is wrongly detecting the character encoding, disable this policy and/or manually set it using the [[GOTO Character encoding]] policy $_$_END_IGNORE Look for preformatted text .......................... $_$_HELP_TOPIC_ID ID_LOOKFOR_PREFORM By default AscToPDF will try to identify regions of preformatted text. Once identified AscToPDF will try to decide if it's a diagram, table or some other form of preformatted text. If it thinks it's a table it will attempt to place the text in an appropriate table structure. You can disable the search for preformatted text, or if you allow preformatted text, disable table generation. (This may be appropriate if you have a large number of ASCII diagrams in your text). The search for preformatted text can be refined via the [[goto Pre-formatted text policies,Pre-formatted text]] $_$_BEGIN_IGNORE and [[goto Table analysis policies,Table analysis]] $_$_END_IGNORE tabbed policy sheets. $_$_BEGIN_IGNORE The output of tables can be fine-tuned via the output policy [[goto Formatting policies,Formatting]] tabbed policy sheet. $_$_END_IGNORE $_$_BEGIN_IGNORE Look for diagrams ................. Specifies whether or not regions of preformatted text that are detected should be considered as candidate diagrams. Text that contains numbers of characters such as "|", "-", ">" and "<" may be considered to be an ASCII diagram. If you find the program is wrongly treating tables as diagrams then disable this policy. $_$_END_IGNORE $_$_HELP_CHAPTER 3,"General analysis policies" General analysis policies ------------------------- $_$_HELP_TOPIC_ID HIDD_ANALYSIS These policies aid AscToPDF's analysis by describing in detail what the contents of the document being converted are *Sizes* - [[popup Page Width]] - [[popup TAB Size]] - [[popup Short line length]] - [[popup Min Chapter Size]] *Paragraphs* - [[popup Blank lines between paragraphs]] - [[popup New paragraph offset]] $_$_BEGIN_IGNORE *Definitions* - [[popup Search for definitions,Search for definitions in source text]] - [[popup hanging indent position(s),Definition paragraph indent levels]] - [[popup recognize hyphen characters]] - [[popup recognize colon characters]] - [[popup Other definition characters]] $_$_END_IGNORE *Layout* - [[popup indent position(s), Indentation levels]] Page Width .......... $_$_HELP_TOPIC_ID ID_PAGEWIDTH This indicates the width (in characters) of your nominal output page. This width is calculated from the observed line lengths in the original document. This width is used in short line calculation, and determining whether a given line contains a definition term or not (definition character near the start of the line). In documents that contain line feeds this should be automatically detected. In other documents you may need to set this manually. TAB size ........ $_$_HELP_TOPIC_ID ID_TABSIZE This indicates the size (in characters) of your tabs. AscToPDF converts all tabs to spaces on conversion before analysis. By default a tab size of 8 characters is assumed. The tab size can influence the analysis of paragraph indentations and other layout. Provided they are used consistently there shouldn't be a problem. However where tabs and spaces are used in combination, mistakes can arise. This is particularly true in tables of data. AscToPDF does not expect tab-separated table cells, instead converting the tabs to spaces and analysing the results. If your source document has been created with an editor with a different tab size, you should change this value should you start to experience strange layout conversion problems. Short Line Length ................. $_$_HELP_TOPIC_ID ID_SHORTLINE This policy is used to determine what is a "short line". Short lines are treated specially by AscToPDF by adding a paragraph marker on the end. They can also be used to detect ends of paragraphs in those documents that don't have blank lines between paragraphs. Normally AscToPDF will determine whether or not a line is short by comparing it to the page width, given the current context. The default value is 0 characters (indicating a comparison to Page Width should be used). Set this to any value you like. A value of 80 is likely to make every line in your original document have a paragraph marker on the end. Min Chapter Size ................ $_$_HELP_TOPIC_ID ID_MINCHAPTER This policy tells AscToPDF what the smallest chapter size may be. This is used when trying to determine if a numbered line is a chapter heading. AscToPDF tries to avoid treating numbered lists as a series of small chapters using this policy. The default value is 8 lines. Change this only if you suspect small chapters are being ignored, or large list items are being treated as chapter headings. Blank Lines between paragraphs .............................. $_$_HELP_TOPIC_ID ID_BLANKLINES AscToPDF can detect whether or not it should expect blank lines between paragraphs. Documents without blank lines between paragraphs will be harder to convert, and errors are more likely. Unfortunately text documents exported from Word for Windows often have this property. Where there are no blank lines, AscToPDF relies of spotting the last line of a paragraph (usually shorter), and (in some documents) the presence of a [[popup New paragraph offset,"hanging indent"]] at the start of each new paragraph. This should be automatically detected. New Paragraph Offset .................... $_$_HELP_TOPIC_ID ID_NEWPARA Some documents start the first line of a new paragraph with an offset of a number of characters. This is especially true in text files saved from Word for Windows documents. AscToPDF can sometimes confuse such paragraphs as being two different levels of indentation. Use this policy to eliminate such confusion. This should be automatically detected $_$_BEGIN_IGNORE Search for definitions ...................... $_$_HELP_TOPIC_ID ID_ALLOWDEFINITIONS This policy can be used to disable the search for definitions. Sometimes this leads to unexpected results with text that is not part of a definition being treated as such. In such cases you can adjust the definition policies, but if this still fails, use this to disable the search completely. See also [[popup one-line definitions]] and [[popup definition paragraphs]] $_$_END_IGNORE $_$_BEGIN_IGNORE Hanging indent position(s) .......................... $_$_HELP_TOPIC_ID ID_DEFNINDENTS This policy identifies the indentations used for the follow-on text in [[popup definition paragraphs]]. These indentation levels need not be the same as the indentation levels used for normal text, though of course often they are. This should be detected automatically, but if your document has only a few examples it's possible AscToPDF will ignore them. In such cases you may need to set this policy manually. _Note, this policy appears on-screen as "Definition paragraph indent levels"_ $_$_END_IGNORE $_$_BEGIN_IGNORE Recognize hyphen characters ........................... $_$_HELP_TOPIC_ID ID_HYPHENDEFNS This policy specifies whether or not hyphen (-) characters are used in [[popup one-line definitions]]. If the hyphen character only occurs in definitions, then set the nearby always flag, otherwise AscToPDF will have to guess whether a particular character is part of a definition or not. This is sometimes a source of conversion errors. If this policy is selected, it will result in a suitable "Definition Char" line being added to the policy file. This should be detected automatically. Recognize colon characters .......................... $_$_HELP_TOPIC_ID ID_COLONDEFNS This policy specifies whether or not colon (:) characters are used in [[popup one-line definitions]]. If the colon character only occurs in definitions, then set the nearby always flag, otherwise AscToPDF will have to guess whether a particular character is part of a definition or not. This is sometimes a source of conversion errors. If this policy is selected, it will result in a suitable "Definition Char" line being added to the policy file. This should be detected automatically. Other definition Characters ........................... $_$_HELP_TOPIC_ID ID_DEFNCHARS This policy specifies which other characters are used in [[popup one-line definitions]]. This may be detected automatically, but more likely you'll need to specify it yourself. Each character selected as a potential delimiter will result in a "Definition Char" line being added to the policy file. $_$_END_IGNORE Indent position(s) .................. $_$_HELP_TOPIC_ID ID_INDENTS AscToPDF recognises multiple levels of indentation. This policy shows the character levels at which indentation has been detected. AscToPDF converts all tab characters into multiple spaces in input. These indentation positions are the positions that result after that conversion. Depending on your tab settings these might not be exactly the positions you would expect. Normally these levels are correctly detected automatically, but should you wish to set them manually you may need to experiment slightly to see how AscToPDF has handled your tabs. Bullet policies --------------- $_$_HELP_TOPIC_ID HIDD_BULLETS AscToPDF should be able to detect the use of bullets on a reasonably sized document. These policies describe the type of bullets expected. - [[popup Look for bullets, Automatically detect bullets and numbered lists]] *Expected Bullet types* - [[popup expect numbered bullets, numbered bullets]] - [[popup expect alphabetic bullets, alphabetic bullets]] - [[popup expect roman numeral bullets, roman numeral bullets]] *Bullet characters* - [[popup "recognize '-' as a bullet",recognize hyphen character as a bullet point]] - [[popup "recognize 'o' as a bullet",'recognize an "o" character as a bullet point']] - [[popup Other bullet point characters]] Look for bullets ................ $_$_HELP_TOPIC_ID ID_AUTODETECT_BULLETS This policy states whether or not the program should attempt to automatically detect bullets and numbered lists. This should normally be left on unless your document has no such features, but the program (wrongly) thinks it has. This policy appears on the Bullets dialog as "Automatically detect bullets and numbered lists", but is identical to the "Look for bullets" policy on the [[goto 'What to look for' policies]] tabbed property sheet. Expect Numbered bullets ....................... $_$_HELP_TOPIC_ID ID_NUMBERBULLETS This policy states whether or not numbered bullet points are expected. The numbered bullets can be followed by any punctuation, thus 1., 2) and (3) will all be recognised, but PDF will not necessarily support this in the markup produced. This should be automatically detected. Expect alphabetic bullets ......................... $_$_HELP_TOPIC_ID ID_ALPHABULLETS This policy states whether or not alphabetic bullet points are expected. The numbered bullets can be followed by any punctuation, thus a., b) and (c) will all be recognised, but PDF will not necessarily support this in the markup produced. Both upper and lower case bullets are recognised (and supported in the markup). This should be automatically detected Expect roman numeral bullets ............................ $_$_HELP_TOPIC_ID ID_ROMANBULLETS This policy states whether or not roman numeral bullet points are expected. The numbered bullets can be followed by any punctuation, thus i., ii) and (iii) will all be recognised, but PDF will not necessarily support this in the markup produced. Both upper and lower case bullets are recognised (and supported in the markup), although the range of roman numeral values supported is limited. This should be automatically detected. Recognize '-' as a bullet ......................... $_$_HELP_TOPIC_ID ID_MINUSBULLETS This policy states whether or not bullet points starting with the hyphen character '-' are expected. This policy appear on-screen as "Recognize hyphen character as a bullet point" This should be automatically detected. Recognize 'o' as a bullet ......................... $_$_HELP_TOPIC_ID ID_OBULLETS This policy states whether or not bullet points starting with the lower case 'o' are expected. This policy appear on-screen as "Recognize 'o' character as a bullet point" This should be automatically detected. Other bullet point characters ............................. $_$_HELP_TOPIC_ID ID_OTHERBULLETS This policy lists any other characters that are to be recognised as bullet characters. Each bullet character entered will appear in the policy file as it's own "Bullet Char" line. This should be automatically detected, but may sometimes need to be manually entered. $_$_HELP_CHAPTER 3,"Contents policies" Contents policies ----------------- $_$_HELP_TOPIC_ID HIDD_CONTENTS This dialog shows both analysis and output policies connected with contents list detection and generation. *Analysis* - [[popup Expect contents list]] Expect contents list .................... $_$_HELP_TOPIC_ID ID_EXPECTCONTENTS This policy specifies whether or not the document already contains a contents list. If it does, AscToPDF will attempt to convert the existing list into a series of hyperlinks. This should be detected automatically, but occasionally you will need to set this policy manually. See the discussion on contents list generation in the [[goto Documentation available]] $_$_HELP_CHAPTER 3,"File Structure policies" File Structure policies ----------------------- $_$_HELP_TOPIC_ID HIDD_FILESTRUCT These policies aid AscToPDF's analysis by describing some of the file structure that would affect the analysis. - [[popup Keep it simple, Expect only a simple layout]] *Expected File contents* - [[popup Expect Code samples,'Expect "C"-code samples']] - [[popup Input file contains DOS characters, Contains DOS characters]] - [[popup Input file contains PCL codes, Contains PCL printer codes]] - [[popup Input file contains Japanese characters, Contains non-European (e.g. Japanese) characters]] - [[popup Input file contains MIME encoding, Contains mime-encoded quotable characters]] - [[popup Input file has change bars, File has change bars]] - [[popup Input file has page markers, File has Page markers]] - [[popup Page marker size (in lines)]] *Text Attributes* - [[popup Text justification]] - [[popup Input file is double spaced, File is double spaced]] *Text to ignore* - [[popup Lines to ignore at start of file, Number of lines to ignore at start of document]] - [[popup Lines to ignore at end of file, Number of lines to ignore at end of document]] Keep it simple .............. $_$_HELP_TOPIC_ID ID_SIMPLE AscToPDF puts a lot of effort into detecting overall structure such as headings etc. In documents that don't have any such structure, AscToPDF is liable to convert any line with a number at the start into a heading. To prevent this, you can mark the document as simple, that is with no global structure. In a simple document AscToPDF will attempt far less analysis. This policy appears on-screen as "Expect only a simple layout". AscToPDF attempts to automatically identify simple documents, but you may still need to set this policy manually. Expect Code samples ................... $_$_HELP_TOPIC_ID ID_EXPECTCODE AscToPDF can markup C-like code fragments in ...tags to preserve the layout and readability of the quoted code. This may be automatically detected, but occasionally needs to be manually corrected. Input file contains DOS characters .................................. $_$_HELP_TOPIC_ID ID_DOSCHARS AscToPDF can convert files that use the DOS (OEM) character set. By default the file is assumed to be in the ANSI character set, but some files may have originated under DOS. This may be automatically detected, but usually needs to be manually set. Input file contains PCL codes ............................. $_$_HELP_TOPIC_ID ID_PCL_CODES Indicates that the input file contains PCL printer codes. When set, the program will make whatever sensible use it can of these codes, otherwise they will be removed. Please note that the PCL printer codes offer a rich command language that may be used to drive graphical printers. As such the emulation possibilities in a *text* converter are limited, and it is quite likely that files that make heavy use of such codes will fail dramatically to convert. That said, those codes that are not recognised will be eliminated from the output. Input file contains Japanese characters ....................................... $_$_HELP_TOPIC_ID ID_JAPANCHARS *** not implemented yet *** Files using non-ASCII character sets (Japanese, Korean etc) will be incorrectly converted. This may be fixed (as far as possible) in later versions. Appears on-screen as "Contains non-European (e.g. Japanese) characters" Input file contains MIME encoding ................................. $_$_HELP_TOPIC_ID ID_MIMECHARS AscToPDF can convert mime-encoded quotable characters. These will usually appear in files that were originally part of an email message. Such files use the "=" character to escape special characters. So for example "=20" should be interpreted as a space. This appears on-screen as "Contains mime-encoded quotable characters" This may be automatically detected in files where the "=" is used to break up long lines, but more usually you will need to manually set this. Input file has change bars .......................... $_$_HELP_TOPIC_ID ID_CHANGEBARS AscToPDF can strip out change bars in documents that contain them. Change bars are usually a vertical bar '|' placed in the leftmost or rightmost column. Currently this is not automatically detected, and so will need to be manually switched on. Input file has page markers ........................... $_$_HELP_TOPIC_ID ID_PAGEMARKER AscToPDF has a limited ability to remove page markers. These are normally a few lines following a form feed (FF) character, containing page numbers etc. This will commonly occur with files generated from older software packages. Page marker size (in lines) ........................... $_$_HELP_TOPIC_ID ID_PAGE_MARKERSIZE The number of lines after each form feed (FF) that should be ignored. These lines will not be copied to the output. Text Justification .................. $_$_HELP_TOPIC_ID ID_TEXTJUSTIFICATION AscToPDF recognises documents that are left justified (default), right justified, centred or both left and right justified (confusingly known as "justified"). The program cannot currently mark up the text in a matching style, but this policy is important in the analysis. For example "justified" documents are padded with extra white space which could be interpreted as pre-formatted text where the document not recognised as being justified. Normally this policy is correctly detected automatically. Input file is double spaced ........................... $_$_HELP_TOPIC_ID ID_DOUBLESPACED AscToPDF will normally treat a blank line as a break between paragraphs. Some files have extra CR/LF characters (usually if they've come from a different computer, or from a printer package). In such cases AscToPDF will see every second line as blank, and this will affect the analysis, usually by turning each line of data into a separate paragraph. If you have such a file, use this policy to mark the file as double spaced to get better results. Lines to ignore at start of file ................................ $_$_HELP_TOPIC_ID ID_IGNORE_AT_START This specifies how many lines from the input files should be ignored at the start of the file. These lines will be discarded from the output. This can be useful when converting file copied from a news feed or whatever that adds a small data header to the file. Lines to ignore at end of file .............................. $_$_HELP_TOPIC_ID ID_IGNORE_AT_END This specifies how many lines from the input files should be ignored at the end of the file. Up to 40 lines may be ignored in this way. These lines will be discarded from the output. This can be useful when converting file copied from a news feed or whatever that adds a small data footer to the file. $_$_HELP_CHAPTER 3,"Headings policies" Headings policies ----------------- $_$_HELP_TOPIC_ID HIDD_HEADINGS These policies determine the headings structure that the document is expected to have. Normally these are calculated correctly by AscToPDF, but due to the complexity of heading detection, you may sometimes need to correct the analysis. At the top of the dialog you can specify what type of headings you expect to see. Any combination is allowed, although usually documents use just one type of heading. - [[popup Expect Numbered headings]] - [[popup Expect Underlined headings]] - [[popup Expect Capitalised headings]] - [[popup Expect Embedded headings]] - [[popup Heading Key phrases]] - [[popup Use first line as heading]] - [[popup Center first heading]] - [[popup Check indentation for consistency, Check indentations of headings are consistent]] If numbered headings are expected, it may be possible to expect headings at multiple levels, and to also expect a contents list. Each level of heading will have it's own set of policies which are shown on this dialog. The policies are shown in text form, but are edited via [[goto the heading details dialog]] Note: This area of functionality is continually under review. See also the discussion in detecting [[goto headings and section titles]]. Expect numbered headings ........................ $_$_HELP_TOPIC_ID ID_NUMBERED_HEADINGS This policy specifies whether or not numbered headings are expected in the document. Numbered headings may be found at multiple levels, and their details may be edited via [[goto The heading details dialog]] This should be calculated correctly by AscToPDF. But is prone to error, getting confused by numbered bullets and the like. In such cases you may need to set this policy manually. Expect underlined headings .......................... $_$_HELP_TOPIC_ID ID_UNDERLINED_HEADINGS This policy specifies whether or not underlined headings are expected. Note, where the headings themselves are numbered, the underlining will be taken into account, and you should set the [[popup expect numbered headings]] policy instead. AscToPDF uses the character in the underlining to determine the heading level, thus text underlined with equals signs is given prominence over text with single underline characters such as minus signs, tildes or underscores. Expect capitalised headings ........................... $_$_HELP_TOPIC_ID ID_CAP_HEADINGS This policy specifies whether or not CAPITALISED headings are expected. Note, where the headings themselves are numbered, this policy need not be set, and instead you should set the [[popup expect numbered headings]] policy instead. Expect Embedded headings ........................ $_$_HELP_TOPIC_ID ID_EMBEDDED_HEADINGS This policy specifies whether or not "embedded" headings are expected, i.e.. the heading is "embedded" in the first paragraph. Such headings are expected to be a complete sentence or phrase in UPPER CASE at the start of a paragraph. At present such headings are not auto-detected... you need to switch this policy. Heading Key phrases ................... $_$_HELP_TOPIC_ID ID_KEYPHRASE_HEADINGS If specified, then any line that begins with one of the key phrases will be regarded as a heading. The syntax is,... where each set of details is=, [ ] and = [| ] That is, each set of can optionally specify a. If omitted this will default to 1,2,3 for the first, second, third set of details etc. Note, this is a *logical* heading level, and will be apparent in the contents list. Each set of must supply a set of, and each set of phrases would must have at least one phrase with extra phrases added if wanted, separated by vertical bars. So for example Part, Chapter, Section would treat lines beginning with the words "Part", "Chapter" and "Section" as level 1,2, and 3 headings. The key phrases are case-sensitive in order to reduce the likelihood of false matches with lines that just happen to have these phrases at the start of the line. So PART|Part, Chapter, Section Would allow either "PART" or "Part" to be matched. "PART|Part,1" , "Chapter,2" , "Section,2" Would make lines beginning with "Part" level-1 headings, while both "Chapter" and "Section" would become level 2. This would be the same as "PART|Part,1" , "Chapter|Section,2" Note, spaces may form part of a match phrase, but because of their use in the tag syntax commands and vertical bars may not. If false matches occur, (e.g. the word "Part" appears in the body of the text) edit the source text so that the offending word is no longer at the start of the line. Use first line as heading ......................... When this option is selected, the first line in the document will be treated as a heading. This can be a useful option to select when the first line of your document is a document title line, but doesn't conform to the headings style used in the rest of the document. $_$_BEGIN_IGNORE See also [[goto use first line as title]] $_$_END_IGNORE Center first heading .................... When this option is selected, the first heading in the document is centred. This may be an appropriate choice when the first heading is in fact to be treated as a document title. See also [[goto use first line as heading]] Check indentation for consistency ................................. $_$_HELP_TOPIC_ID ID_HEADINDENTS The program performs a number of consistency checks when detecting headings. Amongst these is a check that all headings of the same type occur at the same indentation. This check can help distinguish between numbered headings and numbered lists. However, if you have numbered headings that are different indentations - e.g. because they are centred on the page - then this check will cause them to be rejected as headings. In such cases you can manually disable this check. This policy appears on-screen as "Check indentations of headings are consistent" The heading details dialog .......................... $_$_HELP_TOPIC_ID HIDD_HEADDTLS This dialog is reached through one of the edit buttons on the main [[goto Headings Policies]] dialog. This allows you to edit details of a particular type or level of heading. *Position of section number on the line* - [[popup Indentation of heading lines]] - [[popup Heading prefix words]] *Section number formatting* - [[popup Heading numbering scheme]] - [[popup Heading separator characters]] - [[popup Heading trailing letters]] *Bracketing* - [[popup Heading bracket characters]] Indentation of heading lines ............................ $_$_HELP_TOPIC_ID ID_HEAD_INDENT AscToPDF uses checks on indentation levels to reject lines with numbers on that could be confused with headers. This is the indentation level (in characters) that heading of this types are expected to be found at. Heading prefix words .................... $_$_HELP_TOPIC_ID ID_HEAD_PREFIX Some documents put words like "chapter", "subject" and "section" in front of the section number. These are known as prefix words. Heading numbering scheme ........................ $_$_HELP_TOPIC_ID ID_HEAD_NUMBERTYPE This is the numbering scheme expected for headings at this level. At present AscToPDF can't cope with mixed types like "II-2.b". This may be addressed in later versions. Heading separator characters ............................ $_$_HELP_TOPIC_ID ID_HEAD_SEPARATOR This shows the separator expected between parts of the heading number. *** Not currently supported *** Heading trailing letters ........................ $_$_HELP_TOPIC_ID ID_HEAD_TRAILALPHA This shows whether we expect trailing letters after the section number, as in "1.1b". *** Not currently supported *** Heading bracket characters .......................... $_$_HELP_TOPIC_ID ID_HEAD_BRACKETS This shows what bracket characters (if any) we expect before and after the section number as in "[2.2]" or "3.2.1)". *** Not currently supported *** $_$_HELP_CHAPTER 3,"Pre-formatted text policies" Pre-formatted text policies --------------------------- $_$_HELP_TOPIC_ID HIDD_PREFORMAT These policies specify how AscToPDF detects pre-formatted text. *Detecting pre-formatted regions* - [[popup Minimum size of automatic section]] See the section on [[goto pre-formatted text]] for more details. Minimum size of automaticsection ....................................... $_$_HELP_TOPIC_ID ID_MINPRESIZE This policy specifies the minimum number of consecutive pre-formatted lines that must be detected before the text is placed in fixed width font. AscToPDF detects heavily formatted lines, and then looks at their neighbours to see if they too could be part of a pre-formatted text. Once a group of lines is identifies, it will only be marked up as pre-formatted if the minimum is exceeded. The default value is 0. Set this value larger if AscToPDF is marking text as pre-formatted when it shouldn't do. Note: Theis a reference to the shared ancestry of this software with the text to HTML converter from which it evolved. $_$_BEGIN_IGNORE $_$_HELP_CHAPTER 3,"Table analysis policies" Table analysis policies ----------------------- $_$_HELP_TOPIC_ID HIDD_TABANAL These policies specify how AscToPDF detects possible tables and analyses the data in them into columns and rows. - [[popup Attempt table generation]] *Detection* - [[popup Table extending factor, Extend preformatted regions]] *Analysing rows* - [[popup Could be blank line separated,Could table have blank lines between rows]] *Analysing columns* - [[popup Default table layout, Table Layout]] - [[popup Expect sparse tables,Is the table expected to have sparse columns]] - [[popup Ignore table header during analysis,Ignore table header when analysing columns]] - [[popup Column merging factor, Merge together "poor" columns]] - [[popup Minimum TABLE column separation, Minimum number of spaces between table columns]] See the section on [[goto pre-formatted text]] for more details. Attempt table generation ........................ $_$_HELP_TOPIC_ID ID_ATTEMPT_TABLE This policy specifies whether or not you want PDF table generation attempted for regions of apparently pre-formatted text. AscToPDF will attempt to analyse such regions, preferring to fit them into a PDF table. However, if this is not possible, or if AscToPDF decides the pre-formatted region is something else (like a diagram or a piece of code) then a RTF table will not be generated. Disabling this policy tells AscToPDF not to attempt this analysis, usually leading to pre-formatted text being placed in simple fixed width font markup instead. Table extending factor ...................... $_$_HELP_TOPIC_ID ID_EXTEND_TABLE When the program encounters a strongly formatted line, it examines the adjacent lines to see if they too could form part of the same preformatted region. This policy specifies the extend to which strongly preformatted lines should be used to "extend" to include adjacent lines as part of the same preformatted regions. If set to 10, then all adjacent lines up to the next page break or section heading will be treated as part of the same region. When set to 1 only those lines that are clearly heavily formatted themselves will be included. This policy appears on-screen as "Extend preformatted regions" Could be blank line separated ............................. This option specifies whether or not tables are expected to have blank lines between rows. If they are, the software will be more likely to merge the text for adjacent source lines into a single row in the output table. Expect sparse tables .................... $_$_HELP_TOPIC_ID ID_TABLE_SPARSE This policy is used to tell AscToPDF that you expect your tables to be quite sparse in places. This can affect AscToPDF's analysis, as the algorithms are liable to merge "empty" columns with their less empty neighbours. Enabling this policy will usually result in your tables having more, emptier, columns. See also the [[popup Pre-processor command: TABLE_MAY_BE_SPARSE]]. Ignore table header during analysis ................................... $_$_HELP_TOPIC_ID ID_IGNORE_HEADER This policy specifies that the table header should be ignored when analysing the column structure of the table. In some tables (usually "reports") the header can be quite complex, with titles spanning multiple columns, whereas the body of the table is much more structured. In such cases including the table header in the analysis can lead to errors, so enabling this policy can simplify the analysis giving better chances of success. This policy appears on-screen as "Ignore table header when analysing columns" Column merging factor ..................... $_$_HELP_TOPIC_ID ID_MERGE_POOR Once the program has detected the column layout of a table, it reviews how well the data can be fitted into these columns. If too many cells in a column are empty, or if too many cells "span" multiple columns, then the columns are deemed to be "poor", and may be merged together to form fewer, wider columns. This factor determines the extent to which columns should be merged. A value of 10 means columns should be merged together whenever there is any doubt. Use this if you are getting too many columns. A value of 1 means columns should never be merged. Use this if you are getting too few columns. This policy appears on-screen as "Merge together "poor" columns". Note, this policy can't guarantee you will the correct column structure, but it does give you a chance to influence the logic. Minimum TABLE column separation ............................... $_$_HELP_TOPIC_ID ID_MIN_TABLE_SEP This policy specifies the minimum number of spaces that should be interpreted as a gap between columns in a potential table. The default value is 1, but this value can sometimes lead to too many columns, especially in small tables. Larger values may lead to columns being merged together. This policy appears on-screen as "Minimum number of spaces between table columns" $_$_END_IGNORE $_$_BEGIN_IGNORE $_$_HELP_CHAPTER 2,"Output policies" $_$_HELP_SUBJECT "Introduction" Output policies =============== $_$_HELP_TOPIC_ID ID_OUTPUT_POLICIES These policies are used to control the output to PDF. Generally these policies allow you to decide how the resulting PDF should look in a manner that cannot be inferred from the original document. - [[goto File Structure policies,File generation]] - [[goto Document details]] $_$_BEGIN_IGNORE - [[goto Formatting policies,Formatting]] - [[goto Hyperlinks policies,Hyperlinks]] $_$_END_IGNORE - [[goto Preprocessor policies,Preprocessor]] $_$_BEGIN_IGNORE - [[goto Link Dictionary Edit Dialog,Link Dictionary]] $_$_END_IGNORE $_$_HELP_CHAPTER 3,"File generating policies" File generating policies ------------------------ $_$_HELP_TOPIC_ID HIDD_FILESPLIT $_$_HELP_TOPIC_ID HIDD_FILESPLIT_RTF _Line and file structures_ - [[popup Preserve file structure using]] - [[popup Preserve line structure]] - [[popup Treat each line as a paragraph]] _Diagnostics Files_ - [[popup Generate diagnostics files, Generate log files]] - [[popup Generate sample policy file]] Preserve file structure using................................... $_$_HELP_TOPIC_ID ID_PRESERVEFILE This policy can be used to place the whole file inside...markup. This will use a mono spaced font that preserves the line structure and the relative spacing of characters. When this is enabled almost all of the program's other conversions will be disabled. You should only really use this if your document has a lot of formatting that the program is failing to understand. This policy needs to be set manually where wanted. Preserve Line structure ....................... $_$_HELP_TOPIC_ID ID_PRESERVELINES This policy specifies that the line structure of the original document should be preserved, rather than just the paragraph structure. If enabled the lines in the output document will match those of the original document, and the text will not automatically be adjusted if you widen your window. On large monitors this will give the text an "