Detagger: convert HTML to text and remove markup


Using Detagger to remove and tidy html markup

As a HTML markup remover, Detagger acts as a parser that allows you to "tidy up" your HTML code in a number of ways. It can selectively remove html tags, so that you simply select classes of HTML markup you want to remove, sections of code you want stripped out, or tag manipulations you want performed.

You can use Detagger to:-

  • "tidy up" your HTML by removing unwanted tags.
  • strip HTML tags that are innecessary bloat added by Microsoft Office (Word, Excel)
  • help with all the donkey work when migrating pages to CSS or XHTML
  • eliminate all non-standard and deprecated markup
  • remove width, alignment etc attributes from tables

As well as enabling you to remove HTML tags, Detagger can also function as a fully-featured HTML to text converter

Detagger removal options

Options include:-

  • strip out all non-HTML tags (e.g. the extra MS Office tags added by Word)
  • remove all non-standard tags
  • remove the <HEAD>...</HEAD> section
  • remove all <STYLE> tags, style sheets and CSS attributes
  • remove all <SCRIPT> and JavaScript from the document
  • remove all <FORM>,<INPUT>,<SELECT> etc tags
  • remove all <FONT> tags
  • remove all comment tags
  • remove all hyperlinks (replacing them by the display text only)
  • remove size, alignment and color attributed from table cells.

Tag manipulations option

"Tidy-up" options include:-

  • convert all tags to UPPER or lower case.
  • replace character entities such as "&nbsp;" by ASCII near-equivalents
  • replace <p> markup inside tables by <br>

Documentation

The product comes with extensive documentation, which you can also read online.












home - news - search this site - feedback - contact us
Products: products - ordering - developers - documentation
Resources: introduction to the internet - search engines - web robots - affiliates

 
Converted by AscToHTM