Introduction
Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree.
Requirements
To use Tidy, you will need libtidy installed, available on the tidy homepage http://tidy.sourceforge.net/ .
Installation
Tidy is currently available for PHP 4.3.x and PHP 5 as a PECL extension from http://pecl.php.net/package/tidy .
Note: Tidy 1.0 is just for PHP 4.3.x, while Tidy 2.0 is just for PHP 5.
If PEAR is available on your *nix-like system you can use the pear installer to install the tidy extension, by the following command: pecl install tidy .
Example 1. tidy install by hand in PHP 4.3.x
gunzip tidy-xxx.tgz tar -xvf tidy-xxx.tar cd tidy-xxx phpize ./configure && make && make install
Windows users can download the extension dll from http://pecl4win.php.net/ext.php/php_tidy.dll .
In PHP 5 you need only to compile using the —with-tidy option.
Runtime Configuration
The behaviour of these functions is affected by settings in php.ini .
Table 1. Tidy Configuration Options
Name | Default | Changeable | Changelog |
---|---|---|---|
tidy.default_config | «» | PHP_INI_SYSTEM | Available since PHP 5.0.0. |
tidy.clean_output | «0» | PHP_INI_PERDIR | Available since PHP 5.0.0. |
Here’s a short explanation of the configuration directives.
Default path for tidy config file.
Turns on/off the output repairing by Tidy.
Do not turn on tidy.clean_output if you are generating non-html content such as dynamic images.
Resource Types
This extension has no resource types defined.
Predefined Classes
tidyNode
Methods
tidyNode->hasChildren — Returns TRUE if the current node has childrens
tidyNode->hasSiblings — Returns TRUE if the current node has siblings
tidyNode->isAsp — Returns TRUE if the current node is ASP code
tidyNode->isComment — Returns TRUE if the current node is a comment
tidyNode->isHtml — Returns TRUE if the current node is HTML
tidyNode->isJste — Returns TRUE if the current node is JSTE
tidyNode->isPhp — Returns TRUE if the current node is PHP
tidyNode->isText — Returns TRUE if the current node is Text (no markup)
Properties
value — the value of the node (e.g. the html text)
name — the name of the tag (e.g. html, a, etc..)
type — the type of the node (one of the constants above, e.g. TIDY_NODETYPE_PHP )
line* — the line where the node starts
column* — the column where the node starts
proprietary* — TRUE if the node refers to a proprietary tag
id — the ID of the tag (one of the constants above, e.g. TIDY_TAG_FRAME )
attribute — an array with the attributes of the current node, or NULL if there aren’t any
child — an array with the child tidyNodes, or NULL if there aren’t any
Note: The properties marked with * are just available since PHP 5.1.0.
Predefined Constants
The constants below are defined by this extension, and will only be available when the extension has either been compiled into PHP or dynamically loaded at runtime.
Each TIDY_TAG_XXX represents a HTML tag. For example, TIDY_TAG_A represents a link tag. Each TIDY_ATTR_XXX represents a HTML atribute. For example TIDY_ATTR_HREF would represent the href atribute in the previous example.
The following constants are defined:
Table 2. tidy tag constants
constant |
---|
TIDY_TAG_UNKNOWN |
TIDY_TAG_A |
TIDY_TAG_ABBR |
TIDY_TAG_ACRONYM |
TIDY_TAG_ALIGN |
TIDY_TAG_APPLET |
TIDY_TAG_AREA |
TIDY_TAG_B |
TIDY_TAG_BASE |
TIDY_TAG_BASEFONT |
TIDY_TAG_BDO |
TIDY_TAG_BGSOUND |
TIDY_TAG_BIG |
TIDY_TAG_BLINK |
TIDY_TAG_BLOCKQUOTE |
TIDY_TAG_BODY |
TIDY_TAG_BR |
TIDY_TAG_BUTTON |
TIDY_TAG_CAPTION |
TIDY_TAG_CENTER |
TIDY_TAG_CITE |
TIDY_TAG_CODE |
TIDY_TAG_COL |
TIDY_TAG_COLGROUP |
TIDY_TAG_COMMENT |
TIDY_TAG_DD |
TIDY_TAG_DEL |
TIDY_TAG_DFN |
TIDY_TAG_DIR |
TIDY_TAG_DIV |
TIDY_TAG_DL |
TIDY_TAG_DT |
TIDY_TAG_EM |
TIDY_TAG_EMBED |
TIDY_TAG_FIELDSET |
TIDY_TAG_FONT |
TIDY_TAG_FORM |
TIDY_TAG_FRAME |
TIDY_TAG_FRAMESET |
TIDY_TAG_H1 |
TIDY_TAG_H2 |
TIDY_TAG_H3 |
TIDY_TAG_H4 |
TIDY_TAG_H5 |
TIDY_TAG_H6 |
TIDY_TAG_HEAD |
TIDY_TAG_HR |
TIDY_TAG_HTML |
TIDY_TAG_I |
TIDY_TAG_IFRAME |
TIDY_TAG_ILAYER |
TIDY_TAG_IMG |
TIDY_TAG_INPUT |
TIDY_TAG_INS |
TIDY_TAG_ISINDEX |
TIDY_TAG_KBD |
TIDY_TAG_KEYGEN |
TIDY_TAG_LABEL |
TIDY_TAG_LAYER |
TIDY_TAG_LEGEND |
TIDY_TAG_LI |
TIDY_TAG_LINK |
TIDY_TAG_LISTING |
TIDY_TAG_MAP |
TIDY_TAG_MARQUEE |
TIDY_TAG_MENU |
TIDY_TAG_META |
TIDY_TAG_MULTICOL |
TIDY_TAG_NOBR |
TIDY_TAG_NOEMBED |
TIDY_TAG_NOFRAMES |
TIDY_TAG_NOLAYER |
TIDY_TAG_NOSAVE |
TIDY_TAG_NOSCRIPT |
TIDY_TAG_OBJECT |
TIDY_TAG_OL |
TIDY_TAG_OPTGROUP |
TIDY_TAG_OPTION |
TIDY_TAG_P |
TIDY_TAG_PARAM |
TIDY_TAG_PLAINTEXT |
TIDY_TAG_PRE |
TIDY_TAG_Q |
TIDY_TAG_RP |
TIDY_TAG_RT |
TIDY_TAG_RTC |
TIDY_TAG_RUBY |
TIDY_TAG_S |
TIDY_TAG_SAMP |
TIDY_TAG_SCRIPT |
TIDY_TAG_SELECT |
TIDY_TAG_SERVER |
TIDY_TAG_SERVLET |
TIDY_TAG_SMALL |
TIDY_TAG_SPACER |
TIDY_TAG_SPAN |
TIDY_TAG_STRIKE |
TIDY_TAG_STRONG |
TIDY_TAG_STYLE |
TIDY_TAG_SUB |
TIDY_TAG_TABLE |
TIDY_TAG_TBODY |
TIDY_TAG_TD |
TIDY_TAG_TEXTAREA |
TIDY_TAG_TFOOT |
TIDY_TAG_TH |
TIDY_TAG_THEAD |
TIDY_TAG_TITLE |
TIDY_TAG_TR |
TIDY_TAG_TR |
TIDY_TAG_TT |
TIDY_TAG_U |
TIDY_TAG_UL |
TIDY_TAG_VAR |
TIDY_TAG_WBR |
TIDY_TAG_XMP |
Table 3. tidy attribute constants
constant |
---|
TIDY_ATTR_UNKNOWN |
TIDY_ATTR_ABBR |
TIDY_ATTR_ACCEPT |
TIDY_ATTR_ACCEPT_CHARSET |
TIDY_ATTR_ACCESSKEY |
TIDY_ATTR_ACTION |
TIDY_ATTR_ADD_DATE |
TIDY_ATTR_ALIGN |
TIDY_ATTR_ALINK |
TIDY_ATTR_ALT |
TIDY_ATTR_ARCHIVE |
TIDY_ATTR_AXIS |
TIDY_ATTR_BACKGROUND |
TIDY_ATTR_BGCOLOR |
TIDY_ATTR_BGPROPERTIES |
TIDY_ATTR_BORDER |
TIDY_ATTR_BORDERCOLOR |
TIDY_ATTR_BOTTOMMARGIN |
TIDY_ATTR_CELLPADDING |
TIDY_ATTR_CELLSPACING |
TIDY_ATTR_CHAR |
TIDY_ATTR_CHAROFF |
TIDY_ATTR_CHARSET |
TIDY_ATTR_CHECKED |
TIDY_ATTR_CITE |
TIDY_ATTR_CLASS |
TIDY_ATTR_CLASSID |
TIDY_ATTR_CLEAR |
TIDY_ATTR_CODE |
TIDY_ATTR_CODEBASE |
TIDY_ATTR_CODETYPE |
TIDY_ATTR_COLOR |
TIDY_ATTR_COLS |
TIDY_ATTR_COLSPAN |
TIDY_ATTR_COMPACT |
TIDY_ATTR_CONTENT |
TIDY_ATTR_COORDS |
TIDY_ATTR_DATA |
TIDY_ATTR_DATAFLD |
TIDY_ATTR_DATAPAGESIZE |
TIDY_ATTR_DATASRC |
TIDY_ATTR_DATETIME |
TIDY_ATTR_DECLARE |
TIDY_ATTR_DEFER |
TIDY_ATTR_DIR |
TIDY_ATTR_DISABLED |
TIDY_ATTR_ENCODING |
TIDY_ATTR_ENCTYPE |
TIDY_ATTR_FACE |
TIDY_ATTR_FOR |
TIDY_ATTR_FRAME |
TIDY_ATTR_FRAMEBORDER |
TIDY_ATTR_FRAMESPACING |
TIDY_ATTR_GRIDX |
TIDY_ATTR_GRIDY |
TIDY_ATTR_HEADERS |
TIDY_ATTR_HEIGHT |
TIDY_ATTR_HREF |
TIDY_ATTR_HREFLANG |
TIDY_ATTR_HSPACE |
TIDY_ATTR_HTTP_EQUIV |
TIDY_ATTR_ID |
TIDY_ATTR_ISMAP |
TIDY_ATTR_LABEL |
TIDY_ATTR_LANG |
TIDY_ATTR_LANGUAGE |
TIDY_ATTR_LAST_MODIFIED |
TIDY_ATTR_LAST_VISIT |
TIDY_ATTR_LEFTMARGIN |
TIDY_ATTR_LINK |
TIDY_ATTR_LONGDESC |
TIDY_ATTR_LOWSRC |
TIDY_ATTR_MARGINHEIGHT |
TIDY_ATTR_MARGINWIDTH |
TIDY_ATTR_MAXLENGTH |
TIDY_ATTR_MEDIA |
TIDY_ATTR_METHOD |
TIDY_ATTR_MULTIPLE |
TIDY_ATTR_NAME |
TIDY_ATTR_NOHREF |
TIDY_ATTR_NORESIZE |
TIDY_ATTR_NOSHADE |
TIDY_ATTR_NOWRAP |
TIDY_ATTR_OBJECT |
TIDY_ATTR_OnAFTERUPDATE |
TIDY_ATTR_OnBEFOREUNLOAD |
TIDY_ATTR_OnBEFOREUPDATE |
TIDY_ATTR_OnBLUR |
TIDY_ATTR_OnCHANGE |
TIDY_ATTR_OnCLICK |
TIDY_ATTR_OnDATAAVAILABLE |
TIDY_ATTR_OnDATASETCHANGED |
TIDY_ATTR_OnDATASETCOMPLETE |
TIDY_ATTR_OnDBLCLICK |
TIDY_ATTR_OnERRORUPDATE |
TIDY_ATTR_OnFOCUS |
TIDY_ATTR_OnKEYDOWN |
TIDY_ATTR_OnKEYPRESS |
TIDY_ATTR_OnKEYUP |
TIDY_ATTR_OnLOAD |
TIDY_ATTR_OnMOUSEDOWN |
TIDY_ATTR_OnMOUSEMOVE |
TIDY_ATTR_OnMOUSEOUT |
TIDY_ATTR_OnMOUSEOVER |
TIDY_ATTR_OnMOUSEUP |
TIDY_ATTR_OnRESET |
TIDY_ATTR_OnROWENTER |
TIDY_ATTR_OnROWEXIT |
TIDY_ATTR_OnSELECT |
TIDY_ATTR_OnSUBMIT |
TIDY_ATTR_OnUNLOAD |
TIDY_ATTR_PROFILE |
TIDY_ATTR_PROMPT |
TIDY_ATTR_RBSPAN |
TIDY_ATTR_READONLY |
TIDY_ATTR_REL |
TIDY_ATTR_REV |
TIDY_ATTR_RIGHTMARGIN |
TIDY_ATTR_ROWS |
TIDY_ATTR_ROWSPAN |
TIDY_ATTR_RULES |
TIDY_ATTR_SCHEME |
TIDY_ATTR_SCOPE |
TIDY_ATTR_SCROLLING |
TIDY_ATTR_SELECTED |
TIDY_ATTR_SHAPE |
TIDY_ATTR_SHOWGRID |
TIDY_ATTR_SHOWGRIDX |
TIDY_ATTR_SHOWGRIDY |
TIDY_ATTR_SIZE |
TIDY_ATTR_SPAN |
TIDY_ATTR_SRC |
TIDY_ATTR_STANDBY |
TIDY_ATTR_START |
TIDY_ATTR_STYLE |
TIDY_ATTR_SUMMARY |
TIDY_ATTR_TABINDEX |
TIDY_ATTR_TARGET |
TIDY_ATTR_TEXT |
TIDY_ATTR_TITLE |
TIDY_ATTR_TOPMARGIN |
TIDY_ATTR_TYPE |
TIDY_ATTR_USEMAP |
TIDY_ATTR_VALIGN |
TIDY_ATTR_VALUE |
TIDY_ATTR_VALUETYPE |
TIDY_ATTR_VERSION |
TIDY_ATTR_VLINK |
TIDY_ATTR_VSPACE |
TIDY_ATTR_WIDTH |
TIDY_ATTR_WRAP |
TIDY_ATTR_XML_LANG |
TIDY_ATTR_XML_SPACE |
TIDY_ATTR_XMLNS |
Table 4. tidy nodetype constants
constant | description |
---|---|
TIDY_NODETYPE_ROOT | root node |
TIDY_NODETYPE_DOCTYPE | doctype |
TIDY_NODETYPE_COMMENT | HTML comment |
TIDY_NODETYPE_PROCINS | Processing Instruction |
TIDY_NODETYPE_TEXT | Text |
TIDY_NODETYPE_START | start tag |
TIDY_NODETYPE_END | end tag |
TIDY_NODETYPE_STARTEND | empty tag |
TIDY_NODETYPE_CDATA | CDATA |
TIDY_NODETYPE_SECTION | XML section |
TIDY_NODETYPE_ASP | ASP code |
TIDY_NODETYPE_JSTE | JSTE code |
TIDY_NODETYPE_PHP | PHP code |
TIDY_NODETYPE_XMLDECL | XML declaration |
Examples
Example 2. Basic Tidy usage
a html document true, ‘output-xhtml’ => true, ‘wrap’ => 200); // Tidy $tidy = new tidy; $tidy->parseString($html, $config, ‘utf8’); $tidy->cleanRepair(); // Output echo $tidy; ?>
Table of Contents ob_tidyhandler — ob_start callback function to repair the buffer tidy_access_count — Returns the Number of Tidy accessibility warnings encountered for specified document tidy_clean_repair — Execute configured cleanup and repair operations on parsed markup tidy_config_count — Returns the Number of Tidy configuration errors encountered for specified document tidy::__construct — Constructs a new tidy object tidy_diagnose — Run configured diagnostics on parsed and repaired markup tidy_error_count — Returns the Number of Tidy errors encountered for specified document tidy_get_body — Returns a tidyNode Object starting from the
tag of the tidy parse tree tidy_get_config — Get current Tidy configuration tidy_get_error_buffer — Return warnings and errors which occurred parsing the specified document tidy_get_head — Returns a tidyNode Object starting from the tag of the tidy parse tree tidy_get_html_ver — Get the Detected HTML version for the specified document tidy_get_html — Returns a tidyNode Object starting from the tag of the tidy parse tree tidy_get_opt_doc — Returns the documentation for the given option name tidy_get_output — Return a string representing the parsed tidy markup tidy_get_release — Get release date (version) for Tidy library tidy_get_root — Returns a tidyNode object representing the root of the tidy parse tree tidy_get_status — Get status of specified document tidy_getopt — Returns the value of the specified configuration option for the tidy document tidy_is_xhtml — Indicates if the document is a XHTML document tidy_is_xml — Indicates if the document is a generic (non HTML/XHTML) XML document tidy_load_config — Load an ASCII Tidy configuration file with the specified encoding tidy_node->get_attr — Return the attribute with the provided attribute id tidy_node->get_nodes — Return an array of nodes under this node with the specified id tidy_node->next — Returns the next sibling to this node tidy_node->prev — Returns the previous sibling to this node tidy_parse_file — Parse markup in file or URI tidy_parse_string — Parse a document stored in a string tidy_repair_file — Repair a file and return it as a string tidy_repair_string — Repair a string using an optionally provided configuration file tidy_reset_config — Restore Tidy configuration to default values tidy_save_config — Save current settings to named file tidy_set_encoding — Set the input/output character encoding for parsing markup tidy_setopt — Updates the configuration settings for the specified tidy document tidy_warning_count — Returns the Number of Tidy warnings encountered for specified document tidyNode->hasChildren — Returns true if this node has children tidyNode->hasSiblings — Returns true if this node has siblings tidyNode->isAsp — Returns true if this node is ASP tidyNode->isComment — Returns true if this node represents a comment tidyNode->isHtml — Returns true if this node is part of a HTML document tidyNode->isJste — Returns true if this node is JSTE tidyNode->isPhp — Returns true if this node is PHP tidyNode->isText — Returns true if this node represents text (no markup)The tidy class
The HTML representation of the node, including the surrounding tags.
Table of Contents
- tidy::body — Returns a tidyNode object starting from the tag of the tidy parse tree
- tidy::cleanRepair — Execute configured cleanup and repair operations on parsed markup
- tidy::__construct — Constructs a new tidy object
- tidy::diagnose — Run configured diagnostics on parsed and repaired markup
- tidy::$errorBuffer — Return warnings and errors which occurred parsing the specified document
- tidy::getConfig — Get current Tidy configuration
- tidy::getHtmlVer — Get the Detected HTML version for the specified document
- tidy::getOpt — Returns the value of the specified configuration option for the tidy document
- tidy::getOptDoc — Returns the documentation for the given option name
- tidy::getRelease — Get release date (version) for Tidy library
- tidy::getStatus — Get status of specified document
- tidy::head — Returns a tidyNode object starting from the tag of the tidy parse tree
- tidy::html — Returns a tidyNode object starting from the tag of the tidy parse tree
- tidy::isXhtml — Indicates if the document is a XHTML document
- tidy::isXml — Indicates if the document is a generic (non HTML/XHTML) XML document
- tidy::parseFile — Parse markup in file or URI
- tidy::parseString — Parse a document stored in a string
- tidy::repairFile — Repair a file and return it as a string
- tidy::repairString — Repair a string using an optionally provided configuration file
- tidy::root — Returns a tidyNode object representing the root of the tidy parse tree