- PHP Simple HTML DOM Parser Manual
- How to create HTML DOM object?
- How to find HTML elements?
- How to access the HTML element’s attributes?
- How to traverse the DOM tree?
- Parsing documents
- DOM methods & properties
- Element methods & properties
- DOM traversing
- Camel naming conventions
- PHP Simple HTML DOM Parser Manual
- How to create HTML DOM object?
- How to find HTML elements?
- How to access the HTML element’s attributes?
- How to traverse the DOM tree?
PHP Simple HTML DOM Parser Manual
// Find all article blocks
foreach($html->find( ‘div.article’ ) as $article) $item[ ‘title’ ] = $article->find( ‘div.title’ , 0 )->plaintext;
$item[ ‘intro’ ] = $article->find( ‘div.intro’ , 0 )->plaintext;
$item[ ‘details’ ] = $article->find( ‘div.details’ , 0 )->plaintext;
$articles[] = $item;
>
How to create HTML DOM object?
// Create a DOM object from a string
$html = str_get_html( ‘
// Create a DOM object from a URL
$html = file_get_html( ‘http://www.google.com/’ );
// Create a DOM object from a HTML file
$html = file_get_html( ‘test.htm’ );
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load( ‘
// Load HTML from a URL
$html->load_file( ‘http://www.google.com/’ );
// Load HTML from a HTML file
$html->load_file( ‘test.htm’ );
How to find HTML elements?
// Find all anchors, returns a array of element objects
$ret = $html->find( ‘a‘ );
// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find( ‘a‘, 0 );
// Find lastest anchor, returns element object or null if not found (zero based)
$ret = $html->find( ‘a‘, -1 );
// Find all with the id attribute
$ret = $html->find( ‘div[id]‘ );
// Find all which attribute id=foo
$ret = $html->find( ‘div[id=foo]‘ );
// Find all element which id=foo
$ret = $html->find( ‘#foo‘ );
// Find all element which class=foo
$ret = $html->find( ‘.foo‘ );
// Find all element has attribute id
$ret = $html->find( ‘*[id]‘ );
// Find all anchors and images
$ret = $html->find( ‘a, img‘ );
// Find all anchors and images with the «title» attribute
$ret = $html->find( ‘a[title], img[title]‘ );
Supports these operators in attribute selectors:
Filter | Description |
---|---|
[attribute] | Matches elements that have the specified attribute. |
[!attribute] | Matches elements that don’t have the specified attribute. |
[attribute=value] | Matches elements that have the specified attribute with a certain value. |
[attribute!=value] | Matches elements that don’t have the specified attribute with a certain value. |
[attribute^=value] | Matches elements that have the specified attribute and it starts with a certain value. |
[attribute$=value] | Matches elements that have the specified attribute and it ends with a certain value. |
[attribute*=value] | Matches elements that have the specified attribute and it contains a certain value. |
$es = $html->find( ‘ul li‘ );
// Find Nested tags
$es = $html->find( ‘div div div‘ );
// Find all td tags with attribite align=center in table tags
$es = $html->find( »table td[align=center]‘ );
// Find all text blocks
$es = $html->find( ‘text‘ );
// Find all comment () blocks
$es = $html->find( ‘comment‘ );
foreach($html->find( ‘ul‘ ) as $ul)
foreach($ul->find( ‘li‘ ) as $li)
// do something.
>
>
How to access the HTML element’s attributes?
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected. ), it will returns true or false)
$value = $e->href;
// Set a attribute(If the attribute is non-value attribute (eg. checked, selected. ), set it’s value as true or false)
$e->href = ‘my link’ ;
// Remove a attribute, set it’s value as null!
$e->href = null ;
// Determine whether a attribute exist?
if(isset($e->href))
echo ‘href exist!’ ;
// Example
$ html = str_get_html ( «
» ) ;
$e = $html->find( «div» , 0 );
echo $e->tag; // Returns: » div»
echo $e->outertext; // Returns: »
»
echo $e->innertext; // Returns: » foo bar»
echo $e->plaintext; // Returns: » foo bar«
Attribute Name | Usage |
---|---|
$e->tag | Read or write the tag name of element. |
$e->outertext | Read or write the outer HTML text of element. |
$e->innertext | Read or write the inner HTML text of element. |
$e->plaintext | Read or write the plain text of element. |
// Extract contents from HTML
echo $html->plaintext;
// Wrap a element
$e->outertext = » . $e->outertext . ‘ ‘;
// Remove a element, set it’s outertext as an empty string
$e->outertext = » ;
// Append a element
$e->outertext = $e->outertext . ‘foo ‘;
// Insert a element
$e->outertext = ‘foo ‘ . $e->outertext;
How to traverse the DOM tree?
// If you are not so familiar with HTML DOM, check this link to learn more.
// Example
echo $html->find( «#div1», 0 )->children( 1 )->children( 1 )->children( 2 )-> id ;
// or
echo $html->getElementById( «div1» )->childNodes( 1 )->childNodes( 1 )->childNodes( 2 )->getAttribute( ‘id’ );
Parsing documents
The parser accepts documents in the form of URLs, files and strings. The document must be accessible for reading and cannot exceed MAX_FILE_SIZE .
Name | Description |
---|---|
str_get_html( string $content ) : object | Creates a DOM object from string. |
file_get_html( string $filename ) : object | Creates a DOM object from file or URL. |
DOM methods & properties
Name | Description |
---|---|
__construct( [string $filename] ) : void | Constructor, set the filename parameter will automatically load the contents, either text or file/url. |
plaintext : string | Returns the contents extracted from HTML. |
clear() : void | Clean up memory. |
load( string $content ) : void | Load contents from string. |
save( [string $filename] ) : string | Dumps the internal DOM tree back into a string. If the $filename is set, result string will save to file. |
load_file( string $filename ) : void | Load contents from a file or a URL. |
set_callback( string $function_name ) : void | Set a callback function. |
find( string $selector [, int $index] ) : mixed | Find elements by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object. |
Element methods & properties
Name | Description |
---|---|
[attribute] : string | Read or write element’s attribute value. |
tag : string | Read or write the tag name of element. |
outertext : string | Read or write the outer HTML text of element. |
innertext : string | Read or write the inner HTML text of element. |
plaintext : string | Read or write the plain text of element. |
find( string $selector [, int $index] ) : mixed | Find children by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object. |
DOM traversing
Name | Description |
---|---|
$e->children( [int $index] ) : mixed | Returns the Nth child object if index is set, otherwise return an array of children. |
$e->parent() : element | Returns the parent of element. |
$e->first_child() : element | Returns the first child of element, or null if not found. |
$e->last_child() : element | Returns the last child of element, or null if not found. |
$e->next_sibling() : element | Returns the next sibling of element, or null if not found. |
$e->prev_sibling() : element | Returns the previous sibling of element, or null if not found. |
Camel naming conventions
Method | Mapping |
---|---|
$e->getAllAttributes() | $e->attr |
$e->getAttribute( $name ) | $e->attribute |
$e->setAttribute( $name, $value) | $value = $e->attribute |
$e->hasAttribute( $name ) | isset($e->attribute) |
$e->removeAttribute ( $name ) | $e->attribute = null |
$e->getElementById ( $id ) | $e->find ( «#$id», 0 ) |
$e->getElementsById ( $id [,$index] ) | $e->find ( «#$id» [, int $index] ) |
$e->getElementByTagName ($name ) | $e->find ( $name, 0 ) |
$e->getElementsByTagName ( $name [, $index] ) | $e->find ( $name [, int $index] ) |
$e->parentNode () | $e->parent () |
$e->childNodes ( [$index] ) | $e->children ( [int $index] ) |
$e->firstChild () | $e->first_child () |
$e->lastChild () | $e->last_child () |
$e->nextSibling () | $e->next_sibling () |
$e->previousSibling () | $e->prev_sibling () |
PHP Simple HTML DOM Parser Manual
// Find all article blocks
foreach($html->find( ‘div.article’ ) as $article) $item[ ‘title’ ] = $article->find( ‘div.title’ , 0 )->plaintext;
$item[ ‘intro’ ] = $article->find( ‘div.intro’ , 0 )->plaintext;
$item[ ‘details’ ] = $article->find( ‘div.details’ , 0 )->plaintext;
$articles[] = $item;
>
How to create HTML DOM object?
// Create a DOM object from a string
$html = str_get_html( ‘
// Create a DOM object from a URL
$html = file_get_html( ‘http://www.google.com/’ );
// Create a DOM object from a HTML file
$html = file_get_html( ‘test.htm’ );
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load( ‘
// Load HTML from a URL
$html->load_file( ‘http://www.google.com/’ );
// Load HTML from a HTML file
$html->load_file( ‘test.htm’ );
How to find HTML elements?
// Find all anchors, returns a array of element objects
$ret = $html->find( ‘a‘ );
// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find( ‘a‘, 0 );
// Find lastest anchor, returns element object or null if not found (zero based)
$ret = $html->find( ‘a‘, -1 );
// Find all with the id attribute
$ret = $html->find( ‘div[id]‘ );
// Find all which attribute id=foo
$ret = $html->find( ‘div[id=foo]‘ );
// Find all element which id=foo
$ret = $html->find( ‘#foo‘ );
// Find all element which class=foo
$ret = $html->find( ‘.foo‘ );
// Find all element has attribute id
$ret = $html->find( ‘*[id]‘ );
// Find all anchors and images
$ret = $html->find( ‘a, img‘ );
// Find all anchors and images with the «title» attribute
$ret = $html->find( ‘a[title], img[title]‘ );
Supports these operators in attribute selectors:
Filter | Description |
---|---|
[attribute] | Matches elements that have the specified attribute. |
[!attribute] | Matches elements that don’t have the specified attribute. |
[attribute=value] | Matches elements that have the specified attribute with a certain value. |
[attribute!=value] | Matches elements that don’t have the specified attribute with a certain value. |
[attribute^=value] | Matches elements that have the specified attribute and it starts with a certain value. |
[attribute$=value] | Matches elements that have the specified attribute and it ends with a certain value. |
[attribute*=value] | Matches elements that have the specified attribute and it contains a certain value. |
$es = $html->find( ‘ul li‘ );
// Find Nested tags
$es = $html->find( ‘div div div‘ );
// Find all td tags with attribite align=center in table tags
$es = $html->find( »table td[align=center]‘ );
// Find all text blocks
$es = $html->find( ‘text‘ );
// Find all comment () blocks
$es = $html->find( ‘comment‘ );
foreach($html->find( ‘ul‘ ) as $ul)
foreach($ul->find( ‘li‘ ) as $li)
// do something.
>
>
How to access the HTML element’s attributes?
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected. ), it will returns true or false)
$value = $e->href;
// Set a attribute(If the attribute is non-value attribute (eg. checked, selected. ), set it’s value as true or false)
$e->href = ‘my link’ ;
// Remove a attribute, set it’s value as null!
$e->href = null ;
// Determine whether a attribute exist?
if(isset($e->href))
echo ‘href exist!’ ;
// Example
$ html = str_get_html ( «
» ) ;
$e = $html->find( «div» , 0 );
echo $e->tag; // Returns: » div»
echo $e->outertext; // Returns: »
»
echo $e->innertext; // Returns: » foo bar»
echo $e->plaintext; // Returns: » foo bar«
Attribute Name | Usage |
---|---|
$e->tag | Read or write the tag name of element. |
$e->outertext | Read or write the outer HTML text of element. |
$e->innertext | Read or write the inner HTML text of element. |
$e->plaintext | Read or write the plain text of element. |
// Extract contents from HTML
echo $html->plaintext;
// Wrap a element
$e->outertext = » . $e->outertext . ‘ ‘;
// Remove a element, set it’s outertext as an empty string
$e->outertext = » ;
// Append a element
$e->outertext = $e->outertext . ‘foo ‘;
// Insert a element
$e->outertext = ‘foo ‘ . $e->outertext;
How to traverse the DOM tree?
// If you are not so familiar with HTML DOM, check this link to learn more.
// Example
echo $html->find( «#div1», 0 )->children( 1 )->children( 1 )->children( 2 )-> id ;
// or
echo $html->getElementById( «div1» )->childNodes( 1 )->childNodes( 1 )->childNodes( 2 )->getAttribute( ‘id’ );