Source for file stringparser.class.php
Documentation is available at stringparser.class.php
* Generic string parsing infrastructure
* These classes provide the means to parse any kind of string into a tree-like
* memory structure. It would e.g. be possible to create an HTML parser based
* @author Christian Seiler <spam@christian-seiler.de>
* @copyright Christian Seiler 2006
* Copyright (c) 2004-2007 Christian Seiler
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* String parser mode: Search for the next character
* @see StringParser::_parserMode
define ('STRINGPARSER_MODE_SEARCH', 1);
* String parser mode: Look at each character of the string
* @see StringParser::_parserMode
define ('STRINGPARSER_MODE_LOOP', 2);
* @see StringParser::addFilter, StringParser::_prefilters
define ('STRINGPARSER_FILTER_PRE', 1);
* Filter type: Postfilter
* @see StringParser::addFilter, StringParser::_postfilters
define ('STRINGPARSER_FILTER_POST', 2);
* Generic string parser class
* This is an abstract class for any type of string parser.
* There are two possible modes: searchmode and loop mode. In loop mode
* every single character is looked at in a loop and it is then decided
* what action to take. This is the most straight-forward approach to
* string parsing but due to the nature of PHP as a scripting language,
* it can also cost performance. In search mode the class posseses a
* list of relevant characters for parsing and uses the
* {@link PHP_MANUAL#strpos strpos} function to search for the next
* relevant character. The search mode will be faster than the loop mode
* in most circumstances but it is also more difficult to implement.
* The subclass that does the string parsing itself will define which
* mode it will implement.
* @see STRINGPARSER_MODE_SEARCH, STRINGPARSER_MODE_LOOP
* Current position in raw text
* Flag if this object is already parsing a text
* This flag is to prevent recursive calls to the parse() function that
* would cause very nasty things.
* Whether to stop parsing if a parse error occurs.
* Characters or strings to look for
* Characters currently allowed
* Note that this will only be evaluated in loop mode; in search mode
* this would ruin every performance increase. Note that only single
* characters are permitted here, no strings. Please also note that in
* loop mode, {@link StringParser::_charactersSearch _charactersSearch}
* is evaluated before this variable.
* If in strict mode, parsing is stopped if a character that is not
* allowed is encountered. If not in strict mode, the character is
* @param int $type The type of the filter
* @param mixed $callback The callback to call
* @see STRINGPARSER_FILTER_PRE, STRINGPARSER_FILTER_POST
// make sure the function is callable
* @param int $type The type of the filter or 0 for all
* @see STRINGPARSER_FILTER_PRE, STRINGPARSER_FILTER_POST
* This function parses the text
* @param string $text The text to parse
* @return mixed Either the root object of the tree if no output method
* is defined, the tree reoutput to e.g. a string or false
* if an internal error occured, such as a parse error if
* in strict mode or the object is already parsing a text.
* It is possible to specify prefilters for the parser to do some
* manipulating of the string beforehand.
* It is possible to specify postfilters for the parser to do some
* manipulating of the string afterwards.
* Abstract method: Manipulate the tree
* Abstract method: Output tree
// this could e.g. call _applyPostfilters
* Restart parsing after current block
* To achieve this the current top stack object is removed from the
* tree. Then the current item
// this should definitely not happen!
$topelem =
& $this->_stack[$stack_count-
1];
$node_parent =
& $topelem->_parent;
// remove the child from the tree
$res =
$node_parent->removeChild ($topelem, false);
// now try to get the position of the object
if ($topelem->occurredAt <
0) {
// HACK: could it be necessary to set a different status
// if yes, how should this be achieved? Another member of
$this->_cpos =
$topelem->occurredAt +
1;
* Abstract method: Close remaining blocks
* Abstract method: Initialize the parser
* Abstract method: Set a specific status
* Abstract method: Handle status
* @param int $status The current status
* @param string $needle The needle that was found
// make sure this is false!
// original status 0 => no problem
// not in original status? strict mode?
// break up parsing operation of current node
// HACK: This method ist not yet implemented correctly, the code below
// DOES NOT WORK! Do not use!
while ($this->_cpos < $this->_length) {
$needle = $this->_strDetect ($this->_charactersSearch, $this->_cpos);
// not found => see if character is allowed
if (!in_array ($this->_text{$this->_cpos}, $this->_charactersAllowed)) {
$res = $this->_appendText ($this->_text{$this->_cpos});
$subtext = substr ($this->_text, $offset, $offset - $this->_cpos);
$res = $this->_appendText ($subtext);
$res = $this->_handleStatus ($this->_status, $needle);
// original status 0 => no problem
// not in original status? strict mode?
// break up parsing operation of current node
$res = $this->_reparseAfterCurrentBlock ();
// this will not cause an infinite loop because
// _reparseAfterCurrentBlock will increase _cpos by one!
* Abstract method Append text depending on current status
* @param string $text The text to append
* @return bool On success, the function returns true, else false
// default: call _appendToLastTextChild
* Append text to last text child of current top parser stack node
* @param string $text The text to append
* @return bool On success, the function returns true, else false
return $this->_stack[$scount-
1]->appendToLastTextChild ($text);
* Searches {@link StringParser::_text _text} for every needle that is
* specified by using the {@link PHP_MANUAL#strpos strpos} function. It
* returns an associative array with the key <code>'needle'</code>
* pointing at the string that was found first and the key
* <code>'offset'</code> pointing at the offset at which the string was
* found first. If no needle was found, the <code>'needle'</code>
* element is <code>false</code> and the <code>'offset'</code> element
* @see StringParser::_text
function _strpos ($needles, $offset) {
foreach ($needles as $needle) {
if ($n_offset !==
false &&
($n_offset <
$cur_offset ||
$cur_offset <
0)) {
return array ($cur_needle, $cur_offset, 'needle' =>
$cur_needle, 'offset' =>
$cur_offset);
* Detects a string at the current position
* @param array $needles The strings that are to be detected
* @param int $offset The current offset
* @return mixed The string that was detected or the needle
foreach ($needles as $needle) {
* Adds a node to the current parse stack
* @param object $node The node that is to be added
* @return bool True on success, else false.
* @see StringParser_Node, StringParser::_stack
$max_node =
& $this->_stack[$stack_count-
1];
if (!$max_node->appendChild ($node)) {
$this->_stack[$stack_count] =
& $node;
* Removes a node from the current parse stack
* @return bool True on success, else false.
* @see StringParser_Node, StringParser::_stack
unset
($this->_stack[$stack_count-
1]);
* Execute a method on the top element
$method =
array (&$this->_stack[$stack_count-
1], $method);
* Get a variable of the top element
return $this->_stack[$stack_count-
1]->$var;
* Node type: Unknown node
* @see StringParser_Node::_type
define ('STRINGPARSER_NODE_UNKNOWN', 0);
* @see StringParser_Node::_type
define ('STRINGPARSER_NODE_ROOT', 1);
* @see StringParser_Node::_type
define ('STRINGPARSER_NODE_TEXT', 2);
* Global value that is a counter of string parser node ids. Compare it to a
$GLOBALS['__STRINGPARSER_NODE_ID'] =
0;
* Generic string parser node class
* This is an abstract class for any type of node that is used within the
* string parser. General warning: This class contains code regarding references
* that is very tricky. Please do not touch this code unless you exactly know
* what you are doing. Incorrect handling of references may cause PHP to crash
* with a segmentation fault! You have been warned.
* There are three standard node types: root node, text node and unknown
* node. All node types are integer constants. Any node type of a
* subclass must be at least 32 to allow future developements.
* @see STRINGPARSER_NODE_ROOT, STRINGPARSER_NODE_TEXT
* @see STRINGPARSER_NODE_UNKNOWN
var $_type =
STRINGPARSER_NODE_UNKNOWN;
* This ID uniquely identifies this node. This is needed when searching
* for a specific node in the children array. Please note that this is
* only an internal variable and should never be used - not even in
* subclasses and especially not in external data structures. This ID
* has nothing to do with any type of ID in HTML oder XML.
* @see StringParser_Node::_children
* The parent of this node.
* It is either null (root node) or a reference to the parent object.
* @see StringParser_Node::_children
* The children of this node.
* It contains an array of references to all the children nodes of this
* @see StringParser_Node::_parent
* This defines the position in the parsed text where this node occurred
* at. If -1, this value was not possible to be determined.
* Currently, the constructor only allocates a new ID for the node and
* @param int $occurredAt The position in the text where this node
* occurred at. If not determinable, it is -1.
* @global __STRINGPARSER_NODE_ID
$this->_id =
$GLOBALS['__STRINGPARSER_NODE_ID']++
;
* This function returns the type of the node
* @param object $node The node to be prepended.
* @return bool On success, the function returns true, else false.
// root nodes may not be children of other nodes!
// if node already has a parent
if ($node->_parent !==
false) {
// remove node from there
$parent =
& $node->_parent;
if (!$parent->removeChild ($node, false)) {
// move all nodes to a new index
// we have to unset it because else it will be
// overridden in in the loop
// put object to new position
* Append text to last text child
* @param string $text The text to append
* @return bool On success, the function returns true, else false
$this->_children[$ccount-
1]->appendText ($text);
* Append a node to the children
* This function appends a node to the children array(). It
* automatically sets the {@link StrinParser_Node::_parent _parent}
* property of the node that is to be appended.
* @param object $node The node that is to be appended.
* @return bool On success, the function returns true, else false.
// root nodes may not be children of other nodes!
// if node already has a parent
if ($node->_parent !==
null) {
// remove node from there
$parent =
& $node->_parent;
if (!$parent->removeChild ($node, false)) {
// append it to current node
* Insert a node before another node
* @param object $node The node to be inserted.
* @param object $reference The reference node where the new node is
* @return bool On success, the function returns true, else false.
// root nodes may not be children of other nodes!
// is the reference node a child?
// if node already has a parent
if ($node->_parent !==
null) {
// remove node from there
$parent =
& $node->_parent;
if (!$parent->removeChild ($node, false)) {
// move all nodes to a new index
while ($index >=
$child) {
// we have to unset it because else it will be
// overridden in in the loop
// put object to new position
* Insert a node after another node
* @param object $node The node to be inserted.
* @param object $reference The reference node where the new node is
* @return bool On success, the function returns true, else false.
// root nodes may not be children of other nodes!
// is the reference node a child?
// if node already has a parent
if ($node->_parent !==
false) {
// remove node from there
$parent =
& $node->_parent;
if (!$parent->removeChild ($node, false)) {
// move all nodes to a new index
while ($index >=
$child +
1) {
// we have to unset it because else it will be
// overridden in in the loop
// put object to new position
* This function removes a child from the children array. A parameter
* tells the function whether to destroy the child afterwards or not.
* If the specified node is not a child of this node, the function will
* @param mixed $child The child to destroy; either an integer
* specifying the index of the child or a reference
* @param bool $destroy Destroy the child afterwards.
* @return bool On success, the function returns true, else false.
// remove reference on $child
// store count for later use
if (!is_int ($child) ||
$child <
0 ||
$child >=
$ccount) {
if ($this->_children[$child]->_parent ===
null ||
// $object->_parent = null would equal to $this = null
// as $object->_parent is a reference to $this!
// because of this, we have to unset the variable to remove
// the reference and then redeclare the variable
unset
($object->_parent); $object->_parent =
null;
// we have to unset it because else it will be overridden in
// move all remaining objects one index higher
while ($child <
$ccount -
1) {
// we have to unset it because else it will be
// overridden in in the loop
// put object to new position
* Get the first child of this node
* Get the last child of this node
* @param object $node The node to destroy
* @return bool True on success, else false.
// if parent exists: remove node from tree!
if ($node->_parent !==
null) {
$parent =
& $node->_parent;
// directly return that result because the removeChild
// method will call destroyNode again
return $parent->removeChild ($node, true);
while (count ($node->_children)) {
// remove first child until no more children remain
if (!$node->removeChild ($child, true)) {
// now call the nodes destructor
if (!$node->_destroy ()) {
// now just unset it and prey that there are no more references
* @return bool True on success, else false.
* This function searches for a node in the own children and returns
* the index of the node or false if the node is not a child of this
* @param mixed $child The node to look for.
* @return mixed The index of the child node on success, else false.
for ($i =
0; $i <
$ccount; $i++
) {
if ($this->_children[$i]->_id ==
$child->_id) {
* Checks equality of this node and another node
* @param mixed $node The node to be compared with
* @return bool True if the other node equals to this node, else false.
return ($this->_id ==
$node->_id);
* Determines whether a criterium matches this node
* @param string $criterium The criterium that is to be checked
* @param mixed $value The value that is to be compared
* @return bool True if this node matches that criterium
* Search for nodes with a certain criterium
* This may be used to implement getElementsByTagName etc.
* @param string $criterium The criterium that is to be checked
* @param mixed $value The value that is to be compared
* @return array All subnodes that match this criterium
if ($this->_children[$i]->matchesCriterium ($criterium, $value)) {
$subnodes =
$this->_children[$i]->getNodesByCriterium ($criterium, $value);
$subnodes_count =
count ($subnodes);
for ($j =
0; $j <
$subnodes_count; $j++
) {
$nodes[$node_ctr++
] =
& $subnodes[$j];
* Search for nodes with a certain criterium and return the count
* Similar to getNodesByCriterium
* @param string $criterium The criterium that is to be checked
* @param mixed $value The value that is to be compared
* @return int The number of subnodes that match this criterium
if ($this->_children[$i]->matchesCriterium ($criterium, $value)) {
$subnodes =
$this->_children[$i]->getNodeCountByCriterium ($criterium, $value);
* This dumps a tree of nodes
* @param string $prefix The prefix that is to be used for indentation
* @param string $linesep The line separator
* @param int $level The initial level of indentation
function dump ($prefix =
" ", $linesep =
"\n", $level =
0) {
$str .=
$this->_children[$i]->dump ($prefix, $linesep, $level +
1);
* Dump this node to a string
return (string)
$this->_type;
* String parser root node class
* This node is a root node.
* @see STRINGPARSER_NODE_ROOT
var $_type =
STRINGPARSER_NODE_ROOT;
* String parser text node class
* This node is a text node.
* @see STRINGPARSER_NODE_TEXT
var $_type =
STRINGPARSER_NODE_TEXT;
* The content of this node
* @param string $content The initial content of this element
* @param int $occurredAt The position in the text where this node
* occurred at. If not determinable, it is -1.
* @see StringParser_Node_Text::content
* @param string $text The text to append
* @see StringParser_Node_Text::content
* @param string $name The name of the flag
* @param mixed $value The value of the flag
$this->_flags[$name] =
$value;
* @param string $flag The requested flag
* @param string $type The requested type of the return value
* @param mixed $default The default return value
function getFlag ($flag, $type =
'mixed', $default =
null) {
if (!isset
($this->_flags[$flag])) {
$return =
$this->_flags[$flag];
* Dump this node to a string
Documentation generated on Mon, 10 Dec 2007 13:29:38 +0100 by phpDocumentor 1.4.0