Php preg match all delimiter

PHP Regular Expressions

Summary: in this tutorial, you’ll learn about PHP regular expressions and functions that work with regular expression including preg_match() , preg_match_all() , and preg_replace() .

Introduction to the PHP regular expressions

PHP string functions allow you to test if a string contains a substring ( str_contains() ) or to replace all occurrences of a substring with another string ( str_replace() ).

However, these functions deal with fixed patterns. They won’t work with flexible patterns. For example, if you want to search any numbers in a string, the str_contains() won’t work.

To search or replace a string using a pattern, you use regular expressions.

A regular expression is a string that describes a pattern such as phone numbers, credit card numbers, and email addresses.

Create regular expressions

To create a regular expression, you place a pattern in forward-slashes like this:

'/pattern/';Code language: PHP (php)
 $pattern = '/\d+/';Code language: PHP (php)

The $pattern is a string. Also, it is a regular expression that matches a number with one or more digits. For example, it matches the numbers 1, 20, 300, etc.

Note that you’ll learn how to form flexible regular expressions in the following tutorial.

The forward-slashes are delimiters. The delimiters can be one of the following characters ~ , ! , @ , # , $ or braces including <> , () , [] , <> . The braces help improve regular expressions’ readability in some cases.

Note that you cannot use the alphanumeric, multi-byte, and backslashes ( \ ) as delimiters.

The following regular expression uses the curly braces as delimiters:

 $pattern = '';Code language: PHP (php)

Search strings using regular expressions

To search a string for a match to a pattern, you use the preg_match() and preg_match_all() functions.

PHP preg_match() function

To search based on a regular expression, you use the preg_match() function. For example:

 $pattern = ''; $message = 'PHP 8 was released on November 26, 2020'; if (preg_match($pattern, $message)) < echo "match"; > else < echo "not match"; >Code language: PHP (php)
matchCode language: PHP (php)

The preg_match() searches the $message for a match to the $pattern .

The preg_match() function returns 1 if there is a match in the $message , 0 if it doesn’t, or false on failure.

To get the text that matches the pattern, you add the third parameter to the preg_match() function like the following example:

 $pattern = ''; $message = 'PHP 8 was released on November 26, 2020'; if (preg_match($pattern, $message, $matches)) Code language: PHP (php)
Array ( [0] => 8 )Code language: PHP (php)

The $matches parameter contains all the matches. The $matches[0] stores the text that matches the pattern. In this example, it is the number 8.

The $matches[1] , $matches[2] , … store the texts that match the first, second,… capturing group —more on this in the capturing group tutorial.

The preg_match() only returns the first match and stops searching as soon as it finds the first one. To find all matches, you use the preg_match_all() function.

PHP preg_match_all() function

The preg_match_all() function searches for all matches to a regular expression. For example:

 $pattern = ''; $message = 'PHP 8 was released on November 26, 2020'; if (preg_match_all($pattern, $message, $matches)) Code language: PHP (php)
Array ( [0] => Array ( [0] => 8 [1] => 26 [2] => 2020 ) )Code language: PHP (php)

In this example, the preg_match_all() puts all matches in a multidimensional array with the first element contains the texts ( 8 , 26 , and 2020 ) that match the pattern.

The preg_match_all() function returns the number of matches, which can be zero or a positive number.

Replace strings using regular expressions

To replace strings that match a regular expression, you use the preg_replace() function. For example:

 $pattern = '/\d+/'; $message = 'PHP 8 was released on 11/26/2020'; echo preg_replace($pattern, '%d', $message); Code language: PHP (php)
PHP %d was released on %d/%d/%d

In this example, the preg_replace() function replaces all numbers in the $message with the string %d .

Summary

  • PHP regular expressions are strings with pattern enclosing in delimiters for example «/pattern/» .
  • The preg_match() function searches for a match to a pattern in a string.
  • The preg_match_all() function searches for all matches to a pattern in a string.
  • The preg_replace() function searches a string for matches to a pattern and replaces them with a new string or pattern.

Источник

Pattern Modifiers

The current possible PCRE modifiers are listed below. The names in parentheses refer to internal PCRE names for these modifiers. Spaces and newlines are ignored in modifiers, other characters cause error.

i ( PCRE_CASELESS ) If this modifier is set, letters in the pattern match both upper and lower case letters. m ( PCRE_MULTILINE ) By default, PCRE treats the subject string as consisting of a single «line» of characters (even if it actually contains several newlines). The «start of line» metacharacter (^) matches only at the start of the string, while the «end of line» metacharacter ($) matches only at the end of the string, or before a terminating newline (unless D modifier is set). This is the same as Perl. When this modifier is set, the «start of line» and «end of line» constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl’s /m modifier. If there are no «\n» characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect. s ( PCRE_DOTALL ) If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl’s /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier. x ( PCRE_EXTENDED ) If this modifier is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class, and characters between an unescaped # outside a character class and the next newline character, inclusive, are also ignored. This is equivalent to Perl’s /x modifier, and makes it possible to include commentary inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern. A ( PCRE_ANCHORED ) If this modifier is set, the pattern is forced to be «anchored», that is, it is constrained to match only at the start of the string which is being searched (the «subject string»). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl. D ( PCRE_DOLLAR_ENDONLY ) If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl. S When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character. U ( PCRE_UNGREEDY ) This modifier inverts the «greediness» of the quantifiers so that they are not greedy by default, but become greedy if followed by ? . It is not compatible with Perl. It can also be set by a ( ?U ) modifier setting within the pattern or by a question mark behind a quantifier (e.g. .*? ).

Note:

It is usually not possible to match more than pcre.backtrack_limit characters in ungreedy mode.

X ( PCRE_EXTRA ) This modifier turns on additional functionality of PCRE that is incompatible with Perl. Any backslash in a pattern that is followed by a letter that has no special meaning causes an error, thus reserving these combinations for future expansion. By default, as in Perl, a backslash followed by a letter with no special meaning is treated as a literal. There are at present no other features controlled by this modifier. J ( PCRE_INFO_JCHANGED ) The (?J) internal option setting changes the local PCRE_DUPNAMES option. Allow duplicate names for subpatterns. As of PHP 7.2.0 J is supported as modifier as well. u ( PCRE_UTF8 ) This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid.

User Contributed Notes 12 notes

Regarding the validity of a UTF-8 string when using the /u pattern modifier, some things to be aware of;

1. If the pattern itself contains an invalid UTF-8 character, you get an error (as mentioned in the docs above — «UTF-8 validity of the pattern is checked since PHP 4.3.5»

2. When the subject string contains invalid UTF-8 sequences / codepoints, it basically result in a «quiet death» for the preg_* functions, where nothing is matched but without indication that the string is invalid UTF-8

3. PCRE regards five and six octet UTF-8 character sequences as valid (both in patterns and the subject string) but these are not supported in Unicode ( see section 5.9 «Character Encoding» of the «Secure Programming for Linux and Unix HOWTO» — can be found at http://www.tldp.org/ and other places )

4. For an example algorithm in PHP which tests the validity of a UTF-8 string (and discards five / six octet sequences) head to: http://hsivonen.iki.fi/php-utf8/

The following script should give you an idea of what works and what doesn’t;

$examples = array(
‘Valid ASCII’ => «a» ,
‘Valid 2 Octet Sequence’ => «\xc3\xb1» ,
‘Invalid 2 Octet Sequence’ => «\xc3\x28» ,
‘Invalid Sequence Identifier’ => «\xa0\xa1» ,
‘Valid 3 Octet Sequence’ => «\xe2\x82\xa1» ,
‘Invalid 3 Octet Sequence (in 2nd Octet)’ => «\xe2\x28\xa1» ,
‘Invalid 3 Octet Sequence (in 3rd Octet)’ => «\xe2\x82\x28» ,

‘Valid 4 Octet Sequence’ => «\xf0\x90\x8c\xbc» ,
‘Invalid 4 Octet Sequence (in 2nd Octet)’ => «\xf0\x28\x8c\xbc» ,
‘Invalid 4 Octet Sequence (in 3rd Octet)’ => «\xf0\x90\x28\xbc» ,
‘Invalid 4 Octet Sequence (in 4th Octet)’ => «\xf0\x28\x8c\x28» ,
‘Valid 5 Octet Sequence (but not Unicode!)’ => «\xf8\xa1\xa1\xa1\xa1» ,
‘Valid 6 Octet Sequence (but not Unicode!)’ => «\xfc\xa1\xa1\xa1\xa1\xa1» ,
);

echo «++Invalid UTF-8 in pattern\n» ;
foreach ( $examples as $name => $str ) echo » $name \n» ;
preg_match ( «/» . $str . «/u» , ‘Testing’ );
>

echo «++ preg_match() examples\n» ;
foreach ( $examples as $name => $str )

preg_match ( «/\xf8\xa1\xa1\xa1\xa1/u» , $str , $ar );
echo » $name : » ;

if ( count ( $ar ) == 0 ) echo «Matched nothing!\n» ;
> else echo «Matched < $ar [ 0 ]>\n» ;
>

echo «++ preg_match_all() examples\n» ;
foreach ( $examples as $name => $str ) preg_match_all ( ‘/./u’ , $str , $ar );
echo » $name : » ;

$num_utf8_chars = count ( $ar [ 0 ]);
if ( $num_utf8_chars == 0 ) echo «Matched nothing!\n» ;
> else echo «Matched $num_utf8_chars character\n» ;
>

Источник

Читайте также:  Как сделать задержку php
Оцените статью