Old link to Oniguruma regex syntax is not working anymore, there is a working one:
https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txt
(PHP 4 >= 4.2.0, PHP 5, PHP 7)
mb_ereg — Regular expression match with multibyte support
$pattern
, string $string
[, array &$regs
] ) : intExecutes the regular expression match with multibyte support.
pattern
The search pattern.
string
The search string.
regs
If matches are found for parenthesized substrings of
pattern
and the function is called with the
third argument regs
, the matches will be stored
in the elements of the array regs
.
If no matches are found, regs
is set to an empty
array.
$regs[1] will contain the substring which starts at the first left parenthesis; $regs[2] will contain the substring starting at the second, and so on. $regs[0] will contain a copy of the complete string matched.
Returns the byte length of the matched string if a match for
pattern
was found in string
,
or FALSE
if no matches were found or an error occurred.
If the optional parameter regs
was not passed or
the length of the matched string is 0, this function returns 1.
Version | Description |
---|---|
7.1.0 |
mb_ereg() will now set regs to
an empty array, if nothing matched. Formerly,
regs was not modified in that case.
|
Note:
The internal encoding or the character encoding specified by mb_regex_encoding() will be used as the character encoding for this function.
Old link to Oniguruma regex syntax is not working anymore, there is a working one:
https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txt
Note that mb_ereg() does not support the \uFFFF unicode syntax but uses \x{FFFF} instead:
<?PHP
$text = 'Peter is a boy.'; // english
$text = 'بيتر هو صبي.'; // arabic
//$text = 'פיטר הוא ילד.'; // hebrew
mb_regex_encoding('UTF-8');
if(mb_ereg('[\x{0600}-\x{06FF}]', $text)) // arabic range
//if(mb_ereg('[\x{0590}-\x{05FF}]', $text)) // hebrew range
{
echo "Text has some arabic/hebrew characters.";
}
else
{
echo "Text doesnt have arabic/hebrew characters.";
}
?>
I hope this information is shown somewhere on php.net.
According to "https://github.com/php/php-src/tree/PHP-5.6/ext/mbstring/oniguruma",
the bundled Oniguruma regex library version seems ...
4.7.1 between PHP 5.3 - 5.4.45,
5.9.2 between PHP 5.5 - 7.1.16,
6.3.0 since PHP 7.2 - .
mb_ereg() seems unable to Use "named subpattern".
preg_match() seems a substitute only in UTF-8 encoding.
<?php
$text = 'multi_byte_string';
$pattern = '.*(?<name>string).*'; // "?P" causes "mbregex compile err" in PHP 5.3.5
if(mb_ereg($pattern, $text, $matches)){
echo '<pre>'.print_r($matches, true).'</pre>';
}else{
echo 'no match';
}
?>
This code ignores "?<name>" in $pattern and displays below.
Array
(
[0] => multi_byte_string
[1] => string
)
$pattern = '/.*(?<name>string).*/u';
if(preg_match($pattern, $text, $matches)){
instead of lines 2 & 3
displays below (in UTF-8 encoding).
Array
(
[0] => multi_byte_string
[name] => string
[1] => string
)
While hardly mentioned anywhere, it may be useful to note that mb_ereg uses Oniguruma library internally. The syntax for the default mode (ruby) is described here:
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
Do Not Identically compare (===/!==) "TRUE" with return value of mb_ereg().
eg
<?php
// This doesn't work.
if (mb_ereg('bad', 'bad_input') === true) {
// (or) if (mb_ereg('good', 'good_input') !== true) {
die('Get out of here !');
}else{
echo 'safe'; // continue processing...
}
?>
// These work. (not using TRUE)
if (mb_ereg('bad', 'bad_input')) {.....
if (!mb_ereg('good', 'good_input')) {.....
mb_ereg() never returns TRUE, but False (in unmatch case) or Integer (in match case, >=1, which equals TRUE).
Hebrew regex tested on PHP 5, Ubuntu 8.04.
Seems to work fine without the mb_regex_encoding lines (commented out).
Didn't seem to work with \uxxxx (also commented out).
<?php
echo "Line ";
//mb_regex_encoding("ISO-8859-8");
//if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line))
if(mb_ereg(".*([א-ת]).*", $this->current_line))
{
echo "has";
}
else
{
echo "doesn't have";
}
echo " Hebrew characters.<br>";
//mb_regex_encoding("UTF-8");
?>