SunshinePHP Developer Conference 2015

htmlspecialchars

(PHP 4, PHP 5)

htmlspecialchars Converte i caratteri speciali in entità HTML

Descrizione

string htmlspecialchars ( string $string [, int $quote_style [, string $charset ]] )

Alcuni caratteri hanno significati particolari in HTML, e, per questo, devono essere rappresentati tramite entità HTML, se devono mantenere il proprio significato. Questa funzione restituisce restituisce una stringa con la conversione di alcuni di questi caratteri; la conversione svolta non è sempre valida nell'ambito della programmazione web. Se occorre l'utilizzo di tutte le entità HTML, utilizzare htmlentities().

Questa funzione è utile nel prevenire la presenza di marcatori HTML negli input utente, tipo nei forum o nei guest book. Il secondo parametro quote_style, opzionale, indica alla funzione come comportarsi con gli apici singoli e doppi. La modalità di default è, ENT_COMPAT; questa modalità è compatibile con il passato e traduce solo gli apici doppi lasciando inalterati gli apici singoli. Se si imposta ENT_QUOTES, entrambi i tipi di apici, singoli e doppi, sono convertiti in entità e, infine, se si utilizza ENT_NOQUOTES ne gli apici singoli ne gli apici doppi sono convertiti in entità.

La conversioni applicate sono:

  • '&' (e commerciale) diventa '&'
  • '"' (doppio apice) diventa '"' con ENT_NOQUOTES is not set.
  • ''' (singolo apice) diventa ''' soltanto con l'impostazione di ENT_QUOTES.
  • '<' (minore) diventa '&lt;'
  • '>' (maggiore) diventa '&gt;'

Example #1 Esempio di uso di htmlspecialchars()

<?php
$new 
htmlspecialchars("<a href='test'>Test</a>"ENT_QUOTES);
echo 
$new// &lt;a href=&#039;test&#039;&gt;Test&lt;/a&gt;
?>

Occorre notare che questa funzione non converte null'altro oltre ai caratteri elencati in precedenza. Per la conversione di tutte le entità fare riferimento a htmlentities(). Il secondo parametro è stato inserito in PHP 3.0.17 e 4.0.3.

Il terzo parametro charset indica quale set di caratteri utilizzare nella conversione. Il set di caratteri di default è ISO-8859-1. Questo terzo parametro è stato aggiunto in PHP 4.1.0.

Elenco dei set di caratteri supportati:

Set di caratteri supportati
Set di caratteri Alias Descrizione
ISO-8859-1 ISO8859-1 Western European, Latin-1.
ISO-8859-5 ISO8859-5 Il charset cirillico poco utilizzato (Latin/Cyrillic).
ISO-8859-15 ISO8859-15 Western European, Latin-9. Con in più il simbolo dell'Euro e i caratteri francesi e finnici mancanti in Latin-1 (ISO-8859-1).
UTF-8   Set ASCII compatibile con il set multi-byte Unicode su 8-bit.
cp866 ibm866, 866 Set di caratteri cirillico specifico del Dos.
cp1251 Windows-1251, win-1251, 1251 Set di caratteri cirillico specifico di Windows.
cp1252 Windows-1252, 1252 Set di caratteri specifico di Windows per l'Europa occidentale.
KOI8-R koi8-ru, koi8r Russo.
BIG5 950 Cinese tradizionale, usato principalmente a Taiwan.
GB2312 936 Cinese semplificato, set di caratteri nazionale standard.
BIG5-HKSCS   Big5 con estensioni per Hong Kong, cinese tradizionale.
Shift_JIS SJIS, SJIS-win, cp932, 932 Giapponese.
EUC-JP EUCJP, eucJP-win Giapponese.
MacRoman   Charset che veniva utilizzato dal Mac OS.
''   Una stringa vuota attiva il rilevamento della codifica dallo script (Zend multibyte), default_charset e l'attuale locale (guarda nl_langinfo() e setlocale()), in quest'ordine. Non consigliato.

Nota: Ogni altro set di caratteri non è riconosciuto. Sarà invece utilizzata la codifica predefinita e verrà mostrato un avviso.

Vedere anche get_html_translation_table(), strip_tags(), htmlentities() e nl2br().

add a note add a note

User Contributed Notes 42 notes

up
21
Dave
1 year ago
As of PHP 5.4 they changed default encoding from "ISO-8859-1" to "UTF-8". So if you get null from htmlspecialchars or htmlentities

where you have only set
<?php
echo htmlspecialchars($string);
echo
htmlentities($string);
?>

you can fix it by
<?php
echo htmlspecialchars($string, ENT_COMPAT,'ISO-8859-1', true);
echo
htmlentities($string, ENT_COMPAT,'ISO-8859-1', true);
?>

On linux you can find the scripts you need to fix by

grep -Rl "htmlspecialchars\\|htmlentities" /path/to/php/scripts/
up
20
Mike Robinson
1 year ago
Unfortunately, as far as I can tell, the PHP devs did not provide ANY way to set the default encoding used by htmlspecialchars() or htmlentities(), even though they changed the default encoding in PHP 5.4 (*golf clap for PHP devs*). To save someone the time of trying it, this does not work:

<?php
ini_set
('default_charset', $charset); // doesn't work.
?>

Unfortunately, the only way to not have to explicitly provide the second and third parameter every single time this function is called (which gets extremely tedious) is to write your own function as a wrapper:

<?php
define
('CHARSET', 'ISO-8859-1');
define('REPLACE_FLAGS', ENT_COMPAT | ENT_XHTML);

function
html($string) {
    return
htmlspecialchars($string, REPLACE_FLAGS, CHARSET);
}

echo
html("ñ"); // works
?>

You can do the same for htmlentities()
up
7
Thomasvdbulk at gmail dot com
3 years ago
i searched for a while for a script, that could see the difference between an html tag and just < and > placed in the text,
the reason is that i recieve text from a database,
wich is inserted by an html form, and contains text and html tags,
the text can contain < and >, so does the tags,
with htmlspecialchars you can validate your text to XHTML,
but you'll also change the tags, like <b> to &lt;b&gt;,
so i needed a script that could see the difference between those two...
but i couldn't find one so i made my own one,
i havent fully tested it, but the parts i tested worked perfect!
just for people that were searching for something like this,
it may looks big, could be done easier, but it works for me, so im happy.

<?php
function fixtags($text){
$text = htmlspecialchars($text);
$text = preg_replace("/=/", "=\"\"", $text);
$text = preg_replace("/&quot;/", "&quot;\"", $text);
$tags = "/&lt;(\/|)(\w*)(\ |)(\w*)([\\\=]*)(?|(\")\"&quot;\"|)(?|(.*)?&quot;(\")|)([\ ]?)(\/|)&gt;/i";
$replacement = "<$1$2$3$4$5$6$7$8$9$10>";
$text = preg_replace($tags, $replacement, $text);
$text = preg_replace("/=\"\"/", "=", $text);
return
$text;
}
?>

an example:

<?php
$string
= "
this is smaller < than this<br />
this is greater > than this<br />
this is the same = as this<br />
<a href=\"http://www.example.com/example.php?test=test\">This is a link</a><br />
<b>Bold</b> <i>italic</i> etc..."
;
echo
fixtags($string);
?>

will echo:
this is smaller &lt; than this<br />
this is greater &gt; than this<br />
this is the same = as this<br />
<a href="http://www.example.com/example.php?test=test">This is a link</a><br />
<b>Bold</b> <i>italic</i> etc...

I hope its helpfull!!
up
10
ivan at lutrov dot com
3 years ago
Be careful, the "charset" argument IS case sensitive. This is counter-intuitive and serves no practical purpose because the HTML spec actually has the opposite.
up
5
minder at ufive dot unibe dot ch
1 year ago
Problem

In many PHP legacy products the function htmlspecialchars($string) is used to convert characters like < and > and quotes a.s.o to HTML-entities. That avoids the interpretation of HTML Tags and asymmetric quote situations.

Since PHP 5.4 for $string in htmlspecialchars($string) utf8 characters are expected if no charset is defined explicitly as third parameter in the function. Legacy products are mostly in Latin1 (alias iso-8859-1) what makes the functions htmlspecialchars(), htmlentites() and html_entity_decode() to return empty strings if a special character, e. g. a German Umlaut, is present in $string:

PHP<5.4

echo htmlspecialchars('<b>Woermann</b>') //Output: &lt;b&gt;Woermann&lt;b&gt;
echo htmlspecialchars('Wörmann') //Output: &lt;b&gt;Wörmann&lt;b&gt;

PHP=5.4

echo htmlspecialchars('<b>Woermann</b>') //Output: &lt;b&gt;Woermann&lt;b&gt;
echo htmlspecialchars('<b>Wörmann</b>') //Output: empty

Three alternative solutions

a) Not runnig legacy products on PHP 5.4
b) Change all find spots in your code from
htmlspecialchars($string) and *** to
htmlspecialchars($string, ENT_COMPAT | ENT_HTML401, 'ISO-8859-1')
c) Replace all htmlspecialchars() and *** with a new self-made function

*** The same is true for htmlentities() and html_entity_decode();

Solution c

1 Make Search and Replace in the concerned legacy project:
Search for:        htmlspecialchars
Replace with:   htmlXspecialchars
Search for:        htmlentities
Replace with:   htmlXentities
Search for:        html_entity_decode
Replace with:   htmlX_entity_decode
2a Copy and paste the following three functions into an existing already everywhere included PHP-file in your legacy project. (of course that PHP-file must be included only once per request, otherwise you will get a Redeclare Function Fatal Error).

function htmlXspecialchars($string, $ent=ENT_COMPAT, $charset='ISO-8859-1') {
return htmlspecialchars($string, $ent, $charset);
}

function htmlXentities($string, $ent=ENT_COMPAT, $charset='ISO-8859-1') {
return htmlentities($string, $ent, $charset);
}

function htmlX_entity_decode($string, $ent=ENT_COMPAT, $charset='ISO-8859-1') {
return html_entity_decode($string, $ent, $charset);
}

or 2b crate a new PHP-file containing the three functions mentioned above, let's say, z. B. htmlXfunctions.inc.php and include it on the first line of every PHP-file in your legacy product like this: require_once('htmlXfunctions.inc.php').
up
6
solar-energy
7 years ago
also see function "urlencode()", useful for passing text with ampersand and other special chars through url

(i.e. the text is encoded as if sent from form using GET method)

e.g.

<?php
echo "<a href='foo.php?text=".urlencode("foo?&bar!")."'>link</a>";
?>

produces

<a href='foo.php?text=foo%3F%26bar%21'>link</a>

and if the link is followed, the $_GET["text"] in foo.php will contain "foo?&bar!"
up
3
ish1301 at gmail doooot com
6 years ago
used this function for making a variable javascript compatible

<?php
function jsspecialchars( $string = '') {
   
$string = preg_replace("/\r*\n/","\\n",$string);
   
$string = preg_replace("/\//","\\\/",$string);
   
$string = preg_replace("/\"/","\\\"",$string);
   
$string = preg_replace("/'/"," ",$string);
    return
$string;
}
?>
hope this may help those embedding php in javascripts
up
2
support at playnext dot ru
1 year ago
For those having problems after the change of default value of $encoding argument to UTF-8 since PHP 5.4.

If your old non-UTF8 projects ruined - pls consider:
1. http://php.net/manual/en/function.override-function.php
2. http://php.net/manual/ru/function.runkit-function-redefine.php

The idea - you override the built-in htmlspecialchars() function with your customized variant which is able to respect non UTF-8 default encoding. This small piece of code can be then easily inserted somewhere at the start of yout project. No need to rewrite all htmlspecialchars() entries globally.

I've spent several hours with both approaches. Variant 1 looks good especaially in combination with http://www.php.net/manual/en/function.rename-function.php as it allows to call original htmlspecialchars() with just altered default args. The code could be as follows:

<?php
rename_function
('htmlspecialchars', 'renamed_htmlspecialchars');
function
overriden_htmlspecialchars($string, $flags=NULL, $encoding='cp1251', $double_encode=true) {
   
$flags = $flags ? $flags : (ENT_COMPAT|ENT_HTML401);
    return
renamed_htmlspecialchars($string, $flags, $encoding, $double_encode);
}
override_function('htmlspecialchars', '$string, $flags, $encoding, $double_encode', 'return overriden_htmlspecialchars($string, $flags, $encoding, $double_encode);');
?>

Unfortunatelly this didn't work for me properly - my site managed to call overriden function but not every time I reloaded the pages. Moreover other PHP sites crashed under my Apache server as they suddenly started blaming htmlspecialchars() was not defined. I suppose I had to spend more time to make it work thread/request/site/whatever-safe.

So I switched to runkit (variant 2). It worked for me, although even after trying runkit_function_rename()+runkit_function_add() I didn't managed to recall original htmlspecialchars() function. So as a quick solution I decided to call htmlentities() instead:

<?php
function overriden_htmlspecialchars($string, $flags=NULL, $encoding='UTF-8', $double_encode=true) {
   
$flags = $flags ? $flags : (ENT_COMPAT|ENT_HTML401);
   
$encoding = $encoding ? $encoding : 'cp1251';
    return
htmlentities($string, $flags, $encoding, $double_encode);
}
runkit_function_redefine('htmlspecialchars', '$string, $flags, $encoding, $double_encode', 'return overriden_htmlspecialchars($string, $flags, $encoding, $double_encode);');
?>

You may be able to implement your more powerfull overriden function.
Good luck!
up
2
brendel at krumedia dot de
6 years ago
I know some people posted similar functions but may be you are looking for this version:

function jschars($str)
{
    $str = mb_ereg_replace("\\\\", "\\\\", $str);
    $str = mb_ereg_replace("\"", "\\\"", $str);
    $str = mb_ereg_replace("'", "\\'", $str);
    $str = mb_ereg_replace("\r\n", "\\n", $str);
    $str = mb_ereg_replace("\r", "\\n", $str);
    $str = mb_ereg_replace("\n", "\\n", $str);
    $str = mb_ereg_replace("\t", "\\t", $str);
    $str = mb_ereg_replace("<", "\\x3C", $str); // for inclusion in HTML
    $str = mb_ereg_replace(">", "\\x3E", $str);
    return $str;
}

if you use smarty your code may look like:

<a onclick="alert('{$text|jschars|htmlchars}');return false;">Test</a>

(Yes, we have the shortcur htmlchars instead of htmlspecialchars, so we are able to tell the encoding e.g. UTF-8 or ISO-8859-1 to htmlspecialchars)
up
3
Kenneth Kin Lum
6 years ago
if your goal is just to protect your page from Cross Site Scripting (XSS) attack, or just to show HTML tags on a web page (showing <body> on the page, for example), then using htmlspecialchars() is good enough and better than using htmlentities().  A minor point is htmlspecialchars() is faster than htmlentities().  A more important point is, when we use  htmlspecialchars($s) in our code, it is automatically compatible with UTF-8 string.  Otherwise, if we use htmlentities($s), and there happens to be foreign characters in the string $s in UTF-8 encoding, then htmlentities() is going to mess it up, as it modifies the byte 0x80 to 0xFF in the string to entities like &eacute;.  (unless you specifically provide a second argument and a third argument to htmlentities(), with the third argument being "UTF-8").

The reason htmlspecialchars($s) already works with UTF-8 string is that, it changes bytes that are in the range 0x00 to 0x7F to &lt; etc, while leaving bytes in the range 0x80 to 0xFF unchanged.  We may wonder whether htmlspecialchars() may accidentally change any byte in a 2 to 4 byte UTF-8 character to &lt; etc.  The answer is, it won't.  When a UTF-8 character is 2 to 4 bytes long, all the bytes in this character is in the 0x80 to 0xFF range. None can be in the 0x00 to 0x7F range.  When a UTF-8 character is 1 byte long, it is just the same as ASCII, which is 7 bit, from 0x00 to 0x7F.  As a result, when a UTF-8 character is 1 byte long, htmlspecialchars($s) will do its job, and when the UTF-8 character is 2 to 4 bytes long, htmlspecialchars($s) will just pass those bytes unchanged.  So htmlspecialchars($s) will do the same job no matter whether $s is in ASCII, ISO-8859-1 (Latin-1), or UTF-8.
up
1
terminatorul at gmail dot com
7 years ago
To html-encode Unicode characters that may not be part of your document character set (given in the META tag of your page), and so can not be output directly into your document source, you need to use mb_encode_numericentity(). Pay attention to it's conversion map argument.
up
1
urbanheroes {at} gmail {dot} com
9 years ago
In response to the note made by Alexander Nofftz on October 2004, &#39; is used instead of &apos; because IE unfortunately seems to have trouble with the latter.
up
4
hello at haroonahmad dot co dot uk
5 years ago
a common confusion among beginner is that what is the difference between htmlentities() and htmlspecialchars() really, because the manual examples are converting angular brackets for both.

well, htmlentities() will ALSO look for other language characters in the string e.g German, French or Italian etc. So if you think your attacker can use some foreign language characters for a XSS attack in URL etc then use htmlentities() instead of htmlspecialchars().

I hope it helps,

Haroon Ahmad
up
3
thelatesundayshow.com @ nathan (flip it)
10 years ago
heres a version of the recursive escape function that takes the array byref rather than byval so saves some resources in case of big arrays

function recurse_array_HTML_safe(&$arr) {
    foreach ($arr as $key => $val)
        if (is_array($val))
            recurse_array_HTML_safe($arr[$key]);
        else
            $arr[$key] = htmlspecialchars($val, ENT_QUOTES);
}
up
2
took
9 years ago
The Algo from donwilson at gmail dot com to reverse the action of htmlspecialchars(), edited for germany:

function unhtmlspecialchars( $string )
{
  $string = str_replace ( '&amp;', '&', $string );
  $string = str_replace ( '&#039;', '\'', $string );
  $string = str_replace ( '&quot;', '"', $string );
  $string = str_replace ( '&lt;', '<', $string );
  $string = str_replace ( '&gt;', '>', $string );
  $string = str_replace ( '&uuml;', '', $string );
  $string = str_replace ( '&Uuml;', '', $string );
  $string = str_replace ( '&auml;', '', $string );
  $string = str_replace ( '&Auml;', '', $string );
  $string = str_replace ( '&ouml;', '', $string );
  $string = str_replace ( '&Ouml;', '', $string );   
  return $string;
}
up
3
odegroot+php at gmail dot com
2 years ago
This function can be used to escape single quotes only, and not double quotes, by calling it as follows.

<?php $escaped_string = htmlspecialchars($string, ENT_QUOTES & ~ENT_COMPAT, $encoding); ?>

This works because single/double quote escaping actually each have their own flag.

#define ENT_HTML_QUOTE_NONE         0
#define ENT_HTML_QUOTE_SINGLE       1
#define ENT_HTML_QUOTE_DOUBLE       2

#define ENT_COMPAT      ENT_HTML_QUOTE_DOUBLE
#define ENT_QUOTES      (ENT_HTML_QUOTE_DOUBLE | ENT_HTML_QUOTE_SINGLE)
#define ENT_NOQUOTES    ENT_HTML_QUOTE_NONE

Snippet from: php-src/ext/standard/html.h
https://github.com/php/php-src/blob/master/ext/standard/html.h
up
2
strange dot alex at gmail dot com
1 year ago
> For the purposes of this function, the encodings ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent, provided the string itself is valid for the encoding, as the characters affected by htmlspecialchars() occupy the same positions in all of these encodings.

This is not true, actually!
$txt = "Russian text in KOI8-r<br>"; // type here realy russian text in koi8-r !

htmlspecialchars($txt,null,'KOI8-R') = Russian text in KOI8-r<br>
htmlspecialchars($txt,null,'UTF-8') =
Result is EMPTY !!!
up
2
pinkgothic at gmail dot com
3 years ago
Please note that this function results in an E_WARNING when display_errors is off and an invalid multibyte string is passed to it (e.g. with 'utf-8' as the encoding parameter and broken utf-8 characters somewhere in the string).

This is ESPECIALLY IMPORTANT if you have an EXCEPTION-THROWING ERROR HANDLER, since even though you can't reproduce it in a development mode where display_errors is on, you MUST wrap your function call in a try-catch, or your application will crash.

[ The reason PHP makes this distinction is because this is a core function and many production servers are misconfigured to have display_errors on (to prevent such things as path disclosure from error messages from accidentally cropping up). See: http://bugs.php.net/bug.php?id=47494 ]
up
1
Anonymous
4 years ago
This may seem obvious, but if you want to output arbitrary (i.e. user-input) data as an attribute inside an HTML tag (such as the INPUT tags on a FORM), be aware of whether you are using ENT_QUOTES or ENT_COMPAT.  If you're using ENT_COMPAT, the attribute must be wrapped in double-quotes, as single-quotes will not be encoded and the user will be able to inject arbitrary HTML attributes (including javascript behavior) inside the tag, even though they will not be able to inject arbitrary HTML tags.

Also, if you want to allow users to input HTML attributes without them being double-encoded on display, there are two ways to accomplish this:

1 - Run their input through htmlentities_decode() followed by htmlspecialchars().

2 - Call htmlspecialchars() with $double_encode=false.

There is one functional difference between these two methods:  If you want to perform any search-replace on a user's input (such as word censoring in a message-board application), the second method will allow users to circumvent it by HTML-encoding their input, whereas the first will not.
up
1
nessthehero at gmail dot com
4 years ago
Here's a simple function I wrote for parsing form data.

It checks if it's an array and it is recursive (it calls itself).

It also decodes things that have already been encoded so it doesn't change &amp; to &amp;amp;

[In this version,] I found it easier to use a regular expression to check and see if any previously encoded data exists, then decode it repeatedly until there is none left, then re-encode it.

<?php
function formspecialchars($var)
    {
       
$pattern = '/&(#)?[a-zA-Z0-9]{0,};/';
       
        if (
is_array($var)) {    // If variable is an array
           
$out = array();      // Set output as an array
           
foreach ($var as $key => $v) {     
               
$out[$key] = formspecialchars($v);         // Run formspecialchars on every element of the array and return the result. Also maintains the keys.
           
}
        } else {
           
$out = $var;
            while (
preg_match($pattern,$out) > 0) {
               
$out = htmlspecialchars_decode($out,ENT_QUOTES);      
            }                            
           
$out = htmlspecialchars(stripslashes(trim($out)), ENT_QUOTES,'UTF-8',true);     // Trim the variable, strip all slashes, and encode it
           
       
}
       
        return
$out;
    }
?>
up
2
_____ at luukku dot com
12 years ago
People, don't use ereg_replace for the most simple string replacing operations (replacing constant string with another).
Use str_replace.
up
2
chuck at N0SPAM1command dot com
5 years ago
NOTE:
I made an error in my last post.

The last 3 lines should have read
<?php

...

$text = get_page($url);
--------^^^^^^^^
$new = htmlspecialchars($text, ENT_QUOTES); // here is the magic :)

   
echo '<pre>' .$new. '</pre>';

?>

OOPS!
up
1
Anonymous
13 years ago
If your sending data from one form to another, the data in the textareas and text inputs may need to have htmlspecialchars("form data", ENT_QUOTES) applied, assuming you will ever have quotes or less-than signs or any of those special characters.  Using htmlspecialchars will make the text show up properly in the second form.  The changes are automatically undone whenever the form data is submitted. It does seem a little strange, but it works and my headache is now starting to go away.

AZ
up
1
Luiz Miguel Axcar (lmaxcar at yahoo dot com dot br)
9 years ago
Hello,

If you are getting trouble to SGDB write/read HTML data, try to use this:

<?php

//from html_entity_decode() manual page
function unhtmlentities ($string) {
  
$trans_tbl =get_html_translation_table (HTML_ENTITIES );
  
$trans_tbl =array_flip ($trans_tbl );
   return
strtr ($string ,$trans_tbl );
}

//read from db
$content = stripslashes (htmlspecialchars ($field['content']));

//write to db
$content = unhtmlentities (addslashes (trim ($_POST['content'])));

//make sure result of function get_magic_quotes_gpc () == 0, you can get strange slashes in your content adding slashes twice

//better to do this using addslashes
$content = (! get_magic_quotes_gpc ()) ? addslashes ($content) : $content;

?>
up
1
Anonymous
5 years ago
Just a few notes on how one can use htmlspecialchars() and htmlentities() to filter user input on forms for later display and/or database storage...

1. Use htmlspecialchars() to filter text input values for html input tags.  i.e.,

echo '<input name=userdata type=text value="'.htmlspecialchars($data).'" />';


2. Use htmlentities() to filter the same data values for most other kinds of html tags, i.e.,

echo '<p>'.htmlentities($data).'</p>';

3. Use your database escape string function to filter the data for database updates & insertions, for instance, using postgresql,

pg_query($connection,"UPDATE datatable SET datavalue='".pg_escape_string($data)."'");


This strategy seems to work well and consistently, without restricting anything the user might like to type and display, while still providing a good deal of protection against a wide variety of html and database escape sequence injections, which might otherwise be introduced through deliberate and/or accidental input of such character sequences by users submitting their input data via html forms.
up
1
Anonymous
9 years ago
function htmlspecialchars_array($arr = array()) {
   $rs =  array();
   while(list($key,$val) = each($arr)) {
       if(is_array($val)) {
           $rs[$key] = htmlspecialchars_array($val);
       }
       else {
           $rs[$key] = htmlspecialchars($val, ENT_QUOTES);
       }   
   }
   return $rs;
}
up
1
nachitox2000 [at] hotmail [dot] com
4 years ago
I had problems with spanish special characters. So i think in using htmlspecialchars but my strings also contain HTML.
So I used this :) Hope it help

<?php
function htmlspanishchars($str)
{
    return
str_replace(array("&lt;", "&gt;"), array("<", ">"), htmlspecialchars($str, ENT_NOQUOTES, "UTF-8"));
}
?>
up
0
sascham78 at php dot net
20 days ago
If you are using UTF-8 encoding with htmlspecialchars() you may experience blank values with certain language-specific characters i.e. spanish or portuguese "âãá".
So instead of using
<?php
$utf8encoded
= htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
?>
try
<?php
$utf8encoded
= utf8_encode(htmlspecialchars($string, ENT_QUOTES));
?>
up
0
Felix D.
9 months ago
Another thing important to mention is that
htmlspecialchars(NULL)
returnes an empty string and not NULL!
up
0
Anonymous
5 years ago
This may seem obvious, but it caused me some frustration. If you try and use htmlspecialchars with the $charset argument set and the string you run it on is not actually the same charset you specify, you get any empty string returned without any notice/warning/error.

<?php

$ok_utf8
= "A valid UTF-8 string";
$bad_utf8 = "An invalid UTF-8 string";

var_dump(htmlspecialchars($bad_utf8, ENT_NOQUOTES, 'UTF-8'));  // string(0) ""

var_dump(htmlspecialchars($ok_utf8, ENT_NOQUOTES, 'UTF-8'));  // string(20) "A valid UTF-8 string"

?>

So make sure your charsets are consistent

<?php

$bad_utf8
= "An invalid UTF-8 string";

// make sure it's really UTF-8
$bad_utf8 = mb_convert_encoding($bad_utf8, 'UTF-8', mb_detect_encoding($bad_utf8));

var_dump(htmlspecialchars($bad_utf8, ENT_NOQUOTES, 'UTF-8'));  // string(23) "An invalid UTF-8 string"

?>

I had this problem because a Mac user was submitting posts copy/pasted from a program and it contained weird chars in it.
up
0
frank at codedor dot be
7 years ago
If you seem to have a problem with rendering dynamic RSS files from a database - try using htmlspecialchars() or htmlentities() on the text you are rendering.

Since XML and RSS is very strict about what is allowed inside nodes, you need to make sure everything is "A-OK" according to XML standards ...

Especially if the database you're pulling data from is fi. Latin-Swedish encoding, which seems to be the standard setting for MySQL databases.
up
0
mikiwoz at yahoo dot co dot uk
9 years ago
I am not sure, maybe I'm missing something, but I have found something interesting:
I've been working on a project, where I had to use htmlspecialchars (for opbvious reasons). I olso needed to de-code the encoded string. What I have done was almost a copy and paste from php.net:
$trans=get_html_translation_table(HTML_SPECIALCHARS, ENT_QUOTES);
$trans=array_flip($trans);
$string=strtr($encoded, $trans);
(it looked a bit different in my code, but the idea is clear)
I couldn't get the apostrophe sign de-coded, and I needed it for the <A> tags. After an hour or so of debuging, I decided do print_r($trans). What I got was:
...
[&#39;] => '
...
BUT the apostrophe was encoded to $#039; -> note the zero.
I don't suppose it's a bug, but it definetely IS a potential pitfall, watch out for this one.
up
0
zolinak at zoli dot szathmari dot hu
9 years ago
A sample function, if anybody want to turn html entities (and special characters) back to simple. (eg: "&egrave;", "<" etc)

function html2specialchars($str){
    $trans_table = array_flip(get_html_translation_table(HTML_ENTITIES));
    return strtr($str, $trans_table);
}
up
-1
joseph at nextique dot com
12 years ago
Here is a handy function to htmlalize an array (or scalar) before you hand it off to xml.

function htmlspecialchars_array($arr = array()) {
    $rs =  array();
    while(list($key,$val) = each($arr)) {
        if(is_array($val)) {
            $rs[$key] = htmlspecialchars_array($val);
        }
        else {
            $rs[$key] = htmlspecialchars($val, ENT_QUOTES);
        }   
    }
    return $rs;
}
up
-1
ryan at ryano dot net
13 years ago
Actually, if you're using >= 4.0.5, this should theoretically be quicker (less overhead anyway):

$text = str_replace(array("&gt;", "&lt;", "&quot;", "&amp;"), array(">", "<", "\"", "&"), $text);
up
-1
php dot net at orakio dot net
6 years ago
I was recently exploring some code when I saw this being used to make data safe for "SQL".

This function should not be used to make data SQL safe (although to prevent phishing it is perfectly good).

Here is an example of how NOT to use this function:

<?php
$username
= htmlspecialchars(trim("$_POST[username]"));

$uniqueuser = $realm_db->query("SELECT `login` FROM `accounts` WHERE `login` = '$username'");
?>

(Only other check on $_POST['username'] is to make sure it isn't empty which it is after trim on a white space only name)

The problem here is that it is left to default which allows single quote marks which are used in the sql query. Turning on magic quotes might fix it but you should not rely on magic quotes, in fact you should never use it and fix the code instead. There are also problems with \ not being escaped. Even if magic quotes were used there would be the problem of allowing usernames longer than the limit and having some really weird usernames given they are to be used outside of html, this just provide a front end for registering to another system using mysql. Of course using it on the output wouldn;t cause that problem.

Another way to make something of a fix would be to use ENT_QUOTE or do:

<?php
$uniqueuser
= $realm_db->query('SELECT `login` FROM `accounts` WHERE `login` = "'.$username.'";');
?>

Eitherway none of these solutions are good practice and are not entirely unflawed. This function should simply never be used in such a fashion.

I hope this will prevent newbies using this function incorrectly (as they apparently do).
up
-1
richard at mf2fm dot com
8 years ago
I had a script which detected swearing and wanted to make sure that words such as 'f &uuml; c k' didn't slip through the system.

After using htmlentities(), the following line converts most extended alphabet characters back to the standard alphabet so you can spot such problems..

$text=eregi_replace("&([a-z])[a-z0-9]{3,};", "\\\\1", $text);

This changes, for example, '&uuml;' into 'u' and '&szlig' into 's'.  Sadly it also converts '&pound;' and '&para;' into 'p' so it's not perfect but does solve a lot of the problems
up
-2
info at 8th dot at
3 years ago
English:
I'd found THE Final Solution!
it finds and replaces all unknown letters!
(like Ä, Ö, Ü, ß, and much much more)

it turn em in a HTML AND XML compatible format

parameter: $text: a String with unsuported letters in it
return: a String where all unsupported(XML und HTML) letters are changed into the Unicode value (for example &#196;)

Deutsch/German:
Ich hab die perfekte Lösung gefunden!
Es findet und tauscht alle unbekannten Buchstaben!
(wie Ä, Ö, Ü, ß, und viel viel mehr)

es tauscht sie in ein HTML und XML kompatibles Format

parameter: $text: ein String mit nichtunterstüzten Buchstaben
return: ein String wo alle von XML und HTML ununterstützten Buchstaben ins Unicode-Format getauscht sind (z.B. &#196;)

FUNCTION:

<?php
function umlaute($text){
   
$returnvalue="";
    for(
$i=0;$i<strlen($text);$i++){
       
$teil=hexdec(rawurlencode(substr($text, $i, 1)));
        if(
$teil<32||$teil>1114111){
           
$returnvalue.=substr($text, $i, 1);
        }else{
           
$returnvalue.="&#".$teil.";";
        }
    }
    return
$returnvalue;
}
?>
up
-5
steve at mcdragonsoftware dot com
3 years ago
I am working with xml and zip functions to create an xlsx document from a template. Just as I thought I had it finished it stopped working. After a bit of hunting I discovered my zip file began with php notices about undefined constant. I have no idea why my installation can't remember what ENT_XML1 is, it used to know it (or so I thought)

To save anyone else this headache I recommend adding the code at the top of your scripts to verify that these constants are registered. Something like:

defined( "ENT_XML1") or define( "ENT_XML1",        16    );

for each constant you use. Again, I don't know why this problem suddenly came up, but better safe than "Excel cannot open the file....... format... extension... corrupted... blah blah blah".

cheers :)

here's a list of the constant values (since they are not on this page) as taken from html.h for php5

#define ENT_HTML_QUOTE_NONE            0
#define ENT_HTML_QUOTE_SINGLE           1
#define ENT_HTML_QUOTE_DOUBLE                2
#define ENT_HTML_IGNORE_ERRORS        4
#define ENT_HTML_SUBSTITUTE_ERRORS     8
#define ENT_HTML_DOC_TYPE_MASK        (16|32)
#define ENT_HTML_DOC_HTML401           0
#define ENT_HTML_DOC_XML1            16
#define ENT_HTML_DOC_XHTML            32
#define ENT_HTML_DOC_HTML5            (16|32)
/* reserve bit 6 */
#define ENT_HTML_SUBSTITUTE_DISALLOWED_CHARS    128

#define ENT_COMPAT        ENT_HTML_QUOTE_DOUBLE
#define ENT_QUOTES        (ENT_HTML_QUOTE_DOUBLE | ENT_HTML_QUOTE_SINGLE)
#define ENT_NOQUOTES    ENT_HTML_QUOTE_NONE
#define ENT_IGNORE        ENT_HTML_IGNORE_ERRORS
#define ENT_SUBSTITUTE    ENT_HTML_SUBSTITUTE_ERRORS
#define ENT_HTML401        0
#define ENT_XML1        16
#define ENT_XHTML        32
#define ENT_HTML5        (16|32)
#define ENT_DISALLOWED    128
up
-2
glapa.wojciech.com
8 months ago
To show result in comment it should be:

<?php
$new
= htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo
htmlspecialchars($new); // &lt;a href=&#039;test&#039;&gt;Test&lt;/a&gt;
?>

...
up
-5
hm2k at php.net
5 years ago
<?php
/**
* A recursive version of htmlspecialchars() for arrays and strings.
*
*/

function htmlspecialchars_deep($mixed, $quote_style = ENT_QUOTES, $charset = 'UTF-8')
{
    if (
is_array($mixed)) {
        foreach(
$mixed as $key => $value) {
           
$mixed[$key] = htmlspecialchars_deep($value, $quote_style, $charset);
        }
    } elseif (
is_string($mixed)) {
       
$mixed = htmlspecialchars(htmlspecialchars_decode($mixed, $quote_style), $quote_style, $charset);
    }
    return
$mixed;
}
?>
up
-3
moc.xnoitadnuof@310symerej
10 years ago
Here are some usefull functions.
They will apply || decode, htmlspecialchars || htmlentities recursivly to arrays() || to regular $variables. They also protect agains "double encoding".

<?PHP
function htmlspecialchars_or( $mixed, $quote_style = ENT_QUOTES ){
    return
is_array($mixed) ? array_map('htmlspecialchars_or',$mixed, array_fill(0,count($mixed),$quote_style)) : htmlspecialchars(htmlspecialchars_decode($mixed, $quote_style ),$quote_style);
}

function
htmlspecialchars_decode( $mixed, $quote_style = ENT_QUOTES ) {
    if(
is_array($mixed)){
      return
array_map('htmlspecialchars_decode',$mixed, array_fill(0,count($mixed),$quote_style));
  }
 
$trans_table = get_html_translation_table( HTML_SPECIALCHARS, $quote_style );
    if(
$trans_table["'"] != '&#039;' ) { # some versions of PHP match single quotes to &#39;
       
$trans_table["'"] = '&#039;';
    }
    return (
strtr($mixed, array_flip($trans_table)));
}

function
htmlentities_or($mixed, $quote_style = ENT_QUOTES){
    return
is_array($mixed) ? array_map('htmlentities_or',$mixed, array_fill(0,count($mixed),$quote_style)) : htmlentities(htmlentities_decode($mixed, $quote_style ),$quote_style);
}

function
htmlentities_decode( $mixed, $quote_style = ENT_QUOTES ) {
  if(
is_array($mixed)){
      return
array_map('htmlentities_decode',$mixed, array_fill(0,count($mixed),$quote_style));
  }
   
$trans_table = get_html_translation_table(HTML_ENTITIES, $quote_style );
    if(
$trans_table["'"] != '&#039;' ) { # some versions of PHP match single quotes to &#39;
       
$trans_table["'"] = '&#039;';
    }
    return (
strtr($mixed, array_flip($trans_table)));
}
?>

These functions are an addition to an earlier post. I would like to give the person some credit but I do not know who it was.

<?  ;llnu=u!eJq dHd?>
To Top