Php convert to character

utf8_encode

This function has been DEPRECATED as of PHP 8.2.0. Relying on this function is highly discouraged.

Description

This function converts the string string from the ISO-8859-1 encoding to UTF-8 .

Note:

This function does not attempt to guess the current encoding of the provided string, it assumes it is encoded as ISO-8859-1 (also known as «Latin 1») and converts to UTF-8. Since every sequence of bytes is a valid ISO-8859-1 string, this never results in an error, but will not result in a useful string if a different encoding was intended.

Many web pages marked as using the ISO-8859-1 character encoding actually use the similar Windows-1252 encoding, and web browsers will interpret ISO-8859-1 web pages as Windows-1252 . Windows-1252 features additional printable characters, such as the Euro sign ( € ) and curly quotes ( “ ” ), instead of certain ISO-8859-1 control characters. This function will not convert such Windows-1252 characters correctly. Use a different function if Windows-1252 conversion is required.

Parameters

Return Values

Returns the UTF-8 translation of string .

Changelog

Version Description
8.2.0 This function has been deprecated.
7.2.0 This function has been moved from the XML extension to the core of PHP. In previous versions, it was only available if the XML extension was installed.
Читайте также:  Php session cookies class

Examples

Example #1 Basic example

// Convert the string ‘Zoë’ from ISO 8859-1 to UTF-8
$iso8859_1_string = «\x5A\x6F\xEB» ;
$utf8_string = utf8_encode ( $iso8859_1_string );
echo bin2hex ( $utf8_string ), «\n» ;
?>

The above example will output:

Notes

Note: Deprecation and alternatives

This function is deprecated as of PHP 8.2.0, and will be removed in a future version. Existing uses should be checked and replaced with appropriate alternatives.

Similar functionality can be achieved with mb_convert_encoding() , which supports ISO-8859-1 and many other character encodings.

$iso8859_1_string = «\xEB» ; // ‘ë’ (e with diaeresis) in ISO-8859-1
$utf8_string = mb_convert_encoding ( $iso8859_1_string , ‘UTF-8’ , ‘ISO-8859-1’ );
echo bin2hex ( $utf8_string ), «\n» ;

$iso8859_7_string = «\xEB» ; // the same string in ISO-8859-7 represents ‘λ’ (Greek lower-case lambda)
$utf8_string = mb_convert_encoding ( $iso8859_7_string , ‘UTF-8’ , ‘ISO-8859-7’ );
echo bin2hex ( $utf8_string ), «\n» ;

$windows_1252_string = «\x80» ; // ‘€’ (Euro sign) in Windows-1252, but not in ISO-8859-1
$utf8_string = mb_convert_encoding ( $windows_1252_string , ‘UTF-8’ , ‘Windows-1252’ );
echo bin2hex ( $utf8_string ), «\n» ;
?>

The above example will output:

Other options which may be available depending on the extensions installed are UConverter::transcode() and iconv() .

The following all give the same result:

$iso8859_1_string = «\x5A\x6F\xEB» ; // ‘Zoë’ in ISO-8859-1

$utf8_string = utf8_encode ( $iso8859_1_string );
echo bin2hex ( $utf8_string ), «\n» ;

$utf8_string = mb_convert_encoding ( $iso8859_1_string , ‘UTF-8’ , ‘ISO-8859-1’ );
echo bin2hex ( $utf8_string ), «\n» ;

$utf8_string = UConverter :: transcode ( $iso8859_1_string , ‘UTF8’ , ‘ISO-8859-1’ );
echo bin2hex ( $utf8_string ), «\n» ;

$utf8_string = iconv ( ‘ISO-8859-1’ , ‘UTF-8’ , $iso8859_1_string );
echo bin2hex ( $utf8_string ), «\n» ;
?>

The above example will output:

5a6fc3ab 5a6fc3ab 5a6fc3ab 5a6fc3ab

See Also

  • utf8_decode() — Converts a string from UTF-8 to ISO-8859-1, replacing invalid or unrepresentable characters
  • mb_convert_encoding() — Convert a string from one character encoding to another
  • UConverter::transcode() — Convert a string from one character encoding to another
  • iconv() — Convert a string from one character encoding to another

User Contributed Notes 24 notes

Please note that utf8_encode only converts a string encoded in ISO-8859-1 to UTF-8. A more appropriate name for it would be «iso88591_to_utf8». If your text is not encoded in ISO-8859-1, you do not need this function. If your text is already in UTF-8, you do not need this function. In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.

If you need to convert text from any encoding to any other encoding, look at iconv() instead.

Here’s some code that addresses the issue that Steven describes in the previous comment;

/* This structure encodes the difference between ISO-8859-1 and Windows-1252,
as a map from the UTF-8 encoding of some ISO-8859-1 control characters to
the UTF-8 encoding of the non-control characters that Windows-1252 places
at the equivalent code points. */

$cp1252_map = array(
«\xc2\x80» => «\xe2\x82\xac» , /* EURO SIGN */
«\xc2\x82» => «\xe2\x80\x9a» , /* SINGLE LOW-9 QUOTATION MARK */
«\xc2\x83» => «\xc6\x92» , /* LATIN SMALL LETTER F WITH HOOK */
«\xc2\x84» => «\xe2\x80\x9e» , /* DOUBLE LOW-9 QUOTATION MARK */
«\xc2\x85» => «\xe2\x80\xa6» , /* HORIZONTAL ELLIPSIS */
«\xc2\x86» => «\xe2\x80\xa0» , /* DAGGER */
«\xc2\x87» => «\xe2\x80\xa1» , /* DOUBLE DAGGER */
«\xc2\x88» => «\xcb\x86» , /* MODIFIER LETTER CIRCUMFLEX ACCENT */
«\xc2\x89» => «\xe2\x80\xb0» , /* PER MILLE SIGN */
«\xc2\x8a» => «\xc5\xa0» , /* LATIN CAPITAL LETTER S WITH CARON */
«\xc2\x8b» => «\xe2\x80\xb9» , /* SINGLE LEFT-POINTING ANGLE QUOTATION */
«\xc2\x8c» => «\xc5\x92» , /* LATIN CAPITAL LIGATURE OE */
«\xc2\x8e» => «\xc5\xbd» , /* LATIN CAPITAL LETTER Z WITH CARON */
«\xc2\x91» => «\xe2\x80\x98» , /* LEFT SINGLE QUOTATION MARK */
«\xc2\x92» => «\xe2\x80\x99» , /* RIGHT SINGLE QUOTATION MARK */
«\xc2\x93» => «\xe2\x80\x9c» , /* LEFT DOUBLE QUOTATION MARK */
«\xc2\x94» => «\xe2\x80\x9d» , /* RIGHT DOUBLE QUOTATION MARK */
«\xc2\x95» => «\xe2\x80\xa2» , /* BULLET */
«\xc2\x96» => «\xe2\x80\x93» , /* EN DASH */
«\xc2\x97» => «\xe2\x80\x94» , /* EM DASH */

«\xc2\x98» => «\xcb\x9c» , /* SMALL TILDE */
«\xc2\x99» => «\xe2\x84\xa2» , /* TRADE MARK SIGN */
«\xc2\x9a» => «\xc5\xa1» , /* LATIN SMALL LETTER S WITH CARON */
«\xc2\x9b» => «\xe2\x80\xba» , /* SINGLE RIGHT-POINTING ANGLE QUOTATION*/
«\xc2\x9c» => «\xc5\x93» , /* LATIN SMALL LIGATURE OE */
«\xc2\x9e» => «\xc5\xbe» , /* LATIN SMALL LETTER Z WITH CARON */
«\xc2\x9f» => «\xc5\xb8» /* LATIN CAPITAL LETTER Y WITH DIAERESIS*/
);

function cp1252_to_utf8 ( $str ) global $cp1252_map ;
return strtr ( utf8_encode ( $str ), $cp1252_map );
>

For reference, it may be insightful to point out that:
utf8_encode($s)
is actually identical to:
recode_string(‘latin1..utf8’, $s)
and:
iconv(‘iso-8859-1’, ‘utf-8’, $s)
That is, utf8_encode is a specialized case of character set conversions.

If your string to be converted to utf-8 is something other than iso-8859-1 (such as iso-8859-2 (Polish/Croatian)), you should use recode_string() or iconv() instead rather than trying to devise complex str_replace statements.

If you haven’t guessed already: If the UTF-8 character has no representation in the ISO-8859-1 codepage, a ? will be returned. You might want to wrap a function around this to make sure you aren’t saving a bunch of . into your database.

If you need a function which converts a string array into a utf8 encoded string array then this function might be useful for you:

Источник

PHP chr() Function To Convert Byte To Character/String

PHP chr() Function To Convert Byte To Character/String

PHP provides the chr() function in order to convert the given number into a single character or string. Even it is called byte value it is an integer which can be between 0 and 255.

Convert Byte To Char

In the following example, we will convert the given byte values or integers into a character or string. The syntax is very simple where the byte value is provided as a parameter and the single character or string is returned.

  • STRING is the string or character type variable where the BYTE_VALUE character representation will be assigned.
  • BYTE_VALUE is the value we want to convert to the char or string.
$str = chr(240) . chr(159) . chr(144) . chr(152); echo $str; #The output will be ? $str = chr(144); echo $str; #The output will be � $str = chr(89); echo $str; # Output will be Y $str = chr(88); echo $str; #The output will be X $str = chr(87); echo $str; #The output will be W

If The Value Is Higher Than 256

In some cases, the value can be higher than the 256 which is the limit of the chr() function. Or the value can be lower than the 0 as a negative number like -56. In this case, the mod operation is implemented were given out of range values will be converted between 0 and 256.

$str = chr(-169); echo $str; #The output will be W $str = chr(87); echo $str; #The output will be W $str = chr(-170); echo $str; #The output will be V $str = chr(86); echo $str; #The output will be V $str = chr(342); echo $str; #The output will be V

ASCII Table

chr() function uses the ASCII table for byte value or integer value into a single character or string conversion. ASCII table provides the given single character numeric equation like below. For example, 62 will be converted into the < sign.

Источник

htmlspecialchars_decode

This function is the opposite of htmlspecialchars() . It converts special HTML entities back to characters.

The converted entities are: & , " (when ENT_NOQUOTES is not set), ' (when ENT_QUOTES is set), < and > .

Parameters

A bitmask of one or more of the following flags, which specify how to handle quotes and which document type to use. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .

Available flags constants
Constant Name Description
ENT_COMPAT Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES Will convert both double and single quotes.
ENT_NOQUOTES Will leave both double and single quotes unconverted.
ENT_SUBSTITUTE Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or � (otherwise) instead of returning an empty string.
ENT_HTML401 Handle code as HTML 4.01.
ENT_XML1 Handle code as XML 1.
ENT_XHTML Handle code as XHTML.
ENT_HTML5 Handle code as HTML 5.

Return Values

Returns the decoded string.

Changelog

Version Description
8.1.0 flags changed from ENT_COMPAT to ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .

Источник

Converting character to code and code to character

All characters, even if they are not available on your keyboard, have a character code. PHP offers four functions to deal with these characters code: ord() and chr() to deal with single-byte characters; mb_ord and mb_chr() to deal with multi-byte characters. The chr() and mb_chr() convert the numeric value into the corresponding character; ord() and mb_ord() returns the numeric value for characters.

The ord() function

The ord() function converts the first byte of a string to a value between 0 and 255. For example, a value is 97 and A value is 65 :

This function converts the single-byte characters, use the mb_ord() function if you are dealing with multi-byte characters.

Show ord() function result on Web Browsers

Use the HTML entities syntax and the web browser will automatically decode the number into the relevant character. For example, the ASCII value/code of B is 66 , the HTML entity B displays B on a browser page, see example:

Protecting Email Address

The following code can be used to protect email addresses from spammers. Using HTML entities for email addresses, making it much harder for spam bots to find email addresses from web pages:

The chr() function

The chr() function does the reverse of ord() function, it generates a single-byte string from a number (0-255), use mb_chr() function if you are dealing with multi-byte characters. In the following example, we used chr() function to generate the entire alphabet:

 echo chr($a); > //ABCDEFGHIJKLMNOPQRSTUVWXYZ

mb_ord()

  1. $string : the input string
  2. $encoding (optional): the character encoding. If null or not provided, the internal character encoding value will be used.

The mb_ord() function returns the Unicode code point value of the given character. The code point value is a numerical value that maps to a specific character.

mb_chr()

  1. $codepoint : A Unicode codepoint value
  2. $encoding (optional): the character encoding. If null or not provided, the internal character encoding value will be used.

The mb_chr() function does the reverse of mb_ord() function, it generates a multi-byte string from a number (Unicode codepoint value), use chr() function if you are dealing with single-byte characters.

Working with Strings:

Источник

Оцените статью