Html utf 8 decoding

Html to utf8 converter

World’s simplest browser-based HTML entities to UTF8 converter. Just import your HTML escape codes in the editor on the left and you will instantly get UTF8 values on the right. Free, quick, and very powerful. Import HTML – get UTF8. Created by geeks from team Browserling.

What is a html to utf8 converter?

With this tool you can quickly decode HTML escape codes back to human-readable UTF8 strings. It is able to convert hexadecimal and decimal HTML entities, as well as supports named HTML entities. Quick and powerful!

Html to utf8 converter examples

In this example we convert a complete and fully-functional HTML web page that was previously HTML-escaped into readable HTML code.

This example converts hexadecimal HTML entities into readable text combined with UTF8 symbols and icons of various transports.

You can pass input to this tool via ?input query argument and it will automatically compute output. Here’s how to type it in your browser’s address bar. Click to try!

Читайте также:  How to print in javascript

https:// onlineutf8tools.com/convert-html-entities-to-utf8 ?input=%26lt%3Bhtml%26gt%3B%0A%20%20%26lt%3Bhead%26gt%3B%0A%20%20%20%20%26lt%3Bmeta%20charset%3D%26quot%3Butf-8%26quot%3B%26gt%3B%0A%20%20%20%20%26lt%3Btitle%26gt%3BMy%20First%20Web%20Page%26lt%3B%2Ftitle%26gt%3B%0A%20%20%26lt%3B%2Fhead%26gt%3B%0A%20%20%26lt%3Bbody%26gt%3B%0A%20%20%20%20%26lt%3Bp%26gt%3BI%20did%20it%21%20%26%23127881%3B%26lt%3B%2Fp%26gt%3B%0A%20%20%26lt%3B%2Fbody%26gt%3B%0A%26lt%3B%2Fhtml%26gt%3B

Created with love by

We’re Browserling — a friendly and fun cross-browser testing company powered by alien technology. At Browserling we love to make developers’ lives easier, so we created this collection of online UTF8 tools. Unlike many other tools, we made our tools free, without intrusive ads, and with the simplest possible user interface. Our online UTF8 tools are actually powered by our programming tools that we created over the last couple of years. Check them out!

If you love our tools, then we love you, too! Use coupon code UTF8LING to get a discount at Browserling.

All conversions and calculations are done in your browser using JavaScript. We don’t send a single bit about your input data to our servers. There is no server-side processing at all. We use Google Analytics and StatCounter for site usage analytics. Your IP address is saved on our web server, but it’s not associated with any personally identifiable information. We don’t use cookies and don’t store session information in cookies. We use your browser’s local storage to save tools’ input. It stays on your computer.

By using Online Utf8 Tools you agree to our Terms of Service. TLDR: You don’t need an account to use our tools. All tools are free of charge and you can use them as much as you want. You can’t do illegal or shady things with our tools. We may block your access to tools, if we find out you’re doing something bad. We’re not liable for your actions and we offer no warranty. We may revise our terms at any time.

Источник

‘Decode UTF-8 with Javascript

I have Javascript in an XHTML web page that is passing UTF-8 encoded strings. It needs to continue to pass the UTF-8 version, as well as decode it. How is it possible to decode a UTF-8 string for display?

  

Solution 1: [1]

To answer the original question: here is how you decode utf-8 in javascript:

function encode_utf8(s) < return unescape(encodeURIComponent(s)); >function decode_utf8(s)

We have been using this in our production code for 6 years, and it has worked flawlessly.

Note, however, that escape() and unescape() are deprecated. See this.

Solution 2: [2]

// http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt /* utf.js - UTF-8 UTF-16 convertion * * Copyright (C) 1999 Masanao Izumo [email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ function Utf8ArrayToStr(array) < var out, i, len, c; var char2, char3; out = ""; len = array.length; i = 0; while(i < len) < c = array[i++]; switch(c >> 4) < case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: // 0xxxxxxx out += String.fromCharCode(c); break; case 12: case 13: // 110x xxxx 10xx xxxx char2 = array[i++]; out += String.fromCharCode(((c & 0x1F) > return out; > 

Also see the related questions: here and here

Solution 3: [3]

Perhaps using the textDecoder will be sufficient.

Not supported in IE though.

var decoder = new TextDecoder('utf-8'), decodedMessage; decodedMessage = decoder.decode(message.data); 

Handling non-UTF8 text

In this example, we decode the Russian text «. . «, which means «Hello, world.» In our TextDecoder() constructor, we specify the Windows-1251 character encoding, which is appropriate for Cyrillic script.

 let win1251decoder = new TextDecoder('windows-1251'); let bytes = new Uint8Array([207, 240, 232, 226, 229, 242, 44, 32, 236, 232, 240, 33]); console.log(win1251decoder.decode(bytes)); // . . 

The interface for the TextDecoder is described here.

Retrieving a byte array from a string is equally simpel:

const decoder = new TextDecoder(); const encoder = new TextEncoder(); const byteArray = encoder.encode('Größe'); // converted it to a byte array // now we can decode it back to a string if desired console.log(decoder.decode(byteArray));

If you have it in a different encoding then you must compensate for that upon encoding. The parameter in the constructor for the TextEncoder is any one of the valid encodings listed here.

Solution 4: [4]

Update @Albert’s answer adding condition for emoji.

function Utf8ArrayToStr(array) < var out, i, len, c; var char2, char3, char4; out = ""; len = array.length; i = 0; while(i < len) < c = array[i++]; switch(c >> 4) < case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: // 0xxxxxxx out += String.fromCharCode(c); break; case 12: case 13: // 110x xxxx 10xx xxxx char2 = array[i++]; out += String.fromCharCode(((c & 0x1F) return out; > 

Solution 5: [5]

Here is a solution handling all Unicode code points include upper (4 byte) values and supported by all modern browsers (IE and others > 5.5). It uses decodeURIComponent(), but NOT the deprecated escape/unescape functions:

function utf8_to_str(a) < for(var i=0, s=''; ireturn decodeURIComponent(s) > 

To create UTF-8 from a string:

function utf8_from_str(s) < for(var i=0, enc = encodeURIComponent(s), a = []; i < enc.length;) < if(enc[i] === '%') < a.push(parseInt(enc.substr(i+1, 2), 16)) i += 3 >else < a.push(enc.charCodeAt(i++)) >> return a > 

Solution 6: [6]

@albert’s solution was the closest I think but it can only parse up to 3 byte utf-8 characters

function utf8ArrayToStr(array) < var out, i, len, c; var char2, char3; out = ""; len = array.length; i = 0; // XXX: Invalid bytes are ignored while(i < len) < c = array[i++]; if (c >> 7 == 0) < // 0xxx xxxx out += String.fromCharCode(c); continue; >// Invalid starting byte if (c >> 6 == 0x02) < continue; >// #### MULTIBYTE #### // How many bytes left for thus character? var extraLength = null; if (c >> 5 == 0x06) < extraLength = 1; >else if (c >> 4 == 0x0e) < extraLength = 2; >else if (c >> 3 == 0x1e) < extraLength = 3; >else if (c >> 2 == 0x3e) < extraLength = 4; >else if (c >> 1 == 0x7e) < extraLength = 5; >else < continue; >// Do we have enough bytes in our data? if (i+extraLength > len) < var leftovers = array.slice(i-1); // If there is an invalid byte in the leftovers we might want to // continue from there. for (; i < len; i++) if (array[i] >> 6 != 0x02) break; if (i != len) continue; // All leftover bytes are valid. return ; > // Remove the UTF-8 prefix from the char (res) var mask = (1 > 6 != 0x02) ; res = (res if (count != extraLength) < i--; continue; >if (res res -= 0x10000; var high = ((res >> 10) & 0x3ff) + 0xd800, low = (res & 0x3ff) + 0xdc00; out += String.fromCharCode(high, low); > return ; > 

EDIT: fixed the issue that @unhammer found.

Solution 7: [7]

// String to Utf8 ByteBuffer

function strToUTF8(str)< return Uint8Array.from(encodeURIComponent(str).replace(/%(..)/g,(m,v)=>), c=>c.codePointAt(0)) > 

Solution 8: [8]

This is what I found after a more specific Google search than just UTF-8 encode/decode. so for those who are looking for a converting library to convert between encodings, here you go.

var uint8array = new TextEncoder().encode(str); var str = new TextDecoder(encoding).decode(uint8array); 

Paste from repo readme

All encodings from the Encoding specification are supported:

utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874 windows-1250 windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 x-mac-cyrillic gb18030 hz-gb-2312 big5 euc-jp iso-2022-jp shift_jis euc-kr replacement utf-16be utf-16le x-user-defined

(Some encodings may be supported under other names, e.g. ascii, iso-8859-1, etc. See Encoding for additional labels for each encoding.)

Solution 9: [9]

Using my 1.6KB library, you can do

ToString(FromUTF8(Array.from(usernameReceived))) 

Solution 10: [10]

You should take decodeURI for it.

decodeURI('https://developer.mozilla.org/ru/docs/JavaScript_%D1%88%D0%B5%D0%BB%D0%BB%D1%8B'); // "https://developer.mozilla.org/ru/docs/JavaScript_. " 

Consider to use it inside try catch block for not missing an URIError .

Also it has full browsers support.

Solution 11: [11]

I reckon the easiest way would be to use a built-in js functions decodeURI() / encodeURI().

Solution 12: [12]

This is a solution with extensive error reporting.

It would take an UTF-8 encoded byte array (where byte array is represented as array of numbers and each number is an integer between 0 and 255 inclusive) and will produce a JavaScript string of Unicode characters.

function getNextByte(value, startByteIndex, startBitsStr, additional, index) < if (index >= value.length) < var startByte = value[startByteIndex]; throw new Error("Invalid UTF-8 sequence. Byte " + startByteIndex + " with value " + startByte + " (" + String.fromCharCode(startByte) + "; binary: " + toBinary(startByte) + ") starts with " + startBitsStr + " in binary and thus requires " + additional + " bytes after it, but we only have " + (value.length - startByteIndex) + "."); >var byteValue = value[index]; checkNextByteFormat(value, startByteIndex, startBitsStr, additional, index); return byteValue; > function checkNextByteFormat(value, startByteIndex, startBitsStr, additional, index) < if ((value[index] & 0xC0) != 0x80) < var startByte = value[startByteIndex]; var wrongByte = value[index]; throw new Error("Invalid UTF-8 byte sequence. Byte " + startByteIndex + " with value " + startByte + " (" +String.fromCharCode(startByte) + "; binary: " + toBinary(startByte) + ") starts with " + startBitsStr + " in binary and thus requires " + additional + " additional bytes, each of which shouls start with 10 in binary." + " However byte " + (index - startByteIndex) + " after it with value " + wrongByte + " (" + String.fromCharCode(wrongByte) + "; binary: " + toBinary(wrongByte) +") does not start with 10 in binary."); >> function fromUtf8 (str) < var value = []; var destIndex = 0; for (var index = 0; index < str.length; index++) < var code = str.charCodeAt(index); if (code else if (code > 6 ) & 0x1F) | 0xC0; value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80; > else if (code > 12) & 0x0F) | 0xE0; value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80; value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80; > else if (code > 18) & 0x07) | 0xF0; value[destIndex++] = ((code >> 12) & 0x3F) | 0x80; value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80; value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80; > else if (code > 24) & 0x03) | 0xF0; value[destIndex++] = ((code >> 18) & 0x3F) | 0x80; value[destIndex++] = ((code >> 12) & 0x3F) | 0x80; value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80; value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80; > else if (code > 30) & 0x01) | 0xFC; value[destIndex++] = ((code >> 24) & 0x3F) | 0x80; value[destIndex++] = ((code >> 18) & 0x3F) | 0x80; value[destIndex++] = ((code >> 12) & 0x3F) | 0x80; value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80; value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80; > else < throw new Error("Unsupported Unicode character \"" + str.charAt(index) + "\" with code " + code + " (binary: " + toBinary(code) + ") at index " + index + ". Cannot represent it as UTF-8 byte sequence."); >> return value; > 

Solution 13: [13]

const decoder = new TextDecoder(); console.log(decoder.decode(new Uint8Array([97]))); 

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Источник

TextDecoder

The TextDecoder interface represents a decoder for a specific text encoding, such as UTF-8 , ISO-8859-2 , KOI8-R , GBK , etc. A decoder takes a stream of bytes as input and emits a stream of code points.

Note: This feature is available in Web Workers

Constructor

Returns a newly constructed TextDecoder that will generate a code point stream with the decoding method specified in parameters.

Instance properties

The TextDecoder interface doesn’t inherit any properties.

A string containing the name of the decoder, which is a string describing the method the TextDecoder will use.

A Boolean indicating whether the error mode is fatal.

A Boolean indicating whether the byte order mark is ignored.

Instance methods

The TextDecoder interface doesn’t inherit any methods.

Returns a string containing the text decoded with the method of the specific TextDecoder object.

Examples

Representing text with typed arrays

This example shows how to decode a Chinese/Japanese character , as represented by five different typed arrays: Uint8Array , Int8Array , Uint16Array , Int16Array , and Int32Array .

let utf8decoder = new TextDecoder(); // default 'utf-8' or 'utf8' let u8arr = new Uint8Array([240, 160, 174, 183]); let i8arr = new Int8Array([-16, -96, -82, -73]); let u16arr = new Uint16Array([41200, 47022]); let i16arr = new Int16Array([-24336, -18514]); let i32arr = new Int32Array([-1213292304]); console.log(utf8decoder.decode(u8arr)); console.log(utf8decoder.decode(i8arr)); console.log(utf8decoder.decode(u16arr)); console.log(utf8decoder.decode(i16arr)); console.log(utf8decoder.decode(i32arr)); 

Handling non-UTF8 text

In this example, we decode the Russian text «Привет, мир!», which means «Hello, world.» In our TextDecoder() constructor, we specify the Windows-1251 character encoding, which is appropriate for Cyrillic script.

const win1251decoder = new TextDecoder("windows-1251"); const bytes = new Uint8Array([ 207, 240, 232, 226, 229, 242, 44, 32, 236, 232, 240, 33, ]); console.log(win1251decoder.decode(bytes)); // Привет, мир! 

Specifications

Browser compatibility

BCD tables only load in the browser

See also

  • The TextEncoder interface describing the inverse operation.
  • A shim allowing to use this interface in browsers that do not support it.
  • Node.js supports global export from v11.0.0

Found a content problem with this page?

This page was last modified on Feb 19, 2023 by MDN contributors.

Your blueprint for a better internet.

Источник

Оцените статью