Format utf 8 javascript

utf8.js is a well-tested UTF-8 encoder/decoder written in JavaScript. Unlike many other JavaScript solutions, it is designed to be a proper UTF-8 encoder/decoder: it can encode/decode any scalar Unicode code point values, as per the Encoding Standard. Here’s an online demo.

Feel free to fork if you see possible improvements!

Encodes any given JavaScript string ( string ) as UTF-8, and returns the UTF-8-encoded version of the string. It throws an error if the input string contains a non-scalar value, i.e. a lone surrogate. (If you need to be able to encode non-scalar values as well, use WTF-8 instead.)

// U+00A9 COPYRIGHT SIGN; see http://codepoints.net/U+00A9
utf8.encode('\xA9');
// → '\xC2\xA9'
// U+10001 LINEAR B SYLLABLE B038 E; see http://codepoints.net/U+10001
utf8.encode('\uD800\uDC01');
// → '\xF0\x90\x80\x81'

Decodes any given UTF-8-encoded string ( byteString ) as UTF-8, and returns the UTF-8-decoded version of the string. It throws an error when malformed UTF-8 is detected. (If you need to be able to decode encoded non-scalar values as well, use WTF-8 instead.)

utf8.decode('\xC2\xA9');
// → '\xA9'
 utf8.decode('\xF0\x90\x80\x81');
// → '\uD800\uDC01'
// → U+10001 LINEAR B SYLLABLE B038 E

A string representing the semantic version number.

Читайте также: Timing sorts in java

utf8.js has been tested in at least Chrome 27-39, Firefox 3-34, Safari 4-8, Opera 10-28, IE 6-11, Node.js v0.10.0, Narwhal 0.3.2, RingoJS 0.8-0.11, PhantomJS 1.9.0, and Rhino 1.7RC4.

Unit tests & code coverage

After cloning this repository, run npm install to install the dependencies needed for development and testing. You may want to install Istanbul globally using npm install istanbul -g .

Once that’s done, you can run the unit tests in Node using npm test or node tests/tests.js . To run the tests in Rhino, Ringo, Narwhal, PhantomJS, and web browsers as well, use grunt test .

To generate the code coverage report, use grunt cover .

Why is the first release named v2.0.0? Haven’t you heard of semantic versioning?

Long before utf8.js was created, the utf8 module on npm was registered and used by another (slightly buggy) library. @ryanmcgrath was kind enough to give me access to the utf8 package on npm when I told him about utf8.js. Since there has already been a v1.0.0 release of the old library, and to avoid breaking backwards compatibility with projects that rely on the utf8 npm package, I decided the tag the first release of utf8.js as v2.0.0 and take it from there.

utf8.js is available under the MIT license.

Источник

The Complete Guide to Encoding JavaScript Strings to UTF-8 Format

Learn how to encode JavaScript strings into UTF-8 format with this comprehensive guide. Convert strings into a stream of bytes with encodeURI(), encodeURIComponent(), or Buffer.from(). Decode UTF-8 encoded strings with decodeURI().

Understanding JavaScript String Encoding
Decoding UTF-8 Encoded JavaScript Strings
Conversion between UTF-8 ArrayBuffer and String
Marshaling JavaScript Strings into Uint8Array
Other Methods for Encoding JavaScript Strings in UTF-8 Format
Other useful code examples for encoding JavaScript strings to UTF-8 format
Conclusion
How to convert string to UTF-8 format in JavaScript?
How to UTF-8 encode in JS?
How to convert string to UTF-8?
Are JavaScript strings UTF-8?

If you’re working with JavaScript strings, you may have noticed that they are stored in UTF-16 or UCS-2 encoding formats, which use 2 bytes for each character. However, there may be situations where you need to encode a JavaScript string into UTF-8 format. In this blog post, I will provide a comprehensive guide on how to encode JavaScript strings into UTF-8 format, including important and helpful points to keep in mind.

Understanding JavaScript String Encoding

UTF-16 or UCS-2 encoding formats are widely used for JavaScript strings, but sometimes it may be necessary to encode them in UTF-8 format. UTF-8 is a variable-length encoding format that uses 1 to 4 bytes to represent each character. To encode a JavaScript string into UTF-8 format, it is necessary to convert the string into a stream of bytes that can be read and written by UTF-8 encoding.

You can achieve this using several built-in functions, including encodeURI() , encodeURIComponent() , or Buffer.from() . The encodeURI() function encodes a complete URI by replacing each instance of certain characters with its corresponding UTF-8 escape sequence. The encodeURIComponent() function is similar to encodeURI() , but it also encodes characters that can be used in URIs, such as & or = . The Buffer.from() function creates a new buffer object from an input string, which can then be encoded into UTF-8 format.

Decoding UTF-8 Encoded JavaScript Strings

To decode a UTF-8 encoded JavaScript string, you can use the decodeURI() function, which decodes a complete URI. However, you must know the encoding format of the input string to decode it properly. If the input string is not encoded in UTF-8 format, the decoding process may not work as expected.

Conversion between UTF-8 ArrayBuffer and String

Conversion between UTF-8 ArrayBuffer and String can be achieved using specific methods or libraries. The TextEncoder and TextDecoder APIs can be used to encode and decode strings into UTF-8 format, respectively.

TypedArrays, such as Uint8Array or Int8Array , can also be used for more efficient encoding and decoding . For example, you can use the TextEncoder.encode() method to encode a string into a Uint8Array object, which can then be converted back to a string using the TextDecoder.decode() method.

Marshaling JavaScript Strings into Uint8Array

JavaScript strings can be marshaled into Uint8Array to achieve UTF-8 encoding. This can be done using the TextEncoder.encode() method, which converts a string into a Uint8Array object. However, the input string may throw an error if it contains incompatible characters.

To avoid this issue, you can use the TextEncoder.encodeInto() method, which encodes a string into a Uint8Array object, but also returns information about the encoding process. If the input string contains incompatible characters, the method will return an error instead of encoding the string.

Other Methods for Encoding JavaScript Strings in UTF-8 Format

In addition to the methods mentioned above, there are other ways to encode JavaScript strings in UTF-8 format. One such method is using unescape(encodeURIComponent(str)) , which encodes a string in UTF-8 format and then decodes it back to its original form. Another method is encoding the string in Hex format and then decoding it back to UTF-8.

JavaScript strings can also be encoded in other formats, such as Base64. The btoa() function can be used to encode a string in Base64 format, while the atob() function can be used to decode a Base64-encoded string.

Other useful code examples for encoding JavaScript strings to UTF-8 format

In Javascript , for instance, node js utf8 encode

//install using 'npm install utf8' const utf8 = require('utf8'); utf8.encode(string)

Conclusion

In conclusion, encoding JavaScript strings into UTF-8 format can be achieved using various methods, including encodeURI() , encodeURIComponent() , or Buffer.from() . Decoding a UTF-8 encoded JavaScript string can be done using the decodeURI() function. It is important to keep in mind the input string’s encoding format, which can affect the decoding process. Conversion between UTF-8 ArrayBuffer and String can be achieved using specific methods or libraries. Marshaling JavaScript strings into Uint8Array can help achieve UTF-8 encoding, but it is essential to ensure the input string does not contain incompatible characters. Other methods for encoding JavaScript strings in UTF-8 format include using unescape(encodeURIComponent(str)) , encoding in Hex format, and Base64 encoding.

Источник

encodeURI()

The encodeURI() function encodes a URI by replacing each instance of certain characters by one, two, three, or four escape sequences representing the UTF-8 encoding of the character (will only be four escape sequences for characters composed of two surrogate characters). Compared to encodeURIComponent() , this function encodes fewer characters, preserving those that are part of the URI syntax.

Try it

Syntax

Parameters

A string to be encoded as a URI.

Return value

A new string representing the provided string encoded as a URI.

Exceptions

Description

encodeURI() is a function property of the global object.

The encodeURI() function escapes characters by UTF-8 code units, with each octet encoded in the format %XX , left-padded with 0 if necessary. Because lone surrogates in UTF-16 do not encode any valid Unicode character, they cause encodeURI() to throw a URIError .

encodeURI() escapes all characters except:

The characters on the second line are characters that may be part of the URI syntax, and are only escaped by encodeURIComponent() . Both encodeURI() and encodeURIComponent() do not encode the characters -.!~*'() , known as «unreserved marks», which do not have a reserved purpose but are allowed in a URI «as is». (See RFC2396)

The encodeURI() function does not encode characters that have special meaning (reserved characters) for a URI. The following example shows all the parts that a URI can possibly contain. Note how certain characters are used to signify special meaning:

http://username:password@www.example.com:80/path/to/file.php?foo=316&bar=this+has+spaces#anchor

Examples

encodeURI() vs. encodeURIComponent()

encodeURI() differs from encodeURIComponent() as follows:

const set1 = ";/?:@&=+$,#"; // Reserved Characters const set2 = "-.!~*'()"; // Unreserved Marks const set3 = "ABC abc 123"; // Alphanumeric Characters + Space console.log(encodeURI(set1)); // ;/?:@&=+$,# console.log(encodeURI(set2)); // -.!~*'() console.log(encodeURI(set3)); // ABC%20abc%20123 (the space gets encoded as %20) console.log(encodeURIComponent(set1)); // %3B%2C%2F%3F%3A%40%26%3D%2B%24%23 console.log(encodeURIComponent(set2)); // -.!~*'() console.log(encodeURIComponent(set3)); // ABC%20abc%20123 (the space gets encoded as %20)

Note that encodeURI() by itself cannot form proper HTTP GET and POST requests, such as for XMLHttpRequest , because & , + , and = are not encoded, which are treated as special characters in GET and POST requests. encodeURIComponent() , however, does encode these characters.

Encoding a lone high surrogate throws

A URIError will be thrown if one attempts to encode a surrogate which is not part of a high-low pair. For example:

// High-low pair OK encodeURI("\uD800\uDFFF"); // "%F0%90%8F%BF" // Lone high surrogate throws "URIError: malformed URI sequence" encodeURI("\uD800"); // Lone low surrogate throws "URIError: malformed URI sequence" encodeURI("\uDFFF");

You can use String.prototype.toWellFormed() , which replaces lone surrogates with the Unicode replacement character (U+FFFD), to avoid this error. You can also use String.prototype.isWellFormed() to check if a string contains lone surrogates before passing it to encodeURI() .

Encoding for RFC3986

The more recent RFC3986 makes square brackets reserved (for IPv6) and thus not encoded when forming something which could be part of a URL (such as a host). It also reserves !, ‘, (, ), and *, even though these characters have no formalized URI delimiting uses. The following function encodes a string for RFC3986-compliant URL format.

function encodeRFC3986URI(str)  return encodeURI(str) .replace(/%5B/g, "[") .replace(/%5D/g, "]") .replace( /[!'()*]/g, (c) => `%$c.charCodeAt(0).toString(16).toUpperCase()>`, ); >

Specifications

Browser compatibility

BCD tables only load in the browser

MDN

Support

Our communities

Developers

Visit Mozilla Corporation’s not-for-profit parent, the Mozilla Foundation.
Portions of this content are ©1998– 2023 by individual mozilla.org contributors. Content available under a Creative Commons license.

Источник

Format utf 8 javascript