Content encoding html meta

Содержание

: The metadata element
Attributes
Examples
Specifications
Browser compatibility
See also
Found a content problem with this page?
MDN
Support
Our communities
Developers
Content Encoding: why and how to use the meta charset tag and the Content-Type header
How to choose the right character set?
How to advertise your character encoding… and the best way to do it.

: The metadata element

The HTML element represents metadata that cannot be represented by other HTML meta-related elements, like , , , or .

, : a element. If the http-equiv is not an encoding declaration, it can also be inside a element, itself inside a element.
: any element that accepts metadata content.
: any element that accepts metadata content or flow content.

The type of metadata provided by the element can be one of the following:

If the name attribute is set, the element provides document-level metadata, applying to the whole page.
If the http-equiv attribute is set, the element is a pragma directive, providing information equivalent to what can be given by a similarly-named HTTP header.
If the charset attribute is set, the element is a charset declaration, giving the character encoding in which the document is encoded.
If the itemprop attribute is set, the element provides user-defined metadata.

Attributes

This element includes the global attributes.

Note: the attribute name has a specific meaning for the element, and the itemprop attribute must not be set on the same element that has any existing name , http-equiv or charset attributes.

This attribute declares the document’s character encoding. If the attribute is present, its value must be an ASCII case-insensitive match for the string «utf-8» , because UTF-8 is the only valid encoding for HTML5 documents. elements which declare a character encoding must be located entirely within the first 1024 bytes of the document.

Читайте также: Все виды массивов python

This attribute contains the value for the http-equiv or name attribute, depending on which is used.

Defines a pragma directive. The attribute is named http-equiv(alent) because all the allowed values are names of particular HTTP headers:

content-security-policy Allows page authors to define a content policy for the current page. Content policies mostly specify allowed server origins and script endpoints which help guard against cross-site scripting attacks.
content-type Declares the MIME type and the document’s character encoding. The content attribute must have the value «text/html; charset=utf-8» if specified. This is equivalent to a element with the charset attribute specified and carries the same restriction on placement within the document. Note: Can only be used in documents served with a text/html — not in documents served with an XML MIME type.
default-style Sets the name of the default CSS style sheet set.
x-ua-compatible If specified, the content attribute must have the value «IE=edge» . User agents are required to ignore this pragma.
refresh This instruction specifies:
- The number of seconds until the page should be reloaded — only if the content attribute contains a non-negative integer.
- The number of seconds until the page should redirect to another — only if the content attribute contains a non-negative integer followed by the string ‘ ;url= ‘, and a valid URL.
Pages set with a refresh value run the risk of having the time interval being too short. People navigating with the aid of assistive technology such as a screen reader may be unable to read through and understand the page’s content before being automatically redirected. The abrupt, unannounced updating of the page content may also be disorienting for people experiencing low vision conditions.

The name and content attributes can be used together to provide document metadata in terms of name-value pairs, with the name attribute giving the metadata name, and the content attribute giving the value.

See standard metadata names for details about the set of standard metadata names defined in the HTML specification.

Examples
```
meta charset="utf-8" /> meta http-equiv="refresh" content="3;url=https://www.mozilla.org" /> 
```
Specifications

Browser compatibility

BCD tables only load in the browser

See also

Found a content problem with this page?

This page was last modified on Jul 18, 2023 by MDN contributors.

Your blueprint for a better internet.

MDN

Support

Our communities

Developers

Visit Mozilla Corporation’s not-for-profit parent, the Mozilla Foundation.
Portions of this content are ©1998– 2023 by individual mozilla.org contributors. Content available under a Creative Commons license.

Источник

Content Encoding: why and how to use the meta charset tag and the Content-Type header

Improving the speed at which a web page is displayed often means making the browser’s life as easy as possible. When the browser receives an HTTP response, it actually receives text encoded in bytes, where each byte or sequence of bytes represents a given character. If the browser does not have a clear information about the used encoding, it will waste time trying to guess and may fail in some cases. Although the Web is intended to be universal, the various human groups that use it have their own specificities. One of these specificities is language, especially when written. All textual content is composed of characters from a directory intended for a type of use. Hiraganas, for example, are phonetic system intended for the unambiguous transcription of the Japanese language. To be able to designate each character unambiguously, we must assign a unique identifier to each of them. The whole set of identifiers will be called a character set. Once this correspondence table has been defined, each character need be converted into a sequence of bytes so that we can store or share them between computers. This is called character encoding. Imagine that I use a character set to write text and a corresponding encoding to convert it to bytes, which I later send to you. How would you decode it, and read the content, without knowing which encoding, or set, I used? Eventually, you would have to use some of the most common character set & encodings you know, expecting the result to make sense… What could go wrong?

Replace a semicolon (;) with a greek question mark (;) in your friend’s JavaScript and watch them pull their hair out over the syntax error.

So yeah… not a great idea. For example, the bit sequence 1100 0011 1010 1001 represents the character «é» in the UTF-8 encoding. If you decode this sequence assuming you have to use the Latin-1 encoding and not UTF-8, you will read «Ã ©».

In Latin-1, the character «é» is represented by the sequence 1110 1001. When the browser receives bytes from your server, it needs to identify the collection of letters and symbols that were used in writing the text that was converted into these bytes, and the encoding used for this conversion, in order to reverse it. If no information of this kind has been transmitted, the browser will try to find recognizable patterns within the bytes to determine the encoding itself, and eventually try some common charsets, which will take time, delaying further processing of the page. To speed up the display of your pages, you must specify the content encoding into your HTTP response.

How to choose the right character set?

There was a time when hundreds of character encodings coexisted, all limited and not able to contain enough characters to cover all the languages of the world. Sometimes, no encoding was adequate for all letters in a single language. Nowadays, Unicode – a universal character set, defining all the characters necessary to write the majority of languages – has become a standard, no matter what platform, device, application or language you’re targeting. UTF-8 is one of the Unicode encodings and the one that should be used for Web content, according to the W3C:

Everyone developing content, whether content authors or programmers, should use the UTF-8 character encoding, unless there are very special reasons for using something else. (If you decide to not use UTF-8, you must choose one of the few encodings that are interoperably implemented across all browsers.)

Note: if you’re using a database to store your content on the server side, you may be tempted to also use the «utf-8» charset too. Beware: on MySQL and MariaDB, it’s an alias for «utf8mb3», a UTF-8 encoding called «Basic Multilingual Plane» – or BMP – that only stores a maximum of three bytes per code point. Instead, you’d rather use «utf8mb4», an encoding that stores a maximum of four bytes per code point. Otherwise, you won’t be able to use some popular characters, such as 🚀, otherwise known as «U+1F680 ROCKET»!

How to advertise your character encoding… and the best way to do it.

Historically, the terms «character encoding», «character map», «character set» and «code page» were synonymous in computer science[…]. But now the terms have related but distinct meanings,[…] Regardless, the terms are still used interchangeably, with character set being nearly ubiquitous.

We find this use of «character set» or «charset» to designate, in reality, an encoding, in the HTML specifications. We will do the same in the rest of this article. One of the easiest ways to specify a charset in an HTML page is to put in a tag in the element:

Declaring a character set this way requires certain constraints to be respected, one of them being that the element containing the character encoding declaration must be serialized completely within the first 1024 bytes of the document, to ensure that the browser will receive the information with the first IP packets transiting through the network and can use it to decode the rest of the document. As the charset tag is the only one with this kind of requirement, the most common tip is to place it directly after the element opening tag:

If you’re afraid to forget this, don’t worry. This is obviously one of the checks that Dareboost will perform for you within our website quality analysis tool. However, you may find yourself in a situation where this declaration is not sufficient, and the browser does not take it into account. Why? Because the Content-Type metadata of the page may indicate another character set and in the event of a conflict, this information – defined in the page HTTP headers – has priority. To make sure of the information transmitted through the page metadata, you can use our Timeline / Waterfall feature. Unfold the detailed values of your main document to view the response HTTP headers, including the Content-Type header, containing the encoding metadata. To change this HTTP header, you may need the help of the person who managed the server, whereas it’s your hosting service provider or a person in charge in your organization, because the configuration of the HTTP headers is very specific to the web server in use, and you’ll need the appropriate administrative rights to be able to modify those server settings. On Apache 2.2+, the configuration of UTF-8 as a default character set for your text/plain and text/html files involves the AddDefaultCharset directive:

Источник

Content encoding html meta

: The metadata element

Attributes

Examples

Specifications

Browser compatibility

See also

Found a content problem with this page?

MDN

Support

Our communities

Developers

Content Encoding: why and how to use the meta charset tag and the Content-Type header

How to choose the right character set?

How to advertise your character encoding… and the best way to do it.