Go to » Web - QA - Dictionary - Encyclopedia - Images
 Web Opens New Window. Results 1 - 10 of about 1,390,000 for Character encodings HTML 

Character encodings in HTML - Wikipedia, the free encyclopedia

  
HTML (Hypertext Markup Language) has been in use since 1991, but HTML ... In addition to native character encodings, characters can also be encoded as character ...
http://en.wikipedia.org/wiki/Character_encodings_in_HTML

HTML Validation: Using Character Encodings

  
How to validate HTML documents in various character encodings.
http://www.htmlhelp.com/tools/validator/charset.html

HTML Tips - Character Encodings

  
Web On, Web Off - web development tips, articles, and code. AJAX, Javascript, CSS, HTML ...
http://www.webonweboff.com/tips/html/character_encodings.aspx

Featured content page about: character encodings in html

  
Featured content page about: character encodings in html ... About character encodings in html. HTML has been in use since 1991, but HTML 4.0 (December 1997) was the first ...
http://about.qkport.com/c/character_encodings_in_html

HTML / XHTML Character Encodings

  
HTML / XHTML Character Encodings - Free tutorials and references for SOAP XML-RPC Web ... The most common character set or character encoding in use on computers is ASCII The ...
http://www.tutorialspoint.com/html/html_character_encodings.htm

HTML Document Representation

  
Chapter covering document character sets and encodings in HTML from the World Wide Web Consortium's HTML 4.0 Specification.
http://www.w3.org/TR/REC-html40/charset.html

Authoring Techniques for XHTML & HTML Internationalization ...

  
In general, user agents are most likely to support the commonly-used native character ... Richard Ishida, Character Sets & Encodings in XHTML, HTML and CSS, Draft. ...
http://www.w3.org/TR/2004/WD-i18n-html-tech-char-20040509/

Character entity reference - Wikipedia, the free encyclopedia

  
In the markup languages SGML, HTML, XHTML and XML, a character entity reference is a ... Although in popular usage character references are often called "entity ...
http://en.wikipedia.org/wiki/Character_entity_reference

UTF-8: The Secret of Character Encoding - HTML Purifier

  
HTML Purifier End-User Documentation. Character encoding and character sets are not that ... not to be butting in on your character encodings, you can tell it not ...
http://htmlpurifier.org/docs/enduser-utf8.html

HttpClient - Character Encodings

  
This document provides an overview of how HttpClient handles character encodings and how to use HttpClient in an encoding safe way. ...
http://hc.apache.org/httpclient-3.x/charencodings.html
 MORE WEB RESULTS »  

 Questions 'n' Answers about 'Character encodings in HTML' Opens New Window.
We did not find QA results for: Character encodings in HTML. Try the suggestions below or type a new query above.

Suggestions:

  • Check your spelling.
  • Try more general words.
  • Try different words that mean the same thing.
  • Broaden your search by using fewer words.
 Dictionary Opens New Window.

Click on the word below to see the definition:
 
 Encyclopedia Opens New Window.

HTML
HTML.svg

The possibility to use non-default character encodings in HTML was introduced in HTML4 (1997), despite the fact that HTML was first introduced in 1991. If an HTML document includes characters outside the range of ASCII, the information's integrity and universal browser display may be harmed if the document does not define the used character encoding.

Contents

[edit] Specifying the document's character encoding

There are several ways to specify which character encoding is used in the document. First, the web server can include the character encoding or "charset" in the Hypertext Transfer Protocol (HTTP) Content-Type header, which would typically look like this:[1]

Content-Type: text/html; charset=ISO-8859-1

In HTML (but not in XHTML), it is also possible to include this information in the document itself. In this case, the following code could be added near the top of the document, inside the head element:[2]

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

HTML5 also allows the following syntax to mean exactly the same:[2]

<meta charset="utf-8">

XML documents, including XHTML documents, on the other hand, can use a processing instruction, as follows:[3]

<?xml version="1.0" encoding="ISO-8859-1"?>

As each of these methods explain to the receiver how the file being sent should be interpreted, it would be inappropriate for these declaration not to match the actual character encoding used. Because a server usually can't know how a document is encoded—especially if documents are created on different platforms or in different regions—many servers[citation needed] simply do not include a reference to the "charset" in the Content-Type header, thus avoiding making false promises. However, if the document does not specify the encoding either, this may result in the equally bad situation where the user agent displays mojibake because it cannot find out which character encoding was used.

If a user agent reads a document with no character encoding information, it can fall back to using some other information. For example, it can rely on the user's settings, either browser-wide or specific for a given document, or it can pick a default encoding based on the user's language. For Western European languages, it is typical and fairly safe to assume Windows-1252, which is similar to ISO-8859-1 but has printable characters in place of some control codes. The consequence of choosing incorrectly is that characters outside the printable ASCII range (32 to 127) usually appear incorrectly. This presents few problems for English-speaking users, but other languages regularly—in some cases, always—require characters outside that range. In CJK environments where there are several different multi-byte encodings in use, auto-detection is also often employed. Finally, browsers usually permit to override incorrect charset label manually as well.

It is increasingly common for multilingual websites and websites in non-Western languages to use UTF-8, which allows use of the same encoding for all languages. UTF-16 or UTF-32, which can be used for all languages as well, are less widely used because they van be harder to handle in programming languages that assume a byte-oriented ASCII superset encoding, and they are less efficient for text with a high frequency of ASCII characters, which is usually the case for HTML documents.

Successful viewing of a page is not necessarily an indication that its encoding is specified correctly. If the page's creator and reader are both assuming some platform-specific character encoding, and the server does not send any identifying information, then the reader will nonetheless see the page as the creator intended, but other readers on different platforms or with different native languages will not see the page as intended.

[edit] Character references

In addition to native character encodings, characters can also be encoded as character references, which can be numeric character references (decimal or hexadecimal) or character entity references. Character entity references are also sometimes referred to as named entities, or HTML entities for HTML. HTML's usage of character references derives from SGML.

[edit] HTML character references

Numeric character references can be in decimal format, &#DD;, where DD is a variable number of decimal digits. Similarly there is a hexadecimal format, &#xHHHH;, where HHHH is a variable number of hexadecimal digits. Hexadecimal character references are case-insensitive in HTML. For example, the character 'λ' can be represented as &#955;, &#x03BB; or &#X03bb;.

Character entity references have the format &name; where "name" is a case-sensitive alphanumeric string. For example, 'λ' can also be encoded as &lambda; in an HTML document. (For a list of all named HTML character entity references, see List of XML and HTML character entity references.) The character entity references &lt;, &gt;, &quot; and &amp; are predefined in HTML and SGML, because <, >, " and & are already used to delimit markup. This notably does not include XML's &apos; (') entity.

Numeric references always refer to Unicode code points, regardless of the page's encoding. Using numeric references that refer to permanently undefined characters and control characters is forbidden, with the exception of the linefeed, tab, and carriage return characters. That is, characters in the hexadecimal ranges 00–08, 0B–0C, 0E–1F, 7F, and 80–9F cannot be used in an HTML document, not even by reference, so "&#153;", for example, is not allowed. However, for backward compatibility with early HTML authors and browsers that ignored this restriction, raw characters and numeric character references in the 80–9F range are interpreted as representing the characters mapped to bytes 80–9F in the Windows-1252 encoding.

Unnecessary use of HTML character references may significantly reduce HTML readability. If the character encoding for a web page is chosen appropriately, for example a Unicode encoding like UTF-8, then HTML character references are usually only required for a the markup delimiting characters mentioned above.

[edit] XML character references

Unlike traditional HTML with its large range of character entity references, in XML there are only five predefined character entity references. These are used to escape characters that are markup sensitive in certain contexts:[4]

  • &amp; → & (ampersand, U+0026)
  • &lt; → < (less-than sign, U+003C)
  • &gt; → > (greater-than sign, U+003E)
  • &quot; → " (quotation mark, U+0022)
  • &apos; → ' (apostrophe, U+0027)

All other character entity references have to be defined before they can be used. For example, use of &eacute; (which gives é, Latin lower-case E with acute accent, U+00E9 in Unicode) in an XML document will generate an error unless the entity has already been defined. XML also requires that the x in hexadecimal numeric references be in lowercase: for example &#xA1b rather than &#XA1b. XHTML, which is an XML application, supports the HTML entity set, along with XML's predefined entities.

[edit] References

  1. ^ Fielding, R.; Gettys, J.; Mogul, J.; Frystyk, H.; Masinter, L.; Leach, P.; Berners-Lee, T. (June 1999), "Content-Type", Hypertext Transfer Protocol – HTTP/1.1, IETF, http://tools.ietf.org/html/rfc2616#section-14.17, retrieved 8 March 2010 
  2. ^ a b Hickson, I. (5 March 2010), "Specifying the document's character encoding", HTML5, WHATWG, http://www.whatwg.org/html/#charset, retrieved 8 March 2010 
  3. ^ Bray, T.; Paoli, J.; Sperberg-McQueen, C.; Maler, E.; Yergeau, F. (26 November 2008), "Processing Instructions", XML, W3C, http://www.w3.org/TR/REC-xml/#sec-pi, retrieved 8 March 2010 
  4. ^ Bray, T.; Paoli, J.; Sperberg-McQueen, C.; Maler, E.; Yergeau, F. (26 November 2008), "Character and Entity References", XML, W3C, http://www.w3.org/TR/REC-xml/#sec-references, retrieved 8 March 2010 

[edit] External links



All text is available under the terms of the GNU Free Documentation License. (See Copyrights for details.)
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.
Privacy policy - About Wikipedia - Disclaimers - Fundraising
 
 Images Opens New Window.
File Size: 23.8994140625k
Dimensions: 320 x 500 pixels
File Format: gif
File Size: 44k
Dimensions: 480 x 640 pixels
File Format: png
File Size: 17.2998046875k
Dimensions: 414 x 552 pixels
File Format: png
File Size: 18.7998046875k
Dimensions: 482 x 363 pixels
File Format: gif
File Size: 24.7998046875k
Dimensions: 447 x 381 pixels
File Format: gif
File Size: 19.5k
Dimensions: 482 x 363 pixels
File Format: gif
File Size: 35.599609375k
Dimensions: 600 x 800 pixels
File Format: gif
File Size: 24.3994140625k
Dimensions: 448 x 381 pixels
File Format: gif
File Size: 16.2998046875k
Dimensions: 411 x 551 pixels
File Format: png
File Size: 32.69921875k
Dimensions: 339 x 452 pixels
File Format: jpeg
File Size: 8.8994140625k
Dimensions: 316 x 460 pixels
File Format: png
File Size: 5.7998046875k
Dimensions: 230 x 345 pixels
File Format: png
 
 MORE IMAGES »  
Go to » Web - QA - Dictionary - Encyclopedia - Images