Character encoding in HTML (Charset)

In order to display a proper HTML page, the web browser needs to know which character set to use (character encoding).

In order to display a proper HTML page, the web browser needs to know which character set to use (character encoding).

What is character encoding?

ASCII is the first character encoding (character encoding) (also known as character set). ASCII defines 128 alphanumeric characters: numbers from 0 to 9, English letters from A to Z and some special characters like! $ + - () @ <>.

ANSI (Windows-1252) is the first Windows character set that supports 256 different characters. ISO-8859-1 is the default character set for HTML 4, which also supports 256 characters.

Because ANSI and ISO-8859-1 are very limited, HTML 4 also supports UTF-8. UTF-8 (Unicode) includes almost all the characters and symbols in the world. HTML5 uses the UTF-8 character set by default.

Charset properties in HTML

The browser displays the HTML page with a set of characters specified in the tag .

With HTML 4

 

With HTML 5

 

If the browser detects ISO-8859-1 on the web, it will default to ANSI because ANSI is the same as ISO-8859-1, except that there are 32 other characters.

Differences between character sets

The table below shows the difference between the above character sets.

No.ASCIIANSI8859UTF-8Description32 white space 33! ! ! ! exclamation point 34 "" "" quotation mark 35 # # # # sign up $ 36 $ dollar sign 37%%%% 38 percent sign & & & & sign and 39 '' '' single quotes 40 ((((opening quotes 41)))) closing quotes 42 * * * * asterisk 43 + + + + plus signs 44,,,, comma 45 - - - - dash 46. . . . dot 47 / / / / slash mark 48 0 0 0 0 number 0 49 1 1 1 1 number 1 50 2 2 2 2 number 2 51 3 3 3 3 3 3 4 4 4 4 4 4 53 5 5 5 5 number 5 54 6 6 6 6 6 6 55 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9 9 9 9 9 58:::: comma 59; ; ; ; semicolon 60 <<<>>> mark greater than 63? ? ? ? question mark 64 @ @ @ @ a sign of 65 AAAA letter A capital letter 66 BBBB uppercase 67 CC uppercase 69 EEEE uppercase 70 FFFF uppercase F 71 GG G uppercase 72 uppercase H uppercase 73 IIII uppercase I 74 74 JJJJ uppercase J 75KKK K uppercase 76 LLLL uppercase 77 MMMM uppercase M 78 NNNN Uppercase 79 OOOO uppercase letter O 80 PPPP Uppercase letter 81 QQQQ Uppercase 82 RRRR Uppercase 83 SSSS Letter S capitalized 84 TTTT Uppercase 85 UUUU Uppercase U 86 VVVV Uppercase 87 WWWW Uppercase 88 XXXX Uppercase X YYYY Y YYYY Uppercase 90 ZZZZ 91 uppercase letters [[[[square brackets 92 reverse crossover 93]]]] square brackets 94 ^ ^ ^ caps 95 _ _ _ _ underscore 96 `` `` apostrophe 97 aaaa Lowercase a letter 98 bbbb Lowercase letter 99 cccc Letter c hr 100 dddd lowercase d 101 eeee lowercase e 102 ffff lowercase f often 103 gggg lowercase 104 hhhh lowercase 105 iiii lowercase letter 106 jjjj lowercase letter 107 kkkk lowercase letter 108 llll Lowercase letter 109 mmmm Lowercase letter 110 nnnn Lowercase letter 111 oooo Lowercase letter 112 pppp Lowercase p 113 qqqq Lowercase letter 114 rrrr Lowercase letter 115 ssss Lowercase letter 116 tttt Text t lowercase 117 uuuu Lowercase u 118 etc v Normal lowercase 119 wwww lowercase letter 120 xxxx Lowercase letter 121 yyyy Lowercase letter 122 zzzz The lowercase z 123 123 {{{open quotation 124 | | | | | | | dấu gạch thẳng 125 } } } } đóng ngoặc kép 126 ~ ~ ~ ~ dấu ngã 127 DEL 128 € đồng euro 129 Không sử dụng 130 ‚ single low-9 quotation mark 131 ƒ Chữ f thường với dấu móc 132 „ double low-9 quotation mark 133 … horizontal ellipsis 134 † dagger 135 ‡ double dagger 136 ˆ modifier letter circumflex accent 137 ‰ per mille sign 138 Š Latin capital letter S with caron 139 ‹ single left-pointing angle quotation mark 140 Œ Latin capital ligature OE 141 Không sử dụng 142 Ž Latin capital letter Z with caron 143 Không sử dụng 144 Không sử dụng 145 ' left single quotation mark 146 ' right single quotation mark 147 ' left double quotation mark 148 ' right double quotation mark 149 • bullet 150 – en dash 151 — em dash 152 ˜ small tilde 153 ™ trade mark sign 154 š Latin small letter s with caron 155 › single right-pointing angle quotation mark 1 straight line 125}}}} double quotes 126 ~ ~ ~ ~ tilde 127 DEL 128 € euro 129 Don't use 130 ‚single low-9 quotation mark 131 ƒ Regular f with a hook 132„ double low-9 quotation mark 133 . horizontal ellipsis 134 † dagger 135 ‡ double dagger 136 circumflex accent letter modifier 137 ‰ per mille sign 138 Š Latin capital letter S with caron 139 single single left-pointing angle quotation 140 Œ Latin capital ligature OE 141 Œ use 142 Ž Latin capital letter Z with caron 143 Not using 144 Not using 145 'left single quotation mark 146' right single quotation mark 147 'left double quotation mark 148' right double quotation mark 149 • bullet 150 - en dash 151 - em dash 152 ˜ small tilde 153 ™ trade mark sign 154 š Latin small letter s with caron 155 ›single right-pointing angle quotation mark 1 56 œ Latin small ligature oe 157 does not use 158 Latin Latin small letter z with caron 159 Ÿ Latin capital letter Y with diaeresis 160 no-break space 161 ¡¡¡inverted exclamation mark 162 ¢ ¢ ¢ cent sign 163 £ £ pounds pound sign 164 ¤ ¤ ¤ sign currency sign 165 ¥ ¥ yen sign 166 ¦ ¦ ¦ broken bar 167 § § § sign sign 168 ¨ ¨ diaeresis 169 © © copyright sign 170 ª ª ª feminine ordinal indicator 171 «« left-pointing double angle quotation mark 172 ¬ ¬ ¬ not sign 173 software hyphen 174 ® ® registered sign 175 ¯ ¯ ¯ macron 176 signify 177 ± ± plus-minus sign 178 ² ² superscript two 179 ³ ³ ³ superscript three 180 ´ ´ ´ acc accent 181 µ µ micro sign 182 ¶ ¶ 181 pilcrow sign 183 · · · middle dot 184 ¸ ¸ ¸ cedilla 185 ¹ ¹ ¹ superscript one 186 º mas masculine ordinal indicator 187 »» right-pointing double angle quotation mark 188 ¼ ¼ ¼ vulgar fraction one quarter 189 ½ ½ ½ vulgar fraction one half 190 ¾ ¾ ¾ vulgar fraction three quarters 191 ¿¿¿inverted question mark 192 Ah Latin capital letter A with grave 193 Asia Asia Asia Latin capital letter A with acute 194    Latin capital letter A with circumflex 195 capital Latin capital letter A with tilde 196 Ä Ä Å Latin capital letter A with ring above 198 Æ Æ Æ Latin capital letter AE 199 Ç Ç Ç Latin capital letter C with cedilla 200 È È È Latin capital letter E with grave É É É Latin capital letter E with acute 202 E Latin capital letter E with circumflex 203 Ë Ë Ë Latin capital letter E with diaeresis 204 Ì Ì Ì Latin capital letter I with grave 205 LESSON Latin capital letter I with acute 206 Î Î Î Latin capital letter I with circumflex 207 Latin capital letter I with diaeresis 208 ÐD capital capital letter Eth 209 Ñ Ñ Ñ Latin capital letter N with tilde 210 Latin capital letter 211 With Latin capital letter O with acute 212 Cells Latin capital letter O with circumflex 213 Õ Õ capital Latin capital letter O with tilde 214 Ö Ö Latin capital letter O with diaeresis 215 × × × multiplication sign 216 Ø Ø Ø Latin capital letter O with stroke 217 Buzzing capital Capital letter letter U with acute 219 Û Û Û Latin capital letter U with circumflex 220 Ü Ü Ü Latin capital letter U with diaeresis 221 Italian Italian Italian Latin capital letter Y with acute 222 Þ Þ Þ Latin capital letter Thorn 223 ß ß ß Latin small letter sharp s 224 ah Latin small letter a with grave 225 asia Latin small letter a with acute 226 â â Latin small letter a with circumflex 227 ã ã Latin small letter a with tilde 228 Latin Latin small letter a with ringer 230 æ æ æ Latin small letter ae 231 ç ç ç small Latin small letter c with cedilla 232 è Latin small letter e with grave 233 é é Latin small letter e with acute 234 ê Latin small letter e with circumflex 235 ë small Latin small letter e with diaeresis 236 ì Latin Latin small letter i with grave 237 small Latin small letter i with acute 238 î î î Latin small letter i with circumflex 239 ï Latin small Latin small letter i with diaeresis 240 E-text Latin small letter eth 241 to set Latin small letter n with tilde 242 ò Latin small letter o with acute 244 cells Latin small letter o with circumflex 245 Latin Latin small letter o with tilde 246 ö ö ö small small Latin letter o with diaeresis 247 ÷ ÷ ÷ division sign 248 ø Latin Latin small letter o with stroke 249 ù Latin small letter u with grave 250 ú Latin small letter u with acute 251 û Latin small letter with circumflex 252 ü ü ü Lati n small letter u with italian 253er Italy Latin small letter y with acute 254 þ þ Latin small letter thorn 255 ÿ ÿ Small Latin letter y with diaeresis

ASCII character set

ASCII uses values ​​from 0 to 31 (and 127) to control characters, from 32 to 126 for letters, numbers and symbols. ASCII does not use values ​​and 128 to 255.

ANSI character set (Windows-1252)

ANSI is like ASCII with values ​​from 0 to 127, with value characters from 128 to 159, similar to UTF-8 in values ​​from 160 to 255.

ISO-8859-1 character set

8859-1 is similar to ASCII in values ​​from 0 to 127, not using values ​​from 128 to 159, like UTF-8 in values ​​from 160 to 255.

UTF-8 character set

UTF-8 is ASCII-like in values ​​from 0 to 127, not using values ​​between 128 and 159, both ANSI and 8859-1 are between 160 and 255, followed by values ​​from 256 to more than 10,000 other characters.

@Charset rules in CSS

You can use the @charset rule in CSS to determine the character encoding used in style sheet files. The following example selects Unicode UTF-8 for the page.

 @charset "UTF-8"; 

Previous post: Icon in HTML

The following article: Uniform Resource Locators in HTML

Update 25 May 2019
Category

System

Mac OS X

Hardware

Game

Tech info

Technology

Science

Life

Application

Electric

Program

Mobile