Unicode Text Fixer: Cleaning Up Broken or Mixed-Encoding Text
Sometimes text comes through with weird artifacts — '’' instead of an apostrophe, '?' boxes where Korean characters should be, mojibake that looks like random symbols. This tool tries to detect and repair common encoding mistakes.
Mojibake (文字化け) is what happens when text is decoded with the wrong character set. UTF-8 bytes interpreted as Windows-1252 (or vice versa) produce the famous '’' for apostrophes, '—' for em-dashes, and '…' for ellipsis. The tool reverses these common mistakes.
Other common cleanup: stripping invisible Unicode characters (zero-width spaces sometimes inserted by phishing or templating systems), normalizing accented characters, removing private-use-area glyphs.
Common fixes
- •’ → ' (UTF-8 misread as Windows-1252)
- •Â → (extra non-breaking space)
- •Zero-width space (U+200B) and other invisible characters removed
- •Smart quotes back to straight quotes (or vice versa)
- •Composed vs decomposed accents (NFC vs NFD normalization)
Extended FAQ
Can it fix all mojibake?
Common patterns yes — UTF-8 ↔ Windows-1252 mistakes are widely reversible. Some pathological cases (text decoded and re-encoded multiple times) lose information unrecoverably.
Are my pasted strings stored?
No — runs entirely in your browser.
