Print LIST of unicode chars without escape characters
You still get escaped character sequences. How do you go about to get the content of the list unescaped, is it possible? Like this:
>>> s = u'åäö' >>> s u'\xe5\xe4\xf6' >>> print s åäö >>> s = [u'åäö'] >>> s [u'\xe5\xe4\xf6'] >>> print s [u'\xe5\xe4\xf6']
5 Answers 5
When you print a string, you get the output of the __str__ method of the object — in this case the string without quotes. The __str__ method of a list is different, it creates a string containing the opening and closing [] and the string produced by the __repr__ method of each object contained within. What you’re seeing is the difference between __str__ and __repr__ .
You can build your own string instead:
print '[' + ','.join("'" + str(x) + "'" for x in s) + ']'
This version should work on both Unicode and byte strings in Python 2:
print u'[' + u','.join(u"'" + unicode(x) + u"'" for x in s) + u']'
>>> s = ['äåö', 'äå'] >>> print "\n".join(s) äåö äå >>> print ", ".join(s) äåö, äå >>> s = [u'åäö'] >>> print ",".join(s) åäö
Is there such a workaround for unicode strings? I have updated my question for that case of unicode strings.
In Python 2.x the default is what you’re experiencing:
>>> s = ['äåö'] >>> s ['\xc3\xa4\xc3\xa5\xc3\xb6']
In Python 3, however, it displays properly:
s = ['äåö', 'äå'] encodedlist=', '.join(map(unicode, s)) print(u'[<>]'.format(encodedlist).encode('UTF-8'))
One can use this wrapper class:
#!/usr/bin/python # -*- coding: utf-8 -*- class ReprToStrString(str): def __repr__(self): return "'" + self.__str__() + "'" class ReprToStr(object): def __init__(self, printable): if isinstance(printable, str): self._printable = ReprToStrString(printable) elif isinstance(printable, list): self._printable = list([ReprToStr(item) for item in printable]) elif isinstance(printable, dict): self._printable = dict( [(ReprToStr(key), ReprToStr(value)) for (key, value) in printable.items()]) else: self._printable = printable def __repr__(self): return self._printable.__repr__() russian1 = ['Валенки', 'Матрёшка'] print russian1 # Output: # ['\xd0\x92\xd0\xb0\xd0\xbb\xd0\xb5\xd0\xbd\xd0\xba\xd0\xb8', '\xd0\x9c\xd0\xb0\xd1\x82\xd1\x80\xd1\x91\xd1\x88\xd0\xba\xd0\xb0'] print ReprToStr(russian1) # Output: # ['Валенки', 'Матрёшка'] russian2 = print russian2 # Output: # print ReprToStr(russian2) # Output: #
Print unicode string in python regardless of environment
I’m trying to find a generic solution to print unicode strings from a python script. The requirements are that it must run in both python 2.7 and 3.x, on any platform, and with any terminal settings and environment variables (e.g. LANG=C or LANG=en_US.UTF-8). The python print function automatically tries to encode to the terminal encoding when printing, but if the terminal encoding is ascii it fails. For example, the following works when the environment «LANG=enUS.UTF-8»:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 0: ordinal not in range(128)
The following works regardless of the LANG setting, but would not properly show unicode characters if the terminal was using a different unicode encoding:
The desired behavior would be to always show unicode in the terminal if it is possible and show some encoding if the terminal does not support unicode. For example, the output would be UTF-8 encoded if the terminal only supported ascii. Basically, the goal is to do the same thing as the python print function when it works, but in the cases where the print function fails, use some default encoding.
• Python 3.0 provides an alternative string type for binary data and supports Unicode text in its normal string type (ASCII is treated as a simple type of Unicode). • Python 2.6 provides an alternative string type for non-ASCII Unicode text and supports both simple text and binary data in its normal string type. so now whats your question ?
Yes, because I get bug reports even when the user’s environment is the main problem, so I’d like to try to make the code as robust as possible.
I believe the OP’s requirements are that incorrect terminal settings (such as the C locale) should present the user with a reasonable default, such as UTF-8, not with a UnicodeEncodeError and a traceback. The former can produce garbage at worst (but will do what the user wants on all modern systems), whereas the latter is bound to frustrate the user.
4 Answers 4
You can handle the LANG=C case by telling sys.stdout to default to UTF-8 in cases when it would otherwise default to ASCII.
import sys, codecs if sys.stdout.encoding is None or sys.stdout.encoding == 'ANSI_X3.4-1968': utf8_writer = codecs.getwriter('UTF-8') if sys.version_info.major < 3: sys.stdout = utf8_writer(sys.stdout, errors='replace') else: sys.stdout = utf8_writer(sys.stdout.buffer, errors='replace') print(u'\N')
The above snippet fulfills your requirements: it works in Python 2.7 and 3.4, and it doesn't break when LANG is in a non-UTF-8 setting such as C .
It is not a new technique, but it's surprisingly hard to find in the documentation. As presented above, it actually respects non-UTF-8 settings such as ISO 8859-* . It only defaults to UTF-8 if Python would have bogusly defaulted to ASCII, breaking the application.
I don't think you should try and solve this at the Python level. Document your application requirements, log the locale of systems you run on so it can be included in bug reports and leave it at that.
If you do want to go this route, at least distinguish between terminals and pipes; you should never output data to a terminal that the terminal cannot explicitly handle; don't output UTF-8 for example, as the non-printable codepoints > U+007F could end up being interpreted as control codes when encoded.
For a pipe, output UTF-8 by default and make it configurable.
So you'd detect if a TTY is being used, then handle encoding based on that; for a terminal, set an error handler (pick one of replace or backslashreplace to provide replacement characters or escape sequences for whatever characters cannot be handled). For a pipe, use a configurable codec.
import codecs import os import sys if os.isatty(sys.stdout.fileno()): output_encoding = sys.stdout.encoding errors = 'replace' else: output_encoding = 'utf-8' # allow override from settings errors = None # perhaps parse from settings, not needed for UTF8 sys.stdout = codecs.getwriter(output_encoding)(sys.stdout, errors=errors)
How to Print character using its unicode value in Python?
I have a list which contains english alphabets, Hindi alphabets, Greek Symbols and digits as well. I want to remove all alphabets except that of Hindi. Hindi alphabets range in unicode is u'0900'-u'097F'. For details about Hindi alphabets visit http://jrgraphix.net/r/Unicode/0900-097F. Input:
l=['ग','1ए','==क','@','ऊं','abc123','η','θ','abcशि'] for i in l: print i
1 Answer 1
To get a character value you can use the ord(char) buildin function.
In your case, something like this should works:
strings = [u'ग',u'1ए',u'==क',u'@',u'ऊं',u'abc123',u'η',u'θ',u'abcशि'] for string in strings: for char in string: if ord(u'\u0900')
The ord(char) function is available for both Python 2 and Python 3
@ishpreet I've tested with python 2.7, what do you mean with 'It's not working', the output is wrong? did you get some error?
According to 'Google' odr()->Return a string of one character whose ASCII code is the integer. While i want unicode character given that i have a unicode value. And also Your code is not printing the unicode characters like 'ग' even though it is present in the unicode range
My bad, I've forgot to update the answer, you must pass explicit unicode string, like u'ग' , or it doesn't works
Not all of them, but that's because of some encoding issues, even my editor encode some of them badly, (e.g. the last one)
Python print Unicode character
I'm making a card game, but I've run into what seems to be an encoding issue. I'm trying to print a card like this:
def print(self): print("|-------|") print("| %s |" % self.value) print("| |") print("| %s |" % self.suit.encode("utf-8")) print("| |") print("| %s |" % self.value) print("|-------|")
spade = "♠" heart = "♥" diamond = "♦" club = "♣"
File "main.py", line 79, in start() File "main.py", line 52, in start play() File "main.py", line 64, in play card.print() File "main.py", line 36, in print print("| \u2660 |") File "C:\Python34\lib\encodings\cp850.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 4: character maps to
With # -*- coding: utf8 -*- as the first line of my file, and with both self.value changed to "9" , and self.suit.encode("utf-8") changed to "♦" , this is working fine for me building it in Sublime Text 3. what happens if you replace the self.suit.encode line with just the character you want?
It is a cmd issue, yes. Your terminal is in code page 850, so you can't print ♦ which doesn't exist in 850. Code page 65001 would work except for it suffers from implementation bugs in Windows. In general the Windows command line is a dead loss for Unicode.
1 Answer 1
This takes advantage of the fact that the OEM code pages in the Windows console print some visible characters for control characters. The card suits for cp437 and cp850 are chr(3)-chr(6) . Python 3 (prior to 3.6) won't print the Unicode character for a black diamond, but it's what you get for U+0004:
>>> print('\N') Traceback (most recent call last): File "", line 1, in File "C:\Python33\lib\encodings\cp437.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2666' in position 0: character maps to >>> print(chr(4)) ♦
#!python3 #coding: utf8 class Card: def __init__(self,value,suit): self.value = value self.suit = suit # 1,2,3,4 = ♥♦♣♠ def print(self): print("┌───────┐") print("| |".format(self.value)) print("| |") print("| <> |".format(chr(self.suit+2))) print("| |") print("| 2> |".format(self.value)) print("└───────┘")
>>> x=Card('K',4) >>> x.print() ┌───────┐ | K | | | | ♠ | | | | K | └───────┘ >>> x=Card(10,3) >>> x.print() ┌───────┐ | 10 | | | | ♣ | | | | 10 | └───────┘
Python 3.6 Update
Python 3.6 uses Windows Unicode APIs to print so no need for the control character trick now, and the new format strings can be used:
#!python3.6 #coding: utf8 class Card: def __init__(self,value,suit): self.value = value self.suit = '♥♦♣♠'[suit-1] # 1,2,3,4 = ♥♦♣♠ def print(self): print('┌───────┐') print(f'| |') print('| |') print(f'| |') print('| |') print(f'| 2> |') print('└───────┘')
>>> x=Card('10',3) >>> x.print() ┌───────┐ | 10 | | | | ♣ | | | | 10 | └───────┘