Convert unicode string to string in java

Содержание

Convert unicode value to string in java
Convert unicode value to string in java
Converting unicode to string Java [duplicate]
Convert String to Unicode Byte array
Converting an unsigned char array to jstring
Java Internationalization: Converting to and from Unicode
UTF-8
Converting to and from Unicode UTF-8 Using the String Class
Converting to and from Unicode UTF-8 Using the Reader and Writer Classes

Convert unicode value to string in java

Solution 1: As mentioned in comments something like this should work: Solution 2: I think you mix up unicode escape sequences in java code with strings containing such escape sequences. assigns a string containing the single character £ . Assuming you have a mapping from the db-value to a object or the ISO currency code, you won’t need your first if statement, just make sure contains the correct string: (single pound character) (pound character java unicode escape string)

Convert unicode value to string in java

I am trying to extract currencies in my texts and I am getting currencies from db which contains special currency symbols as well. For example for the pound, I have unicode of pound «\u00A3» in the db along with other identifiers such as «gbp» as well.

I am trying to get the corresponding symbol from the unicode and compare with my text in a loop as suggested in here.

But when I evaluate my code, the result is like in the image here:

private Optional extractTokenWise(Iterable tokens) < try < for (String aToken : tokens) < for (String currency : currencies.keySet()) < for (String arep : currencies.get(currency)) < if(arep.startsWith("\\"))< //special character for currency written in unicode representation byte[] charset = arep.getBytes("UTF-8"); arep = new String(charset, "UTF-8"); >if (aToken.equals(arep)) < return Optional.of(Currency.findProperEnum(currency)); >> > > >catch (UnsupportedEncodingException e) < e.printStackTrace(); >return Optional.empty(); >

It is interesting that when arep is equal to «\u00A3» , it does not work but when I specifically give String value of «\u00A3» , It produces the result I want. What am I missing here?

As mentioned in comments something like this should work:

I think you mix up unicode escape sequences in java code with strings containing such escape sequences .

String poundSign = «\u00A3»; assigns poundSign a string containing the single character £ . This string has a length of 1 character. In memory and in the class file it will occupy 2 bytes.

It looks like arep contains the string \u00A3 as assigned by String unicodeEscapeForPoundSign = «\\u00A3»; — that’s what your first if statement tests for. It contains the unicode escape sequence as used in java code, but not the character this escape sequence represents . It contains the 6 characters ‘\’, ‘u’, ‘0’, ‘0’, ‘A’, and ‘3’ (as your IDE shows). arep.getBytes(«UTF-8»); returns an array of just these characters and new String(charset, «UTF-8»); converts the array back to the string \u00A3 and not the string £

The solution depends on what you get from your database . Assuming you have a mapping from the db-value to a Currency object or the ISO currency code, you won’t need your first if statement, just make sure arep contains the correct string:

String arep = «\u00A3» (single pound character)
String arep = «\\u00A3» (pound character java unicode escape string)

Convert unicode value to string in java, The solution depends on what you get from your database. Assuming you have a mapping from the db-value to a Currency object or the ISO currency code, you won’t need your first if statement, just make sure arep contains the correct string: String arep = «\u00A3» (single pound character) String arep = …

Converting unicode to string Java [duplicate]

I wrote an app that get a string from a server, this string is not in English so the server sends a string that represents the Unicode values of the characters.

Is there a method in some class that takes a string representing a Unicode value (string of the format «\uXXXX» ) and returns the Unicode character corresponding to this value?

I tried the answers for the other questions. When I tried it on a regular project it worked but on mt android app it causes the app to stop working.

If you know what Unicodes you want to eliminate, you should use replace(«\uXXXX»,»») method.

Java — Converting from unicode to a string?, This code will work in both cases, for codepoints from Unicode BMP and from Unicode supplemental panes which uses 4 bytes in UTF-8 to encode a …

Convert String to Unicode Byte array

I’m trying to get the Unicode values of a String into a Byte array.

I started to use the following code that specifying Ascii. This gave a list of numbers as you would expect.

byte[] bytes = null; try < bytes = listOfApps.getBytes("US-ASCII"); >catch(Exception e)<> Log.e(TAG, "bytes = " + Arrays.toString(bytes));

listOfAllApps = Gallery|Camera|Contacts|Phone|Email|Messages|Settings. bytes = [71, 97, 108, 108, 101, 114, 121, 124, 67, 97, 109, 101, 114, 97.

. So I changed my code to the following to specify unicode.

byte[] bytes = null; try < bytes = listOfApps.getBytes(Charset.forName("UTF-8")); >catch(Exception e)<> Log.e(TAG, "bytes = " + Arrays.toString(bytes));

I still get the same output. I thought i would get an array of values that start with \u. Does anybody know what i’m doing wrong?

 String aStr = "gallery|settings|À "; char[] charArray = aStr.toCharArray(); for(int i = 0; i < charArray.length; i++)< int x = Character.codePointAt(charArray, i); Log.e(TAG, "x = " + **** = 103 x = 97 x = 108 x = 108 x = 101 x = 114 x = 121 x = 124 x = 115 x = 101 x = 116 x = 116 x = 105 x = 110 x = 103 x = 115 x = 124 x = 192 //À x = 32

Converting Unicode to string and String to Unicode in Java, Strings have an encoding, which may be unicode or something else. "Unicode to String" implies that either unicode is a data type like string is, or that …

Converting an unsigned char array to jstring

I'm having issues trying to convert an unsigned char array to jstring .

The context is I'm using a shared c library from Java. So I'm implementing a JNI c++ file. The function I use from the library returs a unsigned char* num_carte_transcode

And the function I implemented in the JNI C++ file returns a jstring.

So I need to convert the unsigned char array to jstring.

I tried this simple cast return (env)->NewStringUTF((char*) unsigned_char_array);

But though the array should only contain 20 bytes, I get randomly 22 or 23 bytes in Java. (Though the 20 bytes are correct)

EDIT1: Here some more details with example

JNIEXPORT jstring JNICALL Java_com_example_demo_DemoClass_functionToCall (JNIEnv *env, jobject, ) < // Here I define an unsigned char array as requested by the library function unsigned char unsigned_char_array[20]; // The function feeds the array at execution function_to_call(unsigned_char_array); // I print out the result for debug purpose printf("result : %.*s (%ld chars)\n", (int) sizeof unsigned_char_array, unsigned_char_array, (int) sizeof unsigned_char_array); // I get the result I want, which is like: 92311221679609987114 (20 numbers) // Now, I need to convert the unsigned char array to jstring to return to Java return (env)->NewStringUTF((char*) unsigned_char_array); // Though. On debugging on Java, I get 21 bytes 92311221679609987114 (check the image below) >

And sometimes I get 22 bytes, sometimes 23 . though the expected result is always 20 bytes.

The string passed to NewStringUTF must be null-terminated. The string you're passing, however, is not null-terminated, and it looks like there's some garbage at the end of the string before the first null.

I suggest creating a larger array, and adding a null terminator at the end:

JNIEXPORT jstring JNICALL Java_com_example_demo_DemoClass_functionToCall (JNIEnv *env, jobject recv) < unsigned char unsigned_char_array[20 + 1]; // + 1 for null terminator function_to_call(unsigned_char_array); unsigned_char_array[20] = '\0'; // add null terminator printf("result : %s (%ld chars)\n", unsigned_char_array, (int) sizeof unsigned_char_array); return (env)->NewStringUTF((char*) unsigned_char_array); // should work now >

Java convert unicode code point to string, You should focus on parsing your string to a byte array, and then let the built-in libraries convert the UTF-8 bytes to a Java string for you. From a Java …

Источник

Java Internationalization: Converting to and from Unicode

Internally in Java all strings are kept in Unicode. Since not all text received from users or the outside world is in unicode, your application may have to convert from non-unicode to unicode. Additionally, when the application outputs text it may have to convert the internal unicode format to whatever format the outside world needs.

Java has a few different methods you can use to convert text to and from unicode. These methods are:

I will explain both methods in the sections below.

UTF-8

First of all I would like to clarify that Unicode consist of a set of "code points" which are basically a numerical value that corresponds to a given character. There are several ways to "encode" these code points (numerical values) into bytes. The two most common ones are UTF-8 and UTF-16. In this tutorial I will only show examples of converting to UTF-8 - since this seems to be the most commonly used Unicode encoding.

Converting to and from Unicode UTF-8 Using the String Class

You can use the String class to convert a byte array to a String instance. You do so using the constructor of the String class. Here is an example:

byte[] bytes = new byte[10]; String str = new String(bytes, Charset.forName("UTF-8")); System.out.println(str);

This example first creates a byte array. The byte array does not actually contain any sensible data, but for the sake of the example, that does not matter. The example then creates a new String , passing the byte array and the character set of the characters in the byte array as parameters to the constructor. The String constructor will then convert the bytes from the character set of the byte array to unicode.

You can convert the text of a String to another format using the getBytes() method. Here is an example:

bytes[] bytes = str.getBytes(Charset.forName("UTF-8"));

You can also write unicode characters directly in strings in the code, by escaping the with \u . Here is an example:

// The danish letters Æ Ø Å String myString = "\u00C6\u00D8\u00C5" ;

Converting to and from Unicode UTF-8 Using the Reader and Writer Classes

The Reader and Writer classes are stream oriented classes that enable a Java application to read and write streams of characters. Both classes are explained in my Java IO tutorial. Go to Reader or Writer to read more.

Here is an example that uses an InputStreamReader to convert from a certain character set (UTF-8) to unicode:

InputStream inputStream = new FileInputStream("c:\\data\\utf-8-text.txt"); Reader reader = new InputStreamReader(inputStream, Charset.forName("UTF-8")); int data = reader.read(); while(data != -1) < char theChar = (char) data; data = reader.read(); >reader.close();

This example creates a FileInputStream and wraps it in a InputStreamReader . The InputStreamReader is told to interprete the characters in the file as UTF-8 characters. This is done using the second constructor paramter in the InputStreamReader class.

Here is an example writing a stream of characters back out to UTF-8:

OutputStream outputStream = new FileOutputStream("c:\\data\\output.txt"); Writer writer = new OutputStreamWriter(outputStream, Charset.forName("UTF-8")); writer.write("Hello World"); writer.close();

This example creates an OutputStreamWriter which converts the string written through it to the UTF-8 character set.

Источник