Xlib tutorial part 8 -- a different way to reach wide characters

by Alan at Mon 2nd Mar 2009 1:00AM EST

Hello,

It's unfortunate, but even today, almost twenty years after XFontSet was created, sometimes it doesn't work properly. You might not have the proper fonts installed, the locale isn't installed properly, whatever. So we're going to try another tack, going back to XFontStruct and friends, but using XChar2b to reach beyond the ASCII characters.

First we'll need a way of translating UTF8 to XChar2b. This works with the knowledge of how UTF8 is structured. First, ASCII characters map to themselvs. For everything else up to 21 bits, the first byte has the high x bits all set to 1 with a 0 in the next highest bit where x is the number of bytes to display the item using UTF8. 8-11 bits requires two bytes, 12-16 bits requires 3 bytes and 17-21 bits requires 4 bytes. Unfortunately, XChar2b only supports up to 16 bits, so 17-21 will be ignored (which means all Chinese, Japanese and Korean characters). The remaining bits in the first byte are the high bits of the character code. Subsequent bytes have 1 in the high bit and 0 in the next highest bit, followed by 6 bits.


	...
int utf8toXChar2b(XChar2b *output_r, int outsize, const char *input, int inlen){
	int j, k;
	for(j =0, k=0; j < inlen && k < outsize; j ++){
		unsigned char c = input[j];
		if (c < 128)  {
			output_r[k].byte1 = 0;
			output_r[k].byte2 = c; 
			k++;
		} else if (c < 0xC0) {
			/* we're inside a character we don't know  */
			continue;
		} else switch(c&0xF0){
		case 0xC0: case 0xD0: /* two bytes 5+6 = 11 bits */
			if (inlen < j+1){ return k; }
			output_r[k].byte1 = (c&0x1C) >> 2;
			j++;
			output_r[k].byte2 = ((c&0x3) << 6) + (input[j]&0x3F);
			k++;
			break;
		case 0xE0: /* three bytes 4+6+6 = 16 bits */ 
			if (inlen < j+2){ return k; }
			j++;
			output_r[k].byte1 = ((c&0xF) << 4) + ((input[j]&0x3C) >> 2);
			c = input[j];
			j++;
			output_r[k].byte2 = ((c&0x3) << 6) + (input[j]&0x3F);
			k++;
			break;
		case 0xFF:
			/* the character uses more than 16 bits */
			continue;
		}
	}
	return k;
}
	...

Most of the rest of the differences since the last section are to move back from not using XFontSet to XFontStruct *. What is different is in the main_loop() function.


	...
	int strlength = strlen(text);

	/* may be too big, but definitely big enough */
	string = malloc(sizeof(*string) * strlen(text));
	strlength = utf8toXChar2b(string, strlength, text, strlength);
	printf("%d
", strlength);

	text_width = XTextWidth16(font, string, strlength);
	printf("%d
", text_width);
	font_ascent = font->ascent;

	/* as each event that we asked about occurs, we respond. */
	...

What we're doing here is converting our string to an array of XChar2b, calculating its width and storing the font ascent. Notice the use of XTextWidth16() instead of XTextWidth() or Xutf8TextEscapement()

Then in the response to an Expose event,


		...
		case Expose:
			if (ev.xexpose.count > 0) break;
			XDrawLine(dpy, ev.xany.window, pen, 0, 0, width/2-text_width/2, height/2);
			XDrawLine(dpy, ev.xany.window, pen, width, 0, width/2+text_width/2, height/2);
			XDrawLine(dpy, ev.xany.window, pen, 0, height, width/2-text_width/2, height/2);
			XDrawLine(dpy, ev.xany.window, pen, width, height, width/2+text_width/2, height/2);
   			textx = (width - text_width)/2;
   			texty = (height + font_ascent)/2;
   			XDrawString16(dpy, ev.xany.window, pen, textx, texty, string, strlength);
			break;
		...

We are again drawing lines as we did before, and use XDrawString16() instead of Xutf8DrawString() or XDrawString(). Download the full code.

And that's it. Next lesson will be a bit more interesting since we're going to start interacting with the user.

Things to try:

Consider ways, even from here, to be able to display text that does not fit within the first 65535 character points of Unicode.

Comments on Xlib tutorial part 8 -- a different way to reach wide characters
by rui at Wed 24th Jun 2009 3:48PM

Hi, Is it right to say that utf8toXChar2b() method converts from UTF-8 to ucs2 encoding. I have truetype unicode fonts in ucs2 format on aix and solaris, but the methods don't work -- however if i get the iso10646 fonts then it works? How can i convert from utf-8 to ucs2 then to xchar2b OR any encoding to ucs2 and then xchar2b? I am also using iconv at the moment, but is it safe to take the ucs2 encoding as char* Regards, rui
by Alan at Wed 24th Jun 2009 4:30PM

Hi Rui, utf8toXChar2b is to convert a utf8 encoded string directly into the XChar2b encoding that the X consortium created (and which vaguely looks like UTF16). Don't try using ucs2 as an intermediate step. It won't work. Using char * to point to a ucs2 (aka UTF16) encoded string will almost never do what you expect. UTF16 stores each character in two bytes. All the standard character routines (such as strlen, strcpy etc) expect one byte per character (though multiple bytes per character (as in utf8) may work for some). You could use wchar_t * to point to a UTF16 enocded string, but that might not get what you want either and then you have to find all the routines that work with it. If you have iso10646 fonts, you probably want to use Xutf8DrawString() on your original utf8 text (as per the previous post). I hope that helps. Let me know if you're still confused and I'll see if I can explain a bit better.
by rui at Wed 24th Jun 2009 4:57PM

Thanks alan. I hope you can help me with a problem i am having at the moment with unicode fonts at the moment https://www.ibm.com/developerworks/forums/thread.jspa?threadID=267456&tstart=0 I am having same problem for sun as well but for the other systems no problem I have used your methods after converting from windows CP 125? encoding to utf-8 and they work super great but i am stuck with sun solaris and aix. I think when you convert utf8 to xchar2b -- it is a sort of direct conversion to ucs-2 format but storing it in two bytes instead of short Can you send me your email address to my email, so that i can discuss my problem -- if thats ok with you? Regards, Raja
by rui at Wed 24th Jun 2009 5:45PM

Thanks alan. I hope you can help me with a problem i am having at the moment with unicode fonts at the moment https://www.ibm.com/developerworks/forums/thread.jspa?threadID=267456&tstart=0 I am having same problem for sun as well but for the other systems no problem I have used your methods after converting from windows CP 125? encoding to utf-8 and they work super great but i am stuck with sun solaris and aix. I think when you convert utf8 to xchar2b -- it is a sort of direct conversion to ucs-2 format but storing it in two bytes instead of short Can you send me your email address to my email, so that i can discuss my problem -- if thats ok with you? Regards, Raja

Comments are closed.