Thursday, January 05, 2006

There Be Dragons

It's 4 AM, and the night is still and quiet. I'm sitting in front of my computer, lit by the pale light of the LCD screen, and I'm losing my mind.

My life for the past several days (excluding the New Year's) has been no different than tonight. Thinking, coding, debugging, debugging, debugging, ...

I'm working on an embedded environment project for my diploma thesis. I have infinite power over the computer, my software need not obey anyone's laws. Except my own, which I have to keep in my head at all times. Can't write them down, they're changing too swiftly. It's hard, and it's fun. The most challenging part is that I have to be aware of the entire picture, down to the every detail. From the overall design, down to the stack allocation and word width issues.

And it gets ugly. I'm using some 3rd party components (like libc), which aren't exactly bug-free. Just a few minutes ago, after a few hours of poking around in my code (debugger? what debugger? I'm lucky I have printf() :-) I've traced the error to strdup() and malloc() implementations. Great.. So, frustrated and tired, I just wiped out the entire malloc code and created my own, which leakes everything (no free()) but at least correctly produces usable blocks of memory. After I did this, I tried to remember what I was working on when the problem occurred...hm, who knows. I'm tired and I don't want to play any more...

It's hard, but it's fun. There's no hacking like kernel hacking.

3 Comments:

Blogger zvrba said...

All to often somebody claims to have found a "bug" in libc (which libc, BTW?) malloc(). What they forget to take into account is that malloc() is one of the most exercised parts of the library and if it were really bugged, many programs relying on libc would not work.

So, although it is not impossible that malloc() really is bugged, it is highly improbable. In all cases when someone had claimed to find a "bug" in malloc(), it turned out that their own code is at fault.

10:20 AM  
Blogger Senko said...

True, real bugs are rarely found in libc code for most of the users. That's why it took me so much to hunt it down, I expected I was doing the wrong thing.

The libc in question is part of the
Kenge library - I'm working on a L4-based system so it's a near perfect hit.

However, that means that this libc is not very widespread in use, and that not all the bugs in it have been fixed. I've already found and fixed a bug which caused i/o subsystem to try to (mutex) lock itself twice before doing any output.

Of course, many of these problems could be caused by the fact that I'm using the library for my own purposes (as I said, it's a near perfect hit - not a perfect hit). So, maybe it clashes with some of my code in some weird way...

That's the hardest thing - the errors can happen anywhere, there's no "firm ground" that you know works reliably. You have to (double) check everything.

3:27 PM  
Anonymous Anonymous said...

>There's no hacking like kernel hacking.

Heh. There is always GStreamer hacking... sigh :)

12:30 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home