LCDproc development and user support list

Text archives Help


[Lcdproc] CwLnx display corruption - possible cause and fix


Chronological Thread 
  • From: dplatt AT radagast.org (Dave Platt)
  • Subject: [Lcdproc] CwLnx display corruption - possible cause and fix
  • Date: Fri Apr 27 23:08:01 2007

Hi, all. I've been using LCDProc for several years, first
on a home-kluged parallel-port HD44780, and more recently
on a CwLinux 12232 hooked to /dev/ttyS1.

I recently upgraded from the older release I'd been using
(0.4.something) to 0.5.1, as I wanted to start using the
more flexible button support for the CwLinux display. I
immediately began observing severe corruption of the
display, both when it switched screens/client, and during
certain operations such as the drawing of horizontal
bargraphs. At one point, the 12232 even switched into a
strange, mutant 2-line/4-line mode with the bottom two lines
semi-mirroring the first two, with a gray overlay... this
required a complete power-cycle to reset.

On looking at the CwLnx.c driver I noted that it had been
extensively modified to support the 1602 model, and that
there was some mention in the source code of the sort of display
corruption I was seeing.

I've dug through the code, and have found both what I believe to
be the cause of the display corruption, and a number of things
I don't understand. I have an effective fix (I believe) for
the display corruption, and I'm coming here to try to find
a fix for my lack of understanding :-)

I believe that the cause of the display corruption is due to
a coding/design problem which misunderstands how non-blocking
writes occur. In the Write_LCD routine, the code appears to
assume that a nonblocking write() call will either:

[1] write everything it was told to, and return a value
matching the byte count it was told to write, or
[2] return some other value, and not write anything at all.

This may be true for some device drivers, but it is not
true in general. If a device's output buffer is nearly but
not entirely full, the write() call may accept *some* but not
all of the data it was asked to write, and will return the number
of bytes written. If the buffer was entirely full, it'll return
-1 and set errno to EAGAIN. The Write_LCD() logic doesn't account
for the "partial write" case - it assumes that nothing was written,
sleeps for a while, and then retries the whole write.

As a result, if the CwLinux driver tries to write a large amount
of data (e.g. a bunch of custom-character patterns followed by
a bargraph) and the serial-driver buffer fills up, the Write_LCD()
routine may mistakenly write the first part of a buffer to the
device two or more times, sleeping for a short period between
writes. Since the DELAY sleep time is only 2 milliseconds (enough
time for only 2 characters at 9600 bits/second), it could end up
writing the first two or three characters in the buffer to the
device over and over and over again, until the 30-iteration
retry loop bails out. I believe that this is the fundamental
cause of the display corruption.

The fix for this is relatively simple... Write_LCD() just needs
to handle the "partial write" case, incrementing "c" and decrementing
"size", and looping through its write/test/sleep loop until "size"
becomes zero. I also felt it helpful to increase the sleep time to
20 milliseconds - there doesn't seem to be much sense in trying
to be extremely aggressive about pushing more data into the
buffer for such a slow device. I've tested this approach, and
the display corruption is entirely gone - the display is one again
as reliable as it was in 0.4.whatever. I haven't tested the change
on a 1602 display or with a USB/serial interface but I don't forsee
any problems in those environments.

Now, as to the things I don't understand: I don't understand what's
going on when the port is first opened and the device is first
accessed.

The code parses the speed value from the config file, and confirms
that it's either 19200 or 9600. It then opens the port (sending
a long-space BREAK while doing so... or at least trying to...
more on that in a moment), grabs the current serial-port settings
(but never doing anything with what it saved), throws away the
user-specified speed value (!), configures the port to talk at
19200 bits/second, sends a command to the display to tell it to
switch to 9600 bits/second, closes the port, re-opens the port
(sending another long-space BREAK, or at least trying to),
and configures the port for 9600 bit/second operation. The
user-specified baud rate, although validated, is never actually
used, and all communication to the display seems to take place
at 9600 bits/second.

To make matters even funkier... when the port is opened, there's
some logic in place to send a long-space BREAK (by setting the
baud rate to 0, sleeping, and then setting it back again).
However, the sleep-delay for the long-space was reduced in 0.5.x
from several milliseconds (which would actually be a legitimate
long-space at either 19200 or 9600), to only a microsecond or
so! If the kernel actually honors such a short sleep time,
this won't result in a legitimate long-space - it'll generate a
very short "runt pulse" on the serial line, which will look like
noise to the display and may very well cause it to miss the first
few legitimate characters sent to it.

What the hey!?!??

I can't find anything in the CwLinux data sheet which speaks to
what the display would do with a long-space/BREAK/framing-error
on the serial line. It might act as a soft-reset, or might be
ignored, or might be treated as noise and do something
indeterminate.

Can anybody clarify what's happening here, and why the CwLinux driver
doesn't seem to respect the user-specified baud rate?

I also noticed that the framebuffer implementation in the driver
(intended to optimize writes to the display) may actually be making
performance worse in some cases. It probably shouldn't attempt
to skip/move over short (one- to 4-character) sequences of unchanged
characters, as the "move insertion point" command takes 4 bytes
to send. I might rewrite it if I have the opportunity.

Can anybody clarify the above matters?

I'll look into getting ssh/developer access to the CVS archive so
that I can submit a patch to the Write_LCD() routine for people
to test.






Archive powered by MHonArc 2.6.18.

Top of page