LCDproc development and user support list

Text archives Help


[Lcdproc] CwLnx display corruption - possible cause and fix


Chronological Thread 
  • From: peter AT adpm.de (Peter Marschall)
  • Subject: [Lcdproc] CwLnx display corruption - possible cause and fix
  • Date: Sat Apr 28 14:01:03 2007

Hi Dave,

On Saturday, 28. April 2007 01:07, Dave Platt wrote:
> Hi, all. I've been using LCDProc for several years, first
> on a home-kluged parallel-port HD44780, and more recently
> on a CwLinux 12232 hooked to /dev/ttyS1.
>
> I recently upgraded from the older release I'd been using
> (0.4.something) to 0.5.1, as I wanted to start using the
> more flexible button support for the CwLinux display. I
> immediately began observing severe corruption of the
> display, both when it switched screens/client, and during
> certain operations such as the drawing of horizontal
> bargraphs. At one point, the 12232 even switched into a
> strange, mutant 2-line/4-line mode with the bottom two lines
> semi-mirroring the first two, with a gray overlay... this
> required a complete power-cycle to reset.
>
> On looking at the CwLnx.c driver I noted that it had been
> extensively modified to support the 1602 model, and that
> there was some mention in the source code of the sort of display
> corruption I was seeing.

This is very likely to be caused by some "optimizations" I made
after the inclusion of the 1602 support to clean up the driver
a bit.
IIRC Write_LCD() previously was only called with one character at
a time. I changed the calls to use strings, but did not have a too
close look at the implementation in Write_LCD(), which I considered
(at my short glances) apt for the use with multiple chars.

Obviously it isn't ;-(
My bad.

> I've dug through the code, and have found both what I believe to
> be the cause of the display corruption, and a number of things
> I don't understand. I have an effective fix (I believe) for
> the display corruption, and I'm coming here to try to find
> a fix for my lack of understanding :-)
>
> I believe that the cause of the display corruption is due to
> a coding/design problem which misunderstands how non-blocking
> writes occur. In the Write_LCD routine, the code appears to
> assume that a nonblocking write() call will either:
>
> [1] write everything it was told to, and return a value
> matching the byte count it was told to write, or
> [2] return some other value, and not write anything at all.

That explains why it works for the 1-char case.
Here we only have to distinguish between 1 (= success)
and all other cases that are considered as EAGAIN.

With multiple chars the 1 - (size-1) case is falsely
treated like EAGAIN.

> This may be true for some device drivers, but it is not
> true in general. If a device's output buffer is nearly but
> not entirely full, the write() call may accept *some* but not
> all of the data it was asked to write, and will return the number
> of bytes written. If the buffer was entirely full, it'll return
> -1 and set errno to EAGAIN. The Write_LCD() logic doesn't account
> for the "partial write" case - it assumes that nothing was written,
> sleeps for a while, and then retries the whole write.
>
> As a result, if the CwLinux driver tries to write a large amount
> of data (e.g. a bunch of custom-character patterns followed by
> a bargraph) and the serial-driver buffer fills up, the Write_LCD()
> routine may mistakenly write the first part of a buffer to the
> device two or more times, sleeping for a short period between
> writes. Since the DELAY sleep time is only 2 milliseconds (enough
> time for only 2 characters at 9600 bits/second), it could end up
> writing the first two or three characters in the buffer to the
> device over and over and over again, until the 30-iteration
> retry loop bails out. I believe that this is the fundamental
> cause of the display corruption.
>
> The fix for this is relatively simple... Write_LCD() just needs
> to handle the "partial write" case, incrementing "c" and decrementing
> "size", and looping through its write/test/sleep loop until "size"
> becomes zero. I also felt it helpful to increase the sleep time to
> 20 milliseconds - there doesn't seem to be much sense in trying
> to be extremely aggressive about pushing more data into the
> buffer for such a slow device. I've tested this approach, and
> the display corruption is entirely gone - the display is one again
> as reliable as it was in 0.4.whatever. I haven't tested the change
> on a 1602 display or with a USB/serial interface but I don't forsee
> any problems in those environments.

Please post the patch to the ML so that I can be fixed in CVS.

> Now, as to the things I don't understand: I don't understand what's
> going on when the port is first opened and the device is first
> accessed.
>
> The code parses the speed value from the config file, and confirms
> that it's either 19200 or 9600. It then opens the port (sending
> a long-space BREAK while doing so... or at least trying to...
> more on that in a moment), grabs the current serial-port settings
> (but never doing anything with what it saved), throws away the
> user-specified speed value (!), configures the port to talk at
> 19200 bits/second, sends a command to the display to tell it to
> switch to 9600 bits/second, closes the port, re-opens the port
> (sending another long-space BREAK, or at least trying to),
> and configures the port for 9600 bit/second operation. The
> user-specified baud rate, although validated, is never actually
> used, and all communication to the display seems to take place
> at 9600 bits/second.
>
> To make matters even funkier... when the port is opened, there's
> some logic in place to send a long-space BREAK (by setting the
> baud rate to 0, sleeping, and then setting it back again).
> However, the sleep-delay for the long-space was reduced in 0.5.x
> from several milliseconds (which would actually be a legitimate
> long-space at either 19200 or 9600), to only a microsecond or
> so! If the kernel actually honors such a short sleep time,
> this won't result in a legitimate long-space - it'll generate a
> very short "runt pulse" on the serial line, which will look like
> noise to the display and may very well cause it to miss the first
> few legitimate characters sent to it.

This really sounds weird.
I haven't touched that part of the code and have no idea too why
it is coded this way.

> What the hey!?!??
>
> I can't find anything in the CwLinux data sheet which speaks to
> what the display would do with a long-space/BREAK/framing-error
> on the serial line. It might act as a soft-reset, or might be
> ignored, or might be treated as noise and do something
> indeterminate.
>
> Can anybody clarify what's happening here, and why the CwLinux driver
> doesn't seem to respect the user-specified baud rate?

Sorry no idea. Same for the double open or the too short SETUP_DELAY.

If you have patches that make the initialization part of the code
easier to understand, don't hesitate to send them.

I doubt that a cleaner initialization that works on a 12232 might
negatively affect the initialization of a 1602. The 1602 even works
with the current code. So it should be able to profit from a saner
initialization too.

> I also noticed that the framebuffer implementation in the driver
> (intended to optimize writes to the display) may actually be making
> performance worse in some cases. It probably shouldn't attempt
> to skip/move over short (one- to 4-character) sequences of unchanged
> characters, as the "move insertion point" command takes 4 bytes
> to send. I might rewrite it if I have the opportunity.

Hey, you're right.
I haven't thought of that before.
This optimization might apply (with varying thresholds of the move command)
for other drivers as well.

So, same as above: patches are very welcome.

Thanks for your insights.
Peter
--
Peter Marschall
peter AT adpm.de




Archive powered by MHonArc 2.6.18.

Top of page