Tim,
For the 41CL I had to mimic the power-down modes of the original 41C series as best I could. I think I ended up with something quite similar to what was used. Basically there are three states: Deep Sleep, Light Sleep, and Running.
In Deep Sleep power (Vdd) is shut off to the majority of the logic, which in the 41CL is a Flash-based FPGA. The program storage, also being FLash, is powered down, and the main voltage regulator is shut off (it still passes the unregulated battery voltage), along with the master oscillator. So the only things drawing power are: the static RAM (data storage), the CPLD (which has the wake-up logic), the RS232 chip (just in case in the future I add serial wake-up functionality), the micro-power regulators for the RAM and CPLD, and two resistive dividers in the main power supply (one for the low-level detect, and one for the voltage-level feedback).
Light Sleep is the state where the calulator is "on" but not actually doing anything. Everything is powered, but static. The oscillator is still running, but the clocks are stopped inside the FPGA. Because of the way the keyboard scanner works, all of the column drivers have to be Low in this state (to detect any keypress), but that is a slight problem, because the column signals are really open-drain drivers with resistive pull-ups. So the pull-ups have to be as large as possible. But this conflicts with the timing for the keyboard scanner. What I ended up doing was during the actual keyboard scan, precharging all of the column lines before actually driving one of them. This allowed me to use megohm pullups instead of the 75K that would have been required to meet timing. That saved 370uA during light sleep.
I guess the bottom line is that when going for low-power, try everything possible, because it all adds up in the end. The 41CL current drain isn't as low as it could be because the programmable logic vendors haven't really cared about power. I mean, the wake-up logic required to remain powered is perhaps a dozen gates, but the CPLD draws alomst 100uA. If I were doing this in a custom chip that number would be less than 5uA. Even the resistive dividers in the power supply is a trade-off. I live with 3-4% accuracy on the voltages, because that lets me use higher-value resistors, saving another 10uA or more per divider.
Monte