Log in

No account? Create an account
CPU power management with linux and recent CPUs - Nick [entries|archive|friends|userinfo]

[ website | gagravarr.org ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

CPU power management with linux and recent CPUs [Jul. 28th, 2009|10:05 pm]
Intel (and AMD) cpus have been supporting some power management for some time now, especially on the mobile focused cpus. It used to be a slightly fiddly thing involving acpi states, and didn't always deliver quite the savings one might want. However, in the last few years, there are a couple of interesting new CPU features (and matching linux kernel code) which aim to deliver some fairly decent power savings when the cpu load is idle. Handy for making your laptop batter life go longer, but also very handy for cutting your datacenter power bill!

First up, can your x86 / x64 cpu support the more interesting new cpu power features? The answer hides in the flags section /proc/cpuinfo if you know what to look for. The two main ones of interest are tm2 and est. My core 1 laptop cpu supports both, all the core 2 laptops I've come across do too, all my xeon powered servers from the last 6 months do, and a smattering of the ones from the 12 months before that do too. Alas none of my other servers do, as it was slow to move from the laptop cpus to the desktop and server ones.

Of the two flags, est (Enhanced SpeedStep) is the more interesting, as tm2 (Thermal Monitor 2) simply delivers a better way for the CPU to slow down (and draw less power) when it gets too hot. Important to have, but ideally not something you'll normally be using! (As an aside, I'd recommend reading this article if you want to know more on EST, TM1, TM2, and especially see pretty graphs of what happens to the cpu throughput when you stop the fans etc).

With Enhanced SpeedStep (est), you can control the clockspeed of the cpu, picking the suitable one (from a list, the exact number of steps depending on the cpu model), so you can trade between power use and the amount of work that can be done every second. With the right modules loaded on linux, the cpu will throttle back its clockspeed when the load is low. As load increases, the clockspeed is increased to meet demand, and when load drops, the cpu is scaled back again. Magic!

In order for this to work, we need three sets of modules loaded. The first is the basic acpi interfaces to the processor, then the base speedstep modules (which delivers us an exciting set of entries under /sys/), then finally the ondemand governor (which does the magic of changing the cpu speed automatically with load). While you can go for a userspace governor, and change it yourself / via a daemon, almost everyone recommends that when using a newer kernel, you just let the ondemand governor do its thing, as so much work has gone into making it "just work".

The modules you'll probably want are: processor acpi-cpufreq cpufreq_ondemand cpufreq_userspace cpufreq_conservative cpufreq_powersave . The latest RHEL 4 kernels (2.6.9-89) seem to have all of these backported, RHEL 5 certainly has them all, and debian stable and ubuntu have them in too. If you're not sure, I'd say go with at least a 2.6.2x kernel so you can ensure you get all the best new bits.

Loading them on RHEL 4, I generally sling the magic lines in /etc/rc.local. For RHEL 5, I normally create a new file /etc/sysconfig/modules/powersaving.modules and put them there (plus make it executable). For Debian and Ubuntu, I create a new file /etc/rc.boot/power-management, make it executable and put the commands there. The snippet of shell code you'll want is:
for i in processor acpi-cpufreq ; do
        modprobe $i >/dev/null 2>&1
for i in processor acpi-cpufreq cpufreq_ondemand cpufreq_userspace cpufreq_conservative cpufreq_powersave; do
        modprobe $i >/dev/null 2>&1
echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

With those commands in and run, you should discover the directory /sys/devices/system/cpu/cpu0/cpufreq , and if you have multiple cpus, /sys/devices/system/cpu/cpu1/cpufreq etc. Within those directories are several files that tell you about your cpu scaling, and with usermode, let you check it. scaling_available_frequencies gives the list of the different frequencies your cpu can run at (I have one machine with only the one speed, which is really rather helpful, but most have 4 or 5 different speeds). scaling_min_freq and scaling_max_freq tell you the outer bounds of the scaling range. scaling_available_governors lists the governors available (largely dependent on the loaded modules), and scaling_governor lets you get/set the current one. As I've mentioned before, go for ondemand unless you have a really strong argument otherwise!

Let's see it in action, with a slightly made up test:
[root@grenache cpufreq]# cat scaling_available_frequencies
2000000 1667000 1333000 1000000 
[root@grenache cpufreq]# cat cpuinfo_cur_freq
[root@grenache cpufreq]# cat /dev/sda | gzip -9 > /dev/null &
[1] 22983
[root@grenache cpufreq]# sleep 30
[root@grenache cpufreq]# cat cpuinfo_cur_freq
[root@grenache cpufreq]# kill 22983
[root@grenache cpufreq]# sleep 10
[1]+  Terminated              cat /dev/sda | gzip -9 > /dev/null
[root@grenache cpufreq]# cat cpuinfo_cur_freq

Tada! Our machine started out idling at low power + cpu frequency, detected the rise in demand and sped up, then throttled back when the demand was gone. If you have a plug in power meter, you can watch the power use change accordingly too :)

Oh, and you might find some older howtos talking about things in /proc/acpi/processor/CPU0/. If your cpu supports EST, then you almost certainly don't want to be using the old acpi interface, as it doesn't let you control the new EST features, and the older states generally don't work as well. You'll only want the acpi settings if you have an old (probably mobile) cpu that only supports the original SpeedStep.

(note - I haven't gotten my hands on one of the new i7 CPUs to check what happens with them. Some googling indicates they support est and tm2, but I'm not sure if the new Turbo Boost and Zero Power features will be controlled transparently by the above, or if you'll need some new additional tweaks for them)