I’m fairly sure that readers of this blog are acquainted with the folding@home project.
Recently, we installed the F@H app on our 4-core Ubuntu server. In fact, we installed it 4 times, since it seems that’s the only way (we could find) to use all the CPU cores on 32-bit systems. Despite the fact that 16 “FahCore_nn.exe” processes are always running, 12 of them sitting idly, the thing is working. All the cores are used, working units are crunched, we feel good for contributing every time we run the top or ps command :)
And then we started wearing t-shirts in winter.
All the CPUs are working at 100%, ventilators are working on full speed, the office is 10 degrees warmer. We tried to limit the worker processes to some CPU percentage only to find that limits.conf does not support it. Thankfully, Google pointed us to the cpulimit utility.
It worked perfectly. At least until the day after, when some of the folding worker processes were again consuming 100% of CPU. So, on day 2, we learned that there is the “main” F@H process which fires “worker” F@H processes for each working unit. Once the job is done, the process is killed leaving a zombie cpulimit process that’s trying to limit a non-existing process.
Day 3: Perl :)
The only solution we could think of, for limiting dynamically generated processes to a certain amount of CPU percentage, was a periCRONically executed Perl script that kills hanging cpulimit processes and fires new ones. The input parameters are a string for matching the process name and a number indicating the CPU percentage. Works fine, but a hackish taste is left in my mouth.
Any ideas for something smoother? I cannot accept that Perl has to be used for every real-life problem :D
Btw, the perl script can be found here.