User:Darkfader/distcc: Difference between revisions

From Alpine Linux
 
(16 intermediate revisions by the same user not shown)
Line 112: Line 112:
* cflags=
* cflags=
* njobs
* njobs


== detail infos ==
== detail infos ==
Line 180: Line 182:




=== ccache and memcached ===
=== ccache and distributed caches ===
CCACHE is said to be conflicting with pump mode unless when you call them in the backend
CCACHE is said to be conflicting with pump mode unless when you call them in the backend
so, where you start the compile, you don't use it
so, where you start the compile, you don't use it
Line 186: Line 188:
they can share the cache via memcached, this is a nice trick for consistency
they can share the cache via memcached, this is a nice trick for consistency
Upon looking at the ccache website, it seems the correct mechanism is not memcached but Redis. This would work just as well. Generally a cache shared in this way would be optimal for performance.
Upon looking at the ccache website, it seems the correct mechanism is not memcached but Redis. This would work just as well. Generally a cache shared in this way would be optimal for performance.
This patch is described to be solving the interactions between ccache and distcc to work much better
https://patch-diff.githubusercontent.com/raw/ccache/ccache/pull/301.diff
patch status is, unclear
zephyrOS runs KeyDB backend for distcc at scale since a while.
They picked this software to not need a large ram redis server.
One could alternatively use one system if it has enough ram (rumor has it DDR3 ECC is cheap)


=== dockerized / native ===
=== dockerized / native ===


it remains mostly the same, a container needs to make sure it monitors the right services (distccd, nginx, include_server)
it remains mostly the same, a container needs to make sure it monitors the right services (distccd, nginx, include_server)
if you're using zeroconf, you need to somehow expose the mdns service broadcasts & reception
if you're using zeroconf, you need to somehow expose the mdns service broadcasts & receptio
 
 
==== untested build container system ====
 
See this one https://github.com/bensuperpc/docker-distcc
not yet tested, but this could be a good basis for a 'best practice' container.
There's others, especially for crossbuilds, but those are also very complex to modify.
 


Processs list:
Processs list:
Line 211: Line 233:
==== localhost? =====
==== localhost? =====
unclear if you need to use --allow for 127.0.0.1/32 or something to allow the remote preproccesor.
unclear if you need to use --allow for 127.0.0.1/32 or something to allow the remote preproccesor.
=== Gitlab integration ===
TBA
Maybe work from this k8s howto, it's a bit more prod-grade than most:
https://cinaq.com/blog/2020/05/10/speed-up-docker-builds-with-distcc-ccache-and-kubernetes/


== Kernel specific settings ==
== Kernel specific settings ==
Line 316: Line 347:
so the main weaknesses against malicious clients seem to be in sending things to compile, and in overriding the remote compiler to use.
so the main weaknesses against malicious clients seem to be in sending things to compile, and in overriding the remote compiler to use.
it can be assumed that a malicious client able to exploit the compiler handshake can then run arbitrary stuff.
it can be assumed that a malicious client able to exploit the compiler handshake can then run arbitrary stuff.
There's at least a github issue regarding this (link lost) suggesting running over ssh. That does only partitally alleviate this risk with regard  to a key based verfication of a client versus a the standard ip restrictions which always include some parsing.
There's at least a github issue regarding this (link lost, but see here: https://gitlab.com/postmarketOS/pmbootstrap/-/work_items/1619) suggesting running over ssh. That does only partitally alleviate this risk with regard  to a key based verfication of a client versus a the standard ip restrictions which always include some parsing.
So this protects against someone directly exploiting the TCP code of distcc.
So this protects against someone directly exploiting the TCP code of distcc.
It does not protect against malicious clients.
It does not protect against malicious clients.
Line 350: Line 381:




there's also a selinux policy for distcc from gentoo or liguros if one is so inclined.
there's also a selinux policy for distcc from gentoo or liguros if one is so inclined.
https://repology.org/project/selinux-distcc/versions
links for further review:
 
- https://repology.org/project/selinux-distcc/versions
- https://gitlab.com/liguros/liguros-repo/-/tree/stable/sec-policy/selinux-distcc


=== general posture ===
=== general posture ===
Line 360: Line 394:
A kind of overengineered solulion would be to use netlabel to ensure application integrity across hosts, meaning only a valid process would be able to send packets to the distcc nodes :-)
A kind of overengineered solulion would be to use netlabel to ensure application integrity across hosts, meaning only a valid process would be able to send packets to the distcc nodes :-)


== Alternatives ==


Arch wiki refers to a fork by SUSE
Arch wiki refers to a fork by SUSE
https://github.com/icecc/icecream
https://github.com/icecc/icecream


It appears 'maybe better' but hard to tell.
It appears 'maybe better' but hard to tell. There's no mention if maybe SUSE's OBS uses it or something. Without more knowledge of their use I don't know how big a benefit will come from switching. At least you'd wanna know they have a higher commitment to maintenance than the understaffed distcc 'team' of current.
 
 
A commercial alternative (incredibuild https://www.incredibuild.com/) exists, but I have not tried it.
There are references to some 'forever free' tier, also, but I could not yet find the terms and conditions for having a 'forever free thing'. it also seems you need a windows-based 'coordinator' host.
opinion: generally if there's some professional context with high (business) pressure for faster builds I would check that out, and then without time pressure build a great distcc setup for the long-term.
 
 
Architecturally it seems to be taking the ccache approach with a shared cache and smarter redistribution.
So anything that can call ccache is something they can scale out. I'll add a link later, for those whom it helps.
Naturally it is out of scope for this article.
Nonetheless it's interesting since the ordering of how to run ccache and how to run distcc is an issue that also exists in the OSS world. Not just that we don't have an automatic setup and integration of the tools (ccache, redis, distccd, distcc-pump) but we also don't have collected sufficient data to aid in deciding which aproach is the best.
Integrating into CIs (gitlab-runner specifically) is another item where cache persistence becomes very fragile.
Having an alpine build container that knows how to use ccache with redis and distcc-pump would be a possible step forward.
 
== Other sources ==
 
 
https://retroflux.net/blog/distcc-adventures/

Latest revision as of 00:36, 1 April 2026

Hi,

I'm preparing this page. It can take a long time till I finish. If you are also wishing to write on this topic, feel free to integrate the content.


Document overview

I noticed that almost every distro has one partially complete, partially helpful document on how to use distcc on the distro. Usually they also have one for ccache. In either case, they're enough to get started, but not really a reliable watertight thing. We definitely needed one of our own.


goal

to describe a working setup for building aports in easiest/fastests fashion not planning to add versatility or features where it would make the setup more errorprone. the page should describe enough of the steps to successfully compile an LTS kernel via aports and have that job be distributed over multiple nodes. Logs should be set up and able to display errors, but not show any errors during the test compile.

To include a path for analysis via testing components.

distcc can greatly improve compile speeds for large software, it comes with a different set of features than ccache; it focusses not avoiding unneccessary compile work, but on a way to speed up the necessary one.

There's valuable info in the docs of other distros, it should be referenced here (i.e. the arch wiki troubleshooting), when it makes sense, add a TOC for their content.

audience

people running software builds on alpine and have multiple computers


This page will show a specific installation, specific configuration, and specific tests, resulting in a specific set of functionality that can be tested to be working.

installation

Packages

you need, on each host

  • distcc
  • distccd-openrc
  • distcc-pump
  • distcc-pump-pyc

the .pyc will speed up the invocations, without it idk if one or both should be installed in that case. but it appears to be also automatically be precompiled in /usr/lib/python3.12/site-packages/include_server/__pycache__/ so what does the package do exactly?

There's some references to cpython-312, so maybe it actually still uses classy python-c conversion or matbe that's just a component for reading C source code. I have zero idea.

you also need stuff to do compiles

  • alpine sdk
  • clang
  • binutils

...

  • elfutils(-dev)


Settings

settings for distcc

  • there's /etc/default/distcc
  • there's /etc/conf.d/distcc

make all your settings here take care with the listen address, if you specify an IP it'll not be on 127.0.0.1 in case you would have localhost in your list...


  • command_whitelist.sh

this is half functional, you need to set things here but you also need to maintain the symlinks that are collected under /usr/lib/distcc (for your compilers) and /usr/lib/distcc/bin (for itself)

you MUST run the script to update the compilers! Info for script and what files it will create

/usr/sbin/update-distcc-symlinks

tschike:/usr/bin# ls -l /usr/lib/distcc/ total 4 drwxr-xr-x 2 root root 4096 Mar 2 18:14 bin lrwxrwxrwx 1 root root 16 Mar 2 07:02 c++ -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 c89 -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 c99 -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 cc -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 g++ -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 gcc -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 x86_64-alpine-linux-musl-g++ -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 x86_64-alpine-linux-musl-gcc -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 x86_64-alpine-linux-musl-gcc-15.2.0 -> ../../bin/distcc

distcc itself is in bin lrwxrwxrwx 1 root root 15 Mar 2 18:14 cc -> /usr/bin/distcc lrwxrwxrwx 1 root root 15 Mar 2 18:14 cpp -> /usr/bin/distcc lrwxrwxrwx 1 root root 15 Mar 2 18:14 g++ -> /usr/bin/distcc lrwxrwxrwx 1 root root 15 Mar 2 18:14 gcc -> /usr/bin/distcc lrwxrwxrwx 1 root root 15 Mar 2 06:49 x86_64-alpine-linux-musl-gcc -> /usr/bin/distcc

the last symplink here is wrong, made by me and would not work... BAD symlink.

distcc hosts file

idk about that thing it's odd


abuild.conf

settings for aports

  • cc=
  • cxx=
  • cpp=
  • cflags=
  • njobs


detail infos

???

hosts syntax

  • myhost otherhost
  • myhost,cpp,lzo myotherhost,cpp,lzo

the host

hostname/ip localhost 127.0.0.1

1 - does not work

protocol

  • no protocol given
  • ,cpp,lzo protocol

cpp implies lzo, it requires compression, even if you have 10gbit/s or more, it's just hardcoded

threads

/number of workers


architecture

it can handle C, C++, ObjC, maybe some other stuff

  • what happens with normal xmit
  • what happens with pump mode
  • at which step the include server is used and how it collects the includes


distribution algorithm

honestly I simply don't get it

  • The order matters
  • The number of threads matters


localhost

  • localhost precedence
  • localhost fallback

variable: DISTCC_FALLBACK

0 = Fail to compile if it would need to fallback to a normal local gcc call 1 = If remote compile fails, just do it yourself


Operation

startup and shutdown

service distcc stop is not entirely reliable (it can take a minute after the stop until the processes are gone and sometimes it will never stop this is very bad with openrc, the openrc script returns after a second and only relies on its service flags, not the process status. manually check after stopping, wait a min, if needed, kill it all. at some point the rc file needs to be rewritten, it can't stay like it is.

if you used a pump mode session, that also needs a logout (pump --shutdown) avoid running multiple startups without shutdown in one session. it's safe as far as I can tell but nothing cleans up these processes.


ccache and distributed caches

CCACHE is said to be conflicting with pump mode unless when you call them in the backend so, where you start the compile, you don't use it where the compile happens, you use it they can share the cache via memcached, this is a nice trick for consistency Upon looking at the ccache website, it seems the correct mechanism is not memcached but Redis. This would work just as well. Generally a cache shared in this way would be optimal for performance.


This patch is described to be solving the interactions between ccache and distcc to work much better https://patch-diff.githubusercontent.com/raw/ccache/ccache/pull/301.diff

patch status is, unclear


zephyrOS runs KeyDB backend for distcc at scale since a while. They picked this software to not need a large ram redis server. One could alternatively use one system if it has enough ram (rumor has it DDR3 ECC is cheap)

dockerized / native

it remains mostly the same, a container needs to make sure it monitors the right services (distccd, nginx, include_server) if you're using zeroconf, you need to somehow expose the mdns service broadcasts & receptio


untested build container system

See this one https://github.com/bensuperpc/docker-distcc not yet tested, but this could be a good basis for a 'best practice' container. There's others, especially for crossbuilds, but those are also very complex to modify.


Processs list:

  • TBA

Workdir list:

  • TBA

alpine-chroot

when using the 'official' script there's still some odd pieces, seemed to be the processes died on logout. but not all of them.

manual launch

Starting the daemon would be using /usr/bin/distccd --pid-file /var/run/distccd/distccd.pid -N 15 --user distcc --port 3632 --log-level=debug --log-file=/var/log/distccd.log --allow my-sub-net/24

localhost? =

unclear if you need to use --allow for 127.0.0.1/32 or something to allow the remote preproccesor.


Gitlab integration

TBA

Maybe work from this k8s howto, it's a bit more prod-grade than most:

https://cinaq.com/blog/2020/05/10/speed-up-docker-builds-with-distcc-ccache-and-kubernetes/

Kernel specific settings

currently (distcc 3.4-r9 on Alpine) you need a patch to build the kernel. See

Include server Settings

cache reset triggers This ought to be set before enabling pump mode.

export INCLUDE_SERVER_ARGS="--stat_reset_triggers=include/linux/compile.h:include/asm/asm-offsets.h"

link to explanation TBA


Disabling GCC Plugins

KConfig unselect HAVE_GCC_PLUGINS

Some info is here in the troubleshooting part of the Arch Wiki https://wiki.archlinux.org/title/Distcc#Troubleshooting

Patches

Other things (for 6.6LTS) PCIe Stub patch

Autoconf

No known setup examples Add whatever it publishes in mdns


troubleshooting / analysis

testing

  1. turn off fallback DISTCC_FALLBACK=1
  2. set distcc up to point at specific system under test DISTCC_HOSTS="mytestbox,cpp,lzo"
  3. GCC example compiles
    1. code example C
    2. same with included header
    3. code example C++
    4. same with included header
    5. code example ObjC
    6. same with included header
  4. Cmake example compiles


docoument the thing with compile launcher ccache;distcc there's some blog post, point to that no ccache here yet show $CC differences distcc vs gcc, what configure scripts see


Latency

Latency of pump mode startups and fallbacks needs to be investigated. LZO is enforced even if you have faster network DNS Requests, very old bug report from Gcode, one request per call, is it true? how to get rid of it? TMPDIR is respected, make sure it's on ramdisk even on the remote nodes. Compile ideally never goes to disk when it doesn't have to. How efficient is the include server collection and unpacking?


failed to distribute, running locally instead

the curse of the ancient, wise bulgarian witch compildora nottherea has befallen people all over the world. only strict and mindless adherence to rituals passed down from generation to generation has given hope to those who are under her ages old spell. again and again it reemerges to prey on idealistic young men and women who spend so much time at their unholy computers that they try to spend less time there by spending a lot of time trying to optimize what the computer does, slowly, instead of reflecting on why they are there, while the computer is busy working, and why they try to solve this by adding a component that makes the computer be more error-prone at that same work, increasing the need of their presense at this idolized thinking machine to oversee and often repair, or worse, mindlessly restart its doing. if the curse is not lifted, despair may befall them and all they see is the need to investigate and ruminate further on the workings of this tool, ignoring thereby the obivous flaws that stem from of its alchemic origins.

just wait till you find out it has a backoff alghoritm deciding whether to call out to a remote server independent of that server functioning. graceful performance degradation is the goal, and degrading it is while we try to figure this out.


Node Selection algorithm

as per my understanding, a flowchart goes here:

  1. compile task
  2. evaluate whether to run locally by job nature
  3. determine if local host's load is notable
  4. look at distcc hosts list
  5. do something based off localhost entry if first
  6. filter for nodes with cpp flag
  7. do something based off localhost entry if not first
  8. further prioritize by server order, first is handled in some way
  9. skip nodes in backoff prisons
  10. balance and priotize by server thread number, if given
  11. send to suitable host
  12. if compile not successful, proceed on other node
  13. if nodes depleted, proceed on localhost

security

tcpwrapper style ip range filter

the original security model consists of ip restrictions. there seems to also be some GSSAPI user auth. further, commands that can be called are restricted by name and location. this appears to be a runtime whitelist lookup, meaning it's done and authorized by the same parts of the daemon as processes the compile request along with the intended compiler. so the main weaknesses against malicious clients seem to be in sending things to compile, and in overriding the remote compiler to use. it can be assumed that a malicious client able to exploit the compiler handshake can then run arbitrary stuff. There's at least a github issue regarding this (link lost, but see here: https://gitlab.com/postmarketOS/pmbootstrap/-/work_items/1619) suggesting running over ssh. That does only partitally alleviate this risk with regard to a key based verfication of a client versus a the standard ip restrictions which always include some parsing. So this protects against someone directly exploiting the TCP code of distcc. It does not protect against malicious clients. (ssh force command can't be used or you'll not compile anything)

The basic step for protecting access should be filtering who can access the distcc server, so use nftables etc. to restrict access to port 3262 (??) set up the internal filter the same way.

seccomp

The next thing is to confine the compiler calls to only write in their temp directory and that they can only run compilers (using nsjail, apparmor, selinux etc)

the above issue also references the second bit of security, namely a seccomp filter which will already cover a good bit of the above https://github.com/distcc/distcc/pull/235 the commit got closed without merge, from what I see.

I will update the entry once it's clear if the patch was later added or instead nothing was done.

privs

The other internal security bit is that they do some priviledge dropping. it runs as a dedicated user (distcc), so you can also have an audit policy, and can/could use something like iptables' to ensure it can only connect to the other distcc/memcached hosts, but nothing else.

compiler list

One needs to investigate when compiler_whitelist.sh is exactly called. as far as I recall it doesn't close stdin/stdout.

distcc-hardened

Alpine adds some hardening patch, idk what nature/origina that has.

selinux

there's also a selinux policy for distcc from gentoo or liguros if one is so inclined. links for further review:

- https://repology.org/project/selinux-distcc/versions - https://gitlab.com/liguros/liguros-repo/-/tree/stable/sec-policy/selinux-distcc

general posture

Some security measures like the above should definitely be used since the project at its core relies on accepting foreign input over the network, has only a few part-time maintainers that cannot easily drive the project forward or do large refactors. But the seccomp changes were done almost 10 years ago, so, if they were actually upstreamed, I'd say they came through.

A kind of overengineered solulion would be to use netlabel to ensure application integrity across hosts, meaning only a valid process would be able to send packets to the distcc nodes :-)


Alternatives

Arch wiki refers to a fork by SUSE https://github.com/icecc/icecream

It appears 'maybe better' but hard to tell. There's no mention if maybe SUSE's OBS uses it or something. Without more knowledge of their use I don't know how big a benefit will come from switching. At least you'd wanna know they have a higher commitment to maintenance than the understaffed distcc 'team' of current.


A commercial alternative (incredibuild https://www.incredibuild.com/) exists, but I have not tried it. There are references to some 'forever free' tier, also, but I could not yet find the terms and conditions for having a 'forever free thing'. it also seems you need a windows-based 'coordinator' host. opinion: generally if there's some professional context with high (business) pressure for faster builds I would check that out, and then without time pressure build a great distcc setup for the long-term.


Architecturally it seems to be taking the ccache approach with a shared cache and smarter redistribution. So anything that can call ccache is something they can scale out. I'll add a link later, for those whom it helps. Naturally it is out of scope for this article. Nonetheless it's interesting since the ordering of how to run ccache and how to run distcc is an issue that also exists in the OSS world. Not just that we don't have an automatic setup and integration of the tools (ccache, redis, distccd, distcc-pump) but we also don't have collected sufficient data to aid in deciding which aproach is the best. Integrating into CIs (gitlab-runner specifically) is another item where cache persistence becomes very fragile. Having an alpine build container that knows how to use ccache with redis and distcc-pump would be a possible step forward.

Other sources

https://retroflux.net/blog/distcc-adventures/