User:Darkfader/distcc: Difference between revisions
| Line 313: | Line 313: | ||
=== general posture === | === general posture === | ||
Some security measures like the above should definitely be used since the project has only a few part-time maintainers that cannot drive the project forward or do large refactors. | Some security measures like the above should definitely be used since the project has only a few part-time maintainers that cannot easily drive the project forward or do large refactors. | ||
But the seccomp changes were done almost 10 years ago, so, to be fair, they still came through. | |||
Revision as of 10:46, 3 March 2026
Hi,
I'm preparing this page. It can take a long time till I finish. If you are also wishing to write on this topic, feel free to integrate the content.
Document overview
I noticed that almost every distro has one partially complete, partially helpful document on how to use distcc on the distro. Usually they also have one for ccache. In either case, they're enough to get started, but not really a reliable watertight thing. We definitely needed one of our own.
goal
to describe a working setup for building aports in easiest/fastests fashion not planning to add versatility or features where it would make the setup more errorprone. the page should describe enough of the steps to successfully compile an LTS kernel via aports and have that job be distributed over multiple nodes. Logs should be set up and able to display errors, but not show any errors during the test compile.
To include a path for analysis via testing components.
distcc can greatly improve compile speeds for large software, it comes with a different set of features than ccache; it focusses not avoiding unneccessary compile work, but on a way to speed up the necessary one.
audience
people running software builds on alpine and have multiple computers
This page will show a specific installation, specific configuration, and specific tests, resulting in a specific set of functionality that can be tested to be working.
installation
Packages
you need, on each host
- distcc
- distccd-openrc
- distcc-pump
- distcc-pump-pyc
the .pyc will speed up the invocations, without it idk if one or both should be installed in that case. but it appears to be also automatically be precompiled in /usr/lib/python3.12/site-packages/include_server/__pycache__/ so what does the package do exactly?
There's some references to cpython-312, so maybe it actually still uses classy python-c conversion or matbe that's just a component for reading C source code. I have zero idea.
you also need stuff to do compiles
- alpine sdk
- clang
- binutils
...
- elfutils(-dev)
Settings
settings for distcc
- there's /etc/default/distcc
- there's /etc/conf.d/distcc
make all your settings here
- command_whitelist.sh
this is half functional, you need to set things here but you also need to maintain the symlinks that are collected under /usr/lib/distcc (for your compilers) and /usr/lib/distcc/bin (for itself)
you MUST run the script to update the compilers! Info for script and what files it will create
/usr/sbin/update-distcc-symlinks
tschike:/usr/bin# ls -l /usr/lib/distcc/ total 4 drwxr-xr-x 2 root root 4096 Mar 2 18:14 bin lrwxrwxrwx 1 root root 16 Mar 2 07:02 c++ -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 c89 -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 c99 -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 cc -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 g++ -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 gcc -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 x86_64-alpine-linux-musl-g++ -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 x86_64-alpine-linux-musl-gcc -> ../../bin/distcc lrwxrwxrwx 1 root root 16 Mar 2 07:02 x86_64-alpine-linux-musl-gcc-15.2.0 -> ../../bin/distcc
distcc itself is in bin lrwxrwxrwx 1 root root 15 Mar 2 18:14 cc -> /usr/bin/distcc lrwxrwxrwx 1 root root 15 Mar 2 18:14 cpp -> /usr/bin/distcc lrwxrwxrwx 1 root root 15 Mar 2 18:14 g++ -> /usr/bin/distcc lrwxrwxrwx 1 root root 15 Mar 2 18:14 gcc -> /usr/bin/distcc lrwxrwxrwx 1 root root 15 Mar 2 06:49 x86_64-alpine-linux-musl-gcc -> /usr/bin/distcc
the last symplink here is wrong, made by me and would not work... BAD symlink.
distcc hosts file
idk about that thing it's odd
abuild.conf
settings for aports
- cc=
- cxx=
- cpp=
- cflags=
- njobs
detail infos
???
hosts syntax
- myhost otherhost
- myhost,cpp,lzo myotherhost,cpp,lzo
the host
hostname/ip localhost 127.0.0.1
- 1 - does not work
protocol
- no protocol given
- ,cpp,lzo protocol
cpp implies lzo, it requires compression, even if you have 10gbit/s or more, it's just hardcoded
threads
/number of workers
architecture
it can handle C, C++, ObjC, maybe some other stuff
- what happens with normal xmit
- what happens with pump mode
- at which step the include server is used and how it collects the includes
distribution algorithm
honestly I simply don't get it
- The order matters
- The number of threads matters
localhost
- localhost precedence
- localhost fallback
variable: DISTCC_FALLBACK
0 = Fail to compile if it would need to fallback to a normal local gcc call 1 = If remote compile fails, just do it yourself
Operation
startup and shutdown
service distcc stop is not entirely reliable (it can take a minute after the stop until the processes are gone and sometimes it will never stop this is very bad with openrc, the openrc script returns after a second and only relies on its service flags, not the process status. manually check after stopping, wait a min, if needed, kill it all. at some point the rc file needs to be rewritten, it can't stay like it is.
if you used a pump mode session, that also needs a logout (pump --shutdown) avoid running multiple startups without shutdown in one session. it's safe as far as I can tell but nothing cleans up these processes.
ccache and memcached
CCACHE is said to be conflicting with pump mode unless when you call them in the backend so, where you start the compile, you don't use it where the compile happens, you use it they can share the cache via memcached, this is a nice trick for consistency
dockerized / native
it remains mostly the same, a container needs to make sure it monitors the right services (distccd, nginx, include_server) if you're using zeroconf, you need to somehow expose the mdns service broadcasts & reception
Processs list:
- TBA
Workdir list:
- TBA
alpine-chroot
when using the 'official' script there's still some odd pieces, seemed to be the processes died on logout. but not all of them.
manual launch
Starting the daemon would be using /usr/bin/distccd --pid-file /var/run/distccd/distccd.pid -N 15 --user distcc --port 3632 --log-level=debug --log-file=/var/log/distccd.log --allow my-sub-net/24
localhost? =
unclear if you need to use --allow for 127.0.0.1/32 or something to allow the remote preproccesor.
Kernel specific settings
currently (distcc 3.4-r9 on Alpine) you need a patch to build the kernel. See
Include server Settings
cache reset triggers This ought to be set before enabling pump mode.
export INCLUDE_SERVER_ARGS="--stat_reset_triggers=include/linux/compile.h:include/asm/asm-offsets.h"
link to explanation TBA
Disabling GCC Plugins
KConfig unselect HAVE_GCC_PLUGINS
Some info is here in the troubleshooting part of the Arch Wiki https://wiki.archlinux.org/title/Distcc#Troubleshooting
Patches
Other things (for 6.6LTS) PCIe Stub patch
Autoconf
No known setup examples Add whatever it publishes in mdns
troubleshooting / analysis
testing
- turn off fallback
- GCC example compiles
- code example C
- same with included header
- code example C++
- same with included header
- code example ObjC
- same with included header
no ccache here yet
- Cmake example compiles
the thing with compile launcher ccache;distcc
Latency
Latency of pump mode startups and fallbacks needs to be investigated. LZO is enforced even if you have faster network DNS Requests, very old bug report from Gcode, one request per call, is it true? how to get rid of it? TMPDIR is respected, make sure it's on ramdisk even on the remote nodes. Compile ideally never goes to disk when it doesn't have to. How efficient is the include server collection and unpacking?
security
tcpwrapper style ip range filter
the original security model consists of ip restrictions. there seems to also be some GSSAPI user auth. further, commands that can be called are restricted by name and location. this appears to be a runtime whitelist lookup, meaning it's done and authorized by the same parts of the daemon as processes the compile request along with the intended compiler. so the main weaknesses against malicious clients seem to be in sending things to compile, and in overriding the remote compiler to use. it can be assumed that a malicious client able to exploit the compiler handshake can then run arbitrary stuff. There's at least a github issue regarding this google.com/search?q=distcc+seccomp&rlz=1C5CHFA_enDE1121DE1121&oq=distcc+seccomp&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigAdIBCDM1NjRqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8 suggesting running over ssh. That does only partitally alleviate this risk with regard to a key based verfication of a client versus a the standard ip restrictions which always include some parsing. So this protects against someone directly exploiting the TCP code of distcc. It does not protect against malicious clients. (ssh force command can't be used or you'll not compile anything)
The basic step for protecting access should be filtering who can access the distcc server, so use nftables etc. to restrict access to port 3262 (??) set up the internal filter the same way.
seccomp
The next thing is to confine the compiler calls to only write in their temp directory and that they can only run compilers (using nsjail, apparmor, selinux etc)
the above issue also references the second bit of security, namely a seccomp filter which will already cover a good bit of the above https://github.com/distcc/distcc/pull/235
One needs to investigate when compiler_whitelist.sh is exactly called. as far as I recall it doesn't close stdin/stdout.
The other internal security bit is that they do some priviledge dropping. it runs as a dedicated user (distcc), so you can also have an audit policy, and can/could use something like iptables' to ensure it can only connect to the other distcc/memcached hosts, but nothing else.
selinux
I think that's safe enough, the main vector is breakouts and confinement is possible. there's also a selinux policy for distcc from gentoo or liguros if one is so inclined. https://repology.org/project/selinux-distcc/versions
general posture
Some security measures like the above should definitely be used since the project has only a few part-time maintainers that cannot easily drive the project forward or do large refactors. But the seccomp changes were done almost 10 years ago, so, to be fair, they still came through.