Apk spec: Difference between revisions

From Alpine Linux
(Undo revision 10464 by Systmkor (talk))
(use https link)
 
(25 intermediate revisions by 8 users not shown)
Line 1: Line 1:
This page is to describe and formalize the specication of 'apk' package manager.
{{Draft}}


= Database =
For end-user facing documentation about apk, check out the [[Alpine_Package_Keeper|Alpine Package Keeper]] page.


== Syntax ==
This page is an attempt to document the internal data structures of the apk package manager. The canonical implementation of the apk format is [https://gitlab.alpinelinux.org/alpine/apk-tools apk-tools] and much of this information is gleaned from reading the source code.
Starts on line number 800 in database.c, also check package.c


{| class="wikitable"
There are three generations of the APK data formats. Version 1 is deprecated and no longer used, version 2 is currently the main version in use by apk-tools, and version 3 is under development. This page mostly describes the data formats used in version 2.
|-
! Field Character
! Description
! Field Data Format
! install-db
! package-index
|-
| A
| Architecture
| ?
| ?
| ?
|-
| C
| Blob Pull Checksum
| ?
| ?
| ?
|-
| D
| Pull Dependencies
| ?
| ?
| ?
|-
| F
| File Path
| ?
| ?
| ?
|-
| I
| Package Installed Size
| ?
| ?
| ?
|-
| L
| License
| ?
| ?
| ?
|-
| M
| File Permissions
| ?
| ?
| ?
|-
| P
| ?
| ?
| ?
| ?
|-
| R
| Get File
| ?
| ?
| ?
|-
| S
| Package Size
| ?
| ?
| ?
|-
| T
| Package Description
| ?
| ?
| ?
|-
| U
| Package URL
| ?
| ?
| ?
|-
| V
| Package Version
| ?
| ?
| ?
|-
| Z
| Blob Pull Checksum
| ?
| ?
| ?
|-
| a
| Check for file existence
| ?
| ?
| ?
|-
| c
| ?
| ?
| ?
| ?
|-
| i
| ?
| ?
| ?
| ?
|-
| m
| Maintainer
| ?
| ?
| ?
|-
| o
| Package Origin
| ?
| ?
| ?
|-
| q
| Replaces Priority
| ?
| ?
| ?
|-
| p
| Package Provides
| ?
| ?
| ?
|-
| r
| Blob Pull Dependencies
| ?
| ?
| ?
|-
| s
| Get Tag Id
| ?
| ?
| ?
|-
| t
| Build Timestamp (epoch)
| ?
| ?
| ?
|}


= Background =
== Tar Segments ==
Tar segments are a set of tar records. Normal tar files contain two null records at the end of the tar file to signal the end of the tarball. Tar segments are lacking these two records and can thus be concatenated before other tar files and will behave as one continuous tar file. The APK v2 package format makes use of both tar segments and tarballs.


= Blob =
Tar segments can be compressed using gzip compression. Gzip is a stream-based file format and multiple streams can be concatenated together. Most tooling will treat multiple gzip streams within a file as if it were a single stream. APK v2 files are aware of gzip streams and use them for file segmentation.


= APINDEX =
= Package Format V2 =
== Binary Format ==
APK v2 packages contain two tar segments followed by a tarball each in their own gzip stream (3 streams total). These streams contain the package signature, control data, and package data. The package data is a tarball of the files contained in a package laid out in a way that allows it to be unpacked at the filesystem root such that all files are placed in the correct location on the system. The control tar segment contains the package metadata along with any install scripts. The signature tar segment contains a single file that is a binary signature over the control segment.


= Archive =
The signature file is a DER encoded PKCS1v15 RSA signature of the SHA1 hash of the control tar segment gzip stream. The filename has the format <tt>.SIGN.RSA.<key_name>.rsa.pub</tt> (for example <tt>.SIGN.RSA.alpine-devel@lists.alpinelinux.org-5261cecb.rsa.pub</tt>). This file is placed inside of a tar record with permissions 0644, uid 0, and gid 0. This tar record (lacking end-of-tar records) is gzip compressed, forming a signature tar segment, and is concatenated onto the front of the combined control and data segments. [https://gitlab.alpinelinux.org/alpine/abuild/-/blob/master/abuild-sign.in abuild-sign] is responsible for generating these signature segments.


= APKBUILD =
The control segment contains the package metadata in a <tt>.PKGINFO</tt> file as well as all of the scripts (if any) that are used by apk during installation and removal of the package. For historical reasons all files in the control tar segment are prefixed with a dot (<tt>.</tt>). The control segment is constructed by placing each file for the package into a tar record, concatenating those tar records, gzipping the tar records, and concatenating them onto the front of the data tarball. The SHA1 hash of this gzip stream is used as the checksum <tt>C:</tt> field in the APKINDEX file.
 
The data tarball is a standard gzipped tarball with extra PAX headers that contain the SHA1 hash of each file in the tar header for that file. The hash is contained in a header called <tt>APK-TOOLS.checksum.SHA1</tt>. Unlike the other tar streams this tarball does contain the two end-of-tar null records. It is always the final segment of an APK package. Hashes are added with the [https://gitlab.alpinelinux.org/alpine/abuild/-/blob/master/abuild-tar.c abuild-tar] tool.
 
== PKGINFO Format ==
The PKGINFO file contains the package metadata. This is a plain-text file similar to INI files. Lines that begin with <tt>#</tt> are comments and ignored. Unlike INI files the parsing format of this file is very strict. Each key-value pair must be separated by exactly one space, one equal sign, and one more space (<tt> = </tt>). Keys may be repeated in this file and should be treated as a list of values if repetitions are found.
 
The specification for what fields are valid in PKGINFO is largely defined by [https://gitlab.alpinelinux.org/alpine/abuild/-/blob/master/abuild.in abuild]. As of July 2022 the following fields are supported:
 
* <tt>pkgname</tt> - package name
* <tt>pkgver</tt> - package version
* <tt>pkgdesc</tt> - package description
* <tt>url</tt> - package url
* <tt>builddate</tt> - unix timestamp of the package build date/time
* <tt>packager</tt> - name (and typically email) of person who built the package
* <tt>size</tt> - the installed-size of the package
* <tt>arch</tt> - the architecture of the package (ex: x86_64)
* <tt>origin</tt> - the origin name of the package
* <tt>commit</tt> - the commit hash from which the package was built
* <tt>maintainer</tt> - name (and typically email) of the package maintainer
* <tt>replaces_priority</tt> - replaces priority field for package (integer)
* <tt>provider_priority</tt> - provider priority for the package (integer)
* <tt>license</tt> - license string for the package
* <tt>depend</tt> - dependencies for the package (repeated)
* <tt>replaces</tt> - packages this package replaces (repeated)
* <tt>provides</tt> - what this package provides (repeated)
* <tt>triggers</tt> - what packages this package triggers on (repeated)
* <tt>install_if</tt> - install this package if these packages are present (repeated)
* <tt>datahash</tt> - hex-encoded sha256 checksum of the data tarball
 
== Example of PKGINFO ==
<pre>
# Generated by abuild 3.9.0-r2
# using fakeroot version 1.25.3
# Wed Jul  6 19:09:49 UTC 2022
pkgname = busybox
pkgver = 1.35.0-r18
pkgdesc = Size optimized toolbox of many common UNIX utilities
url = https://busybox.net/
builddate = 1657134589
packager = Buildozer <alpine-devel@lists.alpinelinux.org>
size = 958464
arch = x86_64
origin = busybox
commit = 332d2fff53cd4537d415e15e55e8ceb6fe6eaedb
maintainer = Sören Tempel <soeren+alpine@soeren-tempel.net>
provider_priority = 100
license = GPL-2.0-only
replaces = busybox-initscripts
provides = /bin/sh
triggers = /bin /usr/bin /sbin /usr/sbin /lib/modules/*
# automatically detected:
provides = cmd:busybox=1.35.0-r18
provides = cmd:sh=1.35.0-r18
depend = so:libc.musl-x86_64.so.1
datahash = 7d3351ac6c3ebaf18182efb5390061f50d077ce5ade60a15909d91278f70ada7
</pre>
 
== Package Building Example ==
This is a set of commands to partially build a package. '''DO NOT DO THIS''', it's mainly an example to see how this all fits together. Use the official build tools to build packages.
 
<pre>
tar -c .PKGNIFO .pre-install | abuild-tar --cut | gzip -9 > $controldir/control.tar.gz
cd $pkgdir; tar -c * | abuild-tar --hash | gzip -9 > $controldir/data.tar.gz
cat $controldir/control.tar.gz $controldir/data.tar.gz > mypackage-1.0-r0.apk
</pre>
 
= Index Format V2 =
== Binary Format ==
The index is served as [https://dl-cdn.alpinelinux.org/alpine/edge/main/x86_64/APKINDEX.tar.gz APKINDEX.tar.gz] and is downloaded by apk to power the package database. The index is signed similarly to packages. The main difference between the index and packages is that the index file contains only two segments.
 
The signature segment is identical to a package segment and is concatenated, in its own gzip stream, to the beginning of the APKINDEX tarball.
 
The APKINDEX tarball contains two files: a DESCRIPTION file and an APKINDEX file. Each of these files is in their own tar record and the final record is followed by the standard end-of-tar null records. The DESCRIPTION file is a simple text file containing a description of the index (ex: <tt>community v20210212-7170-g5c9853dc69</tt>). The APKINDEX file is a text file containing records for each package in the repository in a text-based format. Each record is separated by a newline.
 
== APKINDEX Format ==
The APKINDEX file contains a set of records extracted from the PKGINFO file of each package in the repository. Each line is prefixed with a letter, colon, and is followed by the value of the field. Lines are newline (<tt>\n</tt>) terminated and there is one blank line between records for a package.
 
The <tt>apk_pkg_write_index_entry</tt> function of [https://gitlab.alpinelinux.org/alpine/apk-tools/-/blob/ff7c8f6ee9dfa2add57b88dc271f6711030e72a0/src/package.c#L905 package.c] defines the currently accepted fields. As of July 2022, these are:
 
* <tt>C:</tt> - file checksum, see below
* <tt>P:</tt> - package name (corresponds to <tt>pkgname</tt> in PKGINFO)
* <tt>V:</tt> - package version (corresponds to <tt>pkgver</tt> in PKGINFO)
* <tt>A:</tt> - architecture (corresponds to <tt>arch</tt> in PKGINFO), optional
* <tt>S:</tt> - size of entire package, integer
* <tt>I:</tt> - installed size, integer (corresponds to <tt>size</tt> in PKGINFO)
* <tt>T:</tt> - description (corresponds to <tt>pkgdesc</tt> in PKGINFO)
* <tt>U:</tt> - url (corresponds to <tt>url</tt> in PKGINFO)
* <tt>L:</tt> - license (corresponds to <tt>license</tt> in PKGINFO)
* <tt>o:</tt> - origin (corresponds to <tt>origin</tt> in PKGINFO), optional
* <tt>m:</tt> - maintainer (corresponds to <tt>maintainer</tt> in PKGINFO), optional
* <tt>t:</tt> - build time (corresponds to <tt>builddate</tt> in PKGINFO), optional
* <tt>c:</tt> - commit (corresponds to <tt>commit</tt> in PKGINFO), optional
* <tt>k:</tt> - provider priority, integer (corresponds to <tt>provider_priority</tt> in PKGINFO), optional
* <tt>D:</tt> - dependencies (corresponds to <tt>depend</tt> in PKGINFO, concatenated by spaces into a single line)
* <tt>p:</tt> - provides (corresponds to <tt>provides</tt> in PKGINFO, concatenated by spaces into a single line)
* <tt>i:</tt> - install if (corresponds to <tt>install_if</tt> in PKGINFO, concatenated by spaces into a single line)
 
== Package Checksum Field ==
The package checksum field is the SHA1 hash of the second gzip stream (control stream) in the package. The binary hash digest is base64 encoded. This is prefixed with <tt>Q1</tt> to differentiate it from the MD5 hashes used in older index formats. It is not possible to compute this checksum with standard command line tools but the apk-tools can compute it in their <tt>index</tt> operation.
 
== Example APKINDEX Record ==
<pre>
C:Q1P4IRU/u5yB4CSnUEBRD1WWwajrY=
P:jool-tools
V:4.1.5-r0
A:x86_64
S:140605
I:434176
T:Userspace control tools for SIIT / NAT64 Jool
U:https://www.jool.mx
L:GPL-2.0-only
o:jool-tools
m:Jakub Jirutka <jakub@jirutka.cz>
t:1620480809
c:771b3b0910ea9c7736db6ca4ff5c37ca9cf9af0d
D:so:libc.musl-x86_64.so.1 so:libnl-3.so.200 so:libnl-genl-3.so.200
p:cmd:jool=4.1.5-r0 cmd:jool_siit=4.1.5-r0 cmd:joold=4.1.5-r0
</pre>
 
= Installed Database V2 =
The installed database is used by apk to track which packages are installed and what modifications those packages have made to the system. This file is located at <tt>/lib/apk/db/installed</tt>. The installed file is a plaintext file of the same format as APKINDEX (contained in APKINDEX.tar.gz). It is neither compressed nor signed. Each record in the installed file starts with a package index record with the same fields as the APKINDEX file. The installed file adds some additional fields that are defined in [https://gitlab.alpinelinux.org/alpine/apk-tools/-/blob/ff7c8f6ee9dfa2add57b88dc271f6711030e72a0/src/database.c#L937 database.c]. As of July 2022 these additional fields are:
 
* <tt>r:</tt> - packages which this package replaces, space separated list
* <tt>q:</tt> - replaces priority, integer, optional
* <tt>s:</tt> - repository tag, optional, this will be set if the package is tagged to a repository in the world file (ex: linux@testing)
* <tt>f:</tt> - indicates broken items, space separated (f=files, s=scripts, x=xattrs, S=file hashes)
 
The following fields are repeated and in groups consist of a set of mutations made to the system to install the package.
 
ACL lines are specified as uid, colon, gid, colon, and mode.
 
* <tt>F:</tt> - directory name that was created by package, repeated
* <tt>M:</tt> - directory ACL, only if different than the default of 0:0:0755
* <tt>R:</tt> - file name, relative to preceding directory name
* <tt>a:</tt> - file ACL
* <tt>Z:</tt> - file checksum, if the checksum in the package is not none, a <tt>Q1</tt> prefix indicates this will be a SHA1 hash in base64 format
 
[[Category:Package Manager]] [[Category:Development]]

Latest revision as of 02:58, 25 August 2023

This material is work-in-progress ...

Do not follow instructions here until this notice is removed.
(Last edited by Sertonix on 25 Aug 2023.)

For end-user facing documentation about apk, check out the Alpine Package Keeper page.

This page is an attempt to document the internal data structures of the apk package manager. The canonical implementation of the apk format is apk-tools and much of this information is gleaned from reading the source code.

There are three generations of the APK data formats. Version 1 is deprecated and no longer used, version 2 is currently the main version in use by apk-tools, and version 3 is under development. This page mostly describes the data formats used in version 2.

Background

Tar Segments

Tar segments are a set of tar records. Normal tar files contain two null records at the end of the tar file to signal the end of the tarball. Tar segments are lacking these two records and can thus be concatenated before other tar files and will behave as one continuous tar file. The APK v2 package format makes use of both tar segments and tarballs.

Tar segments can be compressed using gzip compression. Gzip is a stream-based file format and multiple streams can be concatenated together. Most tooling will treat multiple gzip streams within a file as if it were a single stream. APK v2 files are aware of gzip streams and use them for file segmentation.

Package Format V2

Binary Format

APK v2 packages contain two tar segments followed by a tarball each in their own gzip stream (3 streams total). These streams contain the package signature, control data, and package data. The package data is a tarball of the files contained in a package laid out in a way that allows it to be unpacked at the filesystem root such that all files are placed in the correct location on the system. The control tar segment contains the package metadata along with any install scripts. The signature tar segment contains a single file that is a binary signature over the control segment.

The signature file is a DER encoded PKCS1v15 RSA signature of the SHA1 hash of the control tar segment gzip stream. The filename has the format .SIGN.RSA.<key_name>.rsa.pub (for example .SIGN.RSA.alpine-devel@lists.alpinelinux.org-5261cecb.rsa.pub). This file is placed inside of a tar record with permissions 0644, uid 0, and gid 0. This tar record (lacking end-of-tar records) is gzip compressed, forming a signature tar segment, and is concatenated onto the front of the combined control and data segments. abuild-sign is responsible for generating these signature segments.

The control segment contains the package metadata in a .PKGINFO file as well as all of the scripts (if any) that are used by apk during installation and removal of the package. For historical reasons all files in the control tar segment are prefixed with a dot (.). The control segment is constructed by placing each file for the package into a tar record, concatenating those tar records, gzipping the tar records, and concatenating them onto the front of the data tarball. The SHA1 hash of this gzip stream is used as the checksum C: field in the APKINDEX file.

The data tarball is a standard gzipped tarball with extra PAX headers that contain the SHA1 hash of each file in the tar header for that file. The hash is contained in a header called APK-TOOLS.checksum.SHA1. Unlike the other tar streams this tarball does contain the two end-of-tar null records. It is always the final segment of an APK package. Hashes are added with the abuild-tar tool.

PKGINFO Format

The PKGINFO file contains the package metadata. This is a plain-text file similar to INI files. Lines that begin with # are comments and ignored. Unlike INI files the parsing format of this file is very strict. Each key-value pair must be separated by exactly one space, one equal sign, and one more space ( = ). Keys may be repeated in this file and should be treated as a list of values if repetitions are found.

The specification for what fields are valid in PKGINFO is largely defined by abuild. As of July 2022 the following fields are supported:

  • pkgname - package name
  • pkgver - package version
  • pkgdesc - package description
  • url - package url
  • builddate - unix timestamp of the package build date/time
  • packager - name (and typically email) of person who built the package
  • size - the installed-size of the package
  • arch - the architecture of the package (ex: x86_64)
  • origin - the origin name of the package
  • commit - the commit hash from which the package was built
  • maintainer - name (and typically email) of the package maintainer
  • replaces_priority - replaces priority field for package (integer)
  • provider_priority - provider priority for the package (integer)
  • license - license string for the package
  • depend - dependencies for the package (repeated)
  • replaces - packages this package replaces (repeated)
  • provides - what this package provides (repeated)
  • triggers - what packages this package triggers on (repeated)
  • install_if - install this package if these packages are present (repeated)
  • datahash - hex-encoded sha256 checksum of the data tarball

Example of PKGINFO

# Generated by abuild 3.9.0-r2
# using fakeroot version 1.25.3
# Wed Jul  6 19:09:49 UTC 2022
pkgname = busybox
pkgver = 1.35.0-r18
pkgdesc = Size optimized toolbox of many common UNIX utilities
url = https://busybox.net/
builddate = 1657134589
packager = Buildozer <alpine-devel@lists.alpinelinux.org>
size = 958464
arch = x86_64
origin = busybox
commit = 332d2fff53cd4537d415e15e55e8ceb6fe6eaedb
maintainer = Sören Tempel <soeren+alpine@soeren-tempel.net>
provider_priority = 100
license = GPL-2.0-only
replaces = busybox-initscripts
provides = /bin/sh
triggers = /bin /usr/bin /sbin /usr/sbin /lib/modules/*
# automatically detected:
provides = cmd:busybox=1.35.0-r18
provides = cmd:sh=1.35.0-r18
depend = so:libc.musl-x86_64.so.1
datahash = 7d3351ac6c3ebaf18182efb5390061f50d077ce5ade60a15909d91278f70ada7

Package Building Example

This is a set of commands to partially build a package. DO NOT DO THIS, it's mainly an example to see how this all fits together. Use the official build tools to build packages.

tar -c .PKGNIFO .pre-install | abuild-tar --cut | gzip -9 > $controldir/control.tar.gz
cd $pkgdir; tar -c * | abuild-tar --hash | gzip -9 > $controldir/data.tar.gz
cat $controldir/control.tar.gz $controldir/data.tar.gz > mypackage-1.0-r0.apk

Index Format V2

Binary Format

The index is served as APKINDEX.tar.gz and is downloaded by apk to power the package database. The index is signed similarly to packages. The main difference between the index and packages is that the index file contains only two segments.

The signature segment is identical to a package segment and is concatenated, in its own gzip stream, to the beginning of the APKINDEX tarball.

The APKINDEX tarball contains two files: a DESCRIPTION file and an APKINDEX file. Each of these files is in their own tar record and the final record is followed by the standard end-of-tar null records. The DESCRIPTION file is a simple text file containing a description of the index (ex: community v20210212-7170-g5c9853dc69). The APKINDEX file is a text file containing records for each package in the repository in a text-based format. Each record is separated by a newline.

APKINDEX Format

The APKINDEX file contains a set of records extracted from the PKGINFO file of each package in the repository. Each line is prefixed with a letter, colon, and is followed by the value of the field. Lines are newline (\n) terminated and there is one blank line between records for a package.

The apk_pkg_write_index_entry function of package.c defines the currently accepted fields. As of July 2022, these are:

  • C: - file checksum, see below
  • P: - package name (corresponds to pkgname in PKGINFO)
  • V: - package version (corresponds to pkgver in PKGINFO)
  • A: - architecture (corresponds to arch in PKGINFO), optional
  • S: - size of entire package, integer
  • I: - installed size, integer (corresponds to size in PKGINFO)
  • T: - description (corresponds to pkgdesc in PKGINFO)
  • U: - url (corresponds to url in PKGINFO)
  • L: - license (corresponds to license in PKGINFO)
  • o: - origin (corresponds to origin in PKGINFO), optional
  • m: - maintainer (corresponds to maintainer in PKGINFO), optional
  • t: - build time (corresponds to builddate in PKGINFO), optional
  • c: - commit (corresponds to commit in PKGINFO), optional
  • k: - provider priority, integer (corresponds to provider_priority in PKGINFO), optional
  • D: - dependencies (corresponds to depend in PKGINFO, concatenated by spaces into a single line)
  • p: - provides (corresponds to provides in PKGINFO, concatenated by spaces into a single line)
  • i: - install if (corresponds to install_if in PKGINFO, concatenated by spaces into a single line)

Package Checksum Field

The package checksum field is the SHA1 hash of the second gzip stream (control stream) in the package. The binary hash digest is base64 encoded. This is prefixed with Q1 to differentiate it from the MD5 hashes used in older index formats. It is not possible to compute this checksum with standard command line tools but the apk-tools can compute it in their index operation.

Example APKINDEX Record

C:Q1P4IRU/u5yB4CSnUEBRD1WWwajrY=
P:jool-tools
V:4.1.5-r0
A:x86_64
S:140605
I:434176
T:Userspace control tools for SIIT / NAT64 Jool
U:https://www.jool.mx
L:GPL-2.0-only
o:jool-tools
m:Jakub Jirutka <jakub@jirutka.cz>
t:1620480809
c:771b3b0910ea9c7736db6ca4ff5c37ca9cf9af0d
D:so:libc.musl-x86_64.so.1 so:libnl-3.so.200 so:libnl-genl-3.so.200
p:cmd:jool=4.1.5-r0 cmd:jool_siit=4.1.5-r0 cmd:joold=4.1.5-r0

Installed Database V2

The installed database is used by apk to track which packages are installed and what modifications those packages have made to the system. This file is located at /lib/apk/db/installed. The installed file is a plaintext file of the same format as APKINDEX (contained in APKINDEX.tar.gz). It is neither compressed nor signed. Each record in the installed file starts with a package index record with the same fields as the APKINDEX file. The installed file adds some additional fields that are defined in database.c. As of July 2022 these additional fields are:

  • r: - packages which this package replaces, space separated list
  • q: - replaces priority, integer, optional
  • s: - repository tag, optional, this will be set if the package is tagged to a repository in the world file (ex: linux@testing)
  • f: - indicates broken items, space separated (f=files, s=scripts, x=xattrs, S=file hashes)

The following fields are repeated and in groups consist of a set of mutations made to the system to install the package.

ACL lines are specified as uid, colon, gid, colon, and mode.

  • F: - directory name that was created by package, repeated
  • M: - directory ACL, only if different than the default of 0:0:0755
  • R: - file name, relative to preceding directory name
  • a: - file ACL
  • Z: - file checksum, if the checksum in the package is not none, a Q1 prefix indicates this will be a SHA1 hash in base64 format