Apk spec: Difference between revisions

From Alpine Linux
No edit summary
No edit summary
Line 1: Line 1:
{{Draft}}
{{Draft}}


This page is to describe and formalize the specification of 'apk' package manager.
For end-user facing documentation about apk, check out the [[Package_management]] page.


= Database =
This page is an attempt to document the internal data structures of the apk package manager. The canonical implementation of the apk format is [https://gitlab.alpinelinux.org/alpine/apk-tools apk-tools] and much of this information is gleaned from reading the source code.


== Syntax ==
There are three generations of the APK data formats. Version 1 is deprecated and no longer used, version 2 is currently the main version in use by apk-tools, and version 3 is under development. This page mostly describes the data formats used in version 2.
Starts on line number 800 in database.c, also check package.c


{| class="wikitable"
= Package Format V2 =
|-
== Tar Segments ==
! Field Character
Tar segments are a set of tar records. Normal tar files contain two null records at the end of the tar file to signal the end of the tarball. Tar segments are lacking these two records and can thus be concatenated before other tar files and will behave as one continuous tar file. The APK v2 package format makes use of both tar segments and tarballs.
! Description
! Field Data Format
! regex
! install-db
! package-index
|-
| A
| Architecture
| ?
| <nowiki>^A:\S+$</nowiki>
| ?
| yes
|-
| C
| Pull Checksum
| ?
| ?
| ?
| yes
|-
| D
| Pull Dependencies
| ?
| ?
| ?
| yes
|-
| F
| File Path
| ?
| ?
| ?
| no
|-
| I
| Package Installed Size
| ?
| ?
| ?
| yes
|-
| L
| License
| ?
| ?
| ?
| yes
|-
| M
| File Permissions
| ?
| ?
| ?
| no
|-
| P
| Package Name  (no version, just the plain name)
| ?
| ?
| ?
| yes
|-
| R
| Get File
| ?
| ?
| ?
| no
|-
| S
| Package Size
| ?
| <nowiki>^S:(\d+)$</nowiki>
| ?
| yes
|-
| T
| Package Description
| ?
| ?
| ?
| yes
|-
| U
| Package URL
| ?
| ?
| ?
| yes
|-
| V
| Package Version
| ?
| ?
| ?
| yes
|-
| Z
| Pull Checksum
| ?
| ?
| ?
| no
|-
| a
| Check for file existence
| ?
| ?
| ?
| no
|-
| c
| Git commit of aport
| ?
| ?
| ?
| yes
|-
| k
| Provider priority
| ?
| ?
| ?
| yes
|-
| i
| Automatic Install Condition (aka Install IF)
| ?
| ?
| ?
| yes
|-
| m
| Maintainer
| ?
| ?
| ?
| yes
|-
| o
| Package Origin
| ?
| ?
| ?
| yes
|-
| q
| Replaces Priority
| ?
| ?
| ?
| no
|-
| p
| Package Provides
| ?
| ?
| ?
| yes
|-
| r
| Pull Dependencies
| ?
| ?
| ?
| no
|-
| s
| Get Tag Id
| ?
| ?
| ?
| no
|-
| t
| Build Timestamp (epoch)
| ?
| <nowiki>^t:(\d+)$</nowiki>
| ?
| yes
|}


= File Formats =
Tar segments can be compressed using gzip compression. Gzip is a stream-based file format and multiple streams can be concatenated together. Most tooling will treat multiple gzip streams within a file as if it were a single stream. APK v2 files are aware of gzip streams and use them for file segmentation.


== Index ==
== Binary Format ==
APK v2 packages contain two tar segments followed by a tarball each in their own gzip stream (3 streams total). These streams contain the package signature, control data, and package data. The package data is a tarball of the files contained in a package laid out in a way that allows it to be unpacked at the filesystem root such that all files are placed in the correct location on the system. The control tar segment contains the package metadata along with any install scripts. The signature tar segment contains a single file that is a binary signature over the concatenated control segment and data tarball.


== Install DB ==
The signature file is a DER encoded PKCS1v15 RSA signature of the SHA1 hash of the concatenated control and data gzip streams. The filename has the format <tt>.SIGN.RSA.<key_name>.rsa.pub</tt> (for example <tt>.SIGN.RSA.alpine-devel@lists.alpinelinux.org-5261cecb.rsa.pub</tt>). This file is placed inside of a tar record with permissions 0644, uid 0, and gid 0. This tar record (lacking end-of-tar records) is gzip compressed, forming a signature tar segment, and is concatenated onto the front of the combined control and data segments. [https://gitlab.alpinelinux.org/alpine/abuild/-/blob/master/abuild-sign.in abuild-sign] is responsible for generating these signature segments.


== Package File ==
The control segment contains the package metadata in a <tt>.PKGINFO</tt> file as well as all of the scripts (if any) that are used by apk during installation and removal of the package. For historical reasons all files in the control tar segment are prefixed with a dot (<tt>.</tt>). The control segment is constructed by placing each file for the package into a tar record, concatenating those tar records, gzipping the tar records, and concatenating them onto the front of the data tarball. The SHA1 hash of this gzip stream is used as the checksum <tt>C:</tt> field in the APKINDEX file.


= APKINDEX =
The data tarball is a standard gzipped tarball with extra PAX headers that contain the SHA1 hash of each file in the tar header for that file. The hash is contained in a header called <tt>APK-TOOLS.checksum.SHA1</tt>. Unlike the other tar streams this tarball does contain the two end-of-tar null records. It is always the final segment of an APK package. Hashes are added with the [https://gitlab.alpinelinux.org/alpine/abuild/-/blob/master/abuild-tar.c abuild-tar] tool.


= Archive =
== PKGINFO Format ==
The PKGINFO file contains the package metadata. This is a plain-text file similar to INI files. Lines that begin with <tt>#</tt> are comments and ignored. Unlike INI files the parsing format of this file is very strict. Each key-value pair must be separated by exactly one space, one equal sign, and one more space (<tt> = </tt>). Keys may be repeated in this file and should be treated as a list of values if repetitions are found.


[[Category:Package Manager]]
The specification for what fields are valid in PKGINFO is largely defined by [https://gitlab.alpinelinux.org/alpine/abuild/-/blob/master/abuild.in abuild]. As of July 2022 the following fields are supported:
 
* <tt>pkgname</tt> - package name
* <tt>pkgver</tt> - package version
* <tt>pkgdesc</tt> - package description
* <tt>url</tt> - package url
* <tt>builddate</tt> - unix timestamp of the package build date/time
* <tt>packager</tt> - name (and typically email) of person who built the package
* <tt>size</tt> - the installed-size of the package
* <tt>arch</tt> - the architecture of the package (ex: x86_64)
* <tt>origin</tt> - the origin name of the package
* <tt>commit</tt> - the commit hash from which the package was built
* <tt>maintainer</tt> - name (and typically email) of the package maintainer
* <tt>replaces_priority</tt> - replaces priority field for package (integer)
* <tt>provider_priority</tt> - provider priority for the package (integer)
* <tt>license</tt> - license string for the package
* <tt>depend</tt> - dependencies for the package (repeated)
* <tt>replaces</tt> - packages this package replaces (repeated)
* <tt>provides</tt> - what this package provides (repeated)
* <tt>triggers</tt> - what packages this package triggers on (repeated)
* <tt>install_if</tt> - install this package if these packages are present (repeated)
* <tt>datahash</tt> - hex-encoded sha256 checksum of the data tarball
 
== Example of PKGINFO ==
<pre>
# Generated by abuild 3.9.0-r2
# using fakeroot version 1.25.3
# Wed Jul  6 19:09:49 UTC 2022
pkgname = busybox
pkgver = 1.35.0-r18
pkgdesc = Size optimized toolbox of many common UNIX utilities
url = https://busybox.net/
builddate = 1657134589
packager = Buildozer <alpine-devel@lists.alpinelinux.org>
size = 958464
arch = x86_64
origin = busybox
commit = 332d2fff53cd4537d415e15e55e8ceb6fe6eaedb
maintainer = Sören Tempel <soeren+alpine@soeren-tempel.net>
provider_priority = 100
license = GPL-2.0-only
replaces = busybox-initscripts
provides = /bin/sh
triggers = /bin /usr/bin /sbin /usr/sbin /lib/modules/*
# automatically detected:
provides = cmd:busybox=1.35.0-r18
provides = cmd:sh=1.35.0-r18
depend = so:libc.musl-x86_64.so.1
datahash = 7d3351ac6c3ebaf18182efb5390061f50d077ce5ade60a15909d91278f70ada7
</pre>
 
== Package Building Example ==
This is a set of commands to partially build a package. '''DO NOT DO THIS''', it's mainly an example to see how this all fits together. Use the official build tools to build packages.
 
<pre>
tar -c .PKGNIFO .pre-install | abuild-tar --cut | gzip -9 > $controldir/control.tar.gz
cd $pkgdir; tar -c * | abuild-tar --hash | gzip -9 > $controldir/data.tar.gz
cat $controldir/control.tar.gz $controldir/data.tar.gz > mypackage-1.0-r0.apk
</pre>
 
= Index Format V2 =
== Binary Format ==
The index is served as [http://dl-cdn.alpinelinux.org/alpine/edge/main/x86_64/APKINDEX.tar.gz APKINDEX.tar.gz] and is downloaded by apk to power the package database. The index is signed similarly to packages. The main difference between the index and packages is that the index file contains only two segments.
 
The signature segment is identical to a package segment and is concatenated, in its own gzip stream, to the beginning of the APKINDEX tarball.
 
The APKINDEX tarball contains two files: a DESCRIPTION file and an APKINDEX file. Each of these files is in their own tar record and the final record is followed by the standard end-of-tar null records. The DESCRIPTION file is a simple text file containing a description of the index (ex: <tt>community v20210212-7170-g5c9853dc69</tt>). The APKINDEX file is a text file containing records for each package in the repository in a text-based format. Each record is separated by a newline.
 
== APKINDEX Format ==
The APKINDEX file contains a set of records extracted from the PKGINFO file of each package in the repository. Each line is prefixed with a letter, colon, and is followed by the value of the field. Lines are newline (<tt>\n</tt>) terminated and there is one blank line between records for a package.
 
The <tt>apk_pkg_write_index_entry</tt> function of [https://gitlab.alpinelinux.org/alpine/apk-tools/-/blob/ff7c8f6ee9dfa2add57b88dc271f6711030e72a0/src/package.c#L905 package.c] defines the currently accepted fields. As of July 2022, these are:
 
* <tt>C:</tt> - file checksum, the SHA1 hash of the second gzip stream of the package, base64-encoded and prefixed with <tt>Q1</tt> to indicate SHA1
* <tt>P:</tt> - package name (corresponds to <tt>pkgname</tt> in PKGINFO)
* <tt>V:</tt> - package version (corresponds to <tt>pkgver</tt> in PKGINFO)
* <tt>A:</tt> - architecture (corresponds to <tt>arch</tt> in PKGINFO), optional
* <tt>S:</tt> - size of entire package, integer
* <tt>I:</tt> - installed size, integer (corresponds to <tt>size</tt> in PKGINFO)
* <tt>T:</tt> - description (corresponds to <tt>pkgdesc</tt> in PKGINFO)
* <tt>U:</tt> - url (corresponds to <tt>url</tt> in PKGINFO)
* <tt>L:</tt> - license (corresponds to <tt>license</tt> in PKGINFO)
* <tt>o:</tt> - origin (corresponds to <tt>origin</tt> in PKGINFO), optional
* <tt>m:</tt> - maintainer (corresponds to <tt>maintainer</tt> in PKGINFO), optional
* <tt>t:</tt> - build time (corresponds to <tt>builddate</tt> in PKGINFO), optional
* <tt>c:</tt> - commit (corresponds to <tt>commit</tt> in PKGINFO), optional
* <tt>k:</tt> - provider priority, integer (corresponds to <tt>provider_priority</tt> in PKGINFO), optional
* <tt>D:</tt> - dependencies (corresponds to <tt>depend</tt> in PKGINFO, concatenated by spaces into a single line)
* <tt>p:</tt> - provides (corresponds to <tt>provides</tt> in PKGINFO, concatenated by spaces into a single line)
* <tt>i:</tt> - install if (corresponds to <tt>install_if</tt> in PKGINFO, concatenated by spaces into a single line)
 
== Package Checksum Field ==
The package checksum field is the SHA1 hash of the second gzip stream (control stream) in the package. The binary hash digest is base64 encoded. This is prefixed with <tt>Q1</tt> to differentiate it from the MD5 hashes used in older index formats. It is not possible to compute this checksum with standard command line tools but the apk-tools can compute it in their <tt>index</tt> operation.
 
== Example APKINDEX Record ==
<pre>
C:Q1P4IRU/u5yB4CSnUEBRD1WWwajrY=
P:jool-tools
V:4.1.5-r0
A:x86_64
S:140605
I:434176
T:Userspace control tools for SIIT / NAT64 Jool
U:https://www.jool.mx
L:GPL-2.0-only
o:jool-tools
m:Jakub Jirutka <jakub@jirutka.cz>
t:1620480809
c:771b3b0910ea9c7736db6ca4ff5c37ca9cf9af0d
D:so:libc.musl-x86_64.so.1 so:libnl-3.so.200 so:libnl-genl-3.so.200
p:cmd:jool=4.1.5-r0 cmd:jool_siit=4.1.5-r0 cmd:joold=4.1.5-r0
</pre>
 
[[Category:Package Manager]] [[Category:Development]]

Revision as of 01:37, 18 July 2022

This material is work-in-progress ...

Do not follow instructions here until this notice is removed.
(Last edited by Mcrute on 18 Jul 2022.)

For end-user facing documentation about apk, check out the Package_management page.

This page is an attempt to document the internal data structures of the apk package manager. The canonical implementation of the apk format is apk-tools and much of this information is gleaned from reading the source code.

There are three generations of the APK data formats. Version 1 is deprecated and no longer used, version 2 is currently the main version in use by apk-tools, and version 3 is under development. This page mostly describes the data formats used in version 2.

Package Format V2

Tar Segments

Tar segments are a set of tar records. Normal tar files contain two null records at the end of the tar file to signal the end of the tarball. Tar segments are lacking these two records and can thus be concatenated before other tar files and will behave as one continuous tar file. The APK v2 package format makes use of both tar segments and tarballs.

Tar segments can be compressed using gzip compression. Gzip is a stream-based file format and multiple streams can be concatenated together. Most tooling will treat multiple gzip streams within a file as if it were a single stream. APK v2 files are aware of gzip streams and use them for file segmentation.

Binary Format

APK v2 packages contain two tar segments followed by a tarball each in their own gzip stream (3 streams total). These streams contain the package signature, control data, and package data. The package data is a tarball of the files contained in a package laid out in a way that allows it to be unpacked at the filesystem root such that all files are placed in the correct location on the system. The control tar segment contains the package metadata along with any install scripts. The signature tar segment contains a single file that is a binary signature over the concatenated control segment and data tarball.

The signature file is a DER encoded PKCS1v15 RSA signature of the SHA1 hash of the concatenated control and data gzip streams. The filename has the format .SIGN.RSA.<key_name>.rsa.pub (for example .SIGN.RSA.alpine-devel@lists.alpinelinux.org-5261cecb.rsa.pub). This file is placed inside of a tar record with permissions 0644, uid 0, and gid 0. This tar record (lacking end-of-tar records) is gzip compressed, forming a signature tar segment, and is concatenated onto the front of the combined control and data segments. abuild-sign is responsible for generating these signature segments.

The control segment contains the package metadata in a .PKGINFO file as well as all of the scripts (if any) that are used by apk during installation and removal of the package. For historical reasons all files in the control tar segment are prefixed with a dot (.). The control segment is constructed by placing each file for the package into a tar record, concatenating those tar records, gzipping the tar records, and concatenating them onto the front of the data tarball. The SHA1 hash of this gzip stream is used as the checksum C: field in the APKINDEX file.

The data tarball is a standard gzipped tarball with extra PAX headers that contain the SHA1 hash of each file in the tar header for that file. The hash is contained in a header called APK-TOOLS.checksum.SHA1. Unlike the other tar streams this tarball does contain the two end-of-tar null records. It is always the final segment of an APK package. Hashes are added with the abuild-tar tool.

PKGINFO Format

The PKGINFO file contains the package metadata. This is a plain-text file similar to INI files. Lines that begin with # are comments and ignored. Unlike INI files the parsing format of this file is very strict. Each key-value pair must be separated by exactly one space, one equal sign, and one more space ( = ). Keys may be repeated in this file and should be treated as a list of values if repetitions are found.

The specification for what fields are valid in PKGINFO is largely defined by abuild. As of July 2022 the following fields are supported:

  • pkgname - package name
  • pkgver - package version
  • pkgdesc - package description
  • url - package url
  • builddate - unix timestamp of the package build date/time
  • packager - name (and typically email) of person who built the package
  • size - the installed-size of the package
  • arch - the architecture of the package (ex: x86_64)
  • origin - the origin name of the package
  • commit - the commit hash from which the package was built
  • maintainer - name (and typically email) of the package maintainer
  • replaces_priority - replaces priority field for package (integer)
  • provider_priority - provider priority for the package (integer)
  • license - license string for the package
  • depend - dependencies for the package (repeated)
  • replaces - packages this package replaces (repeated)
  • provides - what this package provides (repeated)
  • triggers - what packages this package triggers on (repeated)
  • install_if - install this package if these packages are present (repeated)
  • datahash - hex-encoded sha256 checksum of the data tarball

Example of PKGINFO

# Generated by abuild 3.9.0-r2
# using fakeroot version 1.25.3
# Wed Jul  6 19:09:49 UTC 2022
pkgname = busybox
pkgver = 1.35.0-r18
pkgdesc = Size optimized toolbox of many common UNIX utilities
url = https://busybox.net/
builddate = 1657134589
packager = Buildozer <alpine-devel@lists.alpinelinux.org>
size = 958464
arch = x86_64
origin = busybox
commit = 332d2fff53cd4537d415e15e55e8ceb6fe6eaedb
maintainer = Sören Tempel <soeren+alpine@soeren-tempel.net>
provider_priority = 100
license = GPL-2.0-only
replaces = busybox-initscripts
provides = /bin/sh
triggers = /bin /usr/bin /sbin /usr/sbin /lib/modules/*
# automatically detected:
provides = cmd:busybox=1.35.0-r18
provides = cmd:sh=1.35.0-r18
depend = so:libc.musl-x86_64.so.1
datahash = 7d3351ac6c3ebaf18182efb5390061f50d077ce5ade60a15909d91278f70ada7

Package Building Example

This is a set of commands to partially build a package. DO NOT DO THIS, it's mainly an example to see how this all fits together. Use the official build tools to build packages.

tar -c .PKGNIFO .pre-install | abuild-tar --cut | gzip -9 > $controldir/control.tar.gz
cd $pkgdir; tar -c * | abuild-tar --hash | gzip -9 > $controldir/data.tar.gz
cat $controldir/control.tar.gz $controldir/data.tar.gz > mypackage-1.0-r0.apk

Index Format V2

Binary Format

The index is served as APKINDEX.tar.gz and is downloaded by apk to power the package database. The index is signed similarly to packages. The main difference between the index and packages is that the index file contains only two segments.

The signature segment is identical to a package segment and is concatenated, in its own gzip stream, to the beginning of the APKINDEX tarball.

The APKINDEX tarball contains two files: a DESCRIPTION file and an APKINDEX file. Each of these files is in their own tar record and the final record is followed by the standard end-of-tar null records. The DESCRIPTION file is a simple text file containing a description of the index (ex: community v20210212-7170-g5c9853dc69). The APKINDEX file is a text file containing records for each package in the repository in a text-based format. Each record is separated by a newline.

APKINDEX Format

The APKINDEX file contains a set of records extracted from the PKGINFO file of each package in the repository. Each line is prefixed with a letter, colon, and is followed by the value of the field. Lines are newline (\n) terminated and there is one blank line between records for a package.

The apk_pkg_write_index_entry function of package.c defines the currently accepted fields. As of July 2022, these are:

  • C: - file checksum, the SHA1 hash of the second gzip stream of the package, base64-encoded and prefixed with Q1 to indicate SHA1
  • P: - package name (corresponds to pkgname in PKGINFO)
  • V: - package version (corresponds to pkgver in PKGINFO)
  • A: - architecture (corresponds to arch in PKGINFO), optional
  • S: - size of entire package, integer
  • I: - installed size, integer (corresponds to size in PKGINFO)
  • T: - description (corresponds to pkgdesc in PKGINFO)
  • U: - url (corresponds to url in PKGINFO)
  • L: - license (corresponds to license in PKGINFO)
  • o: - origin (corresponds to origin in PKGINFO), optional
  • m: - maintainer (corresponds to maintainer in PKGINFO), optional
  • t: - build time (corresponds to builddate in PKGINFO), optional
  • c: - commit (corresponds to commit in PKGINFO), optional
  • k: - provider priority, integer (corresponds to provider_priority in PKGINFO), optional
  • D: - dependencies (corresponds to depend in PKGINFO, concatenated by spaces into a single line)
  • p: - provides (corresponds to provides in PKGINFO, concatenated by spaces into a single line)
  • i: - install if (corresponds to install_if in PKGINFO, concatenated by spaces into a single line)

Package Checksum Field

The package checksum field is the SHA1 hash of the second gzip stream (control stream) in the package. The binary hash digest is base64 encoded. This is prefixed with Q1 to differentiate it from the MD5 hashes used in older index formats. It is not possible to compute this checksum with standard command line tools but the apk-tools can compute it in their index operation.

Example APKINDEX Record

C:Q1P4IRU/u5yB4CSnUEBRD1WWwajrY=
P:jool-tools
V:4.1.5-r0
A:x86_64
S:140605
I:434176
T:Userspace control tools for SIIT / NAT64 Jool
U:https://www.jool.mx
L:GPL-2.0-only
o:jool-tools
m:Jakub Jirutka <jakub@jirutka.cz>
t:1620480809
c:771b3b0910ea9c7736db6ca4ff5c37ca9cf9af0d
D:so:libc.musl-x86_64.so.1 so:libnl-3.so.200 so:libnl-genl-3.so.200
p:cmd:jool=4.1.5-r0 cmd:jool_siit=4.1.5-r0 cmd:joold=4.1.5-r0