1756 lines
90 KiB
Plaintext
1756 lines
90 KiB
Plaintext
This is ctf-spec.info, produced by makeinfo version 7.0.2 from
|
||
ctf-spec.texi.
|
||
|
||
Copyright © 2021-2023 Free Software Foundation, Inc.
|
||
|
||
Permission is granted to copy, distribute and/or modify this document
|
||
under the terms of the GNU General Public License, Version 3 or any
|
||
later version published by the Free Software Foundation. A copy of the
|
||
license is included in the section entitled “GNU General Public
|
||
License”.
|
||
|
||
INFO-DIR-SECTION Software development
|
||
START-INFO-DIR-ENTRY
|
||
* CTF: (ctf-spec). The CTF file format.
|
||
END-INFO-DIR-ENTRY
|
||
|
||
|
||
File: ctf-spec.info, Node: Top, Next: Overview, Up: (dir)
|
||
|
||
The CTF file format
|
||
*******************
|
||
|
||
This manual describes version 3 of the CTF file format, which is
|
||
intended to model the C type system in a fashion that C programs can
|
||
consume at runtime.
|
||
|
||
* Menu:
|
||
|
||
* Overview::
|
||
* CTF archive::
|
||
* CTF dictionaries::
|
||
* Index::
|
||
|
||
|
||
File: ctf-spec.info, Node: Overview, Next: CTF archive, Prev: Top, Up: Top
|
||
|
||
Overview
|
||
********
|
||
|
||
The CTF file format compactly describes C types and the association
|
||
between function and data symbols and types: if embedded in ELF objects,
|
||
it can exploit the ELF string table to reduce duplication further.
|
||
There is no real concept of namespacing: only top-level types are
|
||
described, not types scoped to within single functions.
|
||
|
||
CTF dictionaries can be “children” of other dictionaries, in a
|
||
one-level hierarchy: child dictionaries can refer to types in the
|
||
parent, but the opposite is not sensible (since if you refer to a child
|
||
type in the parent, the actual type you cited would vary depending on
|
||
what child was attached). This parent/child definition is recorded in
|
||
the child, but only as a recommendation: users of the API have to attach
|
||
parents to children explicitly, and can choose to attach a child to any
|
||
parent they like, or to none, though doing so might lead to unpleasant
|
||
consequences like dangling references to types. *Note Type indexes and
|
||
type IDs::. Type lookups in child dicts that are not associated with a
|
||
parent at all will fail with ‘ECTF_NOPARENT’ if a parent type was
|
||
needed.
|
||
|
||
The associated API to generate, merge together, and query this file
|
||
format will be described in the accompanying ‘libctf’ manual once it is
|
||
written. There is no API to modify dictionaries once they’ve been
|
||
written out: CTF is a write-once file format. (However, it is always
|
||
possible to dynamically create a new child dictionary on the fly and
|
||
attach it to a pre-existing, read-only parent.)
|
||
|
||
There are two major pieces to CTF: the “archive” and the
|
||
“dictionary”. Some relatives and ancestors of CTF call dictionaries
|
||
“containers”: the archive format is unique to this variant of CTF. (Much
|
||
of the source code still uses the old term.)
|
||
|
||
The archive file format is a very simple mmappable archive used to
|
||
group multiple dictionaries together into groups: it is expected to
|
||
slowly go away and be replaced by other mechanisms, but right now it is
|
||
an important part of the file format, used to group dictionaries
|
||
containing types with conflicting definitions in different TUs with the
|
||
overarching dictionary used to store all other types. (Even when
|
||
archives go away, the ‘libctf’ API used to access them will remain, and
|
||
access the other mechanisms that replace it instead.)
|
||
|
||
The CTF dictionary consists of a “preamble”, which does not vary
|
||
between versions of the CTF file format, and a “header” and some number
|
||
of “sections”, which can vary between versions.
|
||
|
||
The rest of this specification describes the format of these
|
||
sections, first for the latest version of CTF, then for all earlier
|
||
versions supported by ‘libctf’: the earlier versions are defined in
|
||
terms of their differences from the next later one. We describe each
|
||
part of the format first by reproducing the C structure which defines
|
||
that part, then describing it at greater length in terms of file
|
||
offsets.
|
||
|
||
The description of the file format ends with a description of
|
||
relevant limits that apply to it. These limits can vary between file
|
||
format versions.
|
||
|
||
This document is quite young, so for now the C code in ‘ctf.h’ should
|
||
be presumed correct when this document conflicts with it.
|
||
|
||
|
||
File: ctf-spec.info, Node: CTF archive, Next: CTF dictionaries, Prev: Overview, Up: Top
|
||
|
||
1 CTF archives
|
||
**************
|
||
|
||
The CTF archive format maps names to CTF dictionaries. The names may
|
||
contain any character other than \0, but for now archives containing
|
||
slashes in the names may not extract correctly. It is possible to
|
||
insert multiple members with the same name, but these are quite hard to
|
||
access reliably (you have to iterate through all the members rather than
|
||
opening by name) so this is not recommended.
|
||
|
||
CTF archives are not themselves compressed: the constituent
|
||
components, CTF dictionaries, can be compressed. (*Note CTF header::).
|
||
|
||
CTF archives usually contain a collection of related dictionaries,
|
||
one parent and many children of that parent. CTF archives can have a
|
||
member with a “default name”, ‘.ctf’ (which can be represented as ‘NULL’
|
||
in the API). If present, this member is usually the parent of all the
|
||
children, but it is possible for CTF producers to emit parents with
|
||
different names if they wish (usually for backward- compatibility
|
||
purposes).
|
||
|
||
‘.ctf’ sections in ELF objects consist of a single CTF dictionary
|
||
rather than an archive of dictionaries if and only if the section
|
||
contains no types with identical names but conflicting definitions: if
|
||
two conflicting definitions exist, the deduplicator will place the type
|
||
most commonly referred to by other types in the parent and will place
|
||
the other type in a child named after the translation unit it is found
|
||
in, and will emit a CTF archive containing both dictionaries instead of
|
||
a raw dictionary. All types that refer to such conflicting types are
|
||
also placed in the per-translation-unit child.
|
||
|
||
The definition of an archive in ‘ctf.h’ is as follows:
|
||
|
||
struct ctf_archive
|
||
{
|
||
uint64_t ctfa_magic;
|
||
uint64_t ctfa_model;
|
||
uint64_t ctfa_nfiles;
|
||
uint64_t ctfa_names;
|
||
uint64_t ctfa_ctfs;
|
||
};
|
||
|
||
typedef struct ctf_archive_modent
|
||
{
|
||
uint64_t name_offset;
|
||
uint64_t ctf_offset;
|
||
} ctf_archive_modent_t;
|
||
|
||
(Note one irregularity here: the ‘ctf_archive_t’ is not a typedef to
|
||
‘struct ctf_archive’, but a different typedef, private to ‘libctf’, so
|
||
that things that are not really archives can be made to appear as if
|
||
they were.)
|
||
|
||
All the above items are always in little-endian byte order,
|
||
regardless of the machine endianness.
|
||
|
||
The archive header has the following fields:
|
||
|
||
Offset Name Description
|
||
------------------------------------------------------------------------------------------
|
||
0x00 ‘uint64_t ctfa_magic’ The magic number for archives, ‘CTFA_MAGIC’:
|
||
0x8b47f2a4d7623eeb.
|
||
|
||
0x08 ‘uint64_t ctfa_model’ The data model for this archive: an arbitrary integer
|
||
that serves no purpose but to be handed back by the
|
||
libctf API. *Note Data models::.
|
||
|
||
0x10 ‘uint64_t ctfa_nfiles’ The number of CTF dictionaries in this archive.
|
||
|
||
0x18 ‘uint64_t ctfa_names’ Offset of the name table, in bytes from the start of
|
||
the archive. The name table is an array of ‘struct
|
||
ctf_archive_modent_t[ctfa_nfiles]’.
|
||
|
||
0x20 ‘uint64_t ctfa_ctfs’ Offset of the CTF table. Each element starts with a
|
||
‘uint64_t’ size, followed by a CTF dictionary.
|
||
|
||
|
||
The array pointed to by ‘ctfa_names’ is an array of entries of
|
||
‘ctf_archive_modent’:
|
||
|
||
Offset Name Description
|
||
---------------------------------------------------------------------------------
|
||
0x00 ‘uint64_t name_offset’ Offset of this name, in bytes from the start
|
||
of the archive.
|
||
|
||
0x08 ‘uint64_t ctf_offset’ Offset of this CTF dictionary, in bytes from
|
||
the start of the archive.
|
||
|
||
|
||
The ‘ctfa_names’ array is sorted into ASCIIbetical order by name
|
||
(i.e. by the result of dereferencing the ‘name_offset’).
|
||
|
||
The archive file also contains a name table and a table of CTF
|
||
dictionaries: these are pointed to by the structures above. The name
|
||
table is a simple strtab which is not required to be sorted; the
|
||
dictionary array is described above in the entry for ‘ctfa_ctfs’.
|
||
|
||
The relative order of these various parts is not defined, except that
|
||
the header naturally always comes first.
|
||
|
||
|
||
File: ctf-spec.info, Node: CTF dictionaries, Next: Index, Prev: CTF archive, Up: Top
|
||
|
||
2 CTF dictionaries
|
||
******************
|
||
|
||
CTF dictionaries consist of a header, starting with a premable, and a
|
||
number of sections.
|
||
|
||
* Menu:
|
||
|
||
* CTF Preamble::
|
||
* CTF header::
|
||
* The type section::
|
||
* The symtypetab sections::
|
||
* The variable section::
|
||
* The label section::
|
||
* The string section::
|
||
* Data models::
|
||
* Limits of CTF::
|
||
|
||
|
||
File: ctf-spec.info, Node: CTF Preamble, Next: CTF header, Up: CTF dictionaries
|
||
|
||
2.1 CTF Preamble
|
||
================
|
||
|
||
The preamble is the only part of the CTF dictionary whose format cannot
|
||
vary between versions. It is never compressed. It is correspondingly
|
||
simple:
|
||
|
||
typedef struct ctf_preamble
|
||
{
|
||
unsigned short ctp_magic;
|
||
unsigned char ctp_version;
|
||
unsigned char ctp_flags;
|
||
} ctf_preamble_t;
|
||
|
||
‘#define’s are provided under the names ‘cth_magic’, ‘cth_version’
|
||
and ‘cth_flags’ to make the fields of the ‘ctf_preamble_t’ appear to be
|
||
part of the ‘ctf_header_t’, so consuming programs rarely need to
|
||
consider the existence of the preamble as a separate structure.
|
||
|
||
Offset Name Description
|
||
-------------------------------------------------------------------------------
|
||
0x00 ‘unsigned short ctp_magic’ The magic number for CTF
|
||
dictionaries, ‘CTF_MAGIC’: 0xdff2.
|
||
|
||
0x02 ‘unsigned char ctp_version’ The version number of this CTF
|
||
dictionary.
|
||
|
||
0x03 ‘ctp_flags’ Flags for this CTF file.
|
||
*Note CTF file-wide flags::.
|
||
|
||
Every element of a dictionary must be naturally aligned unless
|
||
otherwise specified. (This restriction will be lifted in later
|
||
versions.)
|
||
|
||
CTF dictionaries are stored in the native endianness of the system
|
||
that generates them: the consumer (e.g., ‘libctf’) can detect whether to
|
||
endian-flip a CTF dictionary by inspecting the ‘ctp_magic’. (If it
|
||
appears as 0xf2df, endian-flipping is needed.)
|
||
|
||
The version of the CTF dictionary can be determined by inspecting
|
||
‘ctp_version’. The following versions are currently valid, and ‘libctf’
|
||
can read all of them:
|
||
|
||
Version Number Description
|
||
-------------------------------------------------------------------------------------------
|
||
‘CTF_VERSION_1’ 1 First version, rare. Very similar to Solaris CTF.
|
||
|
||
‘CTF_VERSION_1_UPGRADED_3’ 2 First version, upgraded to v3 or higher and
|
||
written out again. Name may change. Very rare.
|
||
|
||
‘CTF_VERSION_2’ 3 Second version, with many range limits lifted.
|
||
|
||
‘CTF_VERSION_3’ 4 Third and current version, documented here.
|
||
|
||
This section documents ‘CTF_VERSION_3’.
|
||
|
||
* Menu:
|
||
|
||
* CTF file-wide flags::
|
||
|
||
|
||
File: ctf-spec.info, Node: CTF file-wide flags, Up: CTF Preamble
|
||
|
||
2.1.1 CTF file-wide flags
|
||
-------------------------
|
||
|
||
The preamble contains bitflags in its ‘ctp_flags’ field that describe
|
||
various file-wide properties. Some of the flags are valid only for
|
||
particular file-format versions, which means the flags can be used to
|
||
fix file-format bugs. Consumers that see unknown flags should
|
||
accordingly assume that the dictionary is not comprehensible, and refuse
|
||
to open them.
|
||
|
||
The following flags are currently defined. Many are bug workarounds,
|
||
valid only in CTFv3, and will not be valid in any future versions: the
|
||
same values may be reused for other flags in v4+.
|
||
|
||
Flag Versions Value Meaning
|
||
---------------------------------------------------------------------------------------
|
||
‘CTF_F_COMPRESS’ All 0x1 Compressed with zlib
|
||
‘CTF_F_NEWFUNCINFO’ 3 only 0x2 “New-format” func info section.
|
||
‘CTF_F_IDXSORTED’ 3+ 0x4 The index section is in sorted order
|
||
‘CTF_F_DYNSTR’ 3 only 0x8 The external strtab is in ‘.dynstr’ and the
|
||
symtab used is ‘.dynsym’.
|
||
*Note The string section::
|
||
|
||
‘CTF_F_NEWFUNCINFO’ and ‘CTF_F_IDXSORTED’ relate to the function info
|
||
and data object sections. *Note The symtypetab sections::.
|
||
|
||
Further flags (and further compression methods) wil be added in
|
||
future.
|
||
|
||
|
||
File: ctf-spec.info, Node: CTF header, Next: The type section, Prev: CTF Preamble, Up: CTF dictionaries
|
||
|
||
2.2 CTF header
|
||
==============
|
||
|
||
The CTF header is the first part of a CTF dictionary, including the
|
||
preamble. All parts of it other than the preamble (*note CTF
|
||
Preamble::) can vary between CTF file versions and are never compressed.
|
||
It contains things that apply to the dictionary as a whole, and a table
|
||
of the sections into which the rest of the dictionary is divided. The
|
||
sections tile the file: each section runs from the offset given until
|
||
the start of the next section. Only the last section cannot follow this
|
||
rule, so the header has a length for it instead.
|
||
|
||
All section offsets, here and in the rest of the CTF file, are
|
||
relative to the _end_ of the header. (This is annoyingly different to
|
||
how offsets in CTF archives are handled.)
|
||
|
||
This is the first structure to include offsets into the string table,
|
||
which are not straight references because CTF dictionaries can include
|
||
references into the ELF string table to save space, as well as into the
|
||
string table internal to the CTF dictionary. *Note The string section::
|
||
for more on these. Offset 0 is always the null string.
|
||
|
||
typedef struct ctf_header
|
||
{
|
||
ctf_preamble_t cth_preamble;
|
||
uint32_t cth_parlabel;
|
||
uint32_t cth_parname;
|
||
uint32_t cth_cuname;
|
||
uint32_t cth_lbloff;
|
||
uint32_t cth_objtoff;
|
||
uint32_t cth_funcoff;
|
||
uint32_t cth_objtidxoff;
|
||
uint32_t cth_funcidxoff;
|
||
uint32_t cth_varoff;
|
||
uint32_t cth_typeoff;
|
||
uint32_t cth_stroff;
|
||
uint32_t cth_strlen;
|
||
} ctf_header_t;
|
||
|
||
In detail:
|
||
|
||
Offset Name Description
|
||
-----------------------------------------------------------------------------------------------
|
||
0x00 ‘ctf_preamble_t cth_preamble’ The preamble (conceptually embedded in the header).
|
||
*Note CTF Preamble::
|
||
|
||
0x04 ‘uint32_t cth_parlabel’ The parent label, if deduplication happened against
|
||
a specific label: a strtab offset.
|
||
*Note The label section::. Currently unused and
|
||
always 0, but may be used in future when semantics
|
||
are attached to the label section.
|
||
|
||
0x08 ‘uint32_t cth_parname’ The name of the parent dictionary deduplicated
|
||
against: a strtab offset. Interpretation is up to
|
||
the consumer (usually a CTF archive member name).
|
||
0 (the null string) if this is not a child
|
||
dictionary.
|
||
|
||
0x1c ‘uint32_t cth_cuname’ The name of the compilation unit, for consumers
|
||
like GDB that want to know the name of CUs
|
||
associated with single CUs: a strtab offset. 0 if
|
||
this dictionary describes types from many CUs.
|
||
|
||
0x10 ‘uint32_t cth_lbloff’ The offset of the label section, which tiles the
|
||
type space into named regions.
|
||
*Note The label section::.
|
||
|
||
0x14 ‘uint32_t cth_objtoff’ The offset of the data object symtypetab section,
|
||
which maps ELF data symbols to types.
|
||
*Note The symtypetab sections::.
|
||
|
||
0x18 ‘uint32_t cth_funcoff’ The offset of the function info symtypetab section,
|
||
which maps ELF function symbols to a return type
|
||
and arg types. *Note The symtypetab sections::.
|
||
|
||
0x1c ‘uint32_t cth_objtidxoff’ The offset of the object index section, which maps
|
||
ELF object symbols to entries in the data object
|
||
section. *Note The symtypetab sections::.
|
||
|
||
0x20 ‘uint32_t cth_funcidxoff’ The offset of the function info index section,
|
||
which maps ELF function symbols to entries in the
|
||
function info section.
|
||
*Note The symtypetab sections::.
|
||
|
||
0x24 ‘uint32_t cth_varoff’ The offset of the variable section, which maps
|
||
string names to types.
|
||
*Note The variable section::.
|
||
|
||
0x28 ‘uint32_t cth_typeoff’ The offset of the type section, the core of CTF,
|
||
which describes types using variable-length array
|
||
elements. *Note The type section::.
|
||
|
||
0x2c ‘uint32_t cth_stroff’ The offset of the string section.
|
||
*Note The string section::.
|
||
|
||
0x30 ‘uint32_t cth_strlen’ The length of the string section (not an offset!).
|
||
The CTF file ends at this point.
|
||
|
||
|
||
Everything from this point on (until the end of the file at
|
||
‘cth_stroff’ + ‘cth_strlen’) is compressed with zlib if ‘CTF_F_COMPRESS’
|
||
is set in the preamble’s ‘ctp_flags’.
|
||
|
||
|
||
File: ctf-spec.info, Node: The type section, Next: The symtypetab sections, Prev: CTF header, Up: CTF dictionaries
|
||
|
||
2.3 The type section
|
||
====================
|
||
|
||
This section is the most important section in CTF, describing all the
|
||
top-level types in the program. It consists of an array of type
|
||
structures, each of which describes a type of some “kind”: each kind of
|
||
type has some amount of variable-length data associated with it (some
|
||
kinds have none). The amount of variable-length data associated with a
|
||
given type can be determined by inspecting the type, so the reading code
|
||
can walk through the types in sequence at opening time.
|
||
|
||
Each type structure is one of a set of overlapping structures in a
|
||
discriminated union of sorts: the variable-length data for each type
|
||
immediately follows the type’s type structure. Here’s the largest of
|
||
the overlapping structures, which is only needed for huge types and so
|
||
is very rarely seen:
|
||
|
||
typedef struct ctf_type
|
||
{
|
||
uint32_t ctt_name;
|
||
uint32_t ctt_info;
|
||
__extension__
|
||
union
|
||
{
|
||
uint32_t ctt_size;
|
||
uint32_t ctt_type;
|
||
};
|
||
uint32_t ctt_lsizehi;
|
||
uint32_t ctt_lsizelo;
|
||
} ctf_type_t;
|
||
|
||
Here’s the much more common smaller form:
|
||
|
||
typedef struct ctf_stype
|
||
{
|
||
uint32_t ctt_name;
|
||
uint32_t ctt_info;
|
||
__extension__
|
||
union
|
||
{
|
||
uint32_t ctt_size;
|
||
uint32_t ctt_type;
|
||
};
|
||
} ctf_type_t;
|
||
|
||
If ‘ctt_size’ is the #define ‘CTF_LSIZE_SENT’, 0xffffffff, this type
|
||
is described by a ‘ctf_type_t’: otherwise, a ‘ctf_stype_t’.
|
||
|
||
Here’s what the fields mean:
|
||
|
||
Offset Name Description
|
||
-----------------------------------------------------------------------------------------------------
|
||
0x00 ‘uint32_t ctt_name’ Strtab offset of the type name, if any (0 if none).
|
||
|
||
0x04 ‘uint32_t ctt_info’ The “info word”, containing information on the kind
|
||
of this type, its variable-length data and whether
|
||
it is visible to name lookup. See
|
||
*Note The info word::.
|
||
|
||
0x08 ‘uint32_t ctt_size’ The size of this type, if this type is of a kind for
|
||
which a size needs to be recorded (constant-size
|
||
types don’t need one). If this is ‘CTF_LSIZE_SENT’,
|
||
this type is a huge type described by ‘ctf_type_t’.
|
||
|
||
0x08 ‘uint32_t ctt_type’ The type this type refers to, if this type is of a
|
||
kind which refers to other types (like a pointer).
|
||
All such types are fixed-size, and no types that are
|
||
variable-size refer to other types, so ‘ctt_size’
|
||
and ‘ctt_type’ overlap. All type kinds that use
|
||
‘ctt_type’ are described by ‘ctf_stype_t’, not
|
||
‘ctf_type_t’. *Note Type indexes and type IDs::.
|
||
|
||
0x0c (‘ctf_type_t’ ‘uint32_t ctt_lsizehi’ The high 32 bits of the size of a very large type.
|
||
only) The ‘CTF_TYPE_LSIZE’ macro can be used to get a
|
||
64-bit size out of this field and the next one.
|
||
‘CTF_SIZE_TO_LSIZE_HI’ splits the ‘ctt_lsizehi’ out
|
||
of it again.
|
||
|
||
0x10 (‘ctf_type_t’ ‘uint32_t ctt_lsizelo’ The low 32 bits of the size of a very large type.
|
||
only) ‘CTF_SIZE_TO_LSIZE_LO’ splits the ‘ctt_lsizelo’ out
|
||
of a 64-bit size.
|
||
|
||
Two aspects of this need further explanation: the info word, and what
|
||
exactly a type ID is and how you determine it. (Information on the
|
||
various type-kind- dependent things, like whether ‘ctt_size’ or
|
||
‘ctt_type’ is used, is described in the section devoted to each kind.)
|
||
|
||
* Menu:
|
||
|
||
* The info word::
|
||
* Type indexes and type IDs::
|
||
* Type kinds::
|
||
* Integer types::
|
||
* Floating-point types::
|
||
* Slices::
|
||
* Pointers typedefs and cvr-quals::
|
||
* Arrays::
|
||
* Function pointers::
|
||
* Enums::
|
||
* Structs and unions::
|
||
* Forward declarations::
|
||
|
||
|
||
File: ctf-spec.info, Node: The info word, Next: Type indexes and type IDs, Up: The type section
|
||
|
||
2.3.1 The info word, ctt_info
|
||
-----------------------------
|
||
|
||
The info word is a bitfield split into three parts. From MSB to LSB:
|
||
|
||
Bit offset Name Description
|
||
------------------------------------------------------------------------------------------
|
||
26–31 ‘kind’ Type kind: *note Type kinds::.
|
||
|
||
25 ‘isroot’ 1 if this type is visible to name lookup
|
||
|
||
0–24 ‘vlen’ Length of variable-length data for this type (some kinds only).
|
||
The variable-length data directly follows the ‘ctf_type_t’ or
|
||
‘ctf_stype_t’. This is a kind-dependent array length value,
|
||
not a length in bytes. Some kinds have no variable-length
|
||
data, or fixed-size variable-length data, and do not use this
|
||
value.
|
||
|
||
The most mysterious of these is undoubtedly ‘isroot’. This indicates
|
||
whether types with names (nonzero ‘ctt_name’) are visible to name
|
||
lookup: if zero, this type is considered a “non-root type” and you can’t
|
||
look it up by name at all. Multiple types with the same name in the
|
||
same C namespace (struct, union, enum, other) can exist in a single
|
||
dictionary, but only one of them may have a nonzero value for ‘isroot’.
|
||
‘libctf’ validates this at open time and refuses to open dictionaries
|
||
that violate this constraint.
|
||
|
||
Historically, this feature was introduced for the encoding of
|
||
bitfields (*note Integer types::): for instance, int bitfields will all
|
||
be named ‘int’ with different widths or offsets, but only the full-width
|
||
one at offset zero is wanted when you look up the type named ‘int’.
|
||
With the introduction of slices (*note Slices::) as a more general
|
||
bitfield encoding mechanism, this is less important, but we still use
|
||
non-root types to handle conflicts if the linker API is used to fuse
|
||
multiple translation units into one dictionary and those translation
|
||
units contain types with the same name and conflicting definitions. (We
|
||
do not discuss this further here, because the linker never does this:
|
||
only specialized type mergers do, like that used for the Linux kernel.
|
||
The libctf documentation will describe this in more detail.)
|
||
|
||
The ‘CTF_TYPE_INFO’ macro can be used to compose an info word from a
|
||
‘kind’, ‘isroot’, and ‘vlen’; ‘CTF_V2_INFO_KIND’, ‘CTF_V2_INFO_ISROOT’
|
||
and ‘CTF_V2_INFO_VLEN’ pick it apart again.
|
||
|
||
|
||
File: ctf-spec.info, Node: Type indexes and type IDs, Next: Type kinds, Prev: The info word, Up: The type section
|
||
|
||
2.3.2 Type indexes and type IDs
|
||
-------------------------------
|
||
|
||
Types are referred to within the CTF file via “type IDs”. A type ID is
|
||
a number from 0 to 2^32, from a space divided in half. Types 2^31-1 and
|
||
below are in the “parent range”: these IDs are used for dictionaries
|
||
that have not had any other dictionary ‘ctf_import’ed into it as a
|
||
parent. Both completely standalone dictionaries and parent dictionaries
|
||
with children hanging off them have types in this range. Types 2^31 and
|
||
above are in the “child range”: only types in child dictionaries are in
|
||
this range.
|
||
|
||
These IDs appear in ‘ctf_type_t.ctt_type’ (*note The type section::),
|
||
but the types themselves have no visible ID: quite intentionally,
|
||
because adding an ID uses space, and every ID is different so they don’t
|
||
compress well. The IDs are implicit: at open time, the consumer walks
|
||
through the entire type section and counts the types in the type
|
||
section. The type section is an array of variable-length elements, so
|
||
each entry could be considered as having an index, starting from 1. We
|
||
count these indexes and associate each with its corresponding
|
||
‘ctf_type_t’ or ‘ctf_stype_t’.
|
||
|
||
Lookups of types with IDs in the parent space look in the parent
|
||
dictionary if this dictionary has one associated with it; lookups of
|
||
types with IDs in the child space error out if the dictionary does not
|
||
have a parent, and otherwise convert the ID into an index by shaving off
|
||
the top bit and look up the index in the child.
|
||
|
||
These properties mean that the same dictionary can be used as a
|
||
parent of child dictionaries and can also be used directly with no
|
||
children at all, but a dictionary created as a child dictionary must
|
||
always be associated with a parent — usually, the same parent — because
|
||
its references to its own types have the high bit turned on and this is
|
||
only flipped off again if this is a child dictionary. (This is not a
|
||
problem, because if you _don’t_ associate the child with a parent, any
|
||
references within it to its parent types will fail, and there are almost
|
||
certain to be many such references, or why is it a child at all?)
|
||
|
||
This does mean that consumers should keep a close eye on the
|
||
distinction between type IDs and type indexes: if you mix them up,
|
||
everything will appear to work as long as you’re only using parent
|
||
dictionaries or standalone dictionaries, but as soon as you start using
|
||
children, everything will fail horribly.
|
||
|
||
Type index zero, and type ID zero, are used to indicate that this
|
||
type cannot be represented in CTF as currently constituted: they are
|
||
emitted by the compiler, but all type chains that terminate in the
|
||
unknown type are erased at link time (structure fields that use them
|
||
just vanish, etc). So you will probably never see a use of type zero
|
||
outside the symtypetab sections, where they serve as sentinels of sorts,
|
||
to indicate symbols with no associated type.
|
||
|
||
The macros ‘CTF_V2_TYPE_TO_INDEX’ and ‘CTF_V2_INDEX_TO_TYPE’ may help
|
||
in translation between types and indexes: ‘CTF_V2_TYPE_ISPARENT’ and
|
||
‘CTF_V2_TYPE_ISCHILD’ can be used to tell whether a given ID is in the
|
||
parent or child range.
|
||
|
||
It is quite possible and indeed common for type IDs to point forward
|
||
in the dictionary, as well as backward.
|
||
|
||
|
||
File: ctf-spec.info, Node: Type kinds, Next: Integer types, Prev: Type indexes and type IDs, Up: The type section
|
||
|
||
2.3.3 Type kinds
|
||
----------------
|
||
|
||
Every type in CTF is of some “kind”. Each kind is some variety of C
|
||
type: all structures are a single kind, as are all unions, all pointers,
|
||
all arrays, all integers regardless of their bitfield width, etc. The
|
||
kind of a type is given in the ‘kind’ field of the ‘ctt_info’ word
|
||
(*note The info word::).
|
||
|
||
The space of type kinds is only a quarter full so far, so there is
|
||
plenty of room for expansion. It is likely that in future versions of
|
||
the file format, types with smaller kinds will be more efficiently
|
||
encoded than types with larger kinds, so their numerical value will
|
||
actually start to matter in future. (So these IDs will probably change
|
||
their numerical values in a later release of this format, to move more
|
||
frequently-used kinds like structures and cv-quals towards the top of
|
||
the space, and move rarely-used kinds like integers downwards. Yes,
|
||
integers are rare: how many kinds of ‘int’ are there in a program?
|
||
They’re just very frequently _referenced_.)
|
||
|
||
Here’s the set of kinds so far. Each kind has a ‘#define’ associated
|
||
with it, also given here.
|
||
|
||
Kind Macro Purpose
|
||
----------------------------------------------------------------------------------------
|
||
0 ‘CTF_K_UNKNOWN’ Indicates a type that cannot be represented in CTF, or that
|
||
is being skipped. It is very similar to type ID 0, except
|
||
that you can have _multiple_, distinct types of kind
|
||
‘CTF_K_UNKNOWN’.
|
||
|
||
1 ‘CTF_K_INTEGER’ An integer type. *Note Integer types::.
|
||
|
||
2 ‘CTF_K_FLOAT’ A floating-point type. *Note Floating-point types::.
|
||
|
||
3 ‘CTF_K_POINTER’ A pointer. *Note Pointers typedefs and cvr-quals::.
|
||
|
||
4 ‘CTF_K_ARRAY’ An array. *Note Arrays::.
|
||
|
||
5 ‘CTF_K_FUNCTION’ A function pointer. *Note Function pointers::.
|
||
|
||
6 ‘CTF_K_STRUCT’ A structure. *Note Structs and unions::.
|
||
|
||
7 ‘CTF_K_UNION’ A union. *Note Structs and unions::.
|
||
|
||
8 ‘CTF_K_ENUM’ An enumerated type. *Note Enums::.
|
||
|
||
9 ‘CTF_K_FORWARD’ A forward. *Note Forward declarations::.
|
||
|
||
10 ‘CTF_K_TYPEDEF’ A typedef. *Note Pointers typedefs and cvr-quals::.
|
||
|
||
11 ‘CTF_K_VOLATILE’ A volatile-qualified type.
|
||
*Note Pointers typedefs and cvr-quals::.
|
||
|
||
12 ‘CTF_K_CONST’ A const-qualified type.
|
||
*Note Pointers typedefs and cvr-quals::.
|
||
|
||
13 ‘CTF_K_RESTRICT’ A restrict-qualified type.
|
||
*Note Pointers typedefs and cvr-quals::.
|
||
|
||
14 ‘CTF_K_SLICE’ A slice, a change of the bit-width or offset of some other
|
||
type. *Note Slices::.
|
||
|
||
Now we cover all type kinds in turn. Some are more complicated than
|
||
others.
|
||
|
||
|
||
File: ctf-spec.info, Node: Integer types, Next: Floating-point types, Prev: Type kinds, Up: The type section
|
||
|
||
2.3.4 Integer types
|
||
-------------------
|
||
|
||
Integral types are all represented as types of kind ‘CTF_K_INTEGER’.
|
||
These types fill out ‘ctt_size’ in the ‘ctf_stype_t’ with the size in
|
||
bytes of the integral type in question. They are always represented by
|
||
‘ctf_stype_t’, never ‘ctf_type_t’. Their variable-length data is one
|
||
‘uint32_t’ in length: ‘vlen’ in the info word should be disregarded and
|
||
is always zero.
|
||
|
||
The variable-length data for integers has multiple items packed into
|
||
it much like the info word does.
|
||
|
||
Bit offset Name Description
|
||
-----------------------------------------------------------------------------------
|
||
24–31 Encoding The desired display representation of this integer. You
|
||
can extract this field with the ‘CTF_INT_ENCODING’
|
||
macro. See below.
|
||
|
||
16–23 Offset The offset of this integral type in bits from the start
|
||
of its enclosing structure field, adjusted for
|
||
endianness: *note Structs and unions::. You can extract
|
||
this field with the ‘CTF_INT_OFFSET’ macro.
|
||
|
||
0–15 Bit-width The width of this integral type in bits. You can
|
||
extract this field with the ‘CTF_INT_BITS’ macro.
|
||
|
||
If you choose, bitfields can be represented using the things above as
|
||
a sort of integral type with the ‘isroot’ bit flipped off and the offset
|
||
and bits values set in the vlen word: you can populate it with the
|
||
‘CTF_INT_DATA’ macro. (But it may be more convenient to represent them
|
||
using slices of a full-width integer: *note Slices::.)
|
||
|
||
Integers that are bitfields usually have a ‘ctt_size’ rounded up to
|
||
the nearest power of two in bytes, for natural alignment (e.g. a 17-bit
|
||
integer would have a ‘ctt_size’ of 4). However, not all types are
|
||
naturally aligned on all architectures: packed structures may in theory
|
||
use integral bitfields with different ‘ctt_size’, though this is rarely
|
||
observed.
|
||
|
||
The “encoding” for integers is a bit-field comprised of the values
|
||
below, which consumers can use to decide how to display values of this
|
||
type:
|
||
|
||
Offset Name Description
|
||
--------------------------------------------------------------------------------------------------------
|
||
0x01 ‘CTF_INT_SIGNED’ If set, this is a signed int: if false, unsigned.
|
||
|
||
0x02 ‘CTF_INT_CHAR’ If set, this is a char type. It is platform-dependent whether unadorned
|
||
‘char’ is signed or not: the ‘CTF_CHAR’ macro produces an integral type
|
||
suitable for the definition of ‘char’ on this platform.
|
||
|
||
0x04 ‘CTF_INT_BOOL’ If set, this is a boolean type. (It is theoretically possible to turn
|
||
this and ‘CTF_INT_CHAR’ on at the same time, but it is not clear what
|
||
this would mean.)
|
||
|
||
0x08 ‘CTF_INT_VARARGS’ If set, this is a varargs-promoted value in a K&R function definition.
|
||
This is not currently produced or consumed by anything that we know of:
|
||
it is set aside for future use.
|
||
|
||
The GCC “‘Complex int’” and fixed-point extensions are not yet
|
||
supported: references to such types will be emitted as type 0.
|
||
|
||
|
||
File: ctf-spec.info, Node: Floating-point types, Next: Slices, Prev: Integer types, Up: The type section
|
||
|
||
2.3.5 Floating-point types
|
||
--------------------------
|
||
|
||
Floating-point types are all represented as types of kind ‘CTF_K_FLOAT’.
|
||
Like integers, These types fill out ‘ctt_size’ in the ‘ctf_stype_t’ with
|
||
the size in bytes of the floating-point type in question. They are
|
||
always represented by ‘ctf_stype_t’, never ‘ctf_type_t’.
|
||
|
||
This part of CTF shows many rough edges in the more obscure corners
|
||
of floating-point handling, and is likely to change in format v4.
|
||
|
||
The variable-length data for floats has multiple items packed into it
|
||
just like integers do:
|
||
|
||
Bit offset Name Description
|
||
-------------------------------------------------------------------------------------------
|
||
24–31 Encoding The desired display representation of this float. You can
|
||
extract this field with the ‘CTF_FP_ENCODING’ macro. See below.
|
||
|
||
16–23 Offset The offset of this floating-point type in bits from the start of
|
||
its enclosing structure field, adjusted for endianness:
|
||
*note Structs and unions::. You can extract this field with the
|
||
‘CTF_FP_OFFSET’ macro.
|
||
|
||
0–15 Bit-width The width of this floating-point type in bits. You can extract
|
||
this field with the ‘CTF_FP_BITS’ macro.
|
||
|
||
The purpose of the floating-point offset and bit-width is somewhat
|
||
opaque, since there are no such things as floating-point bitfields in C:
|
||
the bit-width should be filled out with the full width of the type in
|
||
bits, and the offset should always be zero. It is likely that these
|
||
fields will go away in the future. As with integers, you can use
|
||
‘CTF_FP_DATA’ to assemble one of these vlen items from its component
|
||
parts.
|
||
|
||
The “encoding” for floats is not a bitfield but a simple value
|
||
indicating the display representation. Many of these are unused, relate
|
||
to Solaris-specific compiler extensions, and will be recycled in future:
|
||
some are unused and will become used in future.
|
||
|
||
Offset Name Description
|
||
----------------------------------------------------------------------------------------------
|
||
1 ‘CTF_FP_SINGLE’ This is a single-precision IEEE 754 ‘float’.
|
||
2 ‘CTF_FP_DOUBLE’ This is a double-precision IEEE 754 ‘double’.
|
||
3 ‘CTF_FP_CPLX’ This is a ‘Complex float’.
|
||
4 ‘CTF_FP_DCPLX’ This is a ‘Complex double’.
|
||
5 ‘CTF_FP_LDCPLX’ This is a ‘Complex long double’.
|
||
6 ‘CTF_FP_LDOUBLE’ This is a ‘long double’.
|
||
7 ‘CTF_FP_INTRVL’ This is a ‘float’ interval type, a Solaris-specific extension.
|
||
Unused: will be recycled.
|
||
8 ‘CTF_FP_DINTRVL’ This is a ‘double’ interval type, a Solaris-specific
|
||
extension. Unused: will be recycled.
|
||
9 ‘CTF_FP_LDINTRVL’ This is a ‘long double’ interval type, a Solaris-specific
|
||
extension. Unused: will be recycled.
|
||
10 ‘CTF_FP_IMAGRY’ This is a the imaginary part of a ‘Complex float’. Not
|
||
currently generated. May change.
|
||
11 ‘CTF_FP_DIMAGRY’ This is a the imaginary part of a ‘Complex double’. Not
|
||
currently generated. May change.
|
||
12 ‘CTF_FP_LDIMAGRY’ This is a the imaginary part of a ‘Complex long double’. Not
|
||
currently generated. May change.
|
||
|
||
The use of the complex floating-point encodings is obscure: it is
|
||
possible that ‘CTF_FP_CPLX’ is meant to be used for only the real part
|
||
of complex types, and ‘CTF_FP_IMAGRY’ et al for the imaginary part – but
|
||
for now, we are emitting ‘CTF_FP_CPLX’ to cover the entire type, with no
|
||
way to get at its constituent parts. There appear to be no uses of
|
||
these encodings anywhere, so they are quite likely to change
|
||
incompatibly in future.
|
||
|
||
|
||
File: ctf-spec.info, Node: Slices, Next: Pointers typedefs and cvr-quals, Prev: Floating-point types, Up: The type section
|
||
|
||
2.3.6 Slices
|
||
------------
|
||
|
||
Slices, with kind ‘CTF_K_SLICE’, are an unusual CTF construct: they do
|
||
not directly correspond to any C type, but are a way to model other
|
||
types in a more convenient fashion for CTF generators.
|
||
|
||
A slice is like a pointer or other reference type in that they are
|
||
always represented by ‘ctf_stype_t’: but unlike pointers and other
|
||
reference types, they populate the ‘ctt_size’ field just like integral
|
||
types do, and come with an attached encoding and transform the encoding
|
||
of the underlying type. The underlying type is described in the
|
||
variable-length data, similarly to structure and union fields: see
|
||
below. Requests for the type size should also chase down to the
|
||
referenced type.
|
||
|
||
Slices are always nameless: ‘ctt_name’ is always zero for them.
|
||
|
||
(The ‘libctf’ API behaviour is unusual as well, and justifies the
|
||
existence of slices: ‘ctf_type_kind’ never returns ‘CTF_K_SLICE’ but
|
||
always the underlying type kind, so that consumers never need to know
|
||
about slices: they can tell if an apparent integer is actually a slice
|
||
if they need to by calling ‘ctf_type_reference’, which will uniquely
|
||
return the underlying integral type rather than erroring out with
|
||
‘ECTF_NOTREF’ if this is actually a slice. So slices act just like an
|
||
integer with an encoding, but more closely mirror DWARF and other
|
||
debugging information formats by allowing CTF file creators to represent
|
||
a bitfield as a slice of an underlying integral type.)
|
||
|
||
The vlen in the info word for a slice should be ignored and is always
|
||
zero. The variable-length data for a slice is a single ‘ctf_slice_t’:
|
||
|
||
typedef struct ctf_slice
|
||
{
|
||
uint32_t cts_type;
|
||
unsigned short cts_offset;
|
||
unsigned short cts_bits;
|
||
} ctf_slice_t;
|
||
|
||
Offset Name Description
|
||
----------------------------------------------------------------------------------------
|
||
0x0 ‘uint32_t cts_type’ The type this slice is a slice of. Must be an
|
||
integral type (or a floating-point type, but
|
||
this nonsensical option will go away in v4.)
|
||
|
||
0x4 ‘unsigned short cts_offset’ The offset of this integral type in bits from
|
||
the start of its enclosing structure field,
|
||
adjusted for endianness:
|
||
*note Structs and unions::. Identical
|
||
semantics to the ‘CTF_INT_OFFSET’ field:
|
||
*note Integer types::. This field is much too
|
||
long, because the maximum possible offset of
|
||
an integral type would easily fit in a char:
|
||
this field is bigger just for the sake of
|
||
alignment. This will change in v4.
|
||
|
||
0x6 ‘unsigned short cts_bits’ The bit-width of this integral type.
|
||
Identical semantics to the ‘CTF_INT_BITS’
|
||
field: *note Integer types::. As above, this
|
||
field is really too large and will shrink in
|
||
v4.
|
||
|
||
|
||
File: ctf-spec.info, Node: Pointers typedefs and cvr-quals, Next: Arrays, Prev: Slices, Up: The type section
|
||
|
||
2.3.7 Pointers, typedefs, and cvr-quals
|
||
---------------------------------------
|
||
|
||
Pointers, ‘typedef’s, and ‘const’, ‘volatile’ and ‘restrict’ qualifiers
|
||
are represented identically except for their type kind (though they may
|
||
be treated differently by consuming libraries like ‘libctf’, since
|
||
pointers affect assignment-compatibility in ways cvr-quals do not, and
|
||
they may have different alignment requirements, etc).
|
||
|
||
All of these are represented by ‘ctf_stype_t’, have no variable data
|
||
at all, and populate ‘ctt_type’ with the type ID of the type they point
|
||
to. These types can stack: a ‘CTF_K_RESTRICT’ can point to a
|
||
‘CTF_K_CONST’ which can point to a ‘CTF_K_POINTER’ etc.
|
||
|
||
They are all unnamed: ‘ctt_name’ is 0.
|
||
|
||
The size of ‘CTF_K_POINTER’ is derived from the data model (*note
|
||
Data models::), i.e. in practice, from the target machine ABI, and is
|
||
not explicitly represented. The size of other kinds in this set should
|
||
be determined by chasing ctf_types as necessary until a
|
||
non-typedef/const/volatile/restrict is found, and using that.
|
||
|
||
|
||
File: ctf-spec.info, Node: Arrays, Next: Function pointers, Prev: Pointers typedefs and cvr-quals, Up: The type section
|
||
|
||
2.3.8 Arrays
|
||
------------
|
||
|
||
Arrays are encoded as types of kind ‘CTF_K_ARRAY’ in a ‘ctf_stype_t’.
|
||
Both size and kind for arrays are zero. The variable-length data is a
|
||
‘ctf_array_t’: ‘vlen’ in the info word should be disregarded and is
|
||
always zero.
|
||
|
||
typedef struct ctf_array
|
||
{
|
||
uint32_t cta_contents;
|
||
uint32_t cta_index;
|
||
uint32_t cta_nelems;
|
||
} ctf_array_t;
|
||
|
||
Offset Name Description
|
||
----------------------------------------------------------------------------------------
|
||
0x0 ‘uint32_t cta_contents’ The type of the array elements: a type ID.
|
||
|
||
0x4 ‘uint32_t cta_index’ The type of the array index: a type ID of an
|
||
integral type. If this is a variable-length
|
||
array, the index type ID will be 0 (but the
|
||
actual index type of this array is probably
|
||
‘int’). Probably redundant and may be
|
||
dropped in v4.
|
||
|
||
0x8 ‘uint32_t cta_nelems’ The number of array elements. 0 for VLAs,
|
||
and also for the historical variety of VLA
|
||
which has explicit zero dimensions (which
|
||
will have a nonzero ‘cta_index’.)
|
||
|
||
The size of an array can be computed by simple multiplication of the
|
||
size of the ‘cta_contents’ type by the ‘cta_nelems’.
|
||
|
||
|
||
File: ctf-spec.info, Node: Function pointers, Next: Enums, Prev: Arrays, Up: The type section
|
||
|
||
2.3.9 Function pointers
|
||
-----------------------
|
||
|
||
Function pointers are explicitly represented in the CTF type section by
|
||
a type of kind ‘CTF_K_FUNCTION’, always encoded with a ‘ctf_stype_t’.
|
||
The ‘ctt_type’ is the function return type ID. The ‘vlen’ in the info
|
||
word is the number of arguments, each of which is a type ID, a
|
||
‘uint32_t’: if the last argument is 0, this is a varargs function and
|
||
the number of arguments is one less than indicated by the vlen.
|
||
|
||
If the number of arguments is odd, a single ‘uint32_t’ of padding is
|
||
inserted to maintain alignment.
|
||
|
||
|
||
File: ctf-spec.info, Node: Enums, Next: Structs and unions, Prev: Function pointers, Up: The type section
|
||
|
||
2.3.10 Enums
|
||
------------
|
||
|
||
Enumerated types are represented as types of kind ‘CTF_K_ENUM’ in a
|
||
‘ctf_stype_t’. The ‘ctt_size’ is always the size of an int from the
|
||
data model (enum bitfields are implemented via slices). The ‘vlen’ is a
|
||
count of enumerations, each of which is represented by a ‘ctf_enum_t’ in
|
||
the vlen:
|
||
|
||
typedef struct ctf_enum
|
||
{
|
||
uint32_t cte_name;
|
||
int32_t cte_value;
|
||
} ctf_enum_t;
|
||
|
||
Offset Name Description
|
||
------------------------------------------------------------------------
|
||
0x0 ‘uint32_t cte_name’ Strtab offset of the enumeration name.
|
||
Must not be 0.
|
||
|
||
0x4 ‘int32_t cte_value’ The enumeration value.
|
||
|
||
|
||
Enumeration values larger than 2^32 are not yet supported and are
|
||
omitted from the enumeration. (v4 will lift this restriction by
|
||
encoding the value differently.)
|
||
|
||
Forward declarations of enums are not implemented with this kind:
|
||
*note Forward declarations::.
|
||
|
||
Enumerated type names, as usual in C, go into their own namespace,
|
||
and do not conflict with non-enums, structs, or unions with the same
|
||
name.
|
||
|
||
|
||
File: ctf-spec.info, Node: Structs and unions, Next: Forward declarations, Prev: Enums, Up: The type section
|
||
|
||
2.3.11 Structs and unions
|
||
-------------------------
|
||
|
||
Structures and unions are represnted as types of kind ‘CTF_K_STRUCT’ and
|
||
‘CTF_K_UNION’: their representation is otherwise identical, and it is
|
||
perfectly allowed for “structs” to contain overlapping fields etc, so we
|
||
will treat them together for the rest of this section.
|
||
|
||
They fill out ‘ctt_size’, and use ‘ctf_type_t’ in preference to
|
||
‘ctf_stype_t’ if the structure size is greater than ‘CTF_MAX_SIZE’
|
||
(0xfffffffe).
|
||
|
||
The vlen for structures and unions is a count of structure fields,
|
||
but the type used to represent a structure field (and thus the size of
|
||
the variable-length array element representing the type) depends on the
|
||
size of the structure: truly huge structures, greater than
|
||
‘CTF_LSTRUCT_THRESH’ bytes in length, use a different type.
|
||
(‘CTF_LSTRUCT_THRESH’ is 536870912, so such structures are vanishingly
|
||
rare: in v4, this representation will change somewhat for greater
|
||
compactness. It’s inherited from v1, where the limits were much lower.)
|
||
|
||
Most structures can get away with using ‘ctf_member_t’:
|
||
|
||
typedef struct ctf_member_v2
|
||
{
|
||
uint32_t ctm_name;
|
||
uint32_t ctm_offset;
|
||
uint32_t ctm_type;
|
||
} ctf_member_t;
|
||
|
||
Huge structures that are represented by ‘ctf_type_t’ rather than
|
||
‘ctf_stype_t’ have to use ‘ctf_lmember_t’, which splits the offset as
|
||
‘ctf_type_t’ splits the size:
|
||
|
||
typedef struct ctf_lmember_v2
|
||
{
|
||
uint32_t ctlm_name;
|
||
uint32_t ctlm_offsethi;
|
||
uint32_t ctlm_type;
|
||
uint32_t ctlm_offsetlo;
|
||
} ctf_lmember_t;
|
||
|
||
Here’s what the fields of ‘ctf_member’ mean:
|
||
|
||
Offset Name Description
|
||
---------------------------------------------------------------------------------------------------------
|
||
0x00 ‘uint32_t ctm_name’ Strtab offset of the field name.
|
||
|
||
0x04 ‘uint32_t ctm_offset’ The offset of this field _in bits_. (Usually, for bitfields, this is
|
||
machine-word-aligned and the individual field has an offset in bits,
|
||
but the format allows for the offset to be encoded in bits here.)
|
||
|
||
0x08 ‘uint32_t ctm_type’ The type ID of the type of the field.
|
||
|
||
Here’s what the fields of the very similar ‘ctf_lmember’ mean:
|
||
|
||
Offset Name Description
|
||
------------------------------------------------------------------------------------------------------------
|
||
0x00 ‘uint32_t ctlm_name’ Strtab offset of the field name.
|
||
|
||
0x04 ‘uint32_t ctlm_offsethi’ The high 32 bits of the offset of this field in bits.
|
||
|
||
0x08 ‘uint32_t ctlm_type’ The type ID of the type of the field.
|
||
|
||
0x0c ‘uint32_t ctlm_offsetlo’ The low 32 bits of the offset of this field in bits.
|
||
|
||
Macros ‘CTF_LMEM_OFFSET’, ‘CTF_OFFSET_TO_LMEMHI’ and
|
||
‘CTF_OFFSET_TO_LMEMLO’ serve to extract and install the values of the
|
||
‘ctlm_offset’ fields, much as with the split size fields in
|
||
‘ctf_type_t’.
|
||
|
||
Unnamed structure and union fields are simply implemented by
|
||
collapsing the unnamed field’s members into the containing structure or
|
||
union: this does mean that a structure containing an unnamed union can
|
||
end up being a “structure” with multiple members at the same offset. (A
|
||
future format revision may collapse ‘CTF_K_STRUCT’ and ‘CTF_K_UNION’
|
||
into the same kind and decide among them based on whether their members
|
||
do in fact overlap.)
|
||
|
||
Structure and union type names, as usual in C, go into their own
|
||
namespace, just as enum type names do.
|
||
|
||
Forward declarations of structures and unions are not implemented
|
||
with this kind: *note Forward declarations::.
|
||
|
||
|
||
File: ctf-spec.info, Node: Forward declarations, Prev: Structs and unions, Up: The type section
|
||
|
||
2.3.12 Forward declarations
|
||
---------------------------
|
||
|
||
When the compiler encounters a forward declaration of a struct, union,
|
||
or enum, it emits a type of kind ‘CTF_K_FORWARD’. If it later
|
||
encounters a non- forward declaration of the same thing, it marks the
|
||
forward as non-root-visible: before link time, therefore,
|
||
non-root-visible forwards indicate that a non-forward is coming.
|
||
|
||
After link time, forwards are fused with their corresponding
|
||
non-forwards by the deduplicator where possible. They are kept if there
|
||
is no non-forward definition (maybe it’s not visible from any TU at all)
|
||
or if ‘multiple’ conflicting structures with the same name might match
|
||
it. Otherwise, all other forwards are converted to structures, unions,
|
||
or enums as appropriate, even across TUs if only one structure could
|
||
correspond to the forward (after all, all types across all TUs land in
|
||
the same dictionary unless they conflict, so promoting forwards to their
|
||
concrete type seems most helpful).
|
||
|
||
A forward has a rather strange representation: it is encoded with a
|
||
‘ctf_stype_t’ but the ‘ctt_type’ is populated not with a type (if it’s a
|
||
forward, we don’t have an underlying type yet: if we did, we’d have
|
||
promoted it and this wouldn’t be a forward any more) but with the ‘kind’
|
||
of the forward. This means that we can distinguish forwards to structs,
|
||
enums and unions reliably and ensure they land in the appropriate
|
||
namespace even before the actual struct, union or enum is found.
|
||
|
||
|
||
File: ctf-spec.info, Node: The symtypetab sections, Next: The variable section, Prev: The type section, Up: CTF dictionaries
|
||
|
||
2.4 The symtypetab sections
|
||
===========================
|
||
|
||
These are two very simple sections with identical formats, used by
|
||
consumers to map from ELF function and data symbols directly to their
|
||
types. So they are usually populated only in CTF sections that are
|
||
embedded in ELF objects.
|
||
|
||
Their format is very simple: an array of type IDs. Which symbol each
|
||
type ID corresponds to depends on whether the optional _index section_
|
||
associated with this symtypetab section has any content.
|
||
|
||
If the index section is nonempty, it is an array of ‘uint32_t’ string
|
||
table offsets, each giving the name of the symbol whose type is at the
|
||
same offset in the corresponding non-index section: users can look up
|
||
symbols in such a table by name. The index section and corresponding
|
||
symtypetab section is usually ASCIIbetically sorted (indicated by the
|
||
‘CTF_F_IDXSORTED’ flag in the header): if it’s sorted, it can be
|
||
bsearched for a symbol name rather than having to use a slower linear
|
||
search.
|
||
|
||
If the data object index section is empty, the entries in the data
|
||
object and function info sections are associated 1:1 with ELF symbols of
|
||
type ‘STT_OBJECT’ (for data object) or ‘STT_FUNC’ (for function info)
|
||
with a nonzero value: the linker shuffles the symtypetab sections to
|
||
correspond with the order of the symbols in the ELF file. Symbols with
|
||
no name, undefined symbols and symbols named “‘_START_’” and “‘_END_’”
|
||
are skipped and never appear in either section. Symbols that have no
|
||
corresponding type are represented by type ID 0. The section may have
|
||
fewer entries than the symbol table, in which case no later entries have
|
||
associated types. This format is more compact than an indexed form if
|
||
most entries have types (since there is no need to record any symbol
|
||
names), but if the producer and consumer disagree even slightly about
|
||
which symbols are omitted, the types of all further symbols will be
|
||
wrong!
|
||
|
||
The compiler always emits indexed symtypetab tables, because there is
|
||
no symbol table yet. The linker will always have to read them all in
|
||
and always works through them from start to end, so there is no benefit
|
||
having the compiler sort them either. The linker (actually, ‘libctf’’s
|
||
linking machinery) will automatically sort unsorted indexed sections,
|
||
and convert indexed sections that contain a lot of pads into the more
|
||
compact, unindexed form.
|
||
|
||
If child dicts are in use, only symbols that use types actually
|
||
mentioned in the child appear in the child’s symtypetab: symbols that
|
||
use only types in the parent appear in the parent’s symtypetab instead.
|
||
So the child’s symtypetab will almost always be very sparse, and thus
|
||
will usually use the indexed form even in fully linked objects. (It is,
|
||
of course, impossible for symbols to exist that use types from multiple
|
||
child dicts at once, since it’s impossible to declare a function in C
|
||
that uses types that are only visible in two different, disjoint
|
||
translation units.)
|
||
|
||
|
||
File: ctf-spec.info, Node: The variable section, Next: The label section, Prev: The symtypetab sections, Up: CTF dictionaries
|
||
|
||
2.5 The variable section
|
||
========================
|
||
|
||
The variable section is a simple array mapping names (strtab entries) to
|
||
type IDs, intended to provide a replacement for the data object section
|
||
in dynamic situations in which there is no static ELF strtab but the
|
||
consumer instead hands back names. The section is sorted into
|
||
ASCIIbetical order by name for rapid lookup, like the CTF archive name
|
||
table.
|
||
|
||
The section is an array of these structures:
|
||
|
||
typedef struct ctf_varent
|
||
{
|
||
uint32_t ctv_name;
|
||
uint32_t ctv_type;
|
||
} ctf_varent_t;
|
||
|
||
Offset Name Description
|
||
-----------------------------------------------------------
|
||
0x00 ‘uint32_t ctv_name’ Strtab offset of the name
|
||
|
||
0x04 ‘uint32_t ctv_type’ Type ID of this type
|
||
|
||
There is no analogue of the function info section yet: v4 will
|
||
probably drop this section in favour of a way to put both indexed (thus,
|
||
named) and nonindexed symbols into the symtypetab sections at the same
|
||
time.
|
||
|
||
|
||
File: ctf-spec.info, Node: The label section, Next: The string section, Prev: The variable section, Up: CTF dictionaries
|
||
|
||
2.6 The label section
|
||
=====================
|
||
|
||
The label section is a currently-unused facility allowing the tiling of
|
||
the type space with names taken from the strtab. The section is an
|
||
array of these structures:
|
||
|
||
typedef struct ctf_lblent
|
||
{
|
||
uint32_t ctl_label;
|
||
uint32_t ctl_type;
|
||
} ctf_lblent_t;
|
||
|
||
Offset Name Description
|
||
-------------------------------------------------------------
|
||
0x00 ‘uint32_t ctl_label’ Strtab offset of the label
|
||
|
||
0x04 ‘uint32_t ctl_type’ Type ID of the last type
|
||
covered by this label
|
||
|
||
Semantics will be attached to labels soon, probably in v4 (the plan
|
||
is to use them to allow multiple disjoint namespaces in a single CTF
|
||
file, removing many uses of CTF archives, in particular in the ‘.ctf’
|
||
section in ELF objects).
|
||
|
||
|
||
File: ctf-spec.info, Node: The string section, Next: Data models, Prev: The label section, Up: CTF dictionaries
|
||
|
||
2.7 The string section
|
||
======================
|
||
|
||
This section is a simple ELF-format strtab, starting with a zero byte
|
||
(thus ensuring that the string with offset 0 is the null string, as
|
||
assumed elsewhere in this spec). The strtab is usually ASCIIbetically
|
||
sorted to somewhat improve compression efficiency.
|
||
|
||
Where the strtab is unusual is the _references_ to it. CTF has two
|
||
string tables, the internal strtab and an external strtab associated
|
||
with the CTF dictionary at open time: usually, this is the ELF dynamic
|
||
strtab (‘.dynstr’) of a CTF dictionary embedded in an ELF file. We
|
||
distinguish between these strtabs by the most significant bit, bit 31,
|
||
of the 32-bit strtab references: if it is 0, the offset is in the
|
||
internal strtab: if 1, the offset is in the external strtab.
|
||
|
||
There is a bug workaround in this area: in format v3 (the first
|
||
version to have working support for external strtabs), the external
|
||
strtab is ‘.strtab’ unless the ‘CTF_F_DYNSTR’ flag is set on the
|
||
dictionary (*note CTF file-wide flags::). Format v4 will introduce a
|
||
header field that explicitly names the external strtab, making this flag
|
||
unnecessary.
|
||
|
||
|
||
File: ctf-spec.info, Node: Data models, Next: Limits of CTF, Prev: The string section, Up: CTF dictionaries
|
||
|
||
2.8 Data models
|
||
===============
|
||
|
||
The data model is a simple integer which indicates the ABI in use on
|
||
this platform. Right now, it is very simple, distinguishing only
|
||
between 32- and 64-bit types: a model of 1 indicates ILP32, 2 indicats
|
||
LP64. The mapping from ABI integer to type sizes is hardwired into
|
||
‘libctf’: currently, we use this to hardwire the size of pointers,
|
||
function pointers, and enumerated types,
|
||
|
||
This is a very kludgy corner of CTF and will probably be replaced
|
||
with explicit header fields to record this sort of thing in future.
|
||
|
||
|
||
File: ctf-spec.info, Node: Limits of CTF, Prev: Data models, Up: CTF dictionaries
|
||
|
||
2.9 Limits of CTF
|
||
=================
|
||
|
||
The following limits are imposed by various aspects of CTF version 3:
|
||
|
||
‘CTF_MAX_TYPE’
|
||
Maximum type identifier (maximum number of types accessible with
|
||
parent and child containers in use): 0xfffffffe
|
||
‘CTF_MAX_PTYPE’
|
||
Maximum type identifier in a parent dictioanry: maximum number of
|
||
types in any one dictionary: 0x7fffffff
|
||
‘CTF_MAX_NAME’
|
||
Maximum offset into a string table: 0x7fffffff
|
||
‘CTF_MAX_VLEN’
|
||
Maximum number of members in a struct, union, or enum: maximum
|
||
number of function args: 0xffffff
|
||
‘CTF_MAX_SIZE’
|
||
Maximum size of a ‘ctf_stype_t’ in bytes before we fall back to
|
||
‘ctf_type_t’: 0xfffffffe bytes
|
||
|
||
Other maxima without associated macros:
|
||
• Maximum value of an enumerated type: 2^32
|
||
• Maximum size of an array element: 2^32
|
||
|
||
These maxima are generally considered to be too low, because C
|
||
programs can and do exceed them: they will be lifted in format v4.
|
||
|
||
|
||
File: ctf-spec.info, Node: Index, Prev: CTF dictionaries, Up: Top
|
||
|
||
Index
|
||
*****
|
||
|
||
|