Skip to content

Commit

Permalink
[clang][CodeGen] Zero init unspecified fields in initializers in C (l…
Browse files Browse the repository at this point in the history
…lvm#97121)

When an initializer is provided to a variable, the Linux kernel relied
on the compiler to zero-initialize unspecified fields, as clarified in
https://www.spinics.net/lists/netdev/msg1007244.html.

But clang doesn't guarantee this:
1. For a union type, if an empty initializer is given, clang only
   initializes bytes for the first field, left bytes for other (larger)
   fields are marked as undef. Accessing those undef bytes can lead
   to undefined behaviors.
2. For a union type, if an initializer explicitly sets a field, left
   bytes for other (larger) fields are marked as undef.
3. When an initializer is given, clang doesn't zero initialize padding.

So this patch makes the following change:
1. In C, when an initializer is provided for a variable, zero-initialize
   undef and padding fields in the initializer.
2. Document the change in LanguageExtensions.rst.

As suggested in
llvm#78034 (comment),
the change isn't required by C23, but it's standards conforming to do
so.

Fixes: llvm#97459
  • Loading branch information
yabinc committed Sep 25, 2024
1 parent 0a42c7c commit 7a086e1
Show file tree
Hide file tree
Showing 23 changed files with 658 additions and 96 deletions.
23 changes: 23 additions & 0 deletions clang/docs/LanguageExtensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5860,3 +5860,26 @@ specify the starting offset to begin embedding from. The resources is treated
as being empty if the specified offset is larger than the number of bytes in
the resource. The offset will be applied *before* any ``limit`` parameters are
applied.
Union and aggregate initialization in C
=======================================
In C23 (N2900), when an object is initialized from initializer ``= {}``, all
elements of arrays, all members of structs, and the first members of unions are
empty-initialized recursively. In addition, all padding bits are initialized to
zero.
Clang guarantees the following behaviors:
* ``1:`` Clang supports initializer ``= {}`` mentioned above in all C
standards.
* ``2:`` When unions are initialized from initializer ``= {}``, bytes outside
of the first members of unions are also initialized to zero.
* ``3:`` When unions, structures and arrays are initialized from initializer
``= { initializer-list }``, all members not explicitly initialized in
the initializer list are empty-initialized recursively. In addition, all
padding bits are initialized to zero.
Currently, the above extension only applies to C source code, not C++.
40 changes: 38 additions & 2 deletions clang/lib/CodeGen/CGExprAgg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1698,6 +1698,17 @@ void AggExprEmitter::VisitCXXParenListOrInitListExpr(
// Prepare a 'this' for CXXDefaultInitExprs.
CodeGenFunction::FieldConstructionScope FCS(CGF, Dest.getAddress());

const bool ZeroInitPadding =
CGF.CGM.shouldZeroInitPadding() && !Dest.isZeroed();
const Address BaseLoc = Dest.getAddress().withElementType(CGF.Int8Ty);
auto DoZeroInitPadding = [&](CharUnits Offset, CharUnits Size) {
if (Size.isPositive()) {
Address Loc = CGF.Builder.CreateConstGEP(BaseLoc, Offset.getQuantity());
llvm::Constant *SizeVal = CGF.Builder.getInt64(Size.getQuantity());
CGF.Builder.CreateMemSet(Loc, CGF.Builder.getInt8(0), SizeVal, false);
}
};

if (record->isUnion()) {
// Only initialize one field of a union. The field itself is
// specified by the initializer list.
Expand All @@ -1722,17 +1733,37 @@ void AggExprEmitter::VisitCXXParenListOrInitListExpr(
if (NumInitElements) {
// Store the initializer into the field
EmitInitializationToLValue(InitExprs[0], FieldLoc);
if (ZeroInitPadding) {
CharUnits TotalSize =
Dest.getPreferredSize(CGF.getContext(), DestLV.getType());
CharUnits FieldSize =
CGF.getContext().getTypeSizeInChars(FieldLoc.getType());
DoZeroInitPadding(FieldSize, TotalSize - FieldSize);
}
} else {
// Default-initialize to null.
EmitNullInitializationToLValue(FieldLoc);
if (ZeroInitPadding)
EmitNullInitializationToLValue(DestLV);
else
EmitNullInitializationToLValue(FieldLoc);
}

return;
}

// Here we iterate over the fields; this makes it simpler to both
// default-initialize fields and skip over unnamed fields.
const ASTRecordLayout &Layout = CGF.getContext().getASTRecordLayout(record);
CharUnits SizeSoFar = CharUnits::Zero();
for (const auto *field : record->fields()) {
if (ZeroInitPadding) {
unsigned FieldNo = field->getFieldIndex();
CharUnits Offset =
CGF.getContext().toCharUnitsFromBits(Layout.getFieldOffset(FieldNo));
DoZeroInitPadding(SizeSoFar, Offset - SizeSoFar);
CharUnits FieldSize =
CGF.getContext().getTypeSizeInChars(field->getType());
SizeSoFar = Offset + FieldSize;
}
// We're done once we hit the flexible array member.
if (field->getType()->isIncompleteArrayType())
break;
Expand Down Expand Up @@ -1774,6 +1805,11 @@ void AggExprEmitter::VisitCXXParenListOrInitListExpr(
}
}
}
if (ZeroInitPadding) {
CharUnits TotalSize =
Dest.getPreferredSize(CGF.getContext(), DestLV.getType());
DoZeroInitPadding(SizeSoFar, TotalSize - SizeSoFar);
}
}

void AggExprEmitter::VisitArrayInitLoopExpr(const ArrayInitLoopExpr *E,
Expand Down
93 changes: 82 additions & 11 deletions clang/lib/CodeGen/CGExprConstant.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,16 @@ using namespace CodeGen;
namespace {
class ConstExprEmitter;

llvm::Constant *getPadding(const CodeGenModule &CGM, CharUnits PadSize) {
llvm::Type *Ty = CGM.CharTy;
if (PadSize > CharUnits::One())
Ty = llvm::ArrayType::get(Ty, PadSize.getQuantity());
if (CGM.shouldZeroInitPadding()) {
return llvm::Constant::getNullValue(Ty);
}
return llvm::UndefValue::get(Ty);
}

struct ConstantAggregateBuilderUtils {
CodeGenModule &CGM;

Expand All @@ -61,10 +71,7 @@ struct ConstantAggregateBuilderUtils {
}

llvm::Constant *getPadding(CharUnits PadSize) const {
llvm::Type *Ty = CGM.CharTy;
if (PadSize > CharUnits::One())
Ty = llvm::ArrayType::get(Ty, PadSize.getQuantity());
return llvm::UndefValue::get(Ty);
return ::getPadding(CGM, PadSize);
}

llvm::Constant *getZeroes(CharUnits ZeroSize) const {
Expand Down Expand Up @@ -591,6 +598,11 @@ class ConstStructBuilder {
bool Build(const InitListExpr *ILE, bool AllowOverwrite);
bool Build(const APValue &Val, const RecordDecl *RD, bool IsPrimaryBase,
const CXXRecordDecl *VTableClass, CharUnits BaseOffset);
bool DoZeroInitPadding(const ASTRecordLayout &Layout, unsigned FieldNo,
const FieldDecl &Field, bool AllowOverwrite,
CharUnits &FieldSize, CharUnits &SizeSoFar);
bool DoZeroInitPadding(const ASTRecordLayout &Layout, bool AllowOverwrite,
CharUnits &SizeSoFar);
llvm::Constant *Finalize(QualType Ty);
};

Expand Down Expand Up @@ -715,6 +727,10 @@ bool ConstStructBuilder::Build(const InitListExpr *ILE, bool AllowOverwrite) {
if (CXXRD->getNumBases())
return false;

const bool ZeroInitPadding = CGM.shouldZeroInitPadding();
CharUnits FieldSize = CharUnits::Zero();
CharUnits SizeSoFar = CharUnits::Zero();

for (FieldDecl *Field : RD->fields()) {
++FieldNo;

Expand All @@ -732,8 +748,13 @@ bool ConstStructBuilder::Build(const InitListExpr *ILE, bool AllowOverwrite) {
const Expr *Init = nullptr;
if (ElementNo < ILE->getNumInits())
Init = ILE->getInit(ElementNo++);
if (isa_and_nonnull<NoInitExpr>(Init))
if (isa_and_nonnull<NoInitExpr>(Init)) {
if (ZeroInitPadding &&
!DoZeroInitPadding(Layout, FieldNo, *Field, AllowOverwrite, FieldSize,
SizeSoFar))
return false;
continue;
}

// Zero-sized fields are not emitted, but their initializers may still
// prevent emission of this struct as a constant.
Expand All @@ -743,6 +764,11 @@ bool ConstStructBuilder::Build(const InitListExpr *ILE, bool AllowOverwrite) {
continue;
}

if (ZeroInitPadding &&
!DoZeroInitPadding(Layout, FieldNo, *Field, AllowOverwrite, FieldSize,
SizeSoFar))
return false;

// When emitting a DesignatedInitUpdateExpr, a nested InitListExpr
// represents additional overwriting of our current constant value, and not
// a new constant to emit independently.
Expand All @@ -768,6 +794,10 @@ bool ConstStructBuilder::Build(const InitListExpr *ILE, bool AllowOverwrite) {
if (!EltInit)
return false;

if (ZeroInitPadding && FieldSize.isZero())
SizeSoFar += CharUnits::fromQuantity(
CGM.getDataLayout().getTypeAllocSize(EltInit->getType()));

if (!Field->isBitField()) {
// Handle non-bitfield members.
if (!AppendField(Field, Layout.getFieldOffset(FieldNo), EltInit,
Expand All @@ -785,6 +815,9 @@ bool ConstStructBuilder::Build(const InitListExpr *ILE, bool AllowOverwrite) {
}
}

if (ZeroInitPadding && !DoZeroInitPadding(Layout, AllowOverwrite, SizeSoFar))
return false;

return true;
}

Expand Down Expand Up @@ -849,6 +882,9 @@ bool ConstStructBuilder::Build(const APValue &Val, const RecordDecl *RD,

unsigned FieldNo = 0;
uint64_t OffsetBits = CGM.getContext().toBits(Offset);
const bool ZeroInitPadding = CGM.shouldZeroInitPadding();
CharUnits FieldSize = CharUnits::Zero();
CharUnits SizeSoFar = CharUnits::Zero();

bool AllowOverwrite = false;
for (RecordDecl::field_iterator Field = RD->field_begin(),
Expand All @@ -870,6 +906,15 @@ bool ConstStructBuilder::Build(const APValue &Val, const RecordDecl *RD,
if (!EltInit)
return false;

if (ZeroInitPadding) {
if (!DoZeroInitPadding(Layout, FieldNo, **Field, AllowOverwrite,
FieldSize, SizeSoFar))
return false;
if (FieldSize.isZero())
SizeSoFar += CharUnits::fromQuantity(
CGM.getDataLayout().getTypeAllocSize(EltInit->getType()));
}

if (!Field->isBitField()) {
// Handle non-bitfield members.
if (!AppendField(*Field, Layout.getFieldOffset(FieldNo) + OffsetBits,
Expand All @@ -886,7 +931,35 @@ bool ConstStructBuilder::Build(const APValue &Val, const RecordDecl *RD,
return false;
}
}
if (ZeroInitPadding && !DoZeroInitPadding(Layout, AllowOverwrite, SizeSoFar))
return false;

return true;
}

bool ConstStructBuilder::DoZeroInitPadding(
const ASTRecordLayout &Layout, unsigned FieldNo, const FieldDecl &Field,
bool AllowOverwrite, CharUnits &FieldSize, CharUnits &SizeSoFar) {
CharUnits Offset =
CGM.getContext().toCharUnitsFromBits(Layout.getFieldOffset(FieldNo));
if (SizeSoFar < Offset)
if (!AppendBytes(SizeSoFar, getPadding(CGM, Offset - SizeSoFar),
AllowOverwrite))
return false;
FieldSize = CGM.getContext().getTypeSizeInChars(Field.getType());
SizeSoFar = Offset + FieldSize;
return true;
}

bool ConstStructBuilder::DoZeroInitPadding(const ASTRecordLayout &Layout,
bool AllowOverwrite,
CharUnits &SizeSoFar) {
CharUnits TotalSize = Layout.getSize();
if (SizeSoFar < TotalSize)
if (!AppendBytes(SizeSoFar, getPadding(CGM, TotalSize - SizeSoFar),
AllowOverwrite))
return false;
SizeSoFar = TotalSize;
return true;
}

Expand Down Expand Up @@ -1127,12 +1200,10 @@ class ConstExprEmitter

assert(CurSize <= TotalSize && "Union size mismatch!");
if (unsigned NumPadBytes = TotalSize - CurSize) {
llvm::Type *Ty = CGM.CharTy;
if (NumPadBytes > 1)
Ty = llvm::ArrayType::get(Ty, NumPadBytes);

Elts.push_back(llvm::UndefValue::get(Ty));
Types.push_back(Ty);
llvm::Constant *Padding =
getPadding(CGM, CharUnits::fromQuantity(NumPadBytes));
Elts.push_back(Padding);
Types.push_back(Padding->getType());
}

llvm::StructType *STy = llvm::StructType::get(VMContext, Types, false);
Expand Down
51 changes: 51 additions & 0 deletions clang/lib/CodeGen/CodeGenModule.h
Original file line number Diff line number Diff line change
Expand Up @@ -1676,6 +1676,57 @@ class CodeGenModule : public CodeGenTypeCache {
MustTailCallUndefinedGlobals.insert(Global);
}

bool shouldZeroInitPadding() const {
// In C23 (N3096) $6.7.10:
// """
// If any object is initialized with an empty iniitializer, then it is
// subject to default initialization:
// - if it is an aggregate, every member is initialized (recursively)
// according to these rules, and any padding is initialized to zero bits;
// - if it is a union, the first named member is initialized (recursively)
// according to these rules, and any padding is initialized to zero bits.
//
// If the aggregate or union contains elements or members that are
// aggregates or unions, these rules apply recursively to the subaggregates
// or contained unions.
//
// If there are fewer initializers in a brace-enclosed list than there are
// elements or members of an aggregate, or fewer characters in a string
// literal used to initialize an array of known size than there are elements
// in the array, the remainder of the aggregate is subject to default
// initialization.
// """
//
// From my understanding, the standard is ambiguous in the following two
// areas:
// 1. For a union type with empty initializer, if the first named member is
// not the largest member, then the bytes comes after the first named member
// but before padding are left unspecified. An example is:
// union U { int a; long long b;};
// union U u = {}; // The first 4 bytes are 0, but 4-8 bytes are left
// unspecified.
//
// 2. It only mentions padding for empty initializer, but doesn't mention
// padding for a non empty initialization list. And if the aggregation or
// union contains elements or members that are aggregates or unions, and
// some are non empty initializers, while others are empty initiailizers,
// the padding initialization is unclear. An example is:
// struct S1 { int a; long long b; };
// struct S2 { char c; struct S1 s1; };
// // The values for paddings between s2.c and s2.s1.a, between s2.s1.a
// and s2.s1.b are unclear.
// struct S2 s2 = { 'c' };
//
// Here we choose to zero initiailize left bytes of a union type. Because
// projects like the Linux kernel are relying on this behavior. If we don't
// explicitly zero initialize them, the undef values can be optimized to
// return gabage data. We also choose to zero initialize paddings for
// aggregates and unions, no matter they are initialized by empty
// initializers or non empty initializers. This can provide a consistent
// behavior. So projects like the Linux kernel can rely on it.
return !getLangOpts().CPlusPlus;
}

private:
bool shouldDropDLLAttribute(const Decl *D, const llvm::GlobalValue *GV) const;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ struct et7 {
52,
};

// CHECK: @yv7 ={{.*}} global %struct.et7 { [0 x float] zeroinitializer, i8 52 }
// CHECK: @yv7 ={{.*}} global { [0 x float], i8, [3 x i8] } { [0 x float] zeroinitializer, i8 52, [3 x i8] zeroinitializer }
4 changes: 2 additions & 2 deletions clang/test/CodeGen/2008-08-07-AlignPadding1.c
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ struct gc_generation {

#define GEN_HEAD(n) (&generations[n].head)

// The idea is that there are 6 undefs in this structure initializer to cover
// The idea is that there are 6 zeroinitializers in this structure initializer to cover
// the padding between elements.
// CHECK: @generations ={{.*}} global [3 x %struct.gc_generation] [%struct.gc_generation { %union._gc_head { %struct.anon { ptr @generations, ptr @generations, i64 0 }, [8 x i8] undef }, i32 700, i32 0, [8 x i8] undef }, %struct.gc_generation { %union._gc_head { %struct.anon { ptr getelementptr (i8, ptr @generations, i64 48), ptr getelementptr (i8, ptr @generations, i64 48), i64 0 }, [8 x i8] undef }, i32 10, i32 0, [8 x i8] undef }, %struct.gc_generation { %union._gc_head { %struct.anon { ptr getelementptr (i8, ptr @generations, i64 96), ptr getelementptr (i8, ptr @generations, i64 96), i64 0 }, [8 x i8] undef }, i32 10, i32 0, [8 x i8] undef }]
// CHECK: @generations ={{.*}} global [3 x %struct.gc_generation] [%struct.gc_generation { %union._gc_head { %struct.anon { ptr @generations, ptr @generations, i64 0 }, [8 x i8] zeroinitializer }, i32 700, i32 0, [8 x i8] zeroinitializer }, %struct.gc_generation { %union._gc_head { %struct.anon { ptr getelementptr (i8, ptr @generations, i64 48), ptr getelementptr (i8, ptr @generations, i64 48), i64 0 }, [8 x i8] zeroinitializer }, i32 10, i32 0, [8 x i8] zeroinitializer }, %struct.gc_generation { %union._gc_head { %struct.anon { ptr getelementptr (i8, ptr @generations, i64 96), ptr getelementptr (i8, ptr @generations, i64 96), i64 0 }, [8 x i8] zeroinitializer }, i32 10, i32 0, [8 x i8] zeroinitializer }]
/* linked lists of container objects */
struct gc_generation generations[3] = {
/* PyGC_Head, threshold, count */
Expand Down
4 changes: 2 additions & 2 deletions clang/test/CodeGen/2009-06-14-anonymous-union-init.c
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ struct sysfs_dirent {
};
struct sysfs_dirent sysfs_root = { {}, 16877 };

// CHECK: @sysfs_root = {{.*}}global %struct.sysfs_dirent { %union.anon zeroinitializer, i16 16877 }
// CHECK: @sysfs_root = {{.*}}global { %union.anon, i16, [2 x i8] } { %union.anon zeroinitializer, i16 16877, [2 x i8] zeroinitializer }

struct Foo {
union { struct empty {} x; };
Expand All @@ -16,4 +16,4 @@ struct Foo {
struct Foo foo = { {}, 16877 };

// EMPTY: @foo = {{.*}}global %struct.Foo { i16 16877 }
// EMPTY-MSVC: @foo = {{.*}}global %struct.Foo { [4 x i8] undef, i16 16877 }
// EMPTY-MSVC: @foo = {{.*}}global %struct.Foo { [4 x i8] zeroinitializer, i16 16877 }
Loading

0 comments on commit 7a086e1

Please sign in to comment.