Skip to content

Commit

Permalink
[clang][modules] Allow including module maps to be non-affecting (llv…
Browse files Browse the repository at this point in the history
…m#89992)

The dependency scanner only puts top-level affecting module map files on
the command line for explicitly building a module. This is done because
any affecting child module map files should be referenced by the
top-level one, meaning listing them explicitly does not have any meaning
and only makes the command lines longer.

However, a problem arises whenever the definition of an affecting module
lives in a module map that is not top-level. Considering the rules
explained above, such module map file would not make it to the command
line. That's why 83973cf started
marking the parents of an affecting module map file as affecting too.
This way, the top-level file does make it into the command line.

This can be problematic, though. On macOS, for example, the Darwin
module lives in "/usr/include/Darwin.modulemap" one of many module map
files included by "/usr/include/module.modulemap". Reporting the parent
on the command line forces explicit builds to parse all the other module
map files included by it, which is not necessary and can get expensive
in terms of file system traffic.

This patch solves that performance issue by stopping marking parent
module map files as affecting, and marking module map files as top-level
whenever they are top-level among the set of affecting files, not among
the set of all known files. This means that the top-level
"/usr/include/module.modulemap" is now not marked as affecting and
"/usr/include/Darwin.modulemap" is.
  • Loading branch information
jansvoboda11 authored May 1, 2024
1 parent 754072e commit 477c705
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 44 deletions.
9 changes: 9 additions & 0 deletions clang/include/clang/Serialization/ASTWriter.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@ class StoredDeclsList;
class SwitchCase;
class Token;

namespace SrcMgr {
class FileInfo;
} // namespace SrcMgr

/// Writes an AST file containing the contents of a translation unit.
///
/// The ASTWriter class produces a bitstream containing the serialized
Expand Down Expand Up @@ -490,6 +494,11 @@ class ASTWriter : public ASTDeserializationListener,
/// during \c SourceManager serialization.
void computeNonAffectingInputFiles();

/// Some affecting files can be included from files that are not affecting.
/// This function erases source locations pointing into such files.
SourceLocation getAffectingIncludeLoc(const SourceManager &SourceMgr,
const SrcMgr::FileInfo &File);

/// Returns an adjusted \c FileID, accounting for any non-affecting input
/// files.
FileID getAdjustedFileID(FileID FID) const;
Expand Down
90 changes: 58 additions & 32 deletions clang/lib/Serialization/ASTWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -173,54 +173,50 @@ GetAffectingModuleMaps(const Preprocessor &PP, Module *RootModule) {

const HeaderSearch &HS = PP.getHeaderSearchInfo();
const ModuleMap &MM = HS.getModuleMap();
const SourceManager &SourceMgr = PP.getSourceManager();

std::set<const FileEntry *> ModuleMaps;
auto CollectIncludingModuleMaps = [&](FileID FID, FileEntryRef F) {
if (!ModuleMaps.insert(F).second)
std::set<const Module *> ProcessedModules;
auto CollectModuleMapsForHierarchy = [&](const Module *M) {
M = M->getTopLevelModule();

if (!ProcessedModules.insert(M).second)
return;
SourceLocation Loc = SourceMgr.getIncludeLoc(FID);
// The include location of inferred module maps can point into the header
// file that triggered the inferring. Cut off the walk if that's the case.
while (Loc.isValid() && isModuleMap(SourceMgr.getFileCharacteristic(Loc))) {
FID = SourceMgr.getFileID(Loc);
F = *SourceMgr.getFileEntryRefForID(FID);
if (!ModuleMaps.insert(F).second)
break;
Loc = SourceMgr.getIncludeLoc(FID);
}
};

std::set<const Module *> ProcessedModules;
auto CollectIncludingMapsFromAncestors = [&](const Module *M) {
for (const Module *Mod = M; Mod; Mod = Mod->Parent) {
if (!ProcessedModules.insert(Mod).second)
break;
std::queue<const Module *> Q;
Q.push(M);
while (!Q.empty()) {
const Module *Mod = Q.front();
Q.pop();

// The containing module map is affecting, because it's being pointed
// into by Module::DefinitionLoc.
if (FileID FID = MM.getContainingModuleMapFileID(Mod); FID.isValid())
CollectIncludingModuleMaps(FID, *SourceMgr.getFileEntryRefForID(FID));
// For inferred modules, the module map that allowed inferring is not in
// the include chain of the virtual containing module map file. It did
// affect the compilation, though.
if (FileID FID = MM.getModuleMapFileIDForUniquing(Mod); FID.isValid())
CollectIncludingModuleMaps(FID, *SourceMgr.getFileEntryRefForID(FID));
if (auto FE = MM.getContainingModuleMapFile(Mod))
ModuleMaps.insert(*FE);
// For inferred modules, the module map that allowed inferring is not
// related to the virtual containing module map file. It did affect the
// compilation, though.
if (auto FE = MM.getModuleMapFileForUniquing(Mod))
ModuleMaps.insert(*FE);

for (auto *SubM : Mod->submodules())
Q.push(SubM);
}
};

// Handle all the affecting modules referenced from the root module.

CollectModuleMapsForHierarchy(RootModule);

std::queue<const Module *> Q;
Q.push(RootModule);
while (!Q.empty()) {
const Module *CurrentModule = Q.front();
Q.pop();

CollectIncludingMapsFromAncestors(CurrentModule);
for (const Module *ImportedModule : CurrentModule->Imports)
CollectIncludingMapsFromAncestors(ImportedModule);
CollectModuleMapsForHierarchy(ImportedModule);
for (const Module *UndeclaredModule : CurrentModule->UndeclaredUses)
CollectIncludingMapsFromAncestors(UndeclaredModule);
CollectModuleMapsForHierarchy(UndeclaredModule);

for (auto *M : CurrentModule->submodules())
Q.push(M);
Expand Down Expand Up @@ -249,9 +245,27 @@ GetAffectingModuleMaps(const Preprocessor &PP, Module *RootModule) {

for (const auto &KH : HS.findResolvedModulesForHeader(*File))
if (const Module *M = KH.getModule())
CollectIncludingMapsFromAncestors(M);
CollectModuleMapsForHierarchy(M);
}

// FIXME: This algorithm is not correct for module map hierarchies where
// module map file defining a (sub)module of a top-level module X includes
// a module map file that defines a (sub)module of another top-level module Y.
// Whenever X is affecting and Y is not, "replaying" this PCM file will fail
// when parsing module map files for X due to not knowing about the `extern`
// module map for Y.
//
// We don't have a good way to fix it here. We could mark all children of
// affecting module map files as being affecting as well, but that's
// expensive. SourceManager does not model the edge from parent to child
// SLocEntries, so instead, we would need to iterate over leaf module map
// files, walk up their include hierarchy and check whether we arrive at an
// affecting module map.
//
// Instead of complicating and slowing down this function, we should probably
// just ban module map hierarchies where module map defining a (sub)module X
// includes a module map defining a module that's not a submodule of X.

return ModuleMaps;
}

Expand Down Expand Up @@ -1665,6 +1679,18 @@ struct InputFileEntry {

} // namespace

SourceLocation ASTWriter::getAffectingIncludeLoc(const SourceManager &SourceMgr,
const SrcMgr::FileInfo &File) {
SourceLocation IncludeLoc = File.getIncludeLoc();
if (IncludeLoc.isValid()) {
FileID IncludeFID = SourceMgr.getFileID(IncludeLoc);
assert(IncludeFID.isValid() && "IncludeLoc in invalid file");
if (!IsSLocAffecting[IncludeFID.ID])
IncludeLoc = SourceLocation();
}
return IncludeLoc;
}

void ASTWriter::WriteInputFiles(SourceManager &SourceMgr,
HeaderSearchOptions &HSOpts) {
using namespace llvm;
Expand Down Expand Up @@ -1718,7 +1744,7 @@ void ASTWriter::WriteInputFiles(SourceManager &SourceMgr,
Entry.IsSystemFile = isSystem(File.getFileCharacteristic());
Entry.IsTransient = Cache->IsTransient;
Entry.BufferOverridden = Cache->BufferOverridden;
Entry.IsTopLevel = File.getIncludeLoc().isInvalid();
Entry.IsTopLevel = getAffectingIncludeLoc(SourceMgr, File).isInvalid();
Entry.IsModuleMap = isModuleMap(File.getFileCharacteristic());

auto ContentHash = hash_code(-1);
Expand Down Expand Up @@ -2245,7 +2271,7 @@ void ASTWriter::WriteSourceManagerBlock(SourceManager &SourceMgr,
SLocEntryOffsets.push_back(Offset);
// Starting offset of this entry within this module, so skip the dummy.
Record.push_back(getAdjustedOffset(SLoc->getOffset()) - 2);
AddSourceLocation(File.getIncludeLoc(), Record);
AddSourceLocation(getAffectingIncludeLoc(SourceMgr, File), Record);
Record.push_back(File.getFileCharacteristic()); // FIXME: stable encoding
Record.push_back(File.hasLineDirectives());

Expand Down
20 changes: 8 additions & 12 deletions clang/test/ClangScanDeps/modules-extern-unrelated.m
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
// This test checks that only module map files defining affecting modules are
// affecting.

// RUN: rm -rf %t
// RUN: split-file %s %t

Expand All @@ -22,15 +25,8 @@
//--- second/second.h
#include "first_other.h"

//--- cdb.json.template
[{
"directory": "DIR",
"file": "DIR/tu.m",
"command": "clang -fmodules -fmodules-cache-path=DIR/cache -I DIR/zeroth -I DIR/first -I DIR/second -c DIR/tu.m -o DIR/tu.o"
}]

// RUN: sed -e "s|DIR|%/t|g" -e "s|INPUTS|%/S/Inputs|g" %t/cdb.json.template > %t/cdb.json
// RUN: clang-scan-deps -compilation-database %t/cdb.json -format experimental-full > %t/result.json
// RUN: clang-scan-deps -format experimental-full -o %t/result.json \
// RUN: -- %clang -fmodules -fmodules-cache-path=%t/cache -I %t/zeroth -I %t/first -I %t/second -c %t/tu.m -o %t/tu.o
// RUN: cat %t/result.json | sed 's:\\\\\?:/:g' | FileCheck %s -DPREFIX=%/t

// CHECK: {
Expand Down Expand Up @@ -67,11 +63,11 @@
// CHECK-NEXT: ],
// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/second/second.modulemap",
// CHECK-NEXT: "command-line": [
// CHECK-NOT: "-fmodule-map-file=[[PREFIX]]/second/module.modulemap"
// CHECK: ],
// CHECK-NEXT: "context-hash": "{{.*}}",
// CHECK-NEXT: "file-deps": [
// CHECK-NEXT: "[[PREFIX]]/first/module.modulemap",
// CHECK-NEXT: "[[PREFIX]]/second/module.modulemap",
// CHECK-NEXT: "[[PREFIX]]/second/second.h",
// CHECK-NEXT: "[[PREFIX]]/second/second.modulemap"
// CHECK-NEXT: ],
Expand All @@ -90,11 +86,11 @@
// CHECK-NEXT: ],
// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/zeroth/module.modulemap",
// CHECK-NEXT: "command-line": [
// CHECK-NOT: "-fmodule-map-file=[[PREFIX]]/second/module.modulemap"
// CHECK: ],
// CHECK-NEXT: "context-hash": "{{.*}}",
// CHECK-NEXT: "file-deps": [
// CHECK-NEXT: "[[PREFIX]]/first/module.modulemap",
// CHECK-NEXT: "[[PREFIX]]/second/module.modulemap",
// CHECK-NEXT: "[[PREFIX]]/second/second.modulemap",
// CHECK-NEXT: "[[PREFIX]]/zeroth/module.modulemap",
// CHECK-NEXT: "[[PREFIX]]/zeroth/zeroth.h"
Expand All @@ -115,7 +111,7 @@
// CHECK-NEXT: ],
// CHECK-NEXT: "command-line": [
// CHECK: ],
// CHECK-NEXT: "executable": "clang",
// CHECK-NEXT: "executable": "{{.*}}",
// CHECK-NEXT: "file-deps": [
// CHECK-NEXT: "[[PREFIX]]/tu.m"
// CHECK-NEXT: ],
Expand Down

0 comments on commit 477c705

Please sign in to comment.