Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major rewrite of the rewriter and the static introspection tool #12277

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Volker-Weissmann
Copy link
Contributor

@Volker-Weissmann Volker-Weissmann commented Sep 20, 2023

The rewriter and the static introspection tool used to be very broken, now it is less broken.

The most important changes are:

  1. We now have class UnknownValue for more explicit handling of situations that are too complex/impossible.
  2. If you write
var = 'foo'
name = var
var = 'bar'
executable(name, 'foo.c')

the tool now knows that the name of the executable is foo and not bar. See dataflow_dag and node_to_runtime_value for details on how we do this.

To test my work I wrote a script that:

  1. git clone's a couple of big projects using meson (e.g. systemd).
  2. Checks if the outputs of meson introspect meson.build --targets and meson introspect build_folder --targets are not contradicting each other.
  3. Checks if the output of meson introspect meson.build --targets is the same for two different versions of meson (the one we want to test and a known good version).
  4. Iterates over all targets that are found using meson introspect meson.build --targets and checks if
meson rewrite target tgt add rewrite_test_source.c
meson rewrite target tgt add rewrite_test_source.c
meson introspect meson.build --targets
meson rewrite target tgt rm rewrite_test_source.c
meson introspect meson.build --targets

does not crash and produces the expected output.

I think this script is very useful (it found a ton of bugs), but I do not know where to put it, so it currently only exists on my machine. It is too slow (1 hour iirc, haven't measured) to run it in the CI pipeline. In the docs you write

All the software on this list is tested for regressions before release

Is this testing (partially) automated? If so, could we merge my script with it?

@bruchar1 @kcgen Afaik you are one of the few people using the rewriter/static introspection tool in production. Could you test your usecases and voice your opinion?

@eli-schwartz You promised me this in a mail:

I would be willing to review the resulting PRs with the intent of
checking that the results "seem to work" and not getting bogged down in
nitty-gritty details. 🙂 I agree that it's relatively obscure
functionality that doesn't always work and the priority should be,
essentially, treating it as brand new code.

mesonbuild/ast/interpreter.py Fixed Show fixed Hide fixed
mesonbuild/ast/interpreter.py Fixed Show fixed Hide fixed
mesonbuild/ast/printer.py Fixed Show fixed Hide fixed
mesonbuild/ast/printer.py Fixed Show fixed Hide fixed
mesonbuild/rewriter.py Fixed Show fixed Hide fixed
mesonbuild/rewriter.py Fixed Show fixed Hide fixed
@Volker-Weissmann Volker-Weissmann force-pushed the introspection_rewrite branch 2 times, most recently from ed089ca to 25c3cf2 Compare September 20, 2023 17:29
@bruchar1
Copy link
Member

Thank you for working on this. I will try to have a deeper look and to test it soon.

For the tests, I think the best thing to do is to ensure that each bug you found is covered by a test.

I wonder if this could be splitted into multiple commits, to ease understanding what each part fixes, especially for trivial fixes that are not directly part of the refactor.

@Volker-Weissmann
Copy link
Contributor Author

Thank you for working on this. I will try to have a deeper look and to test it soon.

For the tests, I think the best thing to do is to ensure that each bug you found is covered by a test.

I should have done that everytime it found a bug. I cannot do it now, since I forgot most of the bugs.

I wonder if this could be splitted into multiple commits, to ease understanding what each part fixes, especially for trivial fixes that are not directly part of the refactor.

I will do that for "trivial fixes that are not directly part of the refactor" , but the rest will still be in one big commit since it is hard/impossible to split.

@Volker-Weissmann
Copy link
Contributor Author

I splitted the trivial fixes intro seperate commits.

Copy link
Member

@bruchar1 bruchar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a quick pass on python syntax. I didn't tried the functionality yet.

mesonbuild/ast/printer.py Outdated Show resolved Hide resolved
return SymbolNode(Token('', '', 0, 0, 0, (0, 0), val))

class DataflowDAG:
src_to_tgts: T.DefaultDict[T.Union[BaseNode, UnknownValue], T.Set[T.Union[BaseNode, UnknownValue]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of T.Union[BaseNode, UnknownValue]. Maybe should you create a type for that? e.g. NodeOrUnknown = T.Union[BaseNode, UnknownValue].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against that. It is less readable and only marginally shorter.

mesonbuild/ast/interpreter.py Outdated Show resolved Hide resolved
mesonbuild/ast/interpreter.py Outdated Show resolved Hide resolved
active = set(srcs)
while True:
if reverse:
new: T.Set[T.Union[BaseNode, UnknownValue]] = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new set is created on both branches of the if. I think you could initialize it before the if.

mesonbuild/mintro.py Outdated Show resolved Hide resolved
@@ -388,7 +370,7 @@ def list_deps_from_source(intr: IntrospectionInterpreter) -> T.List[T.Dict[str,
'has_fallback',
'conditional',
]
result += [{k: v for k, v in i.items() if k in keys}]
result += [{k: v for k, v in i.__dict__.items() if k in keys}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This __dict__ seems a code smell... Is it just because mypy do not detect the type of i correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could rewrite this as:

result += [{k: getattr(i, k) for k in keys}]


if 'meson.build' in [os.path.basename(options.builddir), options.builddir]:
# TODO: This if clause is undocumented.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is out of scope for this PR. Also, I don't like the command line api here:
I think that meson introspect meson.build --targets should be changed to one of these:

  • meson static-introspect . --targets
  • meson static-introspect --targets
  • meson rewriter --targets

I don't like the fact that meson introspect meson.build --targets and meson introspect builddir --targets are radically different things, but the cli makes it look like they are two just two ways for meson to find the correct directory.


if 'meson.build' in [os.path.basename(options.builddir), options.builddir]:
# TODO: This if clause is undocumented.
if os.path.basename(options.builddir) == 'meson.build':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is environment.build_filename constant. Maybe should you use it?

if 'meson.build' in [os.path.basename(options.builddir), options.builddir]:
# TODO: This if clause is undocumented.
if os.path.basename(options.builddir) == 'meson.build':
sourcedir = '.' if options.builddir == 'meson.build' else options.builddir[:-11]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and use -len(environment.build_filename) here...

res = [root_dir / i['subdir'] / x for x in res]
res = [x.resolve() for x in res]
return res
def list_targets_from_source(intr: IntrospectionInterpreter) -> T.Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making this Any, it could at least be T.List[T.Dict[str, object]]

@@ -388,7 +370,7 @@ def list_deps_from_source(intr: IntrospectionInterpreter) -> T.List[T.Dict[str,
'has_fallback',
'conditional',
]
result += [{k: v for k, v in i.items() if k in keys}]
result += [{k: v for k, v in i.__dict__.items() if k in keys}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could rewrite this as:

result += [{k: getattr(i, k) for k in keys}]

# ignores ParanthesizedNode, the binding power of the inner node is
# relevant.
return precedence_level(node.inner)
raise TypeError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually you would use RuntimeError, but we have MesonBugError (or maybe it's MesonBugException?). I think we should use that here.

if 'native' in kwargs:
native = kwargs.get('native', False)
self._add_languages(args, required, MachineChoice.BUILD if native else MachineChoice.HOST)
else:
for for_machine in [MachineChoice.BUILD, MachineChoice.HOST]:
self._add_languages(args, required, for_machine)
return UnknownValue()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation for UnknownValue was for cases where we couldn't know what would be returned. I'm not sure why I follow that add_languages returns UnknownValue and not bool. If it returns it will always return a bool (or it will abort, but for the purpose of the rewriter that doesn't really matter, does it?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var = add_languages('rust', required: false)
message(var)

prints true on some machines and false on others.
So func_add_languages has to return UnknownValue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns a boolean? I'm just not understanding here, if add_languages(), which returns a bool cannot be statically determined, then what can? There are no functions I know of that always return the same value in every case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem confused, let me clear that up:

You know that if you run meson setup builddir, the contents of the resulting builddir-directory depend on

  1. The contents of meson.build
  2. The machine you are using.

If for example both you and I clone the same project that is using meson, and we both run meson setup builddir, those two directories are not (necessarily) identical if we have different machines. This is not a bug, this is intentional. Therefore, if we both run meson introspect builddir, we (might) get different results.

But if we both clone the same project and run meson introspect meson.build we get the same result. If we don't, that is a bug. In other words, the job of ast/introspection.py is to know what happens if we run meson setup on a different, unknown machine. In other words, ast/introspection.py has full knowledge about the contents of meson.build, but no knowledge about the build/host/target machine. Let's say meson.build contains:

srcs = ['1.c']
srcs += files('2.c')
if 3+4 == 7
    srcs += '3.c'
endif
if build_machine.system == 'linux'
    srcs += 'linux-specific.c'
endif
if add_languages('rust', required: false)
    srcs += 'rust.rs'
endif
executable('foo', srcs)

If I run meson introspect meson.build on my machine, the job of meson is to figure out what sources belong to the foo-executable file if you run meson setup on your machine.
It knows that the foo-executable contains the sources 1.c and 2.c. It does not know that it contains '3.c', since I was too lazy to implement that. There is no way for it to know whether the foo-executable contains the sources 'linux-specific.c' or 'rust.rs', since a program running on my machine cannot know whether your machine has a rust compiler installed.

@@ -505,7 +507,7 @@ def evaluate_indexing(self, node: mparser.IndexNode) -> InterpreterObject:

def function_call(self, node: mparser.FunctionNode) -> T.Optional[InterpreterObject]:
func_name = node.func_name.value
(h_posargs, h_kwargs) = self.reduce_arguments(node.args)
(h_posargs, h_kwargs) = self.reduce_arguments(node.args, include_unknown_args = True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No spaces around the = operator

@GranMinigun
Copy link

Cloned this branch, ran introspect --all on DOSBox Staging's meson.build (that's what e.g. Qt Creator does), it happily crashed:

Unable to evaluate subdir([<mesonbuild.interpreterbase.baseobjects.UnknownValue object at 0x7f7d6651e490>]) in AstInterpreter --> Skipping
Traceback (most recent call last):
  File "/dev/shm/meson/mesonbuild/mesonmain.py", line 194, in run
    return options.run_func(options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dev/shm/meson/mesonbuild/mintro.py", line 541, in run
    return print_results(options, results, indent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dev/shm/meson/mesonbuild/mintro.py", line 501, in print_results
    print(json.dumps(out, indent=indent))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type UnknownValue is not JSON serializable

ERROR: Unhandled python exception

    This is a Meson bug and should be reported!

Python 3.11.5.

@Volker-Weissmann
Copy link
Contributor Author

Cloned this branch, ran introspect --all on DOSBox Staging's meson.build (that's what e.g. Qt Creator does), it happily crashed:

Confirmed and on my way to fix it.

@Volker-Weissmann
Copy link
Contributor Author

Cloned this branch, ran introspect --all on DOSBox Staging's meson.build (that's what e.g. Qt Creator does), it happily crashed:

Unable to evaluate subdir([<mesonbuild.interpreterbase.baseobjects.UnknownValue object at 0x7f7d6651e490>]) in AstInterpreter --> Skipping
Traceback (most recent call last):
  File "/dev/shm/meson/mesonbuild/mesonmain.py", line 194, in run
    return options.run_func(options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dev/shm/meson/mesonbuild/mintro.py", line 541, in run
    return print_results(options, results, indent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dev/shm/meson/mesonbuild/mintro.py", line 501, in print_results
    print(json.dumps(out, indent=indent))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type UnknownValue is not JSON serializable

ERROR: Unhandled python exception

    This is a Meson bug and should be reported!

Python 3.11.5.

Fixed in b98292e

If a string contains '\n' we need to use triple quotes,
since the parser will reject it if we use single quotes.
The rewriter and the static introspection tool
used to be very broken, now it is *less* broken.

The most important changes are:

1. We now have class UnknownValue for more explicit handling
	of situations that are too complex/impossible.

2. If you write
	```
	var = 'foo'
	name = var
	var = 'bar'
	executable(name, 'foo.c')
	```
	the tool now knows that the name of the executable is foo and not bar.
	See dataflow_dag and node_to_runtime_value for details on how we do this.

Fixes mesonbuild#11763
Co-authored-by: Jouke Witteveen <j.witteveen@gmail.com>
@Volker-Weissmann
Copy link
Contributor Author

Any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants