-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cost: add keep_hierarchy pass with min_cost argument #4344
Conversation
I'm really interested in your methodology here--can you explain it in detail? |
|
@whitequark Sure: I used |
kernel/cost.cc
Outdated
|
||
static unsigned int y_coef(RTLIL::IdString type) | ||
{ | ||
// clang-format off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ew. This is convincing me more and more that we should not use clang-format...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that it doesn't fit our repository. This really should have been a switch/case, but ID($...)
isn't a constant value. Then I wouldn't have to do this. Personally I'm very used to hitting Ctrl+Shift+I to format an entire file I'm working on in VS Code. For shared files I intend to get used to formatting modified lines only, which VS Code does allow me to set a shortcut to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use type.in(...)
which may work better.
kernel/cost.h
Outdated
{ ID($_DFF_P_), 1 }, | ||
{ ID($_DFF_N_), 1 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add all of the DFF and latch types here, but I don't know what a reasonable cost estimate for them would be (1 seems off).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can look at the sky130hd and asap7 areas for those and the other cells
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that's out of scope. Want me to make an issue? I added these only because I noticed stat.cc was patching these on its side so I consolidated it into here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is using these costs? Is there any information about where they come from and what they are supposed to model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stat.cc
uses CMOS transistor estimates of these (16). Nothing uses default gate count estimates of these. I was using this logic: by being primitives to be techmapped to, they have a default gate count of 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by being primitives to be techmapped to, they have a default gate count of 1
I am not sure I follow. Wouldn't that make the cost 1 for all the primitives in the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be appropriate to use a model suitable for pre-synthesis estimates.
So something like:
- buffers and inverters have no cost as they aren't full 'functions'
- simple 2-input gates (AND, NAND, OR, NOR, MUX...) have a cost of 1
- complex 2-input gates (XOR, XNOR) have a cost of 2
- larger gates use the equivalent of what they would use, were you to implement them as 2-input gates
This is the same approach used in Zimmermann's thesis, you need to scale it to whatever base you use (times four if you count transistors).
Sequential elements (all of them) should likely be substantially more expensive because it is (usually) harder to optimize them away and depending on your technology, you have additional costs related to this cell (special placement considerations, it may use an additional metal layer internally, it may need decap cells nearby, more flops -> more clock tree etc).
In my opinion it may even make sense to split combination and sequential cost especially considering the application in keep_hierarchy
since you may want to keep modules which aren't big but have a lot of state (eg register files).
kernel/cost.cc
Outdated
} else if (// shift | ||
type == ID($shift) || | ||
type == ID($shiftx)) { | ||
return 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this twice the cost of $shr
? I'm very unconvinced that the cost model is sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll inspect some techmapped examples of these gates on Monday
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run read_rtlil shift.il; techmap; opt; stat
on this shift.il:
module \shr
wire width 10 input 1 \A
wire width 10 input 2 \B
wire width 10 output 3 \Y
cell $shr \UUT
parameter \A_SIGNED 1
parameter \A_WIDTH 10
parameter \B_SIGNED 1
parameter \B_WIDTH 10
parameter \Y_WIDTH 10
connect \A \A
connect \B \B
connect \Y \Y
end
end
module \shift
wire width 10 input 1 \A
wire width 10 input 2 \B
wire width 10 output 3 \Y
cell $shift \UUT
parameter \A_SIGNED 1
parameter \A_WIDTH 10
parameter \B_SIGNED 1
parameter \B_WIDTH 10
parameter \Y_WIDTH 10
connect \A \A
connect \B \B
connect \Y \Y
end
end
yields
=== shift ===
Number of cells: 122
=== shr ===
Number of cells: 64
Which shows that 2x factor. To the extent in which this yosys behavior is correct, so is the model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite interested in the functionality of this PR but in order to be convinced that it's a good addition it will require the following:
- A definition of what the costs mean, as well as a clear statement of who they are suitable for and who they are not.
- A clear methodology for calculating the cost, which is not a part of some random script in someone else's repository, but a part of Yosys itself. This includes both:
- A description of the methodology in prose.
- An executable that calculates the costs according to it.
This is pretty intense for the current sole use case which is as a heuristic flattening only modules that aren't huge. For what it's worth, if these were all set to |
Sorry for the spam @zachjs, I accidentally committed changes in ast that I only used to play around |
@whitequark test_cell is now capable of checking whether the cost is a correct post techmap gate count upper bound. This means the coefficients aren't generated programmatically, but at least they are verified, at least for cells covered by test_cell functionality. Use case example:
Open questions:
|
That is much better! I'll take a closer look a bit later. |
I propose we supply a conservative upper bound based on the width of the |
68b1008
to
9e67d5c
Compare
Current status and intended usage. I think I'll have to leave it as is for now. I think it's good enough as a heuristic and should move on to more pressing topics
|
|
||
CellCosts costs(design); | ||
|
||
for (auto module : design->selected_modules()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's two options to consider for operations on modules, selected_modules()
vs selected_whole_modules_warn()
. I think in this case it makes sense to operate on partial modules (can be useful e.g. if you want to set keep_hierarchy
on any module that contains a particular type of cell) but wanted to get everyone else's opinion too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best thing to do would be to warn on partially selected modules, but still add them, which selected_whole_modules_warn
doesn't do, so I'll leave it as-is
// Get the cell cost for a cell based on its parameters. | ||
// This cost is an *approximate* upper bound for the number of gates that | ||
// the cell will get mapped to with "opt -fast; techmap" | ||
// The intended usage is for flattening heuristics and similar situations | ||
unsigned int get(RTLIL::Cell *cell); | ||
// Sum up the cell costs of all cells in the module | ||
// and all its submodules recursively | ||
unsigned int get(RTLIL::Module *mod); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected CellCosts
to become a base class with the get functions virtual (pure, or maybe the costs for default_gate_cost()
?), and the heuristic a derived class with a name like NumInternalGatesEstimate
that'll make it immediately obvious at the point of use what's happening. Then the cmos and default costs are just other variants of cost models, rather than static functions that are for some reason defined in a class that now does something unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My get method operate on cells, the previously implemented estimation dicts operate on types. They don't share an interface. The dicts are provided over static functions, only using CellCosts as a namespace. I can move them out to make this distinction more explicit. I have previously done something like what you describe but moved away from it
I don't know how far you want to go with this but you could also compare your approach against theoretical approaches to calculate the cost of arithmetic operations. It is also important to note that if you set demanding timing goals and give ABC an aggressive script, it will start to deviate from this architecture and go towards the performance characteristics of others. So depending on how you use ABC, the final result for large operations and timing critical (deep) paths can deviate drastically from what you may observe with a more relaxed ABC script. I don't think this is a huge problem since the intended use-case is likely to get rid of really small modules, so they are less likely to have these big arithmetic operations anyway. I would recommend there is a note in the help-message of |
7e47a07
to
f04137d
Compare
…min_cost parameter
7bb763b
to
4b29f64
Compare
Modules being flattened improves QoR in practice. It also makes the yosys runtime take much longer.
This PR creates cost.cc with linear cost models for almost all internal cell types to estimate the size of a module after techmapping. To get most of these numbers, I used a modified version of the
test_cell
command, see emil/gather-cell-size.This PR also adds the
keep_hierarchy
pass which marks all selected modules with that attribute and has an optional-max_cost
integer argument which sets a maximum estimated cost threshold.Effects on runtime and QoR with OpenROAD: TBD