-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we adopt a more consistent naming convention for traits and marker types? #5276
Comments
Rust already has trait naming conventions: https://rust-lang.github.io/rfcs/0344-conventions-galore.html#trait-naming, which suggests transitive verbs, if not verbs then nouns, if not nouns, only then adjectives. Put differently:
-able trait names are actually not that good in this model: and
I don't see any reason for us to diverge from this by default.
I think we're ending up in this situation when we have a trait and a single struct that is the primary implementor. That is an unusual situation to be in! We should avoid that, and when we can't I think it's fine to be weird about naming on a case by case basis. I do not think changing the naming conventions for the rest of the project to be counter to Rust naming conventions is an appropriately sized reaction to having this problem for a couple types. Can you provide specific examples of where this has happened? I recall it happening but I can't recall examples and without examples we might be talking in the air. I think the rule of "use an adjective phrase when this happens" or "use
N.b it appears we are talking about multiple kinds of markers here, which I think is worth highlighting specifically. In general I would like us to not overload "marker" when naming such types and traits (I want "marker" in ICU4X naming to only mean data markers), though I do recognize that it is a standard way to reference this Rust patterns.
My position is that it is sufficiently novel as to be surprising and distracting, in the sense that it draws a lot of attention in the code. Experienced programmers in any language get used to mentally sight-reading code, and syntax highlighting/casing conventions form a part of this process. Every time I have seen code that introduces brand new casing conventions my mental parser short circuits and tunnels in on the identifier. This still happens even when I am fully familiar with the novel casing convention and what it means. I've heard similar things from other people. (Particularly, I'm prone to defaulting to snake case identifiers if I'm not paying attention to the language I'm writing, and people have often told me how jarring it is to see that in languages that are strictly camelcase) I'd want to see a really good reason to break this expectation and surprise our users. I don't really think "we have many options but none of them is great" is sufficient, this is just not-great in a different way, and a far more novel and surprising one. I think Option 3 Put differently: I don't think the predictability of these identifiers is super important. I think their readability is, and as such the lack of consistency with
(I don't really think "form follows function" is particularly useful in a naming context) |
I think it also raises the question of whether we're doing Rust correctly. The only crates I've seen that don't follow naming conventions are C wrappers, and those often have horrible Rust APIs, are unsafe, etc. |
Yeah. ICU4X does not do that much weird stuff that we should need to be a special snowflake in these ways. I'm always a bit wary when I find crates that seem to be treating themselves as a special snowflake when it comes to Rust norms: typically their reasons for doing so seem to entirely be inside baseball and irrelevant to me as a user, and it makes me worried as to what other less-obvious norms they are deviating from. The "surprise" aspect I mentioned is a regular experience of mine reading code interacting with C wrappers, and that's not even a case where there are new naming styles introduced: they just use extant Rust naming styles in the "wrong place" (typically: snake case names for types). |
Ok We could explore
There are cases where we are getting close in
Thanks for verbalizing this. I see this argument as an "in my experience" type argument. I can't say that I share this experience in mentally parsing code, which could amount to differences in background. In my experience, the most important naming convention for visually parsing code is that types start with capital letters and variables/functions start with lowercase letters, with various exceptions like scream-case constants (and the C# programming language). I was first introduced to title snake case in my work with Unicode properties, and while it was a bit strange at first, I have come to appreciate it.
Setting aside "in my experience" arguments, I do still hold the position that the reasons I listed in the OP amount to a "good" reason to adopt the proposed naming convention (I would also subjectively say "really good"). I never stated "we have many options but none of them is great" as justification. My reasons are the 1/2/3/4 list in the OP, reproduced here:
|
This is a sub-point on Reason 2, but it might deserve being its own reason: I honestly find camel case hard to read. I squint at my screen to figure out where the capital letters are that separate the token into words. For single nouns and noun phrases, this is not super important, because the whole token expresses a single concept. However, for marker types, they do not represent a single concept! Field set marker types are a bag of things. The collection is crucial to understanding the type. As a little exercise: how quickly can you can down this list and tell me which ones include the hour?
In my experience, when I read that list, I can clearly see that these are symbols that are composed of individual tokens. I can quickly see how many tokens are in the list, and I can quickly identify what those tokens are. Now try the same exercise if we use a camel case convention:
At least for me, in my experience, I see the second list as 6 symbols. Secondarily, each symbol appears to be in camel case, so it probably has multiple words, but, based on my experience, I can assume that the individual words are unimportant for understanding the purpose of the type. |
Yeah I think
I think it's worth highlighting that it is not just my experience but others' experience too, there.
To be clear, I'm quite used to this convention at this point because of working with Unicode properties, I'm just not used to it in Rust, which is what I'm worried about.
Yes, that is me paraphrasing what I consider the strongest justification here. I find the other points to be weaker. I'll address them here:
I really don't see this at all. At least in my experience in Rust marker types are not that different, and I don't see why they should be visually distinct. From a user perspective I rarely care that much whether something is a marker type: if it appears in param/return lists I need to worry about it, if it doesn't, then I don't. During the course of the datetime design we have discussed turning various marker types into structs and maybe vice versa. They're not that different. Furthermore, the strong benefits of this design are only really for the subset of marker types that mark a "bag of stuff", which is in our case largely just the fieldset types. I find the justification here even less applicable for data markers: it's nice for dealing with them from codegen, but those aren't even ones people need to bother parsing too much!
I am entirely unconvinced of this.
I do not think ICU4X is anywhere near as influential as being able to do this successfully. This is not a statement on the current popularity of ICU4X either, rather that as an internationalization library there is a limit to how influential we can be here. Libraries that have succeeded in introducing new convention this way are extremely few.
This I agree with, but I'm not yet convinced that this being crystal clear is that important to optimize for.
This ties in to my earlier point about this being just about "bag of stuff" marker types, not marker types in general. If your primary argument is that marker types are a different type of thing that we should introduce a new convention for, the benefits of that convention should work for all markers, not just ones like FieldSet.
To be clear: I totally agree that I suspect besides the difference in experience, the two core value differences we have here are:
|
On trait names: I generally strongly prefer general solutions that avoid clashes ("just work"), but we can fall back to a case-by-case bikeshed given that the referenced Rust RFC is not clash-free. On marker types:
I'll address the first point below. On the second point: the two biggest categories of marker types are field set markers and data markers. The elements in the field set markers are the fields being displayed, and in data markers they are the fully qualified path to the data marker, which shows up in directory paths. I acknowledge that the justification of "bag of things" as the mental model for field set markers is the stronger of the two, so maybe let's take this one step at a time: first for the field marker types in isolation, and then, only after we can agree on what to do there, revisit the more general problem. For field set markers: I'll start with contrasting pros and cons of title snake case and "smushed camel case" (which I currently find objectionable, but the least so of the alternatives I listed earlier):
I personally find this to be a slam dunk: I highly value the clarity and readability, and I believe it is worth taking a risk to advance that objective. We can break down why:
|
I'm not convinced "always use adjectives" will "just work" in edge cases, much like the Rust RFC rules do mostly "just work" except when we really need a struct and trait to have similar naming, which to me is very much an edge case in Rust. So I don't think that's a general solution either.
(I think this is where I derive the "we have many options but none of them is great" justification, fwiw. I don't see our justifications expanding beyond the fieldset markers. I can see a different set of justifications for data markers, but again, they don't generalize)
I don't find that super novel, we've already had Metazone vs MetaZone, this is a common enough thing I see in the Rust community where people decide in different ways. It's not one where there's a single consensus answer, but that's good for us, it means that either casing choice is more or less acceptable.
I actually don't want us to call attention to this. This isn't that worth emphasis. It's worth learning, but there's a lot of things about this API that are equally important. The fact that it isn't constructible isn't that important to me: whether or not it's constructible is only important when you get to the point where an API seems to want to ask for an instance. There are times where we have "it's not important how to construct this, it's just a type system hint" types that aren't markers as well! If anything, this makes me more convinced that treating these as a special snowflake is a bad idea: I think it is worth having this type of fluidity between types that are "important for type system hints" between them being dataful or just zero sized markers. I think we already had some moments during the design of these APIs where what initially was a marker+struct pair got collapsed into a single struct, and I think we've gone in the other direction too. This reveals to me that conceptually whether or not it is a marker isn't really that important! It's only contextually important whether or not something needs to be constructed, and that is clear enough from the APIs you use to get the parametrized type.
I'll go a step further and say that it can be bad for our reputation. As you mention we do go against the Rust grain a couple times and that's often in well-justified scenarios, definitely not ones where I see people getting annoyed as us for it. I still do not see the justification here as particularly strong, and I think us choosing to do this will annoy users and come off as us being a bit arrogant with considering ourselves to be super special (I've seen other crates get talked about negatively for treating themselves as special in weird ways). I really don't think we're super special here, marker types are a well-established pattern across Rust. Because of this I find the bar for doing this to be quite high, I do think that ultimately, |
Metazone wasn't a casing decision; we actually decided that "metazone" was one word. You might be thinking of "time zone", where we decided to keep the module name
For clients, the field set markers are the most important and unique part of the new API. The rest is standard ICU4X: constructors, format functions, compiled data, data providers, etc.
I've done my homework in trying to make these marker types also be concrete types. We did that for calendar systems. But I'm convinced in my multiple failed attempts that these field set marker types really are just marker types. They are in fact special marker types in the sense that they are compile-time collections of things. Note that I'm not proposing changing this convention for the
This type of argument seems to be based on personal experience. I just find it very hard to believe that using title snake case for date field marker types would rise to the level of reputational loss. Would it change your opinion if we shipped type aliases for the common cases, which is something I kind-of wanted to do anyway? For example, pub type DateFormatter = Formatter<Year_Month_Day>;
pub type DateTimeFormatter = Formatter<Year_Month_Day_Hour_Minute>;
// ... Then the field set marker types wouldn't be the first thing people see. They would be something they see when they want to do something that the default type parameters don't do. |
I talked this over with @echeran and @kartva today. I presented the following question: We have a type
Doing my best to summarize the initial reactions:
Then the conversation continued:
|
On the trait naming: the other example I almost forgot to mention but which we've definitely hit problems is in the data provider constellation. |
I think the Rust trait scheme would have unambiguously given BufferProvide and BlobDataProvide for traits. We didn't follow it, and what we chose instead is still acceptable, but "verbs first, then nouns, then adjectives" would still have been unambiguous here: I don't think that's evidence against it. It is seldom the case that a struct can be named like a verb. |
Not really, no, since the long names would still show up relatively commonly. It's slightly better, though.
I was thinking of both, but yes.
The traits that work with these are equally important. The GetField stuff as well. There's a lot of weird stuff about this API. But regardless of that, I don't think picking an entirely novel casing convention is an appropriate way to call out attention in the first place.
Oh, I have no doubt that these are and should remain markers. My point is that the fact that we have in the past had reasonable arguments for having them not be markers, and other markers got un-markered similarly, tells me that markers are really not that special. I keep hearing the argument that they're special here, and I simply don't see any evidence for it.
Yes, it's personal experience, but I think it's important! I'm happy to ask other community members who have been around a while to weigh in and say what they would feel about a library that made a choice like this. For what it's worth, I'm not against |
Oh, if we did end up going the |
My soft (0.5 in Apache scale) preference is: YMD > YearMonthDayPeriod, Year_Month_DayPeriod. Here is my mental model: I consider those markers as an outlier not only in Rust ecosystem but also in ICU4X. I hope we will not need them often, and only the most advanced and complex API will need them - DateTimeFormat. Maybe one or two more, but I really hope not. DTF happens to be the really sensitive one, with tons of people having strong preferences, both users and designers. It's tricky and that's why Shane's work is monumental. He doesn't just rewrite a component, or a component that is hot. He is rewriting the touchpoint of Software and Internationalization. This means that the community will judge us by how ICU4X DateTimeFormat "feels". And we should aim to make the ergonomic API... well, ergonomic. Ok, so let's look at the example of the API call:
I understand we can provide convenience types, but the trick with them is that the pit of success with them is narrow. One step away and you're in a new world with marker types and weirdness - weirdness engineers didn't sign up for. It's a hermetic knowledge most devs don't want to have. They came here to "format a date in an i18n way" because in this week's sprint they have 10 tasks and this is one of them. They are displeased to realize they have choices which ask them to make decisions. They want "just the simple thing" more often than not. Very few come excited to see the breadth of options they have in front of them. And those that do are already aware how challenging the problem is, they have battle scars to show, they came ready for a week of exploration, not 30 minutes. We don't have to help them as much. Getting back to the Joe Average Developer. We may provide them the alias, and let's say (judgement call) that 50% of it'll work. And 50% of time their UX comes back asking for to do 1-1 some Figma design that shows different format in en-US. I actually don't think they'll care if Option 1 or Option 2. They won't spend time parsing DayPeriod vs Dayperiod or analyzing if DayDayPeriod is weirder than DayDayperiod or not. They'll just feel it's challenging. And in Option1, it's also weird. Shane points out that it's a good thing - It's a new concept, new syntax communicates it. I disagree. For me it's as if I wanted to drive a car and the manufacturer (ekhm, Tesla) moved the gear shifter to a different location because "in Tesla it serves a different role, it's worth pointing it out". Cool, but I'm not trying to learn the Advanced Tesla, I'm trying to drive my rental to my destination. Enough of the analogy. Thanks for still reading :) There are two solutions I can see:
1. Trick generics.
Now we're talking! But the reality is that I'm hopefully not going to be hardcoding the locale, so no value in passing it manually. It comes from somewhere:
I understand the need for locale, I get that I set the length to Medium, and the third one is weird, right? Yes, it's weird because I had to look up the list of available ones, and pick one based on some helper table that matches this keyword to some example that closely resembles my Figma. I argue, and here's probably the crux of my argument hidden in a long mental model description, that the reader other than ICU4X dev won't usually care what components are in the marker. They won't read it much. They will just look for XYZ and match it visually to "January 5th 2024" being more what they mean than EWP being "Jan 5th 24 12:30pm". And we can give it to them with something like `fn try_new(loc: &Locale, length: Length, skeleta: M) This may be tricky, but it's doable. 2. MacroIf they care or value having it visually represented, they'd like something like that:
which is close to JS elaborate form:
so... why not a macro? And if we do a macro, then we are not constrained to anything but dev ergonomics. We can encode it internally according to Rust rules, and noone will see it except in debugging, and in debugging, good luck if you're not familiar with ICU4X internals. |
Worth noting, macro-ing the constructor doesn't necessarily make the type easier to read. Hmm, if we're discussing things like macro-ified They do have some downsides:
I'm not sure I really want to advocate this, but I'm not super against exploring such routes.
This I agree with; overall I think from the perspective of a user building a UI understanding exactly what is in these types is not that important. Important enough when they're making the choice (and looking at docs), but not important enough that it should stand out or really even be that easy to mentally parse when later reading the code. I really think that once you've picked which skeleton you want that choice should melt in to the background, it is not worth that choice being super loudly visible in the code once written. |
I think Zibi's arguments make me more amenable to YMD as well, especially with his comments about what the reader of the code will care about. @zbraniecki do you have ideas as to how the YMD scheme will deal with things like FullYear and DayPeriod? |
Here's the full list of components I can find:
We don't necessarily need to have full type alias coverage for the zone types, but it seems like the main issue is going to be Day vs DayPeriod, we could just use Pd/Periodday. That scans better: the more important thing is up front, but allows us to retain Periodyear for later. @sffc is |
So much activity! Thank you! Responding in chronological order
I don't agree that the other traits are equally important. The field set marker types implement about 7 traits each. GetField is not really that complicated of a trait, especially if we consolidate it into the marker type via the parameterized Input struct (#5269). I'm trying to keep most of the complexity in one place, which is the marker types and the traits they implement. I don't think I've used "call out attention" as an argument one way or another in this thread. I do hold the position that the casing convention emphasizes that these are sets of things, which is different.
OK, so you're saying markers as a concept are not really special. One could say that trait implementations are not really special. When we changed markers to non-markers, it was moving the trait impl from a "fake" struct/enum to a "real" struct/enum. The fact that you can freely move trait impls around between things is a really cool and powerful feature of the Rust programming language. And, the fact that Rust lets you make arbitrary types that don't actually do anything other than provide hints to the compiler is quite special. But, apart from whether or not marker types as a general concept are special or not, the field set marker types are special in their own right:
I value and trust your judgement/experience in these matters, and I'm usually inclined to take it at face value, but I feel this situation is sufficiently visible and impactful to the usability of the API that I would rather like to see some data before being forced to adopt what I consider a painfully inadequate approach such as My anecdotal interview with @echeran and @kartva did not raise any concerns about the unconventional case convention. In TC39 we have a subcommittee that specializes in researching and gathering data on exactly these types of questions.
Of all the things in this thread, this is exactly the type of situation I totally agree with, but I reach the exact opposite conclusion! My position is that "compile-time sets of things" is a useful, intuitive, and accurate mental model. I claim devs should be able to understand from reading and writing code exactly what the formatter they found is supposed to do.
I'm open to exploring the tuple path. I think it's even more trait madness to make it work, but it has a nice outcome.
I'm open to exploring the ultra-abbreviated
The fields are changing. I have a proposal. The currently proposed list is:
And I intend this to be able to be extended to include:
I don't find there to be a substantial difference between |
Exploration on ultra-abbreviated, based on the set of fields in the current spec draft: We are limited to the 26 letters. We could potentially use lowercase letters as modifiers on the letter that precedes them. I want to avoid the same letter repeated multiple times in a row because that has implications about field lengths that I want to avoid. Unlike classical skeleta and patterns, we don't have to be unique in each symbol, so long as the full identifier works out. With that in mind, a potential mapping could be:
These result in the following. Day Field Sets:
Calendar Period Field Sets:
Time Field Sets:
Time Zone Field Sets:
General Composite Field Sets:
We could also add some pre-computed composite field sets, but maybe we don't need them with this model:
|
I'll respond to the rest later, but, to be clear, I think it has no additional trait madness. Specifically, we don't ever write any code generic over It does bring up the qeustion of what we do with composites, though, because you can't have a tuple on the LHS of a type alias. |
@sffc your potential mapping should probably include the aspirational field types to be complete, yes?
|
One potential concern with Maybe not a huge deal since "MDY" and "DMY" wouldn't be valid identifiers. |
I like the macro idea. Something like |
Yeah, in general I think the needs of this API is that it should be readable, not necessarily writeable.
Also in favor if we can make it work in all positions, macros are finicky. |
I'm also in favor of a macro. |
Proposal:
|
We have alignment on the datetime field set type names. I'm leaving the issue open because there were other things in the OP which we don't have alignment on, yet. |
A lot of my time in rewriting the API in
icu_datetime
has been spent in figuring out the best name for traits and marker types. This is especially challenging when the best name for a trait happens to also be the best name for the struct implementing the trait!But, I was thinking, why do we let ourselves have this problem? It would be very nice if we could adopt a naming convention that simply avoids clashes.
Some languages with interfaces use the
I
prefix to indicate that the name is an interface. We're kind-of doing this currently with theMarker
suffix on marker types.Idea for traits: should be an adjective or adjective phrase:
Writeable
is a good exampleDataProvider
is a bad example: should beProvidesData
orQueryable
or ...AsCalendar
seems okayTrieValue
should beTrieValueLike
orRepr32
("representable as 32 bits") orValidAsTrieValue
or ...For markers, I'm still very much in favor of introducing a convention of using title snake case. Why?
icu::datetime::Formatter<Gregorian, Month_Day_Hour_DayPeriod>
is clear. To me, it is also visually appealing.* What I mean by "built-in taxonomy system" is that the identifiers benefit from being able to group together multiple tokens into a single identifier. This applies to both of the primary use cases we have for markers:
DateTime_Names_Gregory_Year
DateTime_Names_Gregory_Month
DateTime_Patterns_Gregory_Time
DateTime_Patterns_IslamicCivil_SemanticSkeleta
Era_Year_Month
("July 2024 AD")Year_Month_Day
("July 22, 2024")Month_Day_DayPeriod
("July 22 in the afternoon")Weekday_DayPeriod_Hour_Minute
("Monday at 6:32 in the evening")Title snake case has the distinction of being the only naming convention that allows for groups of multi-word tokens. It is camel case for the individual words and snake case for the grouping. This is an objective benefit of this casing convention over others (form follows function).
Putting various casing and naming conventions head to head:
Formatter<Gregorian, Month_Day_Hour_DayPeriod>
(clear, no ambiguity)Formatter<Gregorian, MonthDayHourDayPeriod>
(contains ambiguity)Formatter<Gregorian, MonthDayHourDayperiod>
(but "day period" is two words everywhere else)Formatter<Gregorian, MonthDayHourPeriod>
(rename "day period" to "period" because of this limitation, but could conflict with other potential concepts such as a proposed "year period" that includes quarters and trimesters)Formatter<Gregorian, MDHP>
(if you can't win, just abbreviate everything)Formatter<Gregorian, Mdhp>
(I really don't like this one but I'll list it anyway)I honestly see only advantages of option 1.
I know @robertbastian and @Manishearth have expressed skepticism of this idea, but I do not have a clear record of their arguments. The only position I can recall, and I apologize in advance for probably misrepresenting it, is that it is novel and raises questions when people see it in the docs. (My responses: novel yes, but I argue that novel is okay as stated above; raises questions is also okay because this is a novel concept where prior assumptions about its behavior should be thrown out.) I hope we can at least document those arguments in this thread.
Also seeking input from @echeran, @hsivonen, @zbraniecki, @markusicu, @eggrobin, and anyone else.
The text was updated successfully, but these errors were encountered: