-
Notifications
You must be signed in to change notification settings - Fork 34
Conversation
Thank you for your interest in contributing to Numsense. FTR, I'm currently travelling, and have only limited time to evaluate pull requests. In due time, I'll work through my backlog, but it may take weeks 😳 😓 |
[<InlineData( | ||
"doismilcentoequarentaesetemilhõesquatrocentoseoitentaetrêsmilseiscentosequarentaesete", | ||
System.Int32.MaxValue)>] | ||
let ``tryParsePortuguese returns correct result`` (portuguese, expected) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bonus points for not repeating my mistake of naming the test function tryOfPortuguese... etc.
👍
This is looking promising, but may need a bit more work. For instance, using the Devil's Advocate review technique, I can make a few changes to the code without failing any tests. In match x with
// | x when x < 0 -> sprintf "menos %s" (toPortugueseImp -x)
| 0 -> "zero"
| 1 -> "um" And, at the same time, I can change the return expression for match canonicalized with
| _ -> conv 0 canonicalized If I do that, all tests still pass. Also, did you mean to add support for object-oriented languages later? If not, I can easily do that. |
@ploeh I will have a closer look at the code coverage. By the looks of it there's no test currently covering negative number parsing. |
In the current pull request: correct. FYI, existing languages (English, Polish, Dutch, Danish) have coverage of negative numbers in NumeralProperties.fs. |
@ploeh done! |
It builds and runs all tests without warnings on my machine, so from a technical perspective I think we're good to go 👍 As I don't read Portuguese, I'd like someone who does to review that part of it, if at all possible. I don't expect any errors, but it's always good with a second pair of eyes 😄 |
I solicited a review on Twitter: https://twitter.com/ploeh/status/693833260472860672 Please retweet and spread the word 😄 |
Regarding Portuguese semantics, LGTM |
How strict/loose should the parser be? The parser doesn't work with spaces, i.e. Another issue I noticed, with English included, is when I try to parse something like Brazilian Portuguese (and I believe European Portuguese) has its weird rules like: Should we try to parse as loose as possible? How about parsing |
My default API design philosophy is to follow Postel's law, which would make parsing functions Tolerant Readers. Thus, if the parsing function can parse something unambiguously, even if it's not strictly correct, it should do so. It shouldn't preclude it from being able to parse correct values as well, though. The most important property of the system is that it can round-trip: given any number, This property isn't an isomorphism, though, because the opposite doesn't hold. Because of Postel's law, there are (acceptable) inputs into
In context of the above, are you thereby saying that the values produced by
I thought about that when I wrote the English and Danish parsers, but I decided to leave that behaviour undefined at the time. I must admit that I don't know what the English or Danish parsers do in that situation, but I think that The question is whether something like |
How about valid numbers not returned from toPortuguese, but written by a person?
That depends on what you expect the output to be. I mean, that are missing prepositions, thus it looks weird to the eyes. I, for one, would never use this in an enterprise system. However, any person that reads Portuguese understands the number. Despite of that, I think it'd be better to replace the
I can confirm the English parser returns |
@mrinaldi toPortuguese is able to parse number with prepositions like The reason why I did not add support for prepositions in ToPortuguese is because I wanted to keep the output similar to what existed for other languages (all languages use the dash to separate numbers). This is indeed something that looks a bit odd but I presumed it was intended. |
I was actually referring to the output of
I see your point. However, in English, it's normal to use hyphens in numbers - although I'm not sure if you can use the hyphen for anything but tens and units. I don't speak the other languages though, thus I can't say the same for them.
Maybe @ploeh can tell us that. |
The idea was to produce values that are correct within the rules of a particular language. These are expected to vary widely from language to language. This discussion (about Spanish) may shed more light on the subject. In short, if hyphens aren't legal in Portuguese numerals, they shouldn't be there. |
That's how most (if not all) parsers in the library currently behave. Returning At the same time, I wonder how much of a problem @ploeh thinks this is. Of course you don't want to get junk if the input is meaningful, but detecting invalid input of almost any kind is perhaps too much to ask. |
Thanks guys. I'm going to fix the output of toPortuguese so we don't use hyphens. The parser will remain as is. |
Me too 😉 You may not have seen this, but I've hinted at this before: it's not that I currently have a particular purpose for Numsense. Originally, I wrote the English and Danish implementations during my Christmas holiday because I thought it was fun. Then I thought it'd be a great exercise for other people who wished to get their feet wet with an easy F# exercise. It's also a good exercise for people not familiar with contributing to open source. I try to be as friendly here as possible, but I also try to keep it realistic. This means that I run it as I run all my other open source projects. You'll get the same amount of feedback here, with the same quality bar. Hopefully, it's worth everyone's time. Looping back to the question about the importance of addressing corner cases when parsing: I do think it's important that Numsense behaves correctly within a language's rules. This is also the reason I ask other people to review the linguistic aspects of it. I consider it a defect in the English parser that |
… different classes of numbers.
@ploeh just merged master to resolve merge conflicts. |
Thank you for your contribution! It's now live as Numsense 0.12.0. |
Adds support for Portuguese.