Skip to content

Commit

Permalink
imp:print:beancount output: more robust account/commodity encoding
Browse files Browse the repository at this point in the history
Unsupported chars are now hex-encoded, not just converted to dashes.
This helps keep account and commodity names unique, especially with
the equity conversion account names generated by --infer-equity when
using currency symbols.
(Those could also be converted to ISO 4217 codes, in theory, but
for now we just hex encode them, which is easier to make robust.)

Also, Beancount commodity symbols are no longer enclosed in
hledger-style double quotes.
  • Loading branch information
simonmichael committed Nov 7, 2024
1 parent cbdbe0a commit f57cd63
Show file tree
Hide file tree
Showing 3 changed files with 81 additions and 41 deletions.
43 changes: 23 additions & 20 deletions hledger-lib/Hledger/Write/Beancount.hs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.Builder as TB
import Safe (maximumBound)
import Text.DocLayout (realLength)
import Text.Printf
import Text.Tabular.AsciiWide hiding (render)

import Hledger.Utils
Expand Down Expand Up @@ -109,7 +110,7 @@ postingAsLinesBeancount elideamount acctwidth amtwidth p =
| elideamount = [mempty]
| otherwise = showMixedAmountLinesB displayopts a'
where
displayopts = defaultFmt{ displayZeroCommodity=True, displayForceDecimalMark=True }
displayopts = defaultFmt{ displayZeroCommodity=True, displayForceDecimalMark=True, displayQuotes=False }
a' = mapMixedAmount amountToBeancount $ pamount p
thisamtwidth = maximumBound 0 $ map wbWidth shownAmounts

Expand Down Expand Up @@ -137,12 +138,12 @@ type BeancountAccountName = AccountName
type BeancountAccountNameComponent = AccountName

-- | Convert a hledger account name to a valid Beancount account name.
-- It replaces non-supported characters with a dash, it prepends the letter B
-- to any part which doesn't begin with a letter or number, and it capitalises each part.
-- It's possible this could generate the same beancount name for distinct hledger account names.
-- It replaces spaces with dashes and other non-supported characters with C<HEXBYTES>;
-- prepends the letter A- to any part which doesn't begin with a letter or number;
-- and capitalises each part.
-- It also checks that the first part is one of the required english
-- account names Assets, Liabilities, Equity, Income, or Expenses, and if not
-- it raises an informative error suggesting --alias.
-- raises an informative error.
-- Ref: https://beancount.github.io/docs/beancount_language_syntax.html#accounts
accountNameToBeancount :: AccountName -> BeancountAccountName
accountNameToBeancount a =
Expand Down Expand Up @@ -174,16 +175,19 @@ accountNameComponentToBeancount acctpart =
Nothing -> ""
Just (c,cs) ->
textCapitalise $
T.map (\d -> if isBeancountAccountChar d then d else '-') $ T.cons c cs
T.concatMap (\d -> if isBeancountAccountChar d then (T.singleton d) else T.pack $ charToBeancount d) $ T.cons c cs
where
prependStartCharIfNeeded t =
case T.uncons t of
Just (c,_) | not $ isBeancountAccountStartChar c -> T.cons beancountAccountDummyStartChar t
_ -> t

-- | Dummy valid starting character to prepend to Beancount account name parts if needed (B).
-- | Dummy valid starting character to prepend to Beancount account name parts if needed (A).
beancountAccountDummyStartChar :: Char
beancountAccountDummyStartChar = 'B'
beancountAccountDummyStartChar = 'A'

charToBeancount :: Char -> String
charToBeancount c = if isSpace c then "-" else printf "C%x" c

-- XXX these probably allow too much unicode:

Expand Down Expand Up @@ -222,25 +226,24 @@ type BeancountCommoditySymbol = CommoditySymbol
-- That is: 2-24 uppercase letters / digits / apostrophe / period / underscore / dash,
-- starting with a letter, and ending with a letter or digit.
-- Ref: https://beancount.github.io/docs/beancount_language_syntax.html#commodities-currencies
-- So this: removes any enclosing double quotes,
-- replaces some common currency symbols with currency codes,
-- So this:
-- replaces common currency symbols with their ISO 4217 currency codes,
-- capitalises all letters,
-- replaces any invalid characters with a dash (-),
-- prepends a B if the first character is not a letter,
-- and appends a B if the last character is not a letter or digit.
-- It's possible this could generate unreadable commodity names,
-- or the same beancount name for distinct hledger commodity names.
-- replaces spaces with dashes and other invalid characters with C<HEXBYTES>,
-- prepends a C if the first character is not a letter,
-- appends a C if the last character is not a letter or digit,
-- and disables hledger's enclosing double quotes.
--
-- >>> commodityToBeancount ""
-- "B"
-- "C"
-- >>> commodityToBeancount "$"
-- "USD"
-- >>> commodityToBeancount "Usd"
-- "USD"
-- >>> commodityToBeancount "\"a1\""
-- "A1"
-- >>> commodityToBeancount "\"A 1!\""
-- "A-1-B"
-- "A-1C21"
--
commodityToBeancount :: CommoditySymbol -> BeancountCommoditySymbol
commodityToBeancount com =
Expand All @@ -251,16 +254,16 @@ commodityToBeancount com =
Nothing ->
com'
& T.toUpper
& T.map (\d -> if isBeancountCommodityChar d then d else '-')
& T.concatMap (\d -> if isBeancountCommodityChar d then T.singleton d else T.pack $ charToBeancount d)
& fixstart
& fixend
where
fixstart bcom = case T.uncons bcom of
Just (c,_) | isBeancountCommodityStartChar c -> bcom
_ -> "B" <> bcom
_ -> "C" <> bcom
fixend bcom = case T.unsnoc bcom of
Just (_,c) | isBeancountCommodityEndChar c -> bcom
_ -> bcom <> "B"
_ -> bcom <> "C"

-- | Is this a valid character in the middle of a Beancount commodity name (a capital letter, digit, or '._-) ?
isBeancountCommodityChar :: Char -> Bool
Expand Down
33 changes: 12 additions & 21 deletions hledger/hledger.m4.md
Original file line number Diff line number Diff line change
Expand Up @@ -834,38 +834,29 @@ hledger will try to adjust your data to suit Beancount.
If you plan to export often, you may want to follow Beancount's conventions in your hledger data,
to ease conversion. Eg use Beancount-friendly account names, currency codes instead of currency symbols,
and avoid virtual postings, redundant cost notation, etc.

Here are more details, included here for now
Here are more details
(see also "hledger and Beancount" <https://hledger.org/beancount.html>).

#### Beancount account names

hledger will try adjust your account names, if needed, to
[Beancount account names](https://beancount.github.io/docs/beancount_language_syntax.html#accounts),
by capitalising, replacing unsupported characters with `-`, and
prepending `B` to parts which don't begin with a letter or digit.
(It's possible for this to convert distinct hledger account names to the same beancount name.
Eg, hledger's automatic equity conversion accounts can have currency symbols in their name,
so `equity:conversion:$-€` becomes `equity:conversion:B---`.)

In addition, you must ensure that the top level account names are `Assets`, `Liabilities`, `Equity`, `Income`, and `Expenses`,
which Beancount requires.
hledger will adjust your account names when needed, to make valid
[Beancount account names](https://beancount.github.io/docs/beancount_language_syntax.html#accounts)
(capitalising, replacing spaces with `-`, replacing other unsupported characters with `C<HEXBYTES>`,
and prepending `A` to account name parts which don't begin with a letter or digit).
However, you must ensure that all top level account names are one of the five required by Beancount:
`Assets`, `Liabilities`, `Equity`, `Income`, or `Expenses`.
If yours are named differently, you can use [account aliases](#alias-directive),
usually in the form of `--alias` options, possibly stored in a [config file](#config-file).
(An example: [hledger2beancount.conf](https://github.com/simonmichael/hledger/blob/master/examples/hledger2beancount.conf))

#### Beancount commodity names

hledger will adjust your commodity names, if needed, to
hledger will adjust commodity names when needed, to make valid
[Beancount commodity/currency names](https://beancount.github.io/docs/beancount_language_syntax.html#commodities-currencies),
which must be 2-24 uppercase letters, digits, or `'`, `.`, `_`, `-`,
beginning with a letter and ending with a letter or digit.
hledger will convert known currency symbols to [ISO 4217 currency codes](https://en.wikipedia.org/wiki/ISO_4217#Active_codes).
Otherwise, it will capitalise letters,
replace unsupported characters with a dash (-),
and prepend/append a "B" when needed.
(It's possible for this to generate unreadable commodity names,
or to convert distinct hledger commodity names to the same beancount name.)
(which must be 2-24 uppercase letters, digits, or `'`, `.`, `_`, `-`, beginning with a letter and ending with a letter or digit).
hledger will convert known currency symbols to [ISO 4217 currency codes](https://en.wikipedia.org/wiki/ISO_4217#Active_codes),
capitalise letters, replace spaces with `-`, replace other unsupported characters with `C<HEXBYTES>`,
and prepend/append a "C" when needed.

#### Beancount virtual postings

Expand Down
46 changes: 46 additions & 0 deletions hledger/test/print/beancount.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# * print command's beancount output format

# ** 1. Unrecognised top level account names are rejected.
<
2000-01-01
other 0 ABC

$ hledger -f- print -O beancount
>2 /bad top-level account/
>=1

# ** 2. Otherwise, accounts are encoded to suit beancount, and open directives are added.
<
2000-01-01
assets:a 0 ABC
equity:$-€:$ 0 USD

$ hledger -f- print -O beancount
2000-01-01 open Assets:A
2000-01-01 open Equity:C24-C20ac:C24

2000-01-01 *
Assets:A 0 ABC
Equity:C24-C20ac:C24 0 USD

>=

# ** 3. Commodity symbols are converted to ISO 4217 codes, or encoded, to suit beancount.
<
2000-01-01
assets $0
assets 0
assets 0!
assets 0 "size 2 pencils"

$ hledger -f- print -O beancount
2000-01-01 open Assets

2000-01-01 *
Assets 0 USD
Assets 0 C
Assets 0 C21
Assets 0 SIZE-2-PENCILS

>=

0 comments on commit f57cd63

Please sign in to comment.