delightful-parsing
is a library for parsing fixed-width columns from a string. It is highly inspired by the project
Apache Daffodil. The differences are:
- For now, a much smaller scope (ie fixed-width strings)
- Defining the parsing specification with Scala case classes and type annotations, instead of XSD
This library is built for Scala 2.12.15, 2.13.8 and 3.1.2
libraryDependencies += "org.sweet-delights" %% "delightful-parsing" % "0.9.0" // check latest version above
<dependency>
<groupId>org.sweet-delights</groupId>
<artifactId>delightful-parsing_2.12</artifactId>
<version>0.9.0</version>
</dependency>
All files in delightful-parsing
are under the GNU Lesser General Public License version 3.
Please read files COPYING
and COPYING.LESSER
for details.
Step 1: decorate a case class with delightful-parsing
annotations.
Example:
import sweet.delights.parsing.annotations.{Length, LengthParam, Options, Regex, Repetition}
@Options(trim = true)
case class Foo(
opt: Option[String] @Length(3),
str: String @Regex("""\w{3}"""),
integer: String @LengthParam("intSize"),
more: List[Bar] @Repetition(2)
)
@Options(trim = true)
case class Bar(
list: List[String] @Repetition(2) @Length(5)
)
Step 2: parse!
import sweet.delights.parsing.Parser._
val line = "optstrintegerAAAAABBBBBCCCCCDDDDD"
val parsed = parse[Foo](Map("intSize" -> 7))(line)
println(parsed)
// Foo(
// opt = Some("opt"),
// str = "str",
// integer = "integer",
// List(
// Bar(List("AAAAA", "BBBBB")),
// Bar(List("CCCCC", "DDDDD"))
// )
// )
By default, Parser
is able to parse strings and basic types
such as Int
, Double
, String
, Option[T]
, List[T]
etc.
The support for additional types is done via implentations of Parser[T]
.
Considering a case class, any field that has a reference to another case classe is a node field.
A node type is the type of a node field.
Any field that is NOT a node is a leaf field.
A leaf type is the type of a leaf field.
Types Boolean
, Byte
, Short
, Int
, Long
, Float
, Double
and String
are leaves.
A node or leaf type T
can be optional (i.e. Option[T]
) or repeatable (i.e. List[T]
).
The choice of type annotations (i.e. annotations "on the right") rather than variable annotations (i.e. annotations "on the left") is purely for readability purposes. As such, it is subjective and opiniated.
Speficies some parsing options like trimming what is consumed. For now, this annotation is mandantory for nodes (case classes). Example:
import sweet.delights.parsing.annotations.Options
@Options(trim = true)
case class Foo()
Experimental. TODO.
Specifies a format to parse a certain leaf type. For now, leaf types supported are java.time.{LocalDate, LocalTime, LocalDateTime, ZonedDateTime}
. Example:
import java.time.LocalDate
import sweet.delights.parsing.annotations.{Length, Options, Format}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
date: LocalDate @Length(6) @Format("yyMMdd")
)
Parser.parse[Foo]("200101")
// res0: Foo(
// date = LocalDate.of(2020, 1, 1)
// )
The format can be provided through a parameter by using the @FormatParam(String)
annotation.
import java.time.LocalDate
import sweet.delights.parsing.annotations.{Length, Options, FormatParam}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
date: LocalDate @Length(6) @FormatParam("dateFormat")
)
Parser.parse[Foo](Map("dateFormat" -> "yyMMdd"))("200101")
// res0: Foo(
// date = LocalDate.of(2020, 1, 1)
// )
Specified whether the parsing of a field should be bypassed (ignored) or not. Applicable only to leaf types. Example:
import sweet.delights.parsing.annotations.{Ignore, Length, Options}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
str: String @Length(5) @Ignore,
opt: Option[String] @Length(2)
)
Parser.parse[Foo]("XX")
// res0: Foo(
// str = "",
// opt = Some("XX")
// )
The parsing of str
is skipped completely. The field is assigned an empty string, its default value.
Ignoring a field can be set through a parameter by using the @IgnoreParam(String)
annotation.
import sweet.delights.parsing.annotations.{IgnoreParam, Length, Options}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
str: String @Length(5) @IgnoreParam("ignoreMe"),
opt: Option[String] @Length(2)
)
Parser.parse[Foo](Map("ignoreMe" -> true))("XX")
// res0: Foo(
// str = "",
// opt = Some("XX")
// )
Specifies the number of characters to be consumed explicitly. Example:
import sweet.delights.parsing.annotations.{Length, Options}
@Options(trim = true)
case class Foo(
str: String @Length(5),
opt: Option[String] @Length(2)
)
The field str
consumes 5 characters from the input string. As the trimming option is activated, the final length of
str
may be less than 5.
The field opt
consumes 2 characters. In addition to the behavior above, as this is an optional field, if the trimmed
string is empty, then opt
becomes None
.
The length can be provided through a parameter by using the @LengthParam
annotation.
import sweet.delights.parsing.annotations.{LengthParam, Options}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
str: String @LengthParam("myStrSize")
)
Parser.parse[Foo](Map("myStrSize" -> 5))("ABCDE")
// res0: Foo(
// str = "ABCDE"
// )
Specifies to ignore any exceptions raised during the parsing of a leaf field. Example:
import sweet.delights.parsing.annotations.{Length, Lenient, Options}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
integer: Int @Length(5) @Lenient,
option: Option[Int] @Length(5) @Lenient
)
Parser.parse[Foo](Map("myStrSize" -> 5))("xxxxxXXXXX")
// res0: Foo(
// integer = 0,
// option = None
// )
NB:
- the default value of an integer is
0
- the default value of an
Option
isNone
Provides a user defined parsing function (UDPF) for a leaf type T
. When present, it
overrides default parsing functions or any parsing function derived from @Format
or
@FormatParam
annotations. The UDPF must be statically defined. Example:
import java.time.LocalTime
import sweet.delights.parsing.annotations.{Length, ParseFunc, Options}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
time: LocalTime @Length(5) @ParseFunc[LocalTime](Foo.removePrefix)
)
object Foo {
def removePrefix(s: String): Option[LocalTime] = Some(LocalTime.parse(s.substring(1)))
}
Parser.parse[Foo]("X03:45")
// res0: Foo(
// time = LocalTime.of(3, 45)
// )
Specifies characters to be consumed thanks to a regular expression. Applicable of leaf types only. Example:
import sweet.delights.parsing.annotations.{Regex, Options}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
str: String @Regex("""\w{5}""")
)
Parser.parse[Foo]("ABCDEF")
// res0: Foo(
// str = "ABCDE"
// )
Specifies the number of repetitions for a list. Example:
import sweet.delights.parsing.annotations.{Length, Repetition, Options}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
strs: List[String] @Repetition(2) @Length(5),
bars: List[Bar] @Repetition(3)
)
@Options(trim = true)
case class Bar(
str: String @Length(1)
)
Parser.parse[Foo]("ABCDEFGHIJKLM")
// res0: Foo(
// strs = List("ABCDE", "FGHIJ"),
// bars = List(
// Bar(str = "K"),
// Bar(str = "L"),
// Bar(str = "M")
// )
// )
strs
is a repeatable leaf field. As such, is requires @Length
in addition to @Repetition
.
bars
is a repeatable node field. Only @Repetition
is required.
Specifies a number of characters to be skipped after a field is parsed successfully. Example:
import sweet.delights.parsing.annotations.{Length, Options, TrailingSkip}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
str1: String @Length(1),
str2: String @Length(1) @TrailingSkip(1),
str3: String @Length(1)
)
Parser.parse[Foo]("AB_D")
// res0: Foo(
// str1 = "A",
// str2 = "B",
// str3 = "D"
// )
For a Boolean
field, specifies a string that should be matched to evaluate the field to true
. Example:
import sweet.delights.parsing.annotations.{Length, Options, TrueIf}
import sweet.delights.parsing.Parser
@Options(trim = true)
case class Foo(
bool: Boolean @Length(3) @TrueIf("Yes")
)
Parser.parse[Foo]("Yes")
// res0: Foo(
// bool = true
// )
Parser.parse[Foo]("xxx")
// res1: Foo(
// bool = false
// )
- case classes MUST be decorated with the
Options
annotation - all fields of a case class MUST be annotated with applicable annotations
- Apache Daffodil for inspiration
- the
shapeless
library - the The Type Astronaut's Guide to Shapeless book
- StackOverflow