Skip to content

transform plain text to structural text (in JSON format) according to a dynamic grammar template with regex. **A parser generator**

Notifications You must be signed in to change notification settings

Super3001/Text-Finer

Repository files navigation

note: lastest update is in topology branch -> goto

Introduction

Text-Finer works to transform plain text to structural text (in JSON format) according to a dynamic grammar template.

currently BNF grammar not supported, and name within itself or circular reference not supported, let alone left recursion detection

only a top-down structure is allowed

regular expression, kleen closure operation is supported

Inputs and Outputs

input two files: <filename>.syntax, <filename>.txt

where the .syntax file contains the user-defined BNF grammar (or pre-defined ones), and the .txt file contains the plain text.

and output one file (default at project root dir): <filename>.tree.json

Usage

python run.py *filename*.syntax *filename*.txt --interpret

Currently, the interpreter version (in Python) is available; the compile version (in Go, or Cangjie in the future) is still under development

Advanced features

  1. Regex support

(available in interpreter version)

  1. Allow import BNF definitions from other .syntax file

(available in interpreter version)

  1. Transform Regex to DFA

(under development)

Examples

input file: base.syntax

LINE ::= /\S.*/

WORD ::= /\w+/

NL ::= '\n' | '\r\n'

LineBlank ::= /\s*/

LineSpace ::= LineBlank LINE

LineFull ::= LineSpace NL

File[file] ::= LineFull* LineBlank

python run.py demo/base.syntax demo/base.syntax --interpret

output:

output file: base.tree.json

{
    "File": {
        "LineFull": [
            {
                "LineSpace": {
                    "LineBlank": "\n",
                    "LINE": "LINE ::= /\\S.*/"
                },
                "NL": "\n"
            },
            {
                "LineSpace": {
                    "LineBlank": "\n",
                    "LINE": "WORD ::= /\\w+/"
                },
                "NL": "\n"
            },
            {
                "LineSpace": {
                    "LineBlank": "\n",
                    "LINE": "NL ::= '\\n' | '\\r\\n'"
                },
                "NL": "\n"
            },
            {
                "LineSpace": {
                    "LineBlank": "\n",
                    "LINE": "LineBlank ::= /\\s*/"
                },
                "NL": "\n"
            },
            {
                "LineSpace": {
                    "LineBlank": "\n",
                    "LINE": "LineSpace ::= LineBlank LINE"
                },
                "NL": "\n"
            },
            {
                "LineSpace": {
                    "LineBlank": "\n",
                    "LINE": "LineFull ::= LineSpace NL"
                },
                "NL": "\n"
            },
            {
                "LineSpace": {
                    "LineBlank": "\n",
                    "LINE": "File[file] ::= LineFull* LineBlank"
                },
                "NL": "\n"
            }
        ],
        "LineBlank": ""
    }
}

About

transform plain text to structural text (in JSON format) according to a dynamic grammar template with regex. **A parser generator**

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published