A list of Python parsing tools initially imported from @nedbat's blog post.
Name | Description | License | Updated | Parses | Used By | Notes |
---|---|---|---|---|---|---|
Ply | Docstrings are used to associate lexer or parser rules with actions. The lexer uses Python regular expressions. | LGPL | v3.6 4/2015 | LALR(1) | lesscpy | ply-hack group |
pyparsing | Direct parser objects in python, built to parallel the grammar. | MIT | v2.0.3 8/2014 | twill | ||
ANTLR | Parser and lexical analyzer generator in Java. Generates parsing code in Python (as well as Java, C++, C#, Ruby, etc). | BSD | v4.4 7/2014 | LL(*) | ||
pyPEG | A parsing expression grammar toolkit for Python. | GPL | v 2.15 1/2015 | PEG | ||
pydsl | A language workbench written in Python. | GPLv3 | v 0.5.2 11/2014 | |||
LEPL | A recursive descent parser. | dual licensed MPL/LGPL | v 5.1.3 9/2012 | Discontinued | ||
Codetalker | Python-based grammar definitions. | MIT | v 1.1 3/2014 | |||
funcparserlib | Recurisve descent parsing library for Python based on functional combinators. | MIT | v0.3.6 5/2013 | |||
picoparse | v0.9 3/2009 | |||||
Aperiot | Apache 2.0 | v0.1.12 1/2012 | ||||
PyGgy | Lexes with DFA from a specification in a .pyl file. Parses GLR grammars from a specification in a .pyg file. Rules in both files have Python action code. Unlike most parser generators, the rule actions are all executed in a post-processing step. The parser isn't represented as a discrete object, but as globals in a module. | Public Domain | v0.4 8/2004 | Python 3 compatible fork 0.4.1, discussion group | ||
Parsing | LR(1) parser generator as well as CFSM and GLR parser drivers. | MIT | v1.4 12/2012 | LR(1), CFSM, and GLR | ||
Rparse | GPL | v 1.1.0. 4/2010 | LL(1) parser generator with AST generation. | |||
SableCC | Java-based parser and lexical analyzer generator. Generates parsing code in Java, with alternative generators for other languages including Python. | LGPL | v 3.7 11/2012 | |||
GOLD Parser | zlib/libpng | v 5.2.0 8/2012 | LALR | |||
Plex | Python module for constructing lexical analysers. | LGPL | v 2.0 12/2009 | compiles all of the regular expressions into a single DFA. | ||
Plex3 | Python3 port of Plex | 8/2012 | No official release | |||
yeanpypa | Yeanpypa is (yet another) framework to create recursive-descent parsers in Python. | Public Domain | 4/2010 | Parsers are created by writing an EBNF-like grammar as Python expressions. | ||
ZestyParser | MIT | v 0.8.1 4/2007 | ||||
BisonGen | v 0.8.0b1 4/2005 | |||||
DParser for Python | A scannerless GLR parser | BSD | v 1.3.0 3/2013 | Charming Python: DParser for Python: Exploring Another Python Parser | ||
Yapps | Produces recursive-descent parsers, as a human would write. Designed to be easy to use rather than powerful or fast. Better suited for small parsing tasks like email addresses, simple configuration scripts, etc. | MIT | v 2.1.1 8/2003 | |||
PyBison | Python binding to the Bison (yacc) and Flex (lex) parser-generator utilities | GPL | v 0.1.8 6/2004 | LALR(1) | Doesn't yet support Windows. | |
Yappy | v 1.9.4 8/2014 | SLR, LR(1) and LALR(1) | Uses python strings to declare the grammar. | |||
Toy Parser Generator | LGPL | v 3.2.2 12/2013 | ||||
kwParsing | An experimental parser generator implemented in Python which generates parsers implemented in Python. | Python License | v 1.3 | SLR | Gadfly | |
Martel | Martel uses regular expression format definition to generate a parser for flat- and semi-structured files. The parse tree information is passed back to the caller using XML SAX events. In other words, Martel lets you parse flat files as if they are in XML. | BSD | v 0.8 12/2001 | BioPython versions 1.4-1.48 | Last version included in BioPython | |
SimpleParse | Lexing and parsing in one step, but only deterministic grammars. | BSD | 2.11a2 8/2010 | |||
mxTextTools | An unusual table-based parser. There is no generation step, the parser is hand-coded using primitives provided by the package. The parser is implemented in C for speed. (just above). | eGenix Public License, similar to Python, compatible with GPL. | v 3.2.8 7/2014 | SimpleParse, Martel | ||
SPARK | Uses docstrings to associate productions with actions. Unlike other tools, also includes semantic analysis and code generation phases. | MIT | v 0.7 pre-alpha 7 5/2002 | |||
FlexModule and BisonModule | Macros to allow Flex and Bison to produce idiomatic lexers and parsers for Python. The generated lexers and parsers are in C, compiled into loadable modules. | Pythonesque | v 2.1 3/2002 | |||
Bison in a box | Uses standard Bison to generate pure Python parsers. It actually reads the bison-generated .c file and generates Python code. | GPL | v 0.1.0 6/2001 | LALR(1) | ||
Berkeley YACC | Classic YACC, extended to generate Python code. Python support seems to be undocumented. | Public Domain | v 20141128 11/2014 | LALR(1) | ||
PyLR | Lexer is based on regular expressions. | 12/1997 | LR | |||
PyLR | PyLR is a partial Python implementation of the OpenLR specification | Apache 2.0 | 12/2014 | announcement | ||
Construct | A declarative parser (and builder) for binary data. | BSD | v 2.5.2 4/2014 | |||
ModGrammar | A general-purpose library for constructing language parsers and interpreters for context-free grammar definitions. | BSD | v 0.10 2/2013 | |||
lrparsing | Differs from other Python LR(1) parsers in using Python expressions as grammars, and offers disambiguation tools. | AGPLv3 | v 1.0.11 3/2015 | LR(1) parser and a tokeniser | ||
docopt | Generates a parser based on formalized conventions that are used for help messages and man pages for program interface description. | MIT | v 0.6.2 6/2014 |
The Python standard library includes a few modules for special-purpose parsing problems. These are not general-purpose parsers, but don't overlook them. If your need overlaps with their capabilities, they're perfect:
- shlex lexes command lines using the rules common to many operating system shells.
- ConfigParser implements a basic configuration file parser language which provides a structure similar to what you would find on Microsoft Windows INI files.
- ArgParse makes it easy to write user-friendly command-line interfaces. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv. The argparse module also automatically generates help and usage messages and issues errors when users give the program invalid arguments.
- email provides many services, including parsing email and other RFC-822 structures. parser parses Python source text.
- cmd implements a simple command interface, prompting for and parsing out command names, then dispatching to your handler methods.
- json is a JSON (JavaScript Object Notation) encoder and decoder
- tokenize is a lexical scanner for Python source code, implemented in Python.
- Simple Top-Down Parsing in Python - A methodology for writing top-down parsers in Python. (7/2008)
- Pysec: Monadic Combinatoric Parsing in Python - An exposition of using monads to build a Python parser. (2/2008)
Python Parsing Tools by Michael R. Bernstein is licensed under a Creative Commons Attribution 4.0 International License.
Based on a work at https://github.com/webmaven/python-parsing-tools/.