Skip to content

garth74/nexis-uni-parser

Repository files navigation

Nexis Uni Parser

PyPI Status Python Version License

Read the documentation at https://nexis-uni-parser.readthedocs.io/ Tests Codecov

pre-commit Black

This package can be used to convert NexisUni richtext files to jsonlines format.

Features

  • TODO

Requirements

  • TODO

Installation

You can install Nexis Uni Parser via pip from PyPI:

pip install nexis-uni-parser

Usage

There are two main functions that this package provides.

Convert an RTF file to plain text

Converting an RTF file to a plain text file can be achieved directly by using pandoc. That said, I have included a function that will convert an RTF file to a plain text file since it could be useful. Under the hood, it just uses pandoc.

from pathlib import Path
from nexis_uni_parser import convert_rtf_to_plain_text

inputfile = Path.home().joinpath("nexisuni-file.rtf")
output_filepath = convert_rtf_to_plain_text(inputfile)

print(output_filepath)
>>> /Users/name/nexisuni-file.txt

Parse Nexis Uni Files

The parse function can be used to parse a single file or a directory. Both produce a gzipped JSON lines file. I choose to convert to a compressed JSON lines file because the text data can get large if all files are read into memory.

from pathlib import Path
from nexis_uni_parser import parse

inputfile = Path.home().joinpath("nexisuni-file.rtf")

output_filepath = parse(inputfile)

# Reading the data into a pandas dataframe is easy from here.

import pandas as pd

nexisuni_df = pd.read_json(str(output_filepath), compression="gzip", lines=True)

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, Nexis Uni Parser is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.

About

Parse RTF files downloaded from NexisUni

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published