Suggestion for output file creation using python 3 dataclasses #34

dax-westerman · 2024-07-26T21:38:31Z

In reviewing case #26, I have a suggestion to consider to help enforce a strong relationship between the source data and the output file.

The following is more meant to illustrate the idea of using a dataclass to collect data along with Pandas to write out the file, rather than capture all the logic in the cleverRules.py file. This might help keep a strong tie between each column and its data, such that any errors could be captured at run-time. There are additional ways to perform field-level validation, if that's of interest. Just wanted to present this options in case it might have use at some point.

import pandas as pd
from dataclasses import asdict, dataclass
from typing import List

# Dataclass to enforce relationship between the name of the column and the data
@dataclass
class CleverOutput:
    label: str
    snippetID: str
    term: str
    sta3n: str
    TIUdocumentSID: str
    TIUstandardTitle: str
    visitSID: str
    referenceDateTime: str
    PatientSID: str
    targetClass: str
    targetSubClass: str
    termID: str
    NoteAndSnipOffset: str
    snippet: str
    OpCode: str

...

# open the input file, then use a list comprehension to emit a list of CleverOutput dataclass objects
with open(fins) as f:
    clever_output_entries : List[CleverOutput] = [
        CleverOutput(
            label=tmpe[0],
            snippetID=tmpe[1],
            term=tmpe[2],
            sta3n=tmpe[3],
            TIUdocumentSID=tmpe[4],
            TIUstandardTitle=tmpe[5],
            visitSID=tmpe[6],
            referenceDateTime=tmpe[7],
            PatientSID=tmpe[8],
            targetClass=tmpe[9],
            targetSubClass=tmpe[10],
            termID=tmpe[11],
            NoteAndSnipOffset=tmpe[12],
            snippet=tmpe[13],
            OpCode=tmpe[14],
        )
        for tmpe in f
    ]

...

# get the output filename
allPos_unfiltered_out = os.path.join(ppath, "allPos_unfiltered.txt")

...

# load a pandas dataframe with the data and write to a CSV with the | delimiter
df: pd.DataFrame = pd.DataFrame.from_records([asdict(o).values() for o in clever_output_entries)
df.to_csv(allPos_unfiltered_out, sep="|")

dax-westerman added the suggestion An issue that suggests change to workflow to method, not necessarily to implement. label Jul 26, 2024

dax-westerman assigned suzytamang and vilijajoyce and unassigned suzytamang and vilijajoyce Jul 26, 2024

dax-westerman added this to the backlog milestone Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion for output file creation using python 3 dataclasses #34

Suggestion for output file creation using python 3 dataclasses #34

dax-westerman commented Jul 26, 2024

Suggestion for output file creation using python 3 dataclasses #34

Suggestion for output file creation using python 3 dataclasses #34

Comments

dax-westerman commented Jul 26, 2024