Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for output file creation using python 3 dataclasses #34

Open
dax-westerman opened this issue Jul 26, 2024 · 0 comments
Open
Labels
suggestion An issue that suggests change to workflow to method, not necessarily to implement.
Milestone

Comments

@dax-westerman
Copy link
Collaborator

In reviewing case #26, I have a suggestion to consider to help enforce a strong relationship between the source data and the output file.

The following is more meant to illustrate the idea of using a dataclass to collect data along with Pandas to write out the file, rather than capture all the logic in the cleverRules.py file. This might help keep a strong tie between each column and its data, such that any errors could be captured at run-time. There are additional ways to perform field-level validation, if that's of interest. Just wanted to present this options in case it might have use at some point.

import pandas as pd
from dataclasses import asdict, dataclass
from typing import List

# Dataclass to enforce relationship between the name of the column and the data
@dataclass
class CleverOutput:
    label: str
    snippetID: str
    term: str
    sta3n: str
    TIUdocumentSID: str
    TIUstandardTitle: str
    visitSID: str
    referenceDateTime: str
    PatientSID: str
    targetClass: str
    targetSubClass: str
    termID: str
    NoteAndSnipOffset: str
    snippet: str
    OpCode: str

...

# open the input file, then use a list comprehension to emit a list of CleverOutput dataclass objects
with open(fins) as f:
    clever_output_entries : List[CleverOutput] = [
        CleverOutput(
            label=tmpe[0],
            snippetID=tmpe[1],
            term=tmpe[2],
            sta3n=tmpe[3],
            TIUdocumentSID=tmpe[4],
            TIUstandardTitle=tmpe[5],
            visitSID=tmpe[6],
            referenceDateTime=tmpe[7],
            PatientSID=tmpe[8],
            targetClass=tmpe[9],
            targetSubClass=tmpe[10],
            termID=tmpe[11],
            NoteAndSnipOffset=tmpe[12],
            snippet=tmpe[13],
            OpCode=tmpe[14],
        )
        for tmpe in f
    ]

...

# get the output filename
allPos_unfiltered_out = os.path.join(ppath, "allPos_unfiltered.txt")

...

# load a pandas dataframe with the data and write to a CSV with the | delimiter
df: pd.DataFrame = pd.DataFrame.from_records([asdict(o).values() for o in clever_output_entries)
df.to_csv(allPos_unfiltered_out, sep="|")
@dax-westerman dax-westerman added the suggestion An issue that suggests change to workflow to method, not necessarily to implement. label Jul 26, 2024
@dax-westerman dax-westerman added this to the backlog milestone Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
suggestion An issue that suggests change to workflow to method, not necessarily to implement.
Projects
None yet
Development

No branches or pull requests

3 participants