From Words to Worth: Newborn Article Impact Prediction with LLM

LLM as Article Impact Predictor

[Early Access Version]

This paper is currently under peer review. The code might change frequently. We are currently experiencing a severe staff shortage. If you encounter any issues during the replication process, please feel free to contact us through an issue or via email：oceanytech@gmail.com.

Introduction

This repository contains the official implementation for the paper "From Words to Worth: Newborn Article Impact Prediction with LLM". The tool is designed to PEFT the LLMs for the prediction of the future impact.

Pre-finetuning Guidance (Skip if you only want to perform the inference.)

The procedures could be a little bit complicated for training the LLMs.

First, you need to pull the repo and type the following commands in the console:

cd ScImpactPredict
pip install -r requirements.txt

Second, you have to manually modify the 'xxxForSequenceClassification' in the transformers package.

class LlamaForSequenceClassification(LlamaPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.model = LlamaModel(config)
        self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)
        self.post_init()
        # Add codes here!
        self.loss_func = 'mse'
        self.sigmoid = nn.Sigmoid()
        ...
    def forward(...):
        ...
        hidden_states = transformer_outputs[0]
        logits = self.score(hidden_states)
        # Add codes here!
        if not self.loss_func == 'bce':
            logits = self.sigmoid(logits)
        if input_ids is not None:
            batch_size = input_ids.shape[0]
        ...
        # Add codes here!
        if self.config.problem_type == "regression":
            if self.loss_func == 'bce':
                loss_fct = BCEWithLogitsLoss()
            elif self.loss_func == 'mse':
                loss_fct = MSELoss()
            # loss_fct = MSELoss()
            elif self.loss_func == 'l1':
                loss_fct = L1Loss()
            elif self.loss_func == 'smoothl1':
                loss_fct = nn.SmoothL1Loss()

Fine-tuning (Skip if you only want to perform the inference.)

Prepare train.sh bash file like the below:

DATA_PATH="ScImpactPredict/NAID/NAID_train_extrainfo.csv"
TEST_DATA_PATH="ScImpactPredict/NAID/NAID_test_extrainfo.csv"

OMP_NUM_THREADS=1 accelerate launch offcial_train.py \
    --total_epochs 5 \
    --learning_rate 1e-4 \
    --data_path $DATA_PATH \
    --test_data_path $TEST_DATA_PATH \
    --runs_dir ScImpactPredict/official_runs/LLAMA3 \
    --checkpoint  path_to_huggingface_LLaMA3

Then, type sh train.sh in the console. Waiting for the training ends~

Testing (batch)

Similar to Fine-tuning, prepare test.sh as below:

python inference.py \
 --data_path ScImpactPredict/NAID/NAID_test_extrainfo.csv \
 --weight_dir path_to_runs_dir

Then, type sh test.sh.

Testing (single article)

Just modify the single_pred.py file, then type python single_pred.py. (If you haven't modified the code as instructed in the pre-finetuning guidance, apply a Sigmoid function to the inference results.)

Model Weights

First, apply and download the LLaMA-3 pretrain weights at huggingface official sites. Then, download the provided LoRA weights (runs_dir) here.

Compare with Previous Methods

With a few adjustments based on your specific needs, it should work fine. Since these models train very quickly (less than a few minutes on a single RTX 3080), we won’t be providing the trained weights.

Repo Structure Description

Folders like furnace, database, and tools are used for building the NAID and TKPD datasets. They have no direct connection to training or inference.

We are pretty confident in our methodology and experiments, and you should be able to achieve any of the performance reported in our paper within an acceptable margin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Words to Worth: Newborn Article Impact Prediction with LLM

LLM as Article Impact Predictor

[Early Access Version]

This paper is currently under peer review. The code might change frequently. We are currently experiencing a severe staff shortage. If you encounter any issues during the replication process, please feel free to contact us through an issue or via email：oceanytech@gmail.com.

Introduction

Pre-finetuning Guidance (Skip if you only want to perform the inference.)

Fine-tuning (Skip if you only want to perform the inference.)

Testing (batch)

Testing (single article)

Model Weights

Compare with Previous Methods

Repo Structure Description

We are pretty confident in our methodology and experiments, and you should be able to achieve any of the performance reported in our paper within an acceptable margin.

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.idea		.idea
CACHE		CACHE
NAID		NAID
TKPD		TKPD
database		database
furnace		furnace
img		img
previous_methods		previous_methods
script		script
tools		tools
.gitignore		.gitignore
README.md		README.md
avg_journal_score_pred.py		avg_journal_score_pred.py
inference.py		inference.py
offcial_train.py		offcial_train.py
requirements.txt		requirements.txt
single_pred.py		single_pred.py

ssocean/ScImpactPredict

Folders and files

Latest commit

History

Repository files navigation

From Words to Worth: Newborn Article Impact Prediction with LLM

LLM as Article Impact Predictor

[Early Access Version]

This paper is currently under peer review. The code might change frequently. We are currently experiencing a severe staff shortage. If you encounter any issues during the replication process, please feel free to contact us through an issue or via email：oceanytech@gmail.com.

Introduction

Pre-finetuning Guidance (Skip if you only want to perform the inference.)

Fine-tuning (Skip if you only want to perform the inference.)

Testing (batch)

Testing (single article)

Model Weights

Compare with Previous Methods

Repo Structure Description

We are pretty confident in our methodology and experiments, and you should be able to achieve any of the performance reported in our paper within an acceptable margin.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages