Skip to content

Commit

Permalink
Fix unreachable links in markdown files (Lightning-AI#1219)
Browse files Browse the repository at this point in the history
  • Loading branch information
Andrei-Aksionov authored Mar 30, 2024
1 parent 81d7cf3 commit 6d04a87
Show file tree
Hide file tree
Showing 5 changed files with 4 additions and 9 deletions.
2 changes: 1 addition & 1 deletion extensions/xla/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ export PJRT_DEVICE=TPU
> An extensive guide on setup and available options can be found [here](https://cloud.google.com/tpu/docs/v4-users-guide).
Since a new machine was created, you may need to download pretrained weights.
They can be copied to the machine using `gcloud compute tpus tpu-vm scp`, or you can follow the steps described in our [downloading guide](download_model_weights.md).
They can be copied to the machine using `gcloud compute tpus tpu-vm scp`, or you can follow the steps described in our [downloading guide](../../tutorials/download_model_weights.md).

It is also recommended to set up a persistent disk from which to load checkpoints.
Follow [this guide](https://cloud.google.com/tpu/docs/setup-persistent-disk#setting_up_a_tpu_vm_and_a_persistent_disk) to do so.
Expand Down
2 changes: 1 addition & 1 deletion tutorials/0_to_litgpt.md
Original file line number Diff line number Diff line change
Expand Up @@ -527,7 +527,7 @@ lm_eval --model hf \
 
**More information and additional resources**

- [tutorials/convert_lit_models](tutorials/convert_lit_models.md): Tutorial on converting LitGPT weights
- [tutorials/convert_lit_models](./convert_lit_models.md): Tutorial on converting LitGPT weights



Expand Down
2 changes: 1 addition & 1 deletion tutorials/inference.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Inference

We demonstrate how to run inference (next token prediction) with the GPT base model in the [`generate.py`](generate.py) script:
We demonstrate how to run inference (next token prediction) with the GPT base model in the [`generate.py`](../litgpt/generate/base.py) script:

```bash
litgpt generate base --prompt "Hello, my name is" --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
Expand Down
2 changes: 1 addition & 1 deletion tutorials/oom.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ However, your hardware may not support such large context lengths. Here's what y
* For the finetuning scripts, you can trim the length of the samples in your dataset.
All the finetuning scripts expose a `--data.max_seq_length=...` argument. This might also be useful in cases where
sample lengths are highly unbalanced, as the presence of a single very long sample would incur a larger memory usage for all other
shorter samples. For example, the median length of the samples in Alpaca is 110 tokens. Truncating the Alpaca dataset to 256 max tokens reduces the memory requirements of a Falcon 7B model from 23.52 GB to 15.73 GB. For more information about the dataset truncation, please see the *Truncating datasets* section in the [prepare_datasets.md](prepare_datasets.md) tutorial.
shorter samples. For example, the median length of the samples in Alpaca is 110 tokens. Truncating the Alpaca dataset to 256 max tokens reduces the memory requirements of a Falcon 7B model from 23.52 GB to 15.73 GB. For more information about the dataset truncation, please see the *Truncating datasets* section in the [prepare_dataset.md](prepare_dataset.md) tutorial.

Keep in mind that reducing the context length will affect the modelling performance on text sequences longer than the limit.

Expand Down
5 changes: 0 additions & 5 deletions tutorials/prepare_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,6 @@ For comparison, the Falcon 7B model requires 23.52 GB of memory for the original

### Alpaca-GPT4


The Alpaca-GPT4 was built by using the prompts of the original Alpaca dataset and generate the responses via GPT 4. The
dataset consists of 52,000 instructions and responses.

Expand Down Expand Up @@ -126,7 +125,6 @@ litgpt finetune lora \
--train.max_seq_length 256
```


 

### Deita
Expand Down Expand Up @@ -162,7 +160,6 @@ litgpt finetune lora \
--train.max_seq_length 512
```


 

### Dolly
Expand Down Expand Up @@ -281,7 +278,6 @@ litgpt finetune lora \

However, you can also select individual subsets via comma-separated strings as follows:


```bash
litgpt finetune lora \
--data FLAN \
Expand Down Expand Up @@ -385,5 +381,4 @@ Note that you only need to modify a small fraction of the code file, namely the

In addition to the finetuning dataset described above, LitGPT also supports several datasets for pretraining. The pretraining datasets are described in more detail in the following separate tutorial documents:

- [Pretrain Llama 2 on OpenWebText](./pretrain_openwebtext.md)
- [Pretrain TinyLlama on Slimpajama and Starcoder](./pretrain_tinyllama.md)

0 comments on commit 6d04a87

Please sign in to comment.