This repository provides a collection of examples for Xorbits.
This example shows you how to use Xorbits to do some initial exploration of the NYC taxi dataset and get a sense of the ease-of-use of Xorbits.
To run this example on your favorite platform:
Platform | Link |
---|---|
Colab | https://colab.research.google.com/github/xprobe-inc/examples/blob/main/nyc-taxi/nyc-taxi.ipynb |
Kaggle | https://www.kaggle.com/code/cornmonster/notebooka9814fb1ba |
This example demonstrates how to use Xorbits to perform text deduplication on the OSCAR Corpus. The OSCAR Corpus is a massive multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the GPT-2 model.
Platform | Link |
---|---|
Colab | https://colab.research.google.com/github/xprobe-inc/examples/blob/main/text-dedup/text-dedup.ipynb |
Example 3: Get the license with most stars using Xorbits dataset over bigcode/the-stack Hugging Face dataset
This example demonstrates how to use Xorbits to get the license with most stars over the bigcode/the-stack dataset. The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The dataset was created as part of the BigCode Project, an open scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs).
Platform | Link |
---|---|
Colab | https://colab.research.google.com/github/xprobe-inc/examples/blob/main/most-stars-license/most-stars-license.ipynb |
This example demonstrates how to perform data visualization using Xorbits in Plotly and Dash.
You can run this example locally:
Platform | Link |
---|---|
Local | https://github.com/xorbitsai/examples/tree/main/nba-data |