Skip to content

Example scripts for the pushshift dump files

License

Notifications You must be signed in to change notification settings

Roee16/PushshiftDumps

 
 

Repository files navigation

This repo contains example python scripts for processing the reddit dump files created by pushshift. The files can be downloaded from here or torrented from here.

  • single_file.py decompresses and iterates over a single zst compressed file
  • iterate_folder.py does the same, but for all files in a folder
  • combine_folder_multiprocess.py uses separate processes to iterate over multiple files in parallel, writing lines that match the criteria passed in to text files, then combining them into a final zst compressed file

About

Example scripts for the pushshift dump files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 69.2%
  • Jupyter Notebook 30.8%