-
Notifications
You must be signed in to change notification settings - Fork 706
Home
P. Oscar Boykin edited this page Jul 19, 2016
·
94 revisions
Scalding is a Scala library that makes it easy to write MapReduce jobs in Hadoop. It's similar to other MapReduce platforms like Pig and Hive, but offers a higher level of abstraction by leveraging the full power of Scala and the JVM.
Scalding is built on top of Cascading, a Java library that abstracts away much of the complexity of Hadoop (such as the need to write raw map
and reduce
functions).
Need a suggestion for where to start? Try the Alice in Wonderland walkthrough which shows how to use Scalding step by step to learn about the book's text.
- Scaladocs: Generated documentation for current version of Scalding.
- Note:
sbt doc
will build scaladocs under thetarget/2.9.2/api/
directory, which you can then open in your browser. - Tutorials
- Beginner
- Getting Started
- Scalding REPL: Learning is better when it's interactive. This tutorial shows off how to interact with your data using the Scalding REPL.
- Alice in Wonderland walkthrough: Step-by-step example of using Scalding in Local mode in the REPL.
- Intro to Scalding Jobs
- Intermediate
- Aggregation using Algebird Aggregators. Continuing the SQL analogy, we see how to use composable Aggregators.
- SQL to Scalding. Canonical ways of translating common SQL idioms to Scalding.
- Advanced
- Building Bigger Platforms With Scalding some approaches for modular design and composing with scalding.
- Getting Started with the Matrix library
- Beginner
- Reference/Other
- Type-safe API Reference. This API is very close to the scala collections API.
- REPL Reference
- Automatic Orderings, Monoids and Arbitraries: using macros to automatically generate needed Ordering, Moniod, Semigroup or Arbitrary instances for case classes and scala collections.
- Matrix-API-Reference
- Scalding Sources
- Scalding-Commons. The README of the former scalding-commons library.
- Rosetta Code. A collection of MapReduce tasks translated (from Pig, Hive, Cascalog, MapReduce Streaming, etc.) into Scalding.
- Oscar's Scalding Talk at the Hadoop Summit. Slides from Oscar's talk at the Hadoop Summit.
- Upgrading to 0.9.0 means fixing some compile issues. These sed rules may help.
- DEPRECATED: Fields-based API Reference. This is the original, Cascading DSL API to scalding using a named tuple model. We highly recommend the Type-safe API, using TypedPipe, for any new code. This page also contains many example code snippets illustrating each Scalding function. See Field Rules for more on Fields.
- Scalding-cassandra support for reading/writing cassandra
- [Spy Glass] (https://github.com/ParallelAI/SpyGlass) - Advanced featured HBase wrapper for Cascading and Scalding
- Scalding: Powerful & Concise MapReduce Programming
- Scalding lecture for UC Berkeley's Analyzing Big Data with Twitter class
- Scalding with CDH3U2 in a Maven project
- Running your Scalding jobs in Eclipse
- Run/Test jobs locally from Intellij IDEA
- Running your Scalding jobs in IDEA intellij
- Running Scalding jobs on EMR
- Running Scalding with HBase support: Scalding HBase wiki
- Using the distributed cache
- Calling Scalding from inside your application
- Unit Testing Scalding Jobs
- Using counters
NOTE: all of the following tutorials use the Fields API, which is deprecated
- Scalding for the impatient great set of tutorials on using scalding walking through simple to more complex examples (including TF-IDF).
- Movie Recommendations and more in MapReduce and Scalding
- Generating Recommendations with MapReduce and Scalding, a shorter version of the above post.
- Poker collusion detection with Mahout and Scalding
- Portfolio Management in Scalding
- Find the Fastest Growing County in US, 1969-2011, using Scalding
- Dean Wampler's Scalding Workshop. Presented by Dean at StrangeLoop 2012.
- Typesafe's Activator for Scalding. Also created by Dean Wampler.
- Hive, Pig, Scalding, Scoobi, Scrunch and Spark: A Comparison of Hadoop Frameworks
- Why Hadoop MapReduce needs Scala
- How Twitter is doing its part to democratize big data
- Meet the combo powering Hadoop at Etsy, Airbnb and Climate Corp.
- Scalding wins a Bossie award from InfoWorld
- Scalding: Hadoop Word Count in LESS than 70 lines of code
- Using Scalding with other versions of Scala
- Scala and sbt for Homebrew users
- Scala and sbt for MacPorts users
- Comparison to Scrunch and Scoobi
- Powered-By see who is using scalding in production.
- Scaladocs
- Getting Started
- Type-safe API Reference
- SQL to Scalding
- Building Bigger Platforms With Scalding
- Scalding Sources
- Scalding-Commons
- Rosetta Code
- Fields-based API Reference (deprecated)
- Scalding: Powerful & Concise MapReduce Programming
- Scalding lecture for UC Berkeley's Analyzing Big Data with Twitter class
- Scalding REPL with Eclipse Scala Worksheets
- Scalding with CDH3U2 in a Maven project
- Running your Scalding jobs in Eclipse
- Running your Scalding jobs in IDEA intellij
- Running Scalding jobs on EMR
- Running Scalding with HBase support: Scalding HBase wiki
- Using the distributed cache
- Unit Testing Scalding Jobs
- TDD for Scalding
- Using counters
- Scalding for the impatient
- Movie Recommendations and more in MapReduce and Scalding
- Generating Recommendations with MapReduce and Scalding
- Poker collusion detection with Mahout and Scalding
- Portfolio Management in Scalding
- Find the Fastest Growing County in US, 1969-2011, using Scalding
- Mod-4 matrix arithmetic with Scalding and Algebird
- Dean Wampler's Scalding Workshop
- Typesafe's Activator for Scalding