Skip to content

Commit

Permalink
Enhance documentation for JoinType and Boundedness enums
Browse files Browse the repository at this point in the history
- Improved descriptions for the Inner and Full join types in join_type.rs to clarify their behavior and examples.
- Added explanations regarding the boundedness of output streams and memory requirements in execution_plan.rs, including specific examples for operators like Median and Min/Max.
  • Loading branch information
jayzhan-synnada committed Dec 20, 2024
1 parent b0d7139 commit 6074af4
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 4 deletions.
10 changes: 7 additions & 3 deletions datafusion/common/src/join_type.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,16 +28,20 @@ use crate::{DataFusionError, Result};
/// Join type
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Hash)]
pub enum JoinType {
/// Inner Join - Returns rows where there is a match in both tables.
/// Inner Join - Returns only rows where there is a matching value in both tables based on the join condition.
/// For example, if joining table A and B on A.id = B.id, only rows where A.id equals B.id will be included.
/// All columns from both tables are returned for the matching rows. Non-matching rows are excluded entirely.
Inner,
/// Left Join - Returns all rows from the left table and matching rows from the right table.
/// If no match, NULL values are returned for columns from the right table.
Left,
/// Right Join - Returns all rows from the right table and matching rows from the left table.
/// If no match, NULL values are returned for columns from the left table.
Right,
/// Full Join - Returns all rows when there is a match in either table.
/// Rows without a match in one table will have NULL values for columns from that table.
/// Full Join (also called Full Outer Join) - Returns all rows from both tables, matching rows where possible.
/// When a row from either table has no match in the other table, the missing columns are filled with NULL values.
/// For example, if table A has row X with no match in table B, the result will contain row X with NULL values for all of table B's columns.
/// This join type preserves all records from both tables, making it useful when you need to see all data regardless of matches.
Full,
/// Left Semi Join - Returns rows from the left table that have matching rows in the right table.
/// Only columns from the left table are returned.
Expand Down
9 changes: 8 additions & 1 deletion datafusion/physical-plan/src/execution_plan.rs
Original file line number Diff line number Diff line change
Expand Up @@ -520,6 +520,9 @@ impl ExecutionPlanProperties for &dyn ExecutionPlan {
///
/// For unbounded streams, it also tracks whether the operator requires finite memory
/// to process the stream or if memory usage could grow unbounded.
///
/// Bounedness of the output stream is based on the the boundedness of the input stream and the nature of
/// the operator. For example, limit or topk with fetch operator can convert an unbounded stream to a bounded stream.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Boundedness {
/// The data stream is bounded (finite) and will eventually complete
Expand All @@ -529,6 +532,9 @@ pub enum Boundedness {
/// Whether this operator requires infinite memory to process the unbounded stream.
/// If false, the operator can process an infinite stream with bounded memory.
/// If true, memory usage may grow unbounded while processing the stream.
///
/// For example, `Median` requires infinite memory to compute the median of an unbounded stream.
/// `Min/Max` requires infinite memory if the stream is unordered, but can be computed with bounded memory if the stream is ordered.
requires_infinite_memory: bool,
},
}
Expand All @@ -542,7 +548,8 @@ impl Boundedness {
/// Represents how an operator emits its output records.
///
/// This is used to determine whether an operator emits records incrementally as they arrive,
/// only emits a final result at the end, or can do both.
/// only emits a final result at the end, or can do both. Note that it generates the output -- record batch with `batch_size` rows
/// but it may still buffer data internally until it has enough data to emit a record batch or the source is exhausted.
///
/// For example, in the following plan:
/// ```text
Expand Down

0 comments on commit 6074af4

Please sign in to comment.