Skip to content

Releases: chdb-io/chdb

v3.0.0

09 Jan 09:46
d26d2ce
Compare
Choose a tag to compare

Try chDB v3.0.0

pip install -U chdb

Major Changes

Performance

  • Some performance improvement (12% faster than DuckDB 1.1.3 and chDB 2.1)
    image

Session Mode and Memory Engine Support

  • Session mode in chDB 3.0 is now equivalent to the new connection-based implementation
  • Improved support for ClickHouse memory engine and stateful operations
  • Enhanced performance for INSERT and UPDATE operations
  • Better handling of background mutation threads and part compaction
  • New support for SQLite-style ":memory:" connection string mode
    • Implemented using ClickHouse memory engine under the hood

New Default Query Mode: Connection-based Queries

  • Connection-based queries are now the recommended and default way to interact with chDB
  • Users should first create a connection using chDB connection or DBAPI
  • Queries can be executed through:
    • Connection with cursor (fetch results via cursor)
    • Direct connection query (simplified method, results returned directly)

New arch chart

What's Changed

Full Changelog: v2.1.1...v3.0.0

Connection and Session Management

Connection String Format

# Supported formats with examples:
":memory:"                                    # In-memory database
"test.db"                                    # Relative path
"file:test.db"                               # Explicit file protocol
"/path/to/test.db"                           # Absolute path
"file:/path/to/test.db"                      # Absolute path with protocol
"file:test.db?param1=value1&param2=value2"   # With query parameters
"file::memory:?verbose&log-level=test"       # In-memory with parameters
"///path/to/test.db?param1=value1"           # Triple slash absolute path

Connection Parameters

  • Parameters in connection strings are passed to ClickHouse engine as startup args
  • Special parameter handling:
    • mode=ro is converted to --readonly=1 for read-only mode
  • For full parameter list, refer to clickhouse local --help --verbose

Session Behavior

  • Sessions maintain query state throughout their lifecycle
  • Only one active session allowed at a time
  • Creating a new session automatically closes any existing session
  • Sessions can be temporary or persistent:
    • Without specified path: Creates auto-cleaned temporary directory
    • With specified path: Maintains persistent database state

Connection States

  • Only one global connection can exist at a time
  • Creating a new connection with the same connection string returns the existing connection
  • Creating a new connection with a different path before closing the existing one will trigger a user warning

API Reference

Detailed API Reference

Connection Class

class connect:
    def __init__(self, path: str = ":memory:"):
        """Initialize connection with optional path parameter"""
        
    def cursor(self) -> cursor:
        """Create and return a new cursor object"""
        
    def execute(self, query_str: str) -> Any:
        """Execute a query directly on the connection"""
        
    def query(self, query_str: str, format: str = "CSV") -> query_result:
        """Execute a query and return a query_result object
        
        Args:
            query_str: SQL query string
            format: Output format (default: "CSV")
        """
        
    def commit(self):
        """Commit current transaction"""
        
    def close(self):
        """Close the connection"""

Cursor Class

class cursor:
    def __init__(self, connection):
        """Initialize cursor with a connection object"""
        
    def execute(self, query: str) -> None:
        """Execute a SQL query"""
        
    def commit(self) -> None:
        """Commit current transaction"""
        
    def close(self) -> None:
        """Close the cursor"""
        
    def get_memview(self) -> memoryview:
        """Get memory view of result data"""
        
    def data_size(self) -> int:
        """Get size of result data in bytes"""
        
    def rows_read(self) -> int:
        """Get number of rows read"""
        
    def bytes_read(self) -> int:
        """Get number of bytes read"""
        
    def elapsed(self) -> float:
        """Get query execution time in seconds"""
        
    def has_error(self) -> bool:
        """Check if there was an error in execution"""
        
    def error_message(self) -> str:
        """Get error message if there was an error"""

Usage Examples

Basic Memory Engine Example

from chdb import connect

conn = connect(":memory:")
cursor = conn.cursor()

# Create a table
cursor.execute("""
    CREATE TABLE users (
        id Int32,
        name String,
        scores Array(UInt8)
    ) ENGINE = Memory
""")

# Insert data
cursor.execute("""
    INSERT INTO users VALUES
    (1, 'Alice', [95, 87, 92]),
    (2, 'Bob', [88, 85, 90]),
    (3, 'Charlie', [91, 89, 94])
""")

# Query data
cursor.execute("SELECT * FROM users WHERE id = 1")
row = cursor.fetchone()

Advanced DBAPI Usage

from chdb import dbapi

# Create connection
conn = dbapi.connect("path/to/db")
cur = conn.cursor()

# Create table
cur.execute("""
    CREATE TABLE rate (
        day Date,
        value Int64
    ) ENGINE = ReplacingMergeTree ORDER BY day
""")

# Insert data (single and multiple rows)
cur.execute("INSERT INTO rate VALUES (%s, %s)", ("2021-01-01", 24))
cur.executemany("INSERT INTO rate VALUES (%s, %s)", [
    ("2021-01-02", 128),
    ("2021-01-03", 256)
])

# Update data
cur.executemany(
    "ALTER TABLE rate UPDATE value = %s WHERE day = %s",
    [(72, "2021-01-02"), (96, "2021-01-03")]
)

# Fetch methods
cur.execute("SELECT value FROM rate ORDER BY day DESC")
row = cur.fetchone()           # Single row
rows = cur.fetchmany(2)        # Multiple rows
all_rows = cur.fetchall()      # All remaining rows

# Clean up
cur.close()
conn.close()

Best Practices

  1. Create a single connection and reuse it throughout your project
  2. Use connection-based queries instead of direct chDB queries
  3. Close existing connections before creating new ones with different paths
  4. Be mindful of thread usage when executing queries

Migration Guide

Users upgrading from pre-2.2 versions should:

  1. Update code to use connection-based queries
  2. Review and update connection string parameters
  3. Update any code relying on the debug output format
  4. Audit thread usage in applications using chDB

For more detailed examples and API usage, refer to:

For binding developers

New API avail:

/**
 * Creates a new chDB connection.
 * Only one active connection is allowed per process.
 * Creating a new connection with different path requires closing existing connection.
 * 
 * @param argc Number of command-line arguments
 * @param argv Command-line arguments array (--path=<db_path> to specify database location)
 * @return Pointer to connection pointer, or NULL on failure
 * @note Default path is ":memory:" if not specified
 */
CHDB_EXPORT struct chdb_conn ** connect_chdb(int argc, char ** argv);

/**
 * Closes an existing chDB connection and cleans up resources.
 * Thread-safe function that handles connection shutdown and cleanup.
 * 
 * @param conn Pointer to connection pointer to close
 */
CHDB_EXPORT void close_conn(struct chdb_conn ** conn);

/**
 * Executes a query on the given connection.
 * Thread-safe function that handles query execution in a separate thread.
 * 
 * @param conn Connection to execute query on
 * @param query SQL query string to execute
 * @param format Output format string (e.g., "CSV", default format)
 * @return Query result structure containing output or error message
 * @note Returns error result if connection is invalid or closed
 */
CHDB_EXPORT struct local_result_v2 * query_conn(struct chdb_conn * conn, const char * query, const char * format);

v2.2.0b1

12 Dec 05:18
01afb91
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.2.0b0...v2.2.0b1

v2.2.0b0

23 Nov 10:35
18d4aa4
Compare
Choose a tag to compare

What's Changed

  • Fix upload pypi and github release by @auxten in #278
  • Update PyPi info by @auxten in #279
  • Add citation just like ClickHouse, as chDB is mentioned in ClickHouse Paper by @auxten in #280
  • New stateful mode in SQLite like API by @auxten in #283
    • SQLite like API: common functions, common connection strings supported

    • Stateful Query with long live clickhouse engine instance bind with connection

    • Python DB-API reimplemented with new API. Both persist and memory engine supported

    • ClickHouse memory engine support

    • Some performance improvement (~43%)
      image

    • More details see #283 (comment)

Try with pip install chdb==2.2.0b0

Full Changelog: v2.1.1...v2.2.0b0

v2.1.1

16 Oct 12:38
400d830
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.1.0...v2.1.1

v2.1.0

15 Oct 04:02
1ff2593
Compare
Choose a tag to compare

What's Changed

  • Update ClickHouse engine to v24.8.4.13-lts by @auxten in #271
  • Fixed JSON_* function not available on macOS @auxten of #270

Full Changelog: v2.0.4...v2.1.0

v2.0.4

16 Sep 08:59
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.0.3...v2.0.4

v2.0.3

05 Sep 11:32
e65cc31
Compare
Choose a tag to compare

What's Changed

New Contributors

  • @DemoYeti made their first contribution in #257
  • @xinhuitian made their first contribution in #263

Full Changelog: v2.0.2...v2.0.3

v2.0.2

22 Aug 16:44
e5953ba
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.0.1...v2.0.2

v2.0.1

22 Aug 09:46
602dad6
Compare
Choose a tag to compare

What's Changed

  • Patchset-5 before v2.0.0 by @auxten in #250
  • Performace tuning of pipeline running by @auxten in #256
  • Better handling of numpy.NaN and complex Python types in Pandas Dataframe @auxten in #256

Full Changelog: v2.0.0b1...v2.0.1

v2.0.0b1

03 Jul 13:03
Compare
Choose a tag to compare

What's Changed

Special thanks to @mneedham!

Full Changelog: v2.0.0b0...v2.0.0b1