Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PXB-3269 Reduce the time the Server is locked by xtrabackup #1603

Open
wants to merge 66 commits into
base: trunk
Choose a base branch
from

Commits on Oct 29, 2024

  1. PXB-3034 - Make --lock-ddl option an ENUM

    https://jira.percona.com/browse/PXB-3034
    
    Changed lock-ddl option to be an enum. Possible Values are:
    
    ON - Same as True
    OFF - Same as False
    REDUCED - Enable REDUCED lock mode. The first attempt to copy IBD
    files are done without locking and changed tables (Affected by DDL)
    are recopied (if needed) under DDL.
    altmannmarcelo authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    66a8908 View commit details
    Browse the repository at this point in the history
  2. PXB-3034 - Add DDL tracking to xtrabackup

    https://jira.percona.com/browse/PXB-3034
    
    Add DDL tracking to xtrabackup. This new object is responsible for
    tracking DDL's while the backup is running.
    Later those changes will be handled in the end of backup and during
    prepare.
    altmannmarcelo authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    c26c2c0 View commit details
    Browse the repository at this point in the history
  3. PXB-3034 - Handle prepare

    https://jira.percona.com/browse/PXB-3034
    
    Adjusted DDL tracking to produce correct files at the end of backup and
    handle those files during prepare.
    altmannmarcelo authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    1b87a2f View commit details
    Browse the repository at this point in the history
  4. PXB-3034 - Second phase copy Multi thread

    https://jira.percona.com/browse/PXB-3034
    
    Added parallel copy capability to the second phase copy of .ibd files.
    altmannmarcelo authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    88f9489 View commit details
    Browse the repository at this point in the history
  5. PXB-3034 - Adding test cases

    https://jira.percona.com/browse/PXB-3034
    
    Added test cases under suite/lockless
    Fixed test cases using --lock-ddl=false/true
    altmannmarcelo authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    36ef9cd View commit details
    Browse the repository at this point in the history
  6. PXB-3034 - adjust fil_open_for_xtrabackup

    Adjusted fil_open_for_xtrabackup to tolerate file been gone and
    re-attempt to open the file 10 times.
    altmannmarcelo authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    02aaf87 View commit details
    Browse the repository at this point in the history
  7. 1. Use space_id instead of table name for naming .del .ren files.

    2. PXB-3220 Allow deleted tables between disovery and file open, save them in missing tables list
    3. PXB-3227: Rename table and then drop table, make sure that original table was deleted
    aybek authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    bbe293a View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    60d609e View commit details
    Browse the repository at this point in the history
  9. We want to note that table will be copied only after it has been open…

    …ed and loaded to cache;
    
    moving ddl_tracker->add_table to the correct spot
    aybek authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    c997a63 View commit details
    Browse the repository at this point in the history
  10. PXB-3113 : Improve debug sync framework to allow PXB to pause and res…

    …ume threads
    
    https://perconadev.atlassian.net/browse/PXB-3113
    
    The current debug-sync option in PXB completely suspends PXB process and user can resume by sending SIGCONT signal
    This is useful for scenarios where PXB is paused and do certain operations on server and then resume PXB to complete.
    
    But many bugs we found during testing, involves multiple threads in PXB. The goal of this work is to be able to
    pause and resume the thread.
    
    Since many tests use the existing debug-sync option, I dont want to disturb these tests. We can convert them to
    the new mechanism later.
    
    How to use?
    -----------
    The new mechanism is used with option --debug-sync-thread="sync_point_name"
    
    In the code place a debug_sync_thread(“debug_point_1”) to stop thread at this place.
    
    You can pass the debug_sync point via commandline --debug-sync-thread=”debug_sync_point1”
    
    PXB will create a file of the debug_sync point name in the backup directory. It is suffixed with a threadnumber.
    Please ensure that no two debug_sync points use same name (it doesn’t make sense to have two sync points with same name)
    
    ```
    2024-03-28T15:58:23.310386-00:00 0 [Note] [MY-011825] [Xtrabackup] DEBUG_SYNC_THREAD: sleeping 1sec.  Resume this thread by deleting file /home/satya/WORK/pxb/bld/backup//xb_before_file_copy_4860396430306702017
    ```
    In the test, after activating syncpoint, you can use wait_for_debug_sync_thread_point <syncpoint_name>
    
    Do some stuff now. This thread is sleeping.
    
    Once you are done, and if you want the thread to resume, you can do so by deleting the file 'rm backup_dir/sync_point_name_*`
    Please use resume_debug_sync_thread_point <syncpoint_name> <backup_dir>. It dletes the syncpoint file and additionally checks that syncpoint is
    indeed resumed.
    
    More common/complicated scenario:
    ----------------------------------
    The scenario is to signal another thread to stop after reaching the first sync point. To achieve this. Do steps 1 to 3 (above)
    
    Echo the debug_sync point name into a file named “xb_debug_sync_thread”. Example:
    
    4. echo "xtrabackup_copy_logfile_pause" > backup/xb_debug_sync_thread
    
    5. send SIGUSR1 signal to PXB process. kill -SIGUSR1 496102
    
    6. Wait for syncpoint to be reached. wait_for_debug_sync_thread <syncpoint_name>
    
    PXB acknowledges it
    2024-03-28T16:05:07.849926-00:00 0 [Note] [MY-011825] [Xtrabackup] SIGUSR1 received. Reading debug_sync point from xb_debug_sync_thread file in backup directory
    2024-03-28T16:05:07.850004-00:00 0 [Note] [MY-011825] [Xtrabackup] DEBUG_SYNC_THREAD: Deleting  file/home/satya/WORK/pxb/bld/backup//xb_debug_sync_thread
    
    and then prints this once the sync point is reached.
    2024-03-28T16:05:08.508830-00:00 1 [Note] [MY-011825] [Xtrabackup] DEBUG_SYNC_THREAD: sleeping 1sec.  Resume this thread by deleting file /home/satya/WORK/pxb/bld/backup//xb_xtrabackup_copy_logfile_pause_10389933572825668634
    
    At this point, we have two threads sleeping at two sync points. Either of them can be resumed by deleting the filenames mentioned in the error log.
    (Or use resume_debug_sync_thread())
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    f1b54f7 View commit details
    Browse the repository at this point in the history
  11. PXB-3252 : Xtrabackup failed to read page after 10 retries. File ./my…

    …sql.ibd seems to be corrupted.
    
    https://perconadev.atlassian.net/browse/PXB-3252
    
    Problem:
    --------
    With lock-ddl=REDUCED, ALTER ENCRYPTION='Y'/'N' happens. On general tablespaces, this is done inplace.
    ie the space_id of tablespace will not change and the pages are encrypted or decrypted.
    
    For file per table tablespaces, a new tablespace is created with encryption key and data is copied from
    old tablespace to new tablespace.
    
    In xtrabackup, the files are discovered and then they are copied. Between these two operations, the encrypted
    tablespace can change. For example, PXB saw that ts1.ibd is encrypted with key1, loaded into cache.
    
    Then server did ENCRYPTION='N' and then back to ENCRYPTION='Y', now the tablspace is encrypted with a different key.
    
    Now PXB copy threads tries to copy this tablespce and cannot decrypt a page. Page 0 is always unencrypted. So the
    problem typically detected at Page 1. It can happen on any page.
    
    Since PXB cannot decrypt the page, it reports corruption and aborts the backup.
    
    Fix:
    ----
    On decryption errors, we track such tablespaces with separate corrupted list. We also them to the recopy tables list.
    Under lock, these tablespaces are copied again. A .new extension is used.
    Then we process the corrupted list under lock. Create .corrupt files for the tablespaces from the corrupted list.
    For example, if the tablespace encrypted is ts1.ibd, the file will be ts1.ibd.corrupted.
    
    On prepare, we delete the corresponding ts1.ibd if the ts1.ibd.corrupted is present. This has to be done before the
    *.ibd scan becuase tablespace loading aborts on processing such half-written tablespaces.
    If the .corrupted is present in incremental directory, delete the ts1.ibd.meta and ts.ibd.delta files from the incremental
    backup directory.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    042e105 View commit details
    Browse the repository at this point in the history
  12. PXB-3246 : Assertion failure: log0recv.cc:2141:!page || fil_page_type…

    …_is_index(page_type)
    
    Problem:
    --------
    Unable to apply redo log record entry because page is in wrong state. It was observed that
    tablespace is created by incremental backup
    
    How did this happen?
    --------------------
    
    lets say tablespace is t1.ibd and happily in fullbackup
    before incremental, this gets renamed to t2.ibd
    incremental backup creates t2.ibd.delta and t2.ibd.meta files in incremental backup directory
    later there is drop t2.ibd,  we have space_id.del file in incremental backup directory
    also some redo generated on this table before it is dropped.
    
    During prepare of incremental backup, when we process a space_id.del file, we check the tablespace if tablespace is found.
    Lets say, it 2.del. To process 2.del, we first check, the tabespace that is with space_id 2.
    Since the tablespace name is t1.ibd in the full backup directory, we delete it. Additionally,
    we delete the .ibd and .meta files, so we try to delete t1.ibd.meta and t1.ibd.delta files.
    They never existed, so we ignore the errors to delete them.
    
    But in the inc backup directory, we still have t2.ibd.delta and t2.ibd.meta files. So inc backup prepare
    creates a tablespace with space_id 2 and apply the delta file changes. This tablespace is wrong
    because, we are creating a dropped tablespace and we dont have all the changes. incremental backup
    creates this tablespace with all-zero 7 pages. Later when we do MLOG_INSERT into the index page,
    we find out the page is NOT in correct state.
    
    Fix:
    ----
    We have to delete the right incremental files based on space_id. So we build metamap by scanning
    *.meta files and with the key as space_id (found in meta file).
    
    Later, when we process the space_id.del file, after removing the tablespace with space_id,
    we will now ask aka meta map cache to give the .delta and .meta file belonging to deleted space_id.
    By deleting the un-necessary .meta file and .delta, the tablespace is considred as dropped by redo
    and corresponding redo entries are not applied.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    abbf67b View commit details
    Browse the repository at this point in the history
  13. PXB-3253 : [ERROR] [MY-012592] [InnoDB] Operating system error number…

    … 2 in a file operation
    
    https://perconadev.atlassian.net/browse/PXB-3253
    
    Problem:
    --------
    Files disappear during backup with --lockd-ddl=reduced
    
    Analysis:
    ---------
    PXB open server files using os_file_create_simple_no_error_handling() via Fil_shard::open_file(),
    Fil_shard::get_file_size(), Datafile::open_read_only. This API doesn't tolerate file open errors.
    
    This particular bug occurs when the file disappeared after get_file_size() in Fil_shard::open_file().
    (See the testcase for more details).
    
    Fix:
    ----
    If lock ddl is reduced and if we have not yet acquired/entered the copy under lock phase
    ie is_server_locked() is false, we can tolerate the file open errors. So we use the function/API
    os_file_create() instead of other variants. Within this, based on lock_ddl reduced mode, we
    tolerate file opening errors.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    0caf7a7 View commit details
    Browse the repository at this point in the history
  14. PXB-3223 : PXB must not allow --lock-ddl=REDUCED when pagetracking is…

    … enabled
    
    Problem:
    -------
    We cannot allow pagetracking with lock-ddl=REDUCED. This is because page-tracking gives
    us a set of page_ids (space_id, page_nos). PXB should copy these pages and while we copy
    these pages, tablespace disappear, get renamed, encrypted etc.
    
    We will enable it if there is need or usecase for this. For now, we will disable it.
    
    Fix:
    ----
    Disable the combination of --page-tracking and --lock-ddl=REDUCED
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    77dcf01 View commit details
    Browse the repository at this point in the history
  15. PXB-3120 : Assertion failure: Dir_Walker::is_directory

    Problem:
    --------
    InnoDB assumes directories or files do not disappear. It is true
    for the engine because, it is in the startup mode and no opeartions are allowed
    at this point of time.
    
    Analysis:
    ---------
    With lock-ddl=RECUCED, tables can be dropped concurrently when pxb does *.ibd scan
    or subdirectories can disappear too.
    
    Fix:
    ----
    Handle walk_posix() for missing files/directories. The scan should continue and skip
    these deleted files or directories.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    fda5b94 View commit details
    Browse the repository at this point in the history
  16. PXB-3278 : Wrong parsing of MLOG_FILE_ redo log records with lock-ddl…

    …=REDUCED
    
    Problem:
    --------
    ddl_tracker_t::backup_file_op assumes the required redo bytes are always present. see the assertion len < 6.
    But it may happen we sometimes receive redo less than that. In such cases, we return nullptr and let the caller read more read and retry
    
    Fix:
    ----
    fil_tablespace_redo_create()/rename()/delete() variants handle this problem by returning nullptr and reading more redo and retry.
    Moved ddl tracker calls to track after the validation is done.
    aybek authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    fbf43c8 View commit details
    Browse the repository at this point in the history
  17. PXB-3281 : With lock-ddl=REDUCED, STL containers used by reduced code…

    … are not thread safe
    
    Problem:
    -------
    xtrabackup uses multiple threads to scan the *.ibd files. With lock-ddl=reduced, we use several STL maps to track of missing, dropped or renamed tables.
    
    Multiple threads are used only when number of IBDs are more than 8K
    
    Unsafe calls:
      ddl_tracker->add_missing_table(phy_filename);
      ddl_tracker->add_renamed_table(space_id, path);
    
    These calls from multiple threads operate on std::map/unordered_map and can cause race conditions.
    
    Fix:
    ----
    1. stream line mutex usage for entire ddl_tracker class. Currently used only for corrupted STL map.
    2. Use space id instead of table id in messages
    3. Rename add_table() since the name is confusing. Actual map elements should be renamed. it will be done later
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    ddb8674 View commit details
    Browse the repository at this point in the history
  18. PXB-3241 : Assertion failure: os0file.cc:3416:!exists while taking ba…

    …ckups with lock-ddl=REDUCED
    
    Problem:
    --------
    Backups taken with lock-ddl=reduced, prepare failed to complete.
    
    Analysis:
    ---------
    When handling .ren files, the destination file name already exists and this causes assertion failure. See the below backup log
    
    ```
    102: 2024-02-28T12:15:49.061631-00:00 1 [Note] [MY-011825] [Xtrabackup] DDL tracking : LSN: 73749548 create table ID: 788 Name: test/#sql-1fc79d_13#p#p3.ibd
    
    423: 2024-02-28T12:15:50.121767-00:00 1 [Note] [MY-011825] [Xtrabackup] DDL tracking : LSN: 74312497 rename table ID: 425 From: test/tt_28_p#p#p3.ibd To: test/#sql2-1fc79d-13#p#p3.ibd
    
    870: 2024-02-28T12:15:50.609015-00:00 2 [Note] [MY-011825] [Xtrabackup] Copying ./test/#sql-1fc79d_13#p#p3.ibd to /home/mohit.joshi/dbbackup_28_02_2024/full/test/#sql-1fc79d_13#p#p3.ibd
    
    1337 2024-02-28T12:15:51.183699-00:00 1 [Note] [MY-011825] [Xtrabackup] DDL tracking : LSN: 74967007 rename table ID: 788 From: test/#sql-1fc79d_13#p#p3.ibd To: test/tt_28_p#p#p3.ibd
    
    1491: 2024-02-28T12:15:51.398615-00:00 2 [Note] [MY-011825] [Xtrabackup] Copying ./test/tt_28_p#p#p3.ibd to /home/mohit.joshi/dbbackup_28_02_2024/full/test/tt_28_p#p#p3.ibd
    
    2115:  2024-02-28T12:15:52.209645-00:00 1 [Note] [MY-011825] [Xtrabackup] DDL tracking : LSN: 75267178 delete table ID: 425 Name: test/#sql2-1fc79d-13#p#p3.ibd
    ```
    
    Whats going on here?
    
    Lets say we have partition  p3 with space id 425. This is being altered. So partition algorithm does this:
    
    1. create new copy of space_id: 788 (#sql1).
    2. rename the existing table 425 to some temp taname (#sql2)
    3. we copied the new copy space_id 788 (#sql1) to backup.
    4. we also copied the space_id 425 with original name (p3).
    5. Later we saw a rename file for the copied tablespace 788. (788.ren created with destination name as tt_28_p#p#p3.ibd
    
    The rename file for space_id 425 is skipped, because we knowthat it is dropped. So only a .del file. Final state of backup is:
    
    ===
    
    788 in backup with name #sql1
    425 in backup with name p3
    788.ren file-> 788 From: test/#sql-1fc79d_13#p#p3.ibd To: test/tt_28_p#p#p3.ibd
    425.drop file
    
    ===
    
    Now prepare starts to process .ren files
    it tries to rename 788  from #sql1 to p3. but p3 already exists..
    
    Fix:
    ----
    we skip rename and other operations if we know that tablespace is going to be dropped.
    In the above example, we skipped 425.ren file.
    
    So while preparing, we should handle the .del files first. Then we are applying all the consolidated operations in a way.
    Then .ren can be processed
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    3b2891b View commit details
    Browse the repository at this point in the history
  19. PXB-3245 : Assertion failure: fil0fil.cc:2545:err == DB_SUCCESS found…

    … during incremental backup with lock-ddl=REDUCED
    
    Problem:
    --------
    File deleted between PXB discovery and opening the file. This time at Fil_shard::create_node.
    It insists the file to be found.
    
    Fix:
    ----
    1. Tolerate the file missing error
    2. Use different error code to track in missing files
    3. free the tablespace object on error (otherwise, if fil_space_t remians in cache, pxb will try to copy the file)
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    6496fc3 View commit details
    Browse the repository at this point in the history
  20. PXB-3280 : undo log truncation causes assertion failure with reduced …

    …lock
    
    Problem:
    -------
    With lock-ddl=reduced and Concurrent undo truncations, xtrabackup fails with an
    assertion
    
    Analysis:
    ---------
    After truncation, the tablespace id of an undo tablespace id might change and xtrabackup returns error instead of crash. But higher layers of undo discovery do not
    tolerate missing files or errors.
    
    Fix:
    ---
    Tolerate file deletions during undo discovery
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    fa718d9 View commit details
    Browse the repository at this point in the history
  21. PXB-3248 Multiple files found for the same tablespace ID

        1. On prepare phase when handling ddl files some of the data files were not loaded to cache because of the first page validation therefore were left without applying ddls on them.
           To tolerate this issue we should open and load data files to cache without validation, to do so we are using fil_tablespace_open_for_recovery() function instead of fil_open_for_xtrabackup().
        2. Remove macro checks for debug_sync_thread() temporarily, for QA testing
    aybek authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    5e35687 View commit details
    Browse the repository at this point in the history
  22. PXB-3034: Bring back UNIV_DEBUG on debug-sync-thread.

    This will be debug only option. Fix release build issues
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    2bbb9d0 View commit details
    Browse the repository at this point in the history
  23. PXB-3320 : prepare_handle_del_files() fails to delete the .meta and .…

    …delta files for deleted tablespaces in incremental backup directory
    
    Problem:
    --------
        1. take full backup with --lock-ddl=reduced
    
        2. create table t1(a INT), lets say space_id 10
    
        3. start incremental backup and pause before backup_start() function (we take Bakcup lock here)
    
        4. incremental backup copied t1.ibd.meta and t1.ibd.delta by this time
    
        5. DROP TABLE t1
    
        6. resume the incremental backup. 10.del file is created
    
        7. prepare the full backup with --apply-log-only
    
        8. prepare incremental backup
    
        incremental backup prepare first processes the .del files. before this all tabelspaces are loaded via .ibd scan
    
        since there is no t1.ibd in backup directory( it is only present as meta and delta file) in incremental backup directory, space_id with 10 is not in cache.
    
        Hence prepare_handle_del_files() will not delete the files related to space_id 10.
        We end up with orphan .ibd or .ibu files. Server ignore orphan .ibd
        But if the tablespace is undo tablespace, orphan .ibu are not ignored by server.
        Server discovers them via *.ibu scan. This can lead to assertion failures.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    ff34687 View commit details
    Browse the repository at this point in the history
  24. PXB-3318 : prepare_handle_ren_files(): failed to handle .ren files

    Problem:
    -------
    1. take full backup
    2. create table t1 before incremental backup
    3. Take incremental backup under gdb and pause at backup_start
    4. now rename t1 to t2
    5. let it finish
    6. prepare full
    7. prepare incremental
    
    It happens because tables created between full and incremental are copied as .delta/*.meta files and not as IBD files.
    prepare_handle_ren_files() relies on *.ibd scan but this cannot work as the *.delta files are not yet loaded to fil_cache.
    
    Fix:
    ----
    Use the meta_map generated from *.meta scan. Use the space_id from space_id.ren to identify the correct .meta and delta files.
    Rename the matched .meta and .delta files to the destination name stored in the .ren file
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    3ddcafd View commit details
    Browse the repository at this point in the history
  25. PXB-3295 : Undo tablespaces are not tracked properly with lock-ddl=RE…

    …DUCED
    
    Problem:
    --------
    1.
    Undo tablespaces are not tracked properly. Since undo tablespaces are not opened via
    fil_open_for_xtrabackup(), they are not tracked as 'copied'. This leads to wrong
    decisions in handle_ddl_operations.
    
    2.
    When new undo tablespaces are created,
    Server doesn't write a MLOG_FILE_CREATE record and so these are missed by the tracking
    system.
    
    Fix:
    ----
    1. track undo tablespaces that xtrabackup copies without lock (before lock state)
    2. After lock is taken, undo tablespces are discovered again (after lock state)
    
    With this before and after states, we now determine undo files to be deleted,
    undo files to be copied. Truncated undo tablespace use different tablespace id, so
    old undo file is marked as deleted and new version of undo tablespace is copied
    
    For example undo_001.ibu of space_id 10 is truncated, the filename remains same
    but it space_id becomes 11. xtrabackup creates 11.ibu.del for undo tablespace to be deleted.
    then undo_001.ibu.new with space_id 11 is copied under lock
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    9807166 View commit details
    Browse the repository at this point in the history
  26. PXB-3295: fix testcase

    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    336d2b7 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    f52b166 View commit details
    Browse the repository at this point in the history
  28. PXB-3221 : Assertion failure: page0cur.cc:1177:ib::fatal triggered du…

    …ring prepare/or next server startup
    
    Problem:
    --------
    If there are tables created in system tablespace and if ALTER ADD INDEX/DROP INDEX is executed before the
    backup lock is taken, system tablespace could end up in corrupted state.
    
    This is because this operation is not redologged and we are supposed to recopy the system tablespace files.
    But we dont track system tablesapce, neither reopen and recopy them. Hence this issue.
    
    Fix:
    ----
    1. Track system tablespace in the list of tables tracked/backedup
    2. Removing tracking for tables in system tablespace except of recopy. Other operations can be played via redolog
    3. Reopen system tablespace and recopy them as ibdata1.new/ ibdata2.new
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    32fb643 View commit details
    Browse the repository at this point in the history
  29. PXB-3331 : Assertion failure: fil0fil.cc:6422:success

    Problem:
    -------
    During prepare, for backups taken with lock-ddl=ON, we did *.ibd scan before
    recovery.
    
    This is allowed only for lock-ddl=REDUCED.
    
    Fix:
    ----
    During prepare, do *.ibd scan and processing of .new, .del, .ren, .corrupt files only if lock-ddl=REDUCED
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    e737b19 View commit details
    Browse the repository at this point in the history
  30. PXB-3349 : Fix compilation errors on some platforms

    Errors are caused by usage of const itertator cend(). Replaced with
    end().
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    ed9b360 View commit details
    Browse the repository at this point in the history
  31. PXB-3350 : Display verbose DDL states maintained in reduced lock mode

    Print all maps, sets, vector used for reduced lock feature
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    58b18c0 View commit details
    Browse the repository at this point in the history
  32. PXB-3330 : Show the total time that xtrabackup locked the server

    xtrabackup can lock server with three types of Backup locks
    1. On upstream, LOCK INSTANCE FOR BACKUP
    2. On PS, LOCK TABLES FOR BACKUP
    3. On upstream if there are myisam tables or non-innodb tables, we also
       take FLUSH TABLES WITH READ LOCK (this is stronger than backup lock,
       DMLs allowd)
    
    Display the total time server is locked by xtrabackup.
    
    2024-08-16T12:06:06.602134+01:00 0 [Note] [MY-011825] [Xtrabackup] Total time Server is locked by LOCK INSTANCE FOR BACKUP is: 1.634 seconds
    2024-08-16T12:06:06.602153+01:00 0 [Note] [MY-011825] [Xtrabackup] Total time Server is locked by FLUSH TABLES WITH READLOCK is: 134.165 milliseconds
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    83e9ce4 View commit details
    Browse the repository at this point in the history
  33. Configuration menu
    Copy the full SHA
    4524cde View commit details
    Browse the repository at this point in the history
  34. PXB-3351 : Fix undo.sh random failure

    Test has wrong assumption that a open transaction will block all undo
    truncations. For the tablespaces that are marked as inactive, they are
    not blocked by open transaction.
    
    Modified it to check if truncated tablespace is recopied after
    truncation.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    de534fa View commit details
    Browse the repository at this point in the history
  35. Fix memory leak seen on Jenkins.

    This is caused by fil_space_read_name_and_filepath) which expects
    caller to free the memory for the returned values.
    
    Use scope_guard and simply the freeing logic. It is now auto-freed
    on scope exit
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    34165d4 View commit details
    Browse the repository at this point in the history
  36. PXB-3352 : Fix jenkins test failures

    Fix t/bug1461735.sh and t/ddl.sh test failures. These test use
    lock-ddl=OFF and DDL during the backup. Although unsupported,
    testcase exists.
    
    Regression introduced from PXB-3246 fix. A part of fix which suppressing
    warnings about missing files (they are tracked and dleted by DDL
    handling at startup of prepare), introduced a problem that tablespaces
    are not delted by processing of MLOG_FILE_DLETE redos (happens only if
    backup is taken with lock-ddl=OFF).
    
    Fixed the logic, for backups taken with lock-ddl=reduced,
    track the deletions by updating the recv_sys deleted and missing maps.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    b6d1ee2 View commit details
    Browse the repository at this point in the history
  37. PXB-3352 : Fix jenkins test failures

    Fix memory leak seen by undo.sh
    
    On error path, memory allocated is not freed.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    b73ec23 View commit details
    Browse the repository at this point in the history
  38. PXB-3334 Code cleanups in reduced lock feature

    made some minor code changes for better readability
    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    8b767ff View commit details
    Browse the repository at this point in the history
  39. PXB-3334 Code cleanups in reduced lock feature

    removed argument `prep_handle_ddls` from all using functions, using xtrabackup_prepare instead
    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    e9d0d97 View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    f464bbc View commit details
    Browse the repository at this point in the history
  41. PXB-3334 Code cleanups in reduced lock feature

    Add error handling for prepare_handle_rename() and prepare_handle_del() operations and some minor code refactoring
    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    2935136 View commit details
    Browse the repository at this point in the history
  42. Addressing the PR review comments

    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    9d6961f View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    4bf6795 View commit details
    Browse the repository at this point in the history
  44. PXB-3269

    1. Move to_string to utils.cc
    2. Add checks to ensure the ddl_trackers maps are not updated after
       we reach handle_ddl_operations()
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    1985f69 View commit details
    Browse the repository at this point in the history
  45. Configuration menu
    Copy the full SHA
    5a15a2e View commit details
    Browse the repository at this point in the history
  46. Configuration menu
    Copy the full SHA
    e7eff0f View commit details
    Browse the repository at this point in the history
  47. PXB-3269: Address self review comments

    Add ifdef XTRABACKUP for the code introduced in innobase codebase.
    (100% not possible though for places that are heavily refactored)
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    8dba6c0 View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    e7991db View commit details
    Browse the repository at this point in the history
  49. PXB-3368 Fix jenkins test failures

    Setting handle_ddl_ops variable to false initially
    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    7674243 View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    181423f View commit details
    Browse the repository at this point in the history
  51. PXB-3269 : Fix debug assertion failure on prepare_handle_ren() files.

    It is possible that a space_id.ren with content of desired filename,
    the destination file name could already exist.
    
    If the source and desitnation to be renamed is same, skip rename.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    8e6ed60 View commit details
    Browse the repository at this point in the history
  52. PXB-3368 Fix jenkins test failures

    Fix keyring test failures by replacing keyring_file to keyring_component
    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    5af6918 View commit details
    Browse the repository at this point in the history
  53. PXB-3370 : Full Backup fails while open file limit is reduced to 1024…

    … from 65536
    
    Problem:
    --------
    A regression introduced by 6c9aa00 caused extra opening of files.
    
    Fix:
    ----
    remove extra file open
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    661ed2c View commit details
    Browse the repository at this point in the history
  54. PXB-3374 : Space ID is missing encryption information

    With lock-ddl=REDUCED, the following sequence can happen (Not possible
    with lock-ddl=ON)
    
        1. first an IBD with all zero(invalid) encryption is found
        2. so this is added to invalid encryption ids
        3. the same IBD is found again (because of concurrent DDL, they are both found with different names)
        4. this time IBD has proper encryption,  so fil_space_t is created
        5. encryption info from redo is parsed. fil_tablespace_redo_encryption()
        6. because the fil_space_t exists, encryption key is not added to the recovery encryption keys map
        7. later at the end of the backup, we check if we have found a valid encryption key for the invalid encryption space_ids.
        8. we haven’t (remember at step 6, we skipped it)
        9. backup is aborted
    
    Fix:
    ----
    Store keys in recv_sys->keys map even if there exists a tablespace
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    83ac29a View commit details
    Browse the repository at this point in the history
  55. PXB-3381 : Implement check to fail early if number of open file handl…

    …es is not same as number of files in datadir
    
    1. Verifies the ulimit -Sn , ulimit -Hn, current --open-files-limit
       parameter and increases the limit if possible, else throws an error
       early.
    
    2. Despite the limit increase, backup may still fail because if there
       are new files that appear, we may need more handles than we first
       calculated.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    09dd30d View commit details
    Browse the repository at this point in the history
  56. PXB-3380 Backup fails when using external tablespaces and external un…

    …do files
    
    Fixed the path for .del/.ren/.new files.
    Removed extra scan of external tablespaces during prepare
    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    4fb9a1d View commit details
    Browse the repository at this point in the history
  57. PXB-3380 Extending the tests for external tablespaces

    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    078f661 View commit details
    Browse the repository at this point in the history
  58. PXB-3380 Addressing review comments

    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    dc9fd6b View commit details
    Browse the repository at this point in the history
  59. PXB-3380 Addressing review comments

    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    3bbdd49 View commit details
    Browse the repository at this point in the history
  60. PXB-3387 : Assertion failure: ddl_tracker.cc:122:!handle_ddl_ops

    Problem:
    --------
    Undo tablespaces are encrypted during the backup. Their key is in
    redo but not on the file yet. During recopy phase, xtrabackup was
    not able to decrypt the pages and wants to add to list of corrupted
    tablespaces. But this is too late as we are at handling DDL operations.
    
    Fix:
    ----
    Read encryption keys of undo tablespaces from redo and use it for
    decryption.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    2787848 View commit details
    Browse the repository at this point in the history
  61. Configuration menu
    Copy the full SHA
    19ff788 View commit details
    Browse the repository at this point in the history
  62. PXB-3388 : PXB fails to take backup of general tablespaces created ou…

    …tside data directory when lock-ddl=REDUCED
    
    Problem:
    -------
    When external general tablespaces are present, with reduced mode,
    xtrabackup created the .crpt files in the external dir instead of
    the backup dir.
    
    Fix:
    ---
    Always create files within backupdir.
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    1a80dd9 View commit details
    Browse the repository at this point in the history
  63. PXB-3390: Table Flags Mismatch Causing Assertion Failure in InnoDB Ta…

    …blespace
    
    Problem:
    ========
    For every IBD file, we open the file twice.
    Datafile::open
    Fil_shard::get_file_size
    
    Between these two operations, a general tablespace may become
    ENCRYPTION Y->N or N->Y
    
    So the tablespace flags can mismatch. This should be tolerated because
    
    Fix:
    ----
    we track this encryption change and recopy them under lock anyway
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    0b265e5 View commit details
    Browse the repository at this point in the history
  64. PXB-3269: Jenkins fixes

    1. Dumple tablespace key has two parts. Saving the keys during the
       backup and using them during prepare. We should use keys from dump
       file during prepare only
    2. Free the memory allocated by fil_path_to_space_name() using a scope
       guard
    satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    1e4ff2f View commit details
    Browse the repository at this point in the history
  65. PXB-3393 : PXB fails to take backup of general tablespaces created ou…

    …tside data directory when lock-ddl=REDUCED
    
        Problem:
        When using backup with lock-ddl=REDUCED and backing up general tablesapce in an external data directory, external directory is assumed as schema of
        the general tablspace. For example:  Delete DDL for `ext_dir/ext_subdir/gen_tbs.ibd` creates `backup/ext_subdir/space_id.del` file, instead of `backup/space_id.del`
        Fix:
        ---
        Check tablespace type and strip the leading path in case of general tablesapces and undo tablespaces
    Aibek Bukabayev authored and satya-bodapati committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    ff3f321 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2024

  1. probuild changes

    satya-bodapati committed Nov 1, 2024
    Configuration menu
    Copy the full SHA
    a022a20 View commit details
    Browse the repository at this point in the history