๐Ÿ‘บ
Rookie's Notes
  • ๐Ÿ˜Hi there ๐Ÿ‘‹
  • ๐Ÿฆธโ€โ™‚๏ธKey-value Storage
    • ๐Ÿ˜‡MetaData
    • ๐Ÿ˜‡CAP therom
    • ๐Ÿ˜‡LSM Tree
    • ๐Ÿ˜Google BigTable
    • ๐Ÿ˜Google File System
    • ๐Ÿ˜Google MapReduce
    • ๐ŸงBloom Filter
    • ๐ŸงCuckoo Filter๏ผšBetter Than Bloom
    • ๐ŸคฉLevelDB
      • ๐Ÿค“LevelDB & BigTable
      • ๐Ÿค“SSTable in LevelDB
      • ๐Ÿค“Log source code analysis
    • ๐ŸคฉRocksDB
      • ๐Ÿ˜™RocksDB & LevelDB
      • ๐Ÿ˜šGeneral
      • ๐Ÿ˜šOptimization
    • ๐ŸคฉTiKV
      • ๐ŸฅณGeneral
  • ๐ŸฆธDPU Plus
    • ๐Ÿ˜„General
    • ๐Ÿ˜Meson
    • ๐Ÿ˜SSD
    • ๐Ÿ˜†NVMe
    • ๐ŸฅฐRDMA
      • ๐Ÿ˜Import
      • ๐Ÿ˜…RoCE
      • ๐Ÿ˜‹Elements
      • ๐Ÿ˜‚Options
      • ๐ŸฅณService
      • ๐Ÿ˜ƒMemory Region
      • โ˜บ๏ธProtection Domain
      • ๐Ÿ˜Address Handle
      • ๐Ÿ˜…Queue Pair
      • ๐Ÿ˜‚Completion Queue โ€‹
      • ๐Ÿ˜†Shared Receive Queue
      • ๐Ÿ˜†Verbs
      • ๐Ÿฅฒ็”จๆˆทๆ€ไธŽๅ†…ๆ ธๆ€ไบคไบ’
  • ๐Ÿฆนโ€โ™‚๏ธGPU & DB
    • ๐Ÿ˜€Crystal
    • ๐Ÿ˜„GPU Compression
    • ๐Ÿ˜†Mordred
    • ๐Ÿ˜ƒGPU & RDMA & DB
  • ๐ŸฆธDatabases
    • ๐Ÿ˜CMU 15-445
      • ๐Ÿ˜‰Buffer Pool
        • ๐Ÿ˜„Expand
      • ๐Ÿ˜‰B+ Tree Index
        • ๐Ÿ˜ŒPre: B & B+
        • ๐ŸคฃPre: B+Tree
        • ๐Ÿ˜‚Expand
        • ๐Ÿ˜‚Expand2: Delete
        • ๐Ÿ˜‚Expand3: Index_Iterator
      • ๐Ÿ˜‰Query Execution
      • ๐Ÿ˜‰Concurrency Control
    • ๐Ÿ˜…CMU 15-721
      • ๐Ÿ˜‡02 inmemory
      • ๐Ÿ˜‡03 mvcc1
      • ๐Ÿ˜‡04 mvcc2
      • ๐Ÿ˜‡05 mvcc3
      • ๐Ÿ˜‡06 oltpindexes1
  • ๐Ÿฆธโ€โ™‚๏ธBlock Chain
    • ๐Ÿ˜กUniswap-v2 ๅˆ็บฆๆฆ‚่งˆ
    • ๐Ÿ˜ญๅฏนๆŽฅ Uniswap V2 ๅ…‘ๆขไปฃๅธ
    • ๐Ÿค“bignumer.jsไธญๅธธ่ง่ฟ็ฎ—
  • ๐Ÿฆธโ€โ™‚๏ธUtils
    • ๐Ÿ˜…typename or class?
    • ๐Ÿ˜RALL
    • ๐ŸฅฒSmart Pointers
    • ๐ŸคฃParallelism and Concurrency
    • ๐Ÿ˜‡Lock V1
    • ๐Ÿ˜‡Lock V2
    • ๐ŸฅฐThread Pooling
    • ๐Ÿ˜ฉSkiplist
    • ๐Ÿ˜…Miscellaneous Of C++
  • Group 1
    • ๐Ÿซ‚Personal diaries
      • ๐Ÿ˜‘2021ไธญ็ง‹ๆ‚ๆ„Ÿ
      • ๐Ÿ˜–2022็މ็މๆ—ฅ่ฎฐ๏ผˆไธ€๏ผ‰
      • ๐Ÿ˜š2022็މ็މๆ—ฅ่ฎฐ๏ผˆไบŒ๏ผ‰
      • ๐Ÿค”2022็މ็މๆ—ฅ่ฎฐ๏ผˆไธ‰๏ผ‰
      • โ˜น๏ธ2022็މ็މๆ—ฅ่ฎฐ๏ผˆๅ››๏ผ‰
      • ๐Ÿฅฒ2022็މ็މๆ—ฅ่ฎฐ๏ผˆไบ”๏ผ‰
      • ๐Ÿง2023็•™ๅญฆๆ—ฅ่ฎฐ ๏ผˆไธ€๏ผ‰
Powered by GitBook
On this page
  • Introduction
  • MVCC Deletes
  • Indexes with MVCC Tables
  • Garbage Collection: Design Decisions
  • Index Clean-up
  • Version Tracking
  • Granularity
  • Comparison Unit
  • Block Compaction
  • MVCC Deleted Tuples
  • Block Compaction
  • Truncate Operation
  • References
Edit on GitHub
  1. Databases
  2. CMU 15-721

05 mvcc3

Lecture #05: Multi-Version Concurrency Control (Garbage Collection)

Previous04 mvcc2Next06 oltpindexes1

Last updated 2 years ago

Carnegie Mellon University

Introduction

MVCC maintains multiple physical versions of a single logical object in the database. Over time, old versions will become invisible to active transactions under snapshot isolation. These are reclaimable versions that the DBMS must remove in order to reclaim memory.

Up until now, we have assumed that transactions (OLTP) will complete in a short amount of time. This means that the lifetime of an obsolete version is short as well. But HTAP workloads may have long running queries that access old snapshots. Such queries block the previous garbage collection methods (e.g., tuplelevel, transaction-level) that we have discussed.

The DBMS can incur several problems if it is unable to clean up old versions:

  • Increased Memory Usage

  • Memory Allocation Contention

  • Longer Version Chains

  • Garbage Collection CPU Spikes

  • Poor Time-based Version Locality

MVCC Deletes

When the DBMS performs a delete operation, the system will remove the tuple logically removed but it is still physically available. The DBMS physically deletes a tuple from the database only when all versions of a logically deleted tuple are not visible. We need a way to denote that a tuple has been logically deleted at some point in time.

  1. Deleted Flag: Maintain a flag to indicate that the logical tuple has been deleted after the newest physical version. It can either be in tuple header or a separate column. This is the most common method.

  2. Tombstone Flag: Create an empty physical version to indicate that a logical tuple is deleted. To reduce the overhead of creating a full tuple (i.e., with all attributes) in the append-only and timetravel version storage approaches, the DBMS can use a separate pool for tombstone tuples with only a special bit pattern in version chain pointer.

Indexes with MVCC Tables

Most MVCC DBMS do not store version information about tuples in indexes with their keys. An exception is index-organized tables like MySQLโ€™s InnoDB.

Every index must support duplicate keys from different snapshots because the same key may point to different logical tuples in different snapshots. Therefore, the DBMS need to use additional execution logic to perform conditional inserts for primary key / unique indexes. These conditional inserts need to be atomic as well. The DBMSโ€™s execution engine also may get back multiple entries for a single fetch; it then has to follow the pointers to find the proper physical version.

Garbage Collection: Design Decisions

There are four design decisions to make when designing the garbage collection mechanism for an MVCC DBMS [1, 2].

Index Clean-up

The DBMS must remove a tuplesโ€™ keys from indexes when their corresponding versions are no longer visible to active transactions. To achieve this, the DBMS maintains an internal log entry that can track the transactionโ€™s modifications to individual indexes to support GC of older versions on commit and removal modifications on abort.

Version Tracking

The version tracking protocol determines how the DBMS discovers reclaimable versions.

  • Tuple-level: The DBMSโ€™s GC component is responsible for finding old versions by examining tuples directly. The system can use either Background Vacuuming or Cooperative Cleaning to locate old tuples.

  • Transaction-level: Each transaction keeps track their old versions in thread-local storage. This means that the DBMS does not have to scan tuples to determine visibility.

Granularity

Granularity determines how the DBMS should internally organize the expired versions that it needs to check to determine whether they are reclaimable. Different granularity levels balance trade-off between the ability to reclaim versions sooner versus computational overhead.

  • Single Version: The DBMS tracks the visibility of individual versions and reclaims them separately. This approach provides more fine-grained control, but has higher overhead.

  • Group Version: The DBMS organizes versions into groups and reclaim all of them together when the newest tuple in that group is no longer visible to any active transaction. This grouping reduces overhead, but may delay reclamation.

  • Tables: Reclaim all versions from a table if the DBMS determines that active transactions will never access it. This is a special case scenario that requires transactions to execute using either stored procedures or prepared statements since it requires the DBMS knowing what tables a transaction will access in advance.

Comparison Unit

DBMS needs a way to determine whether version(s) are reclaimable. Examining the list of active transactions and reclaimable versions should be latch-free; we want this process to be as efficient as possible to prevent new transactions from committing. As a result, the reclaimable checks might generate false negatives, but this is allowable as long as the versions are eventually reclaimed.

  • Timestamp: Use a global minimum timestamp to determine whether versions are safe to reclaim. This approach is the easiest to implement and execute.

  • Interval: The DBMS identifies ranges of timestamps that are not visible to any active transaction. The lower-bound of this range may not be the lowest timestamp of any active transaction, but range is not visible under snapshot isolation. This approach was first introduced by SAP HANA [2], but it is also used in HyPer [1].

Block Compaction

If the application deletes a tuple, then the slots occupied by that tupleโ€™s versions are available to store new data once the versionsโ€™ storage is reclaimed. Ideally the DBMS should try to reuse those slots to conserve memory. The DBMS also need to deal with the case where the application deletes a bunch of tuples in a short amount of time, which in turn generates a large amount of potentially reusable space.

MVCC Deleted Tuples

The DBMS needs to determine what to do with empty slots after it reclaims tuple versions.

  • Reuse Slot: The DBMS allow workers to insert new tuples in the empty slots. This approach is an obvious choice for append-only storage since there is no distinction between versions. The problem, however, is that it destroys temporal locality of tuples in delta storage.

  • Leave Slot Unoccupied: With this approach, workers can only insert new tuples in slots that were not previously occupied. This ensures that tuples in the same block are inserted into the database at around the same time. Overtime the DBMS will need to perform a background compaction step to combine together less-than-full blocks of data

Block Compaction

A mechanism to reuse empty holes in our database is to consolidate less-than-full blocks into fewer blocks and then returning memory to the OS. The DBMS should move data using DELETE+ INSERT to ensure transactional guarantees during consolidation.

Ideally the DBMS will want to store tuples that are likely to be accessed together within a window of time together in the same block. This will make operations on blocks (e.g., compression) easier to execute because tuples that are unlikely to be updated will be within the same block.

There are different approaches for identifying what blocks to compact:

  1. Time Since Last Update: Leverage the BEGIN-TS field in each tuple to determine when the version was created.

  2. Time Since Last Access: Track the timestamp of every read access to a tuple (e.g., the READ-TS in the basic T/O concurrency control protocol). This expensive to maintain because now every read operation has to perform a write.

  3. Application-level Semantics: The DBMS determines how tuples from the same table are related to each other according to some higher-level construct. This is difficult for the DBMS to figure out automatically unless their are schema hints available (e.g., foreign keys).

Truncate Operation

During block compaction process, the DBMS might want to free up large amount of space with small overhead. TRUNCATE removes all tuples in a table, which is the same as DELETE without a WHERE clause.

The fastest way to execute TRUNCATE is to drop the table and then create it again. This invalidates all versions within that table. We would not need to track the visibility of individual tuples. The GC will free all memory when there are no active transactions that exist before the drop operation. If the catalog is transactional, then this is easy to do since all the operations on metadata are atomic.

References

  1. J. Bottcher, V. Leis, T. Neumann, and A. Kemper. Scalable garbage collection for in-memory mvccยจ systems. Proc. VLDB Endow., 13(2):128โ€“141, Oct. 2019. doi: 10.14778/3364324.3364328.

  2. J. Lee, H. Shin, C. G. Park, S. Ko, J. Noh, Y. Chuh, W. Stephan, and W.-S. Han. Hybrid garbage collection for multi-version concurrency control in sap hana. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD โ€™16, pages 1307โ€“1318, 2016. doi: 10.1145/2882903. 2903734.

๐Ÿฆธ
๐Ÿ˜…
๐Ÿ˜‡
15-721
Advanced Database Systems (Spring
2020) https://15721.courses.cs.cmu.edu/spring2020/
Prof. Andy Pavlo