BlockHammer Preventing RowHammer at Low Cost by Blacklisting
56 Slides3.57 MB
BlockHammer Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows Abdullah Giray Yağlıkçı Minesh Patel Jeremie S. Kim Roknoddin Azizi Ataberk Olgun Lois Orosa Hasan Hassan Jisung Park Konstantinos Kanellopoulos Taha Shahroodi Saugata Ghose* Onur Mutlu *
Webcam Here Executive Summary Motivation: RowHammer is a worsening DRAM reliability and security problem Problem: Mitigation mechanisms have limited support for current/future chips - Scalability with worsening RowHammer vulnerability Compatibility with commodity DRAM chips Goal: Efficiently and scalably prevent RowHammer bit-flips without knowledge of or modifications to DRAM internals Key Idea: Selectively throttle memory accesses that may cause RowHammer bit-flips Mechanism: BlockHammer - Tracks activation rates of all rows by using area-efficient Bloom filters Throttles row activations that could cause RowHammer bit flips Identifies and throttles threads that perform RowHammer attacks Scalability with Worsening RowHammer Vulnerability: - Competitive with state-of-the-art mechanisms when there is no attack - Superior performance and DRAM energy when a RowHammer attack is present Compatibility with Commodity DRAM Chips: - No proprietary information of DRAM internals - No modifications to DRAM circuitry 2
Webcam Here Outline DRAM and RowHammer Background Motivation and Goal BlockHammer RowBlocker AttackThrottler Evaluation Conclusion 3
Webcam Here Outline DRAM and RowHammer Background Motivation and Goal BlockHammer RowBlocker AttackThrottler Evaluation Conclusion 4
Organizing and Accessing DRAM Cells A DRAM cell consists of a capacitor and an access transistor A row needs to be activated to access its content 5
Capacitor voltage (Vdd) DRAM Refresh 100% Refresh Window tREFW Refresh Operations Vmin 0% REF REF time REF Periodic refresh operations preserve stored data [Patel ISCA’17, Kim ISCA’20] 6
Webcam Here The RowHammer Phenomenon DRAM Bank open closed closed Row 0 Victim Row Row 1 Victim Row Row 2 Aggressor Row Row 3 Victim Row Row 4 Victim Row Repeatedly opening (activating) and closing (precharging) a DRAM row causes RowHammer bit flips in nearby cells [Kim ISCA’20] 7
Webcam Here Outline DRAM and RowHammer Background Motivation and Goal BlockHammer RowBlocker AttackThrottler Evaluation Conclusion 8
Webcam Here RowHammer Mitigation Approaches Increased refresh rate REF-to-REF time reduces Fewer activations can fit Physical isolation Aggressor Row DRAM Bank Isolation Rows Large-enough distance Victim Rows Reactive refresh Victim Rows Aggressor Row DRAM Bank Victim rows Refresh Rapidly activated (hammered) Refresh Proactive throttling Fewer activations can be performed 9
Webcam Here Two Key Challenges 1 Scalability with worsening RowHammer vulnerability 2 Compatibility with commodity DRAM chips 10
Scalability with Worsening RowHammer Vulnerability DRAM chips are more vulnerable to RowHammer today RowHammer bit-flips occur at much lower activation counts (more than an order of magnitude decrease): - 139.2K [Y. Kim , ISCA 2014] 9.6K [J. S. Kim , ISCA 2020] RowHammer blast radius has increased by 33%: - 9 rows [Y. Kim , ISCA 2014] - 12 rows [J. S. Kim , ISCA 2020] In-DRAM mitigation mechanisms are ineffective [Frigo , S&P 2020] RowHammer is a more serious problem than ever 11
Webcam Here Mitigation Approaches with Worsening RowHammer Vulnerability Increased refresh rate Physical isolation REF-to-REF time further reduces Even fewer activations can fit Aggressor Row DRAM Bank Isolation Rows Isolation Rows Larger distance more isolation rows Victim Rows Reactive refresh Victim rows Refresh more frequently Refresh more rows Aggressor row DRAM Bank Victim rows Refresh more frequently Refresh more rows Proactive throttling More aggressively throttles row activations 12
Webcam Here Mitigation Approaches with Worsening RowHammer Vulnerability Increased refresh rate Physical isolation REF-to-REF time further reduces Even fewer activations can fit Aggressor Row DRAM Bank Larger distance Isolation Rows Mitigation mechanisms face the challenge of Isolation Rows more isolation rows scalability with worsening RowHammer Victim Rows Reactive refresh Victim rows Refresh more frequently Refresh more rows Aggressor row DRAM Bank Victim rows Refresh more frequently Refresh more rows Proactive throttling More aggressively throttles row activations 13
Webcam Here Two Key Challenges 1 Scalability with worsening RowHammer vulnerability 2 Compatibility with commodity DRAM chips 14
Compatibility DRAM Chip Visible within the Processor with Commodity DRAM Chips Application Level System Level Virtual Memory Address Physical Memory Address Memory Controller (Channel, Rank, Bank Group, Bank, Row, Col) In-DRAM Mapping Physical Rows and Columns DRAM Bus Addresses 15
Compatibility with Commodity DRAM Chips Vendors apply in-DRAM mapping for two reasons: Design Optimizations: By simplifying DRAM circuitry to provide better density, performance, and power Yield Improvement: By mapping faulty rows and columns to redundant ones In-DRAM mapping scheme includes insights into chip design and manufacturing quality In-DRAM mapping is proprietary information 16
Webcam Here RowHammer Mitigation Approaches Increased refresh rate REF-to-REF time reduces Fewer activations can fit Physical isolation Aggressor Row DRAM Bank Isolation Rows Victim Rows Reactive refresh Victim Rows Aggressor Row DRAM Bank Victim rows Identifying victim and isolation rows requires Proactive throttling proprietary knowledge of in-DRAM mapping Fewer activations can be performed 17
Webcam Here Our Goal To prevent RowHammer efficiently and scalably without knowledge of or modifications to DRAM internals 18
Webcam Here Outline DRAM and RowHammer Background Motivation and Goal BlockHammer RowBlocker AttackThrottler Evaluation Conclusion 19
BlockHammer Webcam Here Key Idea Selectively throttle memory accesses that may cause RowHammer bit-flips 20
BlockHammer Overview of Approach RowBlocker Tracks row activation rates using area-efficient Bloom filters Blacklists rows that are activated at a high rate Throttles activations targeting a blacklisted row No row can be activated at a high enough rate to induce bit-flips AttackThrottler Identifies threads that perform a RowHammer attack Reduces memory bandwidth usage of identified threads Greatly reduces the performance degradation and energy wastage a RowHammer attack inflicts on a system 21
Webcam Here Outline DRAM and RowHammer Background Motivation and Goal BlockHammer RowBlocker AttackThrottler Evaluation Conclusion 22
RowBlocker Modifies the memory request scheduler to throttle row activations Blacklists rows with a high activation rate and delays subsequent activations targeting blacklisted rows Blacklisting Logic Delaying Logic 23
RowBlocker Blocks a row activation if the row is both blacklisted and recently activated 24
RowBlocker When a row activation is performed, both RowBlocker-BL and RowBlocker-HB are updated with the row activation information 25
RowBlocker-BL Webcam Here Blacklisting Logic Blacklists a row when the row’s activation count in a time window exceeds a threshold Employs two counting Bloom filters for area-efficient activation rate tracking 26
Webcam Here Counting Bloom Filters Blacklisting logic counts activations using counting Bloom filters A row’s activation count - can be observed more than it is (false positive) - cannot be observed less than it is (no false negative) To avoid saturating counters, we use a time-interleaving approach ACT Test Row A B A Hash functions 10 10 0 10 10 0 0 210 10 10 Minimum 1 27
RowBlocker-BL Blacklisting Logic Blacklisting logic employs two counting Bloom filters A new row activation is inserted in both filters Only one filter (active filter) responds to test queries The active filter changes at every epoch CBFA is active CBFB is passive CBFA is passive CBFB is active 28
RowBlocker-BL Blacklisting Logic Blacklisting logic employs two counting Bloom filters A new row activation is inserted in both filters Only one filter (active filter) responds to test queries The active filter changes at every epoch Blacklists a row if its activation count reaches the blacklisting threshold (NBL) Assume that the row is activated at a high rate Assume that the row is not activated at a high rate 29
Webcam Here Limiting the Row Activation Rate The activation rate is RowHammer-safe if it is smaller than or equal to RowHammer threshold (NRH) activations in a refresh window (tREFW) RowBlocker limits the activation count (NCBF) in a CBF’s lifetime (tCBF) 𝐴 𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑅𝑎𝑡𝑒 𝑖𝑛 𝑎 𝑡 𝐶𝐵𝐹 𝑁 𝑅𝐻 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑎 𝑟𝑒𝑓𝑟𝑒𝑠h 𝑤𝑖𝑛𝑑𝑜𝑤 (𝑡 𝑅𝐸𝐹𝑊 ) Clear CBFA Clear CBFA tCBF Clear CBFB tCBF Clear CBFB 30
Webcam Here Limiting the Row Activation Rate The activation rate is RowHammer-safe if it is smaller than or equal to RowHammer threshold (NRH) activations in a refresh window (tREFW) RowBlocker limits the activation count (NCBF) in a CBF’s lifetime (tCBF) 𝐴 𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑅𝑎𝑡𝑒 𝑖𝑛 𝑎 𝑡 𝐶𝐵𝐹 𝑁 𝑅𝐻 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑎 𝑟𝑒𝑓𝑟𝑒𝑠h 𝑤𝑖𝑛𝑑𝑜𝑤 (𝑡 𝑅𝐸𝐹𝑊 ) RowHammer Safety Constraint 𝑵 𝑪𝑩𝑭 𝒕 Clear CBF𝑪𝑩𝑭 A 𝑵 𝑹𝑯 𝒕 𝑹𝑬𝑭𝑾 Clear CBFA tCBF Clear CBFB tCBF Clear CBFB 31
Webcam Here RowBlocker-HB Limiting the Row Activation Rate Ensures that all rows experience a RowHammer-safe activation rate 𝑁 𝐶𝐵𝐹 /𝑡 𝐶𝐵𝐹 𝑁 𝑅𝐻 /𝑡 𝑅𝐸𝐹𝑊 NCBF row activations NBL row activations Blacklisted row activation Row activation tRC tRC x NBL tDelay tDelay tCBF tDelay tDelay time tCBF – (tRC ️NBL) 𝑡 𝐶𝐵𝐹 (𝑡 𝑅𝐶 𝑁 𝐵𝐿 ) We limit NCBF by configuring tDelay : 𝑁 𝐶𝐵𝐹 𝑁 𝐵𝐿 ¿ 𝑡 𝐷𝑒𝑙𝑎𝑦 32
RowBlocker-HB Webcam Here Delaying Row Activations RowBlocker-HB ensures no subsequent blacklisted row activation is performed sooner than tDelay RowBlocker-HB implements a history buffer for row activations that can fit in a tDelay time window A blacklisted row activation is blocked as long as a valid activation record of the row exists in the history buffer No row can be activated at a high enough rate to induce bit-flips 33
Webcam Here Outline DRAM and RowHammer Background Motivation and Goal BlockHammer RowBlocker AttackThrottler Evaluation Conclusion 34
AttackThrottler Tackles a RowHammer attack’s performance degradation and energy wastage on a system A RowHammer attack intrinsically keeps activating blacklisted rows RowHammer Likelihood Index (RHLI): Number of activations that target blacklisted rows (normalized to maximum possible activation count) RHLI 0.0 Benign application No blacklisted row activations 1.0 RowHammer attack Blacklisted row activation count approaches RowHammer threshold RHLI is larger when the thread’s access pattern is more similar to a RowHammer attack 35
AttackThrottler Applies a smaller quota to a thread’s in-flight request count as RHLI increases 0.0 Benign application No blacklisted row activations No quota applied 1.0 RowHammer attack Blacklisted row activation count approaches RowHammer threshold No request is allowed RHLI Reduces a RowHammer attack’s memory bandwidth consumption, enabling a larger memory bandwidth for concurrent benign applications Greatly reduces the perfomance degradation and energy wastage a RowHammer attack inflicts on a system RHLI can also be used as a RowHammer attack indicator by the system software 36
Webcam Here Outline DRAM and RowHammer Background Motivation and Goal BlockHammer RowBlocker AttackThrottler Evaluation Conclusion 37
Webcam Here Evaluation BlockHammer’s Hardware Complexity We analyze six state-of-the-art mechanisms and BlockHammer NRH 32K We calculate area, access energy, and static power consumption* Mitigation Mechanism SRAM KB CAM KB BlockHammer PARA [73] ProHIT [137] MRLoc [161] CBT [132] TWiCe [84] Graphene [113] 51.48 16.00 23.10 - 1.73 0.22 0.47 8.50 14.02 5.22 Area mm2 %CPU 0.14 0.01 0.01 0.01 0.20 0.15 0.04 0.06 0.01 0.01 0.08 0.06 0.02 Access Energy Static Power pJ mW 20.30 3.67 4.44 9.13 7.99 40.67 22.27 0.14 0.21 35.55 21.28 3.11 BlockHammer is low cost and competitive with state-of-the-art mechanisms * Assuming a high-end 28-core Intel Xeon processor system with 4-channel single-rank DDR4 DIMMs with a RowHammer threshold (NRH) of 32K 38
Webcam Here Evaluation NRH 1K NRH 32K BlockHammer’s Hardware Complexity Mitigation Mechanism SRAM KB CAM KB BlockHammer PARA [73] ProHIT [137] MRLoc [161] CBT [132] TWiCe [84] Graphene [113] 51.48 16.00 23.10 - 1.73 0.22 0.47 8.50 14.02 5.22 BlockHammer PARA [73] ProHIT [137] MRLoc [161] CBT [132] TWiCe [84] Graphene [113] 441.33 x x 512.00 738.32 - 55.58 x x 272.00 448.27 166.03 Area mm2 %CPU 0.14 0.01 0.01 0.01 0.20 0.15 0.04 1.57 0.01 x x 20x 3.95 35x 5.17 23x 1.14 0.06 0.0110x 0.01 0.08 0.06 0.02 0.64 x x 1.60 2.10 0.46 Access Energy Static Power pJ mW 20.30 3.675x 4.44 9.13 7.99 40.67 99.64 23x x x 127.93 124.79 917.55 22.27 0.1410x 0.21 35.55 21.28 3.11 220.99 x 15x x 30x 535.50 30x 631.98 93.96 BlockHammer’s hardware complexity scales more efficiently than state-of-the-art mechanisms 39
Evaluation Performance and DRAM Energy Cycle-level simulations using Ramulator and DRAMPower System Configuration: Processor LLC Memory scheduler Address mapping DRAM 3.2 GHz, {1,8} core, 4-wide issue, 128-entry instr. window 64-byte cacheline, 8-way set-associative, {2,16} MB FR-FCFS Minimalistic Open Pages DDR4 1 channel, 1 rank, 4 bank group, 4 banks per bank group Single-Core Benign Workloads: - 22 SPEC CPU 2006 - 4 YCSB Disk I/O - 2 Network Accelerator Traces - 2 Bulk Data Copy with Non-Temporal Hint (movnti) Randomly Chosen Multiprogrammed Workloads: - 125 workloads containing 8 benign applications - 125 workloads containing 7 benign applications and 1 RowHammer attack thread 40
Evaluation Performance and DRAM Energy We classify single-core workloads into three categories based on row buffer conflicts per thousand instructions 0.0 1.0 Low (L) 5.0 Medium (M) High (H) RBCPKI No application’s row activation count exceeds BlockHammer’s blacklisting threshold (NBL) BlockHammer does not incur performance or DRAM energy overheads for single-core benign applications 41
Evaluation Performance and DRAM Energy System throughput (weighted speedup) Job turnaround time (harmonic speedup) Unfairness (maximum slowdown) DRAM energy consumption No RowHammer Attack BlockHammer introduces very low performance ( 0.5%) and DRAM energy ( 0.4%) overheads RowHammer Attack Present BlockHammer significantly increases benign application performance (by 45% on average) and reduces DRAM energy consumption (by 29% on average) 42
Evaluation Scaling with RowHammer Vulnerability System throughput (weighted speedup) Job turnaround time (harmonic speedup) Unfairness (maximum slowdown) DRAM energy consumption No RowHammer Attack BlockHammer’s performance and energy overheads remain negligible ( 0.6%) RowHammer Attack Present BlockHammer scalably provides much higher performance (71% on average) and lower energy consumption (32% on average) than state-of-the-art mechanisms 43
Webcam Here More in the Paper Security Proof - Mathematically represent all possible access patterns - We show that no row can be activated high-enough times to induce bit-flips when BlockHammer is configured correctly Addressing Many-Sided Attacks Evaluation of 14 mechanisms representing four mitigation approaches - Comprehensive Protection Compatibility with Commodity DRAM Chips Scalability with RowHammer Vulnerability Deterministic Protection 44
Webcam Here Outline DRAM and RowHammer Background Motivation and Goal BlockHammer RowBlocker AttackThrottler Evaluation Conclusion 45
Webcam Here Conclusion Motivation: RowHammer is a worsening DRAM reliability and security problem Problem: Mitigation mechanisms have limited support for current/future chips - Scalability with worsening RowHammer vulnerability Compatibility with commodity DRAM chips Goal: Efficiently and scalably prevent RowHammer bit-flips without knowledge of or modifications to DRAM internals Key Idea: Selectively throttle memory accesses that may cause RowHammer bit-flips Mechanism: BlockHammer - Tracks activation rates of all rows by using area-efficient Bloom filters Throttles row activations that could cause RowHammer bit flips Identifies and throttles threads that perform RowHammer attacks Scalability with Worsening RowHammer Vulnerability: - Competitive with state-of-the-art mechanisms when there is no attack - Superior performance and DRAM energy when a RowHammer attack is present Compatibility with Commodity DRAM Chips: - No proprietary information of DRAM internals - No modifications to DRAM circuitry 46
BlockHammer Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows Abdullah Giray Yağlıkçı Minesh Patel Jeremie S. Kim Roknoddin Azizi Ataberk Olgun Lois Orosa Hasan Hassan Jisung Park Konstantinos Kanellopoulos Taha Shahroodi Saugata Ghose* Onur Mutlu *
BlockHammer Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows Backup Slides
Timing Constraints for DRAM Row Activations Timing row activations is critical to meet reliability and power constraints. Two timing constraints limit row activation rates. time Bank A ACT ACT Time difference tRC ( 45-50ns) Row X Row Y ACT Bank B Row Z ACT Bank C Row T ACT Bank D Row U Bank E Time difference tFAW ( 30-35ns) Bank F ACT Row V ACT Time difference tFAW ( 30-35ns) 1 2 3 4 Row W 5 6 7 tRC. : Minimum delay between two consecutive activations in a bank. tFAW: Rolling time window in which at most four rows can be activated in a rank. 49
BlockHammer Hardware Complexity RowBlocker - RowBlocker-BL: Implemented per-bank 1K counters in a CBF 4 H3 hash functions - RowBlocker-HB: Implemented per-rank 887 entries AttackThrottler - Two counters per Bank, Thread pair. 50
Webcam Here RowHammer Characteristics RowHammer Threshold (NRH): The minimum row activation count in a refresh window to induce a RowHammer bit-flip. Blast Radius (rBlast): The maximum physical distance from the aggressor row at which RowHammer bit-flips can be observed. Blast Impact Factor (ci): Set of coefficients that scale a RowHammer attacks impact on victim rows based on their physical distance to the aggressor row. 51
Many-Sided Attacks NRH : RowHammer threshold for single-sided attack. NRH* : Maximum activation count that BlockHammer allows in a refresh window. rBlast : Blast radius ci : Blast impact factor We configure NRH* such that hammering all rows NRH* times does not cause bit-flips. 𝑟 𝐵𝑙𝑎𝑠𝑡 2𝑁 𝑅𝐻 𝑐 𝑖 𝑁 𝑅𝐻 𝑖 1 52
DRAM Organization local localrow row local row decoder decoder global row decoder DRAM Bank A DRAM bank is hierarchically organized into subarrays subarray subarray local bitline local local row-buffer row-buffer local DRAM cell wordline DRAM row global row buffer Columns of cells in subarrays share a local bitline Rows of cells in a subarray share a wordline 53
Cache line Row Decoder DRAM Operation READ Local Row READBuffer READ DRAM Command Sequence ACT R0 RD RD RD PRE R0 time ACT R1 RD RD RD 54
Webcam Here DRAM Cell Each cell encodes information in leaky capacitors wordline access transistor bitline capacitor charge leakage paths Stored data is corrupted if too much charge leaks (i.e., the capacitor voltage degrades too much) [Patel ISCA’17, Kim ISCA’20] 55
Webcam Here Security Analysis No permutation of epochs can satisfy the necessary constraints of a successful attack 56