Exploiting Zero-Entropy Data for Efficient Deduplication
| dc.contributor.author | Al Jarah, Mu'men | |
| dc.date.accessioned | 2025-09-15T13:37:19Z | |
| dc.date.available | 2025-09-15T13:37:19Z | |
| dc.date.issued | 2025-09-15 | |
| dc.date.submitted | 2025-09-10 | |
| dc.description.abstract | As the volume of digital data continues to grow rapidly, efficient data reduction techniques, such as deduplication, are essential for managing storage and bandwidth. A key step in deduplication is file chunking, which is typically performed using Content-Defined Chunking (CDC) algorithms. While these algorithms have been studied under random data, their performance in the presence of zero-entropy data, where long sequences of identical bytes appear, has not been explored. Such zero-entropy data are common in real-world datasets and introduce challenges for CDC in deduplication systems. This thesis studies the impact of zero-entropy data on the performance of both hash-based and hashless state-of-the-art CDC algorithms. The results show that existing algorithms, particularly hash-based ones, are inefficient at detecting and handling zero-entropy blocks, especially when these blocks are small, which reduces space savings. To address this issue, I propose ZERO (Zero-Entropy Region Optimization), a system that can be integrated into the deduplication pipeline. ZERO identifies and extracts zero-entropy blocks prior to chunking, compresses them using Run-Length Encoding (RLE), and stores their metadata for later reconstruction. ZERO improves deduplication space savings by up to 29% without impacting throughput. | |
| dc.identifier.uri | https://hdl.handle.net/10012/22414 | |
| dc.language.iso | en | |
| dc.pending | false | |
| dc.publisher | University of Waterloo | en |
| dc.title | Exploiting Zero-Entropy Data for Efficient Deduplication | |
| dc.type | Master Thesis | |
| uws-etd.degree | Master of Mathematics | |
| uws-etd.degree.department | David R. Cheriton School of Computer Science | |
| uws-etd.degree.discipline | Computer Science | |
| uws-etd.degree.grantor | University of Waterloo | en |
| uws-etd.embargo.terms | 0 | |
| uws.contributor.advisor | Al-Kiswany, Samer | |
| uws.contributor.affiliation1 | Faculty of Mathematics | |
| uws.peerReviewStatus | Unreviewed | en |
| uws.published.city | Waterloo | en |
| uws.published.country | Canada | en |
| uws.published.province | Ontario | en |
| uws.scholarLevel | Graduate | en |
| uws.typeOfResource | Text | en |