The University of Waterloo Libraries will be performing maintenance on UWSpace tomorrow, November 5th, 2025, from 10 am – 6 pm EST.
UWSpace will be offline for all UW community members during this time. Please avoid submitting items to UWSpace until November 7th, 2025.

Exploiting Zero-Entropy Data for Efficient Deduplication

dc.contributor.authorAl Jarah, Mu'men
dc.date.accessioned2025-09-15T13:37:19Z
dc.date.available2025-09-15T13:37:19Z
dc.date.issued2025-09-15
dc.date.submitted2025-09-10
dc.description.abstractAs the volume of digital data continues to grow rapidly, efficient data reduction techniques, such as deduplication, are essential for managing storage and bandwidth. A key step in deduplication is file chunking, which is typically performed using Content-Defined Chunking (CDC) algorithms. While these algorithms have been studied under random data, their performance in the presence of zero-entropy data, where long sequences of identical bytes appear, has not been explored. Such zero-entropy data are common in real-world datasets and introduce challenges for CDC in deduplication systems. This thesis studies the impact of zero-entropy data on the performance of both hash-based and hashless state-of-the-art CDC algorithms. The results show that existing algorithms, particularly hash-based ones, are inefficient at detecting and handling zero-entropy blocks, especially when these blocks are small, which reduces space savings. To address this issue, I propose ZERO (Zero-Entropy Region Optimization), a system that can be integrated into the deduplication pipeline. ZERO identifies and extracts zero-entropy blocks prior to chunking, compresses them using Run-Length Encoding (RLE), and stores their metadata for later reconstruction. ZERO improves deduplication space savings by up to 29% without impacting throughput.
dc.identifier.urihttps://hdl.handle.net/10012/22414
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.titleExploiting Zero-Entropy Data for Efficient Deduplication
dc.typeMaster Thesis
uws-etd.degreeMaster of Mathematics
uws-etd.degree.departmentDavid R. Cheriton School of Computer Science
uws-etd.degree.disciplineComputer Science
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorAl-Kiswany, Samer
uws.contributor.affiliation1Faculty of Mathematics
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Al Jarah_Mu'men.pdf
Size:
1.73 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: