Research Showcase of Anamitra Makur

Research Showcase

Document Watermarking for Restoration

Watermarking involves transparent embedding of some information on a media such as text document. Typically, watermarks are used for document authentication, where any tampering to the document is detected. We look further to content protection, where the text of the document should be protected against any editing. The aim is to restore the original text if any part of the document is modified, deleted, or added. For example, consider the following scanned document:

reco_sig

Tim gets to carry this memo after watermarking, when it looks like:

wtr_sig

You should not see much difference, since the watermark embedding is using flippable pixel (some pixels are flipped from black to white or white to black), and these pixels are chosen to avoid perceptual distortion. The flipped pixels are shown below with black dots:

wtrdisp

This signed recommendation, however, reaches the intended recipient after some careful editing (where editing has been achieved by erasing some letters and pasting new letters of identical font).

attack_sig

Unfortunately for Tim, when this document goes through our restoration algorithm, all edited letters are restored to their original versions.

How is this done?

Self-embedding is used for this purpose, where the watermark is the text itself (in a more rudimentary form).

Each letter carries the watermark which is another letter. For example, letter a is watermarked with another letter r . Since r is 1110010 in ASCII, so 1110010 is the watermark on a . When this letter a is changed to i by unauthorized editing, the restoration algorithm extracts the watermark of i which is found to be 1011011, not even a valid watermark. So, the algorithm knows that i is tampered. Now, the watermark of the original letter has been embedded on another letter j . So, extracting the watermark of j finds the missing letter as a .

Performance:

The restoration algorithm does not succeed in some cases. Restoration failure occurs when a tampered letter can not be restored. Extraneous detection occurs when an authentic letter is declared as tampered. For two variations of the self-embedding scheme (cyclic and random), the plots on the right show the probability of such failures (vertical axis) for different probabilities of random substitution/editing of letters (horizontal axis).

This plot tells us that some of the tampered letters will be restored. For example, if 10% letters are changed, on an average 2% of them can not be restored, but the remaining 8% are restored.

Reference 1 , Reference 2 , Reference 3