Tuesday, August 21, 2007

Computers to Reassemble Shredded East German Secret Police Files


FOXNews.com Computers to Reassemble Shredded East German Secret Police Files: "German researchers said Wednesday that they were launching an attempt to reassemble millions of shredded East German secret police files using complicated computerized algorithms."

"Some 16,250 sacks containing pieces of 45 million shredded documents were found and confiscated after the reunification of Germany in 1990. Reconstruction work began 12 years ago but 24 people have been able to reassemble the contents of only 323 sacks."

"Using algorithms developed 15 years ago to help decipher barely legible lists of Nazi concentration camp victims, each individual strip of the shredded Stasi files will be scanned on both sides.

"The data then will be fed into the computer for interpretation using color recognition; texture analysis; shape and pattern recognition; machine and handwriting analysis and the recognition of forged official stamps..."

Vernor Vinge explored a similar idea in his novel Rainbow's End -- see excerpt at technovelgy.

2 comments:

  1. This seems like a monumental task! The power of computers is incredible, this is a puzzle that would take 30 diligent Germans 600 to 800 years to finish by hand, but according to one estimate, might be solved by computer in seven.

    An excerpt from http://www.pimall.com/nais/n.shred.html
    explains the computer-side part of shredded paper reconstruction:
    A programmer would then want to compare the edges of the images in the computer's memory. The basic idea is to turn the edge of a shred image into a "word" according to its pixel pattern. This "word" would then be sorted with the other "words" and the results would indicate which images are matches. Only a small portion of each edge would be compared, since a close match in one area is a good indicator for the whole. A sample size might be three inches in length, starting one inch down from the top of each shred. Reconstruction would be accomplished by drawing in the images in their relative positions and printing the result, or passing the image to an OCR routine for translation into completed ASCII text pages. This too can be very time consuming, but if many of the documents are in the same format, it will become much faster on the second and following documents than the first.

    That entire website describes how tedious this process is by hand, and gives step by step instructions about how it would be done.

    ReplyDelete
  2. edit: found better source to explain the process... http://www.iti.gr/files/isspit04.pdf

    ReplyDelete