A quick visual "test" to see if a file is likely encrypted using the R statistical computing package.




Is your file encrypted? A visual test in R

R is a very powerful statistical package and programming environment. It allows complex statistical analysis with just a few lines of code that can take hours to code in other languages. This is a somewhat quirky example of  R useage, but shows nontheless its inherent power and simplicity. Suppose you have a file that you suspect is encrypted. How to you know that. From a statitical standpoint, if you look at values extracted from that file and the file is encrypted, then the distribution of those values should be (more or less) uniform. In other words, the probability of any value in an encrypted file shoudl be about the same as any other. In R, it's extremely easy to 'dissect' any file in binary mode and estimate it's density. It only takes a few lines of code:


Figure 1 - R Code

The code opens the file in binary mode and computes an histogram and also a kernel density estimat on the same plot. For this test I used a photo (of my cat:), and compared the results of the original file to an encrypted zip file of the same photo.


 Figure 1 - The Original File. The most spoiled cat in the neighborhood.


The resulting distribution plots are shown below:



 Figure 1 - Original, Unencypted File



Figure 2 - Encrypted File

It is obvious in this case that the dstribution of values in the encryted file resembles a uniform distribution whereas the original file does not. This is of course not a rigurous test, but it can give a quick visual hint when no other information about a "mistery" file is known. And R makes this type of test a really simple task!


Comments, questions, suggestions? You can reach me at: contact (at sign) paulorenato (dot) com