Steganography By: Joe Jupin Supervised by: Dr. Longin Jan Latecki
20 Slides2.35 MB
Steganography By: Joe Jupin Supervised by: Dr. Longin Jan Latecki
Overview Introduction Background Clandestine Communication Digital Applications of Steganography Uncompressed Images Compressed Images Steganalysis The Images Used Finding and Extracting Messages from Bitmaps Detecting Messages in jpegs Future Work
Introduction Clandestine Communication Cryptography Steganography Scrambles the message into cipher Hides the message in unexpected places Digital Applications of Steganography Can be hidden in digital data MS Word (doc) Web pages (htm) Executables (exe) Sound files (mp3, wav, cda) Video files (mpeg, avi) Digital images (bmp, gif, jpg)
Length 12 Background Message Hello Stego! Uncompressed Images Character Integer Binary Space Grayscale Bitmap images (bmp) 32 00100000 256 shades of intensity from black to white 0 – 9 Can 48 57 00110000 00111001 be– obtained from color-images Arranged into a 2-D matrix A–Z 65 – 90 01000001 Messages are hidden in the least 01011010 significant bits (lsb) a–z 97 – 122 01100001 – 01111010 Matrix values change slightly Interested in patterns that form messages
Background Compressed Images Grayscale jpeg images (jpg) Joint Photographic Experts Group (jpeg) Converts image to YCbCr colorspace Divides into 8x8 blocks Uses Discrete Cosine Transform (DCT) – Obtain frequency coefficients – Scaled by quantization to remove some frequencies – High quality setting will not be noticed Huffman Coding Affects the images statistical properties
Background Steganalysis The Images Used From Star Trek Website 1,000 color jpeg images 320x240 or 240x320 www.startrek.com There will be Klingons
Finding and Extracting Messages from Bitmaps Problem Messages can be hidden in lsb’s May be anywhere in image Cannot see message in image Would take forever to be processed by a human
Finding and Extracting Messages from Bitmaps Steganography is the art and science of communicating in a Procedure way which hides the existence of the communication. In Inject messages into a images contrast to cryptography, where the "enemy" is allowed to Take a Boolean snapshot of even and odd pixels detect, intercept and modify messages without being able to Construct a string of all possible characters violate certain security premises guaranteed by a An n-pixel image has n-7 individual character enumerations cryptosystem, the- 7goal of steganography is to hide messages (320 x 240 76,793) inside other "harmless" messages in a away that does not Use character properties to match message pattern the enumerated string allow anyin"enemy" to even detect that there is a second Define a ‘message’ (pattern of message characters) secret message present [Markus Kuhn 1995-07-03]. Define ‘message characters’ (used in messages) Use ‘stego stems’ (patterns) A test can be performed faster by using tiled samples
Finding and Extracting Messages from Bitmaps Observation Only considered linear unencrypted messages Trial performed on 100 grayscale bitmaps Took an average of 9 seconds per image to find with 100% accuracy (no training -- cold) 97 clean 3 stego Occasionally some garbage text at head or tail Took an average of 3 seconds per image to test with 100% accuracy Clean images had pattern scores of less than 10 Stego images had pattern scores of 31 or more
Finding and Extracting Messages from Bitmaps Conclusion Messages are detectible and extractible from non-encrypted uncompressed images Linear messages can be found in any direction with more computation This method can be foiled by hashing the message into the image
Detecting Messages in jpegs Problem Cannot use an enumeration scheme to detect or find a message May only be able to detect because of encoding schemes and encryption Cannot see message in image Statistical properties of an image change when a message is injected
Detecting Messages in jpegs -0.004 0.590963 meanV 17.120 meanH 0.050189 meanD 120.485 0.080103 varV 0.059 0.345166 varH 0.363 0.343829 varD 1.041 0.332710 skwV 3.809 skwH -0.291 12 12 12 12 12 12 120.001311 12 0.021374 0.482941 -0.146 838.622 0.094929 97.874 0.084698 0.887 0.411032 0.034 0.331954 1.391 0.572352 3.948 0.260870 -0.703 0.337264 skwD12 krtV krtH krtD meanEv 12 12 12 12 meanEh12 meanEd12 varEv12 0.135543 -2.200 0.065238skwEv 47.077 skwEh 0.079329 -1.128skwEd 0.542244 -0.465 0.187500 2.060 0.603208 3.726 0.306227 -0.738 0.424866 varEh12 15627.538 varEd krtEv krtEh krtEd 12 12 12 12 12 12 12 0.011 0.370270 15.318 0.032725meanD 90.017 varV 0.025054 0.594 varH 0.381317 0.268 0.412698 0.969 0.385321 3.877 0.001666 -0.172 0.043085 meanV meanH varD skwV skwH Procedure 23 23 23 23 23 23 23 23 0.402427 0.053992 -0.523 920.19 62.226 0.155397 -1.366 -0.146 0.553661 0.476190 1.326 0.432629 3.944 0.237224 -0.705 0.271698 skwD krtV krtH krtD meanEv meanEh meanEd varEv 23 23 23 23 23 23 23 23 4.418 0.422609 15572.229 0.096439 0.087974 23.531 -0.123 -0.541 0.463496 0.471598 1.980 0.242233 3.571 0.153389 -0.705 0.360447 varEh23 varEd23 skwEv23 skwEh23 skwEd23 krtEv23 krtEh23 krtEd23 -0.004 0.935 0.395349 0.026724 0.044753 182.339 0.738226 -1.808 0.601 0.479060 1.226 0.367367 4.692 0.073430 0.205 0.361345 meanV meanH34 meanD34 varV34 varH34 varD34 skwV34 skwH34 34 193.451 -0.079 0.427911 0.042625 0.055986 364.874 0.558653 -9.569 -0.116 0.350634 1.133 0.332762 4.244 0.165738 -0.577 0.301011 skwD34 krtV34 krtH34 krtD34 meanEv34 meanEh34 meanEd34 varEv34 1.899 0.611057 3640.213 0.054988 0.166710 24.731 0.766 0.497393 -0.349 0.518569 1.681 0.373766 3.426 0.153005 -0.625 0.320611 varEh34 varEd34 skwEv34 skwEh34 skwEd34 krtEv34 krtEh34 krtEd34 0 72 features plus the class (0 clean, 1 stego) class Obtain the 4-level 2-D wavelet decomposition of the images Obtain the orientation decomposition of frequency space statistics Includes: mean, variance, skewness and kurtosis of coefficients and error for prediction in subband Normalize the data by 0-1 min-max Train Fisher Linear Descriptor (FLD) Test the FLD threshold
Detecting Messages in jpegs Observation Trials performed on 2000 images 1000 clean and 1000 stego Random selection of 1000 instances without replacement (500 each class) Messages in stego had sufficient size Results show overwhelming accuracy Bior3.1 True Neg 100%, True Pos 98.6% Rbio5.5 True Neg 99.8%, True Pos 98.8%
Detecting Messages in jpegs Conclusion Messages of sufficient size can be detected in stego images with great accuracy Improved accuracy may be due to a large training set 1000 (800/200) 500 (400/100) Restricted domain Many similar images
Detecting Messages in jpegs Problems Authors did not handle log of zero problem Replaced with small value Differing jpeg sizes need differing message sizes Dynamic message injection
Detecting Messages in jpegs Other Classifiers Tests were run on J4.8, SMO, Logistic and Naïve Bayes for bior3.1 and rbio5.5 with 80/20 split and default settings Results
Future Work Would like to find optimal stems Pattern matching Text mining Cryptanalysis Would like to optimize TestMsg code C/assembly code
References Petitcolas, F.A.P., Anderson, R., Kuhn, M.G., "Information Hiding A Survey", July1999, URL: http://www.cl.cam.ac.uk/ fapp2/publications/ieee99infohiding.pdf (11/26/0117:00) Farid, Hany, “Detecting Steganographic Messages in Digital Images” Department of Computer Science, Dartmouth College, Hanover NH 03755 Moby Words II, Copyright (c) 1988-93, Grady Ward. All Rights Reserved. Lyu, Siwei and Farid, Hany, “Steganalysis Using Color Wavelet Statistics and One-Class Support Vector Machines”, Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA Farid, Hany, “Detecting Hidden Messages Using Higher Order Statistical Models” Department of Computer Science, Dartmouth College, Hanover NH 03755
Spy Vs. Spy by Antonio Prohias from MAD Magazine
Have a good Winter Break!