Spectrum Steganalysis of WAV Audio Streams

Share Embed


Descrição do Produto

Spectrum Steganalysis of Digital WAV Audios Qingzhong Liu 1, 2, Andrew H. Sung 1, 2,*, Mengyu Qiao1 1 Computer Science and Engineering Department 2 Institute for Complex Additive Systems Analysis New Mexico Tech, Socorro, NM 87801 USA {liu, sung, myuqiao}@cs.nmt.edu *correspoding author

Abstract. In this article, we propose an audio steganalysis method, called Fourier Spectrum Steganalysis. The mean values and the standard deviations of the high frequency spectrum of the second and high order derivatives are extracted from the testing signals and the reference versions. A Support Vector Machine (SVM) is employed to discriminate the innocent signals and the steganograms wherein covert messages were embedded. Experimental results show that our method delivers very good performance and holds great promise for effective detection of stegnograms produced by Hide4PGP, Invisible Secrets, S-tools4 and Steghide. Keywords: steganalysis, spectrum, audio, derivative, steganography, SVM

1 Introduction Steganography is the art and science of hiding data in digital images, audios and videos, etc. In the past years, many researchers designed different algorithms for information-hiding [30-34]. To the contrary, steganalysis is the art and science of detecting the informationhiding behaviors in these covers. In the past few years, many researchers presented several steganalysis methods to detect the information-hiding behaviors in multiple steganography systems. Most of these methods are focused on the detection of information-hiding in digital images. For example, as one of well-known detectors, Histogram Characteristic Function Center Of Mass (HCFCOM) was once successful in detecting noise-adding steganography [1]. Another well-known method is to construct the high-order moment statistical model in the multi-scale decomposition using wavelet-like transform and then apply learning classifier to the high order feature set [2]. Shi et al. [3] proposed a Markov process based approach to detect the information-hiding behaviors in JPEG images. Based on the Markov approach, Liu et al. [4] expanded the Markov features to the inter-bands of the DCT domains and combined the expanded features and the polynomial fitting of the histogram of the DCT coefficients, and successfully improved the steganalysis performance in multiple JPEG images. Other works in image steganalysis can be found in the references [510].

The detection of information-hiding in digital audios falls behind the activity in digital images. In the past years, most of steganalysis work was focused on digital images, because digital audios and digital images have different characteristics, most of remarkable steganalysis methods in images don’t work in digital audios. Although the methods of audio steganalysis are few, the following schemes have good exploration for detecting the information-hiding: Ru et al. presented a detection method by measuring the features between the signal and a self-generated reference signal via linear predictive coding [11, 12]; Avcibas designed the content-independent distortion measures as features for classifier design [13]; Ozer et al. constructed the detector based on the characteristics of the denoised residuals of the audio file [14]; to detect the information-hiding in audios, Johnson et al. set up a statistical model by building a linear basis that captures certain statistical properties of audio signals [15]; Kraetzer and Dittmann proposed a Mel-cepstrum based analysis to perform a detection of embedded hidden messages [16, 17]; recently, by utilizing and expanding the Markov approach that was proposed by Shi et al. [3] for detecting the informationhiding in digital images, Liu et al. designed the expanding Markov features for steganalysis of digital audios [27]. Zeng et al. designed new algorithms to detect phase coding steganography based on analysis of the phase discontinuities [28] and to detect echo steganography based on statistical moments of peak frequency [29]. In this article, we propose an audio steganalysis method, named Fourier Spectrum Steganalysis (FSS). The mean values and the standard deviations of the high frequency spectrum of the second and high order derivatives are extracted from the testing audio signals and the reference versions, as the detector. A Support Vector Machine (SVM) with RBF kernel is employed to recognize the innocent signals and the steganograms. Experimental results show that our method is very promising and highly successful for audio steganalysis. In comparison with the steganalysis method based on the high order statistics, derived from linear prediction error [11, 12], and the latest method based on improved Markov approach and the expanding features [27], the advantage of FSS is remarkable. The rest of the paper is organized as follows: section 2 presents the derivative based additive noise model for audio steganalysis, Part 3 describes the generation of the signal reference and the calculation of the difference features of the spectrum. Experiments are presented in part 4, followed by the discussions in section 5 and conclusion in section 6.

2 Derivative Based Additive Noise Model for Audio Steganalysis In image processing, second order derivative is widely employed for detecting isolated points, edges, etc. [18]. Bringing this approach in mind, we developed a scheme based on the joint distribution and the condition distribution of the second order derivative for audio steganalysis. Based on our previous work, we expand the second order derivative to high order derivatives. Specifically, we integrate the second, third, and fourth derivatives of the audio signals, depicted as follows.

An audio signal is denoted as f (t ) . Where t is the sample time and t  0,1, 2,..., N  1 . The second, third, and fourth derivatives, denoted by

D 2f () , D 3f () , and D 4f () , respectively, are given as D 2f (t ) 

d2 f  f (t  2)  2* f (t  1)  f (t ) dt 2 t  0,1, 2,..., N  3.

(1)

d3 f  f (t  3)  3* f (t  2)  3* f (t  1)  f (t ) dt 3 t  0,1, 2,..., N  4

D 3f (t ) 

(2)

d4 f dt 4  f (t  4)  4* f (t  3)  6* f (t  2)  4* f (t  1)  f (t ) D 4f (t ) 

(3)

t  0,1, 2,,..., N  5

The embedding message is denoted as h(t ) and the stego-signal is denoted as s(t ) . The approximation of s(t ) is s(t )  f (t )  h(t ) , but in general, it is not exact. To exactly represent s(t ) , we assume the error between s(t ) and f (t ) is e(t ) , that is,

s (t )  f (t )  e(t ) The

derivatives

of

the

error

(4) e(t )

and

s(t )

are

denoted

by

D () and D () ( n  2,3, 4) , respectively. We obtain n e

n s

Dsn ()  D nf ()  Den (),

n  2,3, 4

(5)

The Discrete Fourier Transforms (DFTs) of Dsn () , D nf () , and Den () , are denoted as

Fks , Fk f , and Fke , respectively. M 1

Fks   Dsn (t )e



j 2 kt M

(6)

t 0

M 1

Fk f   D nf (t )e



j 2 kt M

(7)

t 0

M 1

Fke   Den (t )e t 0



j 2 kt M

(8)

Where We have

k  0,1, 2,..., M  1 and M is the number of samples of the derivatives. Fks  Fk f  Fke

Assume that



(9) f

e

is the angle between the vectors Fk and Fk , then 2

2

2

Fks  Fk f  Fke  2 Fk f  Fks  cos  s 2

The expected value of Fk

(10)

is

2

E ( Fks )

 



F

f 2

k

0



2

 Fke  2 Fk f  Fks  cos  d





0

 Fk

f 2

(11)

d

e 2 k

F

And also we have the following equations





2 2 2   E  Fks  E ( Fks )   

 



0

2

2

4 Fk f  Fks  cos 2  d





0

 2 Fk

f 2

 Fks

2

2

E ( Fks ) Fk f

2

(12)

d

 1

Fke

2

Fk f

(13)

2

Since the expected values of all the derivatives are 0’s, the spectrums in the lowest frequency are zeros. The error e(t) can be treated as random error with the e

expected value of 0. Refer to [25, 26], the spectrum | F | is approximately depicted by a Gaussian distribution or a Gaussian-like distribution. The power is zero at the lowest frequency, as the frequency increases, the spectrum increases. That is, the spectrum at the high frequency is higher than that at the low frequency. Fig. 1 shows the spectrum distribution of the second to the fourth order derivatives of a random

error with the values of +1, -1, and 0. It demonstrates that the spectrum of the high frequency of the derivatives (the central part) is bigger than those of other parts. Normally, digital audios are band-limited, that is, there are limited magnitudes on the high frequency components, although the high frequency spectra are different from one audio signal to another. Based on the equation (13), in low and middle frequencies, the spectrum of audio signal is greater than the spectrum of error signal, so the modification of steg audios at low and middle frequency is negligible, however, the modification at high frequency component may be the clue for us to detect the information-hiding behavior, since the magnitude of the high frequency components of the audio signal is limited and the energy of the Fourier transform of the derivative of the error signal is concentrated at the high frequency components. So far, we reach the key point of our steganalysis. That is, the information-hiding in audios generally increases the high frequency spectrum of the derivatives. Then we can measure the statistics of the high frequency spectrum to recognize the signal carrying covert message or not. Fig. 2 shows the spectrum distribution of the derivatives of an innocent signal and the distribution of the stego-signal that is generated by hiding some message into the innocent signal. It clearly shows that, the high frequency spectrum of the second derivative of the stego-signal has the higher mean values than that of the cover.

(a)

(b)

Fig.1. random error signals with the number of sampling 500 (a) and 10000(b), respectively, and the spectrum distributions (before shifting) of the derivatives. The red dashed rectangles indicate the areas of the high frequency spectrum

Fig.2. The comparison of the spectrum (the first row: whole frequency; and the second row: high-frequency) of the second derivatives of a cover signal and the stego-signal

At this point, we present the following procedure to extract the statistical characteristics of the spectrum. 1. Obtain the Fourier spectrum of the derivatives of the testing signal. 2. Calculate the mean values and the standard deviations of the different frequency zones on the spectrum from step 1. In our approach, we equally divide the whole frequency zone into Z (Z is set to 20 ~ 80) zones or parts, from the lowest to the highest frequency. The mean value and the standard deviation of the ith zone are denoted as mi and  i , respectively. 3.

Choose the mi and  i values from the high frequency spectrum as the features. In our approach, if Z = 80, i is set from 66 to 80.

3 Reference Based Solution Information-hiding does modify the statistics of the spectrum of the derivatives, as depicted in the part 2 and shown in Fig. 2, however, different audio signals have different statistical characteristics of the spectra, in other words, the spectrum statistics vary from one signal to another. Without any reference, it is still difficult to accurately detect some audio stego-systems, or maybe we reach the incorrect conclusion, especially detecting the audio steganograms wherein the bit-depth modification is just limited to the least significant bits, and hence result in very small modification to the original audios. Considering this point, we have the following generation of the signal reference signal, described as:

1.

Randomly modify the least significant bit of the testing signal g , the modified version is denoted r . According to (1)-(3) and (6)-(8), we obtain

Fkg and Fkr . 2.

Obtain the the mean values and the standard deviations of the high frequency spectrums, denoted mi and  i , and mi and  i , associated with Fk g

g

r

r

g

r

and Fk , respectively. 3.

Calculated the differences mi and  i in the following way d

d

mid  mir  mig

(14)



(15)

d i

   r i

g i

The values, mi and  i , extracted from the high frequency spectrum, are the d

d

final features. We also may combine these features with the mi and  i from the high frequency spectrum, depicted in part 2, to constitute the final feature set.

4 Experiments 4.1 Set up and comparison of features We have 1000 WAV audios files covering different types such as digital speech, online broadcast, and music, etc. Respectively, we produced the same amount of the stego-audios by hiding different message in these audios. The hiding tools include Hide4PGP V4.0 [20], Invisible Secrets [21], S-tools4 [22], and steghide [23]. The hidden data include different text messages, audios, and random signals, etc. The embedded in any two audio files are different. We set 80 to Z and extract 80 mean values and 80 standard deviations, total 160 features, of the spectrum of the derivatives. Fig. 3 lists the F-statistics of the features,

mi and  i (Fig. 3a) and mid and  id (Fig. 3b), extracted from 215 stego-audios, which is compared with the statistics from 215 covers. Fig. 3 clearly demonstrates that, regarding the statistical significance, the values d d mi and  i and mi and  i of the high frequency are much better than the values of the low and middle frequencies; the standard deviation is a little better than the mean values; the features associated with Hide4PGP has higher significance scores than those associated with other three information-hiding systems, which implies that the steganalysis performance of Hide4PGP will be the best. Comparing Fig.3 (a) to Fig. 3(b), except the F statistics of the features from Hide4PGP, the features of mid and  id are better than mi and  i . It implies that the generation of signal reference is good to Invisible Secrets, S-tool4, and Steghide, but it is not good to Hide4PGP.

(a)

(b)

Fig.3. the F-statistics of the features, mi and  i (a) and mid and  id (b) of the spectrums of the second derivatives (215 covers vs. 215 stego audios)

4.2 Experimental results Based on the analysis depicted in 4.1, we formed two types of feature sets. The first is called comb-set, given as

COMB  SET :

(16)

x | x m    m    , i  66,67,...,80 i

d i

i

d i

The second is called diff-set, given by

DIFF  SET :

x | x m     , i  66, 67,...,80 d i

(17)

d i

We employ a SVM with RBF kernel [24] to training feature sets and testing feature sets. 75% of the total feature-sets are used for training; the other 25% are used for testing. The training sets and testing sets are randomly chosen in each experiment. In detecting each type of audios, we repeat the experiment for detecting each type of stego-audios 30 times. The testing results consists of true positive (TP), true negative (TN), false positive (FP), and false negative (FN). Since the testing numbers of each type (cover or stegosignal) of audios in each experiment equals, the testing accuracy is calculated by (TP+TN)/(TP+TN+FP+FN). Table 1 lists the average testing accuracy (%) of the experiments. In the table COMB-SET (2D) means that the feature set is the first type and the features are extracted from the spectrum of the second order derivatives, and so on. Table 1. The average testing accuracy (%) of the two types of feature sets of high frequency spectrum of the second, the third, and the fourth order derivatives Feature set COMB-SET (2D) DIFF-SET (2D) COMB-SET (3D) DIFF-SET (3D) COMB-SET (4D) DIFF-SET (4D) COMB-SET (2,3,&4D) DIFF-SET (2,3,&4D)

Hide4PGP

Invisible Secrets

S-tool4

Steghide

99.0 %

90.5

87.1

81.0

98.7

90.6

86.7

82.8

99.2

91.4

88.3

83.8

99.6

95.2

88.8

78.8

99.0

90.9

84.8

85.7

98.5

91.1

87.1

82.0

99.1

95.9

88.1

85.4

99.3

98.7

91.6

81.7

Table 1 indicates that, the steganalysis of Hide4PGP audios achieved the best performance, followed by invisible secrets, s-tools, and steghide. The results are

consistent with the analysis in 4.1. The best average testing accuracy is 99.6% for Hide4PGP, 98.7% for Inivisible Secrets, 91.6% for S-tool4, and 85.7% for Steghide. It shows that our steganalysis is highly successful. Table 2 compares the testing results derived from FSS, high-order statistics based on the linear prediction code [11, 12], here we call it LPC-HOS, and the Expanding Markov Features [27], here we abbreviate it EMF. Table 2 shows that, the advantage of FSS over LPC-HOS and EMF is dramatic, especially in steganalysis of invisible secrets, s-tool4 and steghide, FSS gains the improvement by about 17% to 37%, 19% to 35%, and 19% to 29%, respectively. Table 2. The testing accuracy (%) of FSS, LPC-HOS [11, 12], and EMF [27]. Stegnalysis Method

Hide4PGP

Invisible Secrets

S-tool4

Steghide

FSS (average)

99.1 %

93.0

91.6

81.7

LPC-HOS

80.0

56.9

57.6

53.2

EMF

99.1

76.3

72.7

62.9

5 Discussion To generate the reference signal, we just randomly modified the least significant bits of the testing signal. Since the embedding data in audios with the use of Hide4PGP is not the same, it explains that, regarding the F-statistics, mid and  id , shown in Fig. 3(b), is not better than mi and  i , shown in Fig. 3(a). If we exactly simulate the embedding of the hiding-methods / tools by embedding random signal to generate the signal reference, and calculate the difference between the features from the testing signals and those from the reference, the improvement of the steganalysis performance is worthy of being expected. The steganalysis performance in detecting Hide4PGP audio steganograms is much better than the detection of other steganograms. We analyzed the embedding procedures of these three hiding tools. It shows that, Hide4PGP has a bigger embedding capacity and the modified bits of the signals are not restricted in the least bit, but the last few least significant bits; and hence, it makes more modification and results in the more significant change to the derivatives, which causes the highly detectable in the high frequency spectrum. Here we want to mention the poor performance of the compared method, LPCHOS, in detecting invisible secrets, S-tool 4, and steghide. In our opinion, in these hiding systems, the modification caused by the data hiding is very small; however, the error resulting from the linear prediction may be much higher than the hiding modification, so the detection performance is not good. Our method of FSS overcomes the drawback and obtains good detection results.

We did not study the issue of the feature selection. By employing some methods of feature selection and choosing an optimal feature set, the improvement of the steganalysis performance is reasonably expected.

6 Conclusions In this paper, we proposed a Fourier Spectrum Steganalysis (FSS) of digital audios. We first introduced the second and high order derivatives of the signals, and employed Fourier transform to obtain the spectrums of the derivatives. By randomly modifying the least significant bits, the signal reference is generated and the spectrums of the derivatives are produced. By extracting the statistics of the high frequency spectrum of the derivatives of the signal and the reference, we employ an SVM to discriminate the features from innocent signals and those from stego-signals. Experimental results indicate that proposed FSS is highly promising and gains remarkable improvement, in comparison with the high-order statistics based on linear prediction code [11,12], and the latest modified Markov approach and the expanding features [27] for detecting the information-hiding of digital audios.

7 Acknowledgement The authors gratefully acknowledge the support for this research from ICASA, a research division of New Mexico Tech.

References [1] J. Harmsen and W. Pearlman. Steganalysis of Additive Noise Modelable Information Hiding. Proc. of SPIE Electronic Imaging, Security, Steganography, and Watermarking of Multimedia Contents, V. 5020, pp.131-142, 2003. [2] S. Lyu and H. Farid. How Realistic is Photorealistic, IEEE Trans. on Signal Processing, 53(2): 845-850, 2005. [3] Y. Shi, C. Chen and W. Chen. A Markov process based approach to effective attacking JPEG steganography, Lecture Notes in Computer Sciences, vol.437, pp.249-264, 2007. [4] Q. Liu, A. Sung, B. Ribeiro and R. Ferreira. Steganalysis of Multi-class JPEG Images Based on Expanded Markov Features and Polynomial Fitting. Proc. of 21st International Joint Conference on Neural Networks, pp. 3351-3356. 2008. [5] Q. Liu and A. Sung. Feature Mining and Nuero-Fuzzy Inference System for Steganalysis of LSB Matching Steganography in Grayscale Images. Proc. of 20th International Joint Conference on Artificial Intelligence, pp. 2808-2813, 2007. [6] Q. Liu, A. Sung, J. Xu and B. Ribeiro. Image Complexity and Feature Extraction for Steganalysis of LSB Matching Steganography, Proc. of 18th International Conference on Pattern Recognition, ICPR (1), pp. 1208-1211, 2006. [7] Q. Liu, A. Sung, Z. Chen and J. Xu. Feature Mining and Pattern Classification for Steganalysis of LSB Matching Steganography in Grayscale Images. Pattern Recognition, 41(1): 56-66, 2008.

[8] Q. Liu, A. Sung, B. Ribeiro, M. Wei, Z. Chen and J. Xu. Image Complexity and Feature Mining for Steganalysis of Least Significant Bit Matching Steganography. Information Sciences, 178(1): 21-36, 2008. [9] J. Fridrich. Feature-Based Steganalysis for JPEG Images and its Implications for Future Design of Steganographic Schemes. Lecture Notes in Computer Science, vol. 3200, Springer-Verlag, pp. 67-81, 2004. [10] T. Pevny and J. Fridrich. Merging Markov and DCT Features for Multi-Class JPEG Steganalysis. Proc. SPIE Electronic Imaging, Electronic Imaging, Security, Steganography, and Watermarking of Multimedia Contents IX, vol. 6505, 2007. [11] X. Ru, H. Zhang and X. Huang. Steganalysis of Audio: Attaching the Steghide. Proc. the Fourth International Conference on Machine Learning and Cybernetics, pp. 3937-3942, 2005. [12] X. Ru, Y. Zhang and F. Wu. Audio Steganalysis Based on “Negative Resonance Phenomenon” Caused by Steganographic Tools. Journal of Zhejiang University SCIENCE A, 7(4):577-583, 2006. [13] I. Avcibas. Audio Steganalysis with Content-independent Distortion Measures. IEEE Signal Processing Letters, 2006 13(2):92-95. [14] H. Ozer, B. Sankur, N. Memon and I. Avcibas. Detection of Audio Covert Channels Using Statstical Footprints of Hidden Messages. Digital Signal Processing, 16(4):389-401, 2006. [15] M. Johnson, S. Lyu and H. Farid. Steganalysis of Recorded Speech. Proc. SPIE, vol. 5681, pp.664-672, 2005. [16] C. Kraetzer and J. Dittmann. Pros and Cons of Mel-cepstrum Based Audio Steganalysis Using SVM Classification. Lecture Notes in Computer Science, vol. 4567, pp. 359-377, 2008. [17] C. Kraetzer and J. Dittmann. Mel-cepstrum based steganalysis for voip-steganography. Proc. SPIE Vol. 6505, San Jose, CA, USA, 2007. [18] R.Gonzalez and R. Woods. Digital Image Processing. 3rd edition, ISBN: 9780131687288, Prentice Hall, 2008. [19] T. Hill and P. Lewicki. Statistics: Methods and Applications. ISBN: 1884233597, StatSoft, Inc., 2005. [20] http://www.heinz-repp.onlinehome.de/Hide4PGP.htm [21] http://www.invisiblesecrets.com/ [22] http://digitalforensics.champlain.edu/download/s-tools4.zip [23] http://steghide.sourceforge.net/ [24] V. Vapnik, Statistical Learning Theory, John Wiley, 1998. [25] A. Oppenheim and R. Schafer and J. Buck. Discrete-Time Signal Processing. Prentice Hall, 1999. [26] http://mathworld.wolfram.com/FourierTransformGaussian.html [27 Q. Liu, A. Sung, and M. Qiao, Detecting Information-Hiding in WAV Audios. Proc of 19th International Conference on Pattern Recognition, 2008. [28] W. Zeng, H. Ai, and R. Hu, A Novel Steganalysis Algorithm of Phase coding in Audio Signal. Proc. the Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT), 2007, pp. 261-264. [29] W. Zeng, H. Ai, and R. Hu, An Algorithm of Echo Steganalysis based on Power Cepstrum and Pattern Classification. Proc. International Conference on Information and Automation (ICIA), 2008, pp. 1667-1670. [30] F. Zhang, Z. Pan, K.Cao, F. Zheng and F. Wu, The upper and lower bounds of the information-hiding capacity of digital images, Information Sciences. 178(14): 2950-2959, 2008. [31] C. Chang, C. Lin, C. Tseng, W. Tai. Reversible hiding in DCT-based compressed images, Information Sciences. 177(13): 2768-2786, 2007.

[32] C. Chang, C. Lin. Reversible steganographic method using SMVQ approach based on declustering, Information Sciences. 177(8): 1796-1805, 2007. [33] C. Lin, S. Chen, N. Hsueh. Adaptive embedding techniques for VQ-compressed images. Information Sciences. doi:10.1016/j.ins.2008.09.001 [34] C. Liu and S. Liao. High-performance JPEG steganography using complementary embedding strategy. Pattern Recognition 41(9): 2945-2955, 2008.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.