Encrypted Content Type identification via Machine Learning

Awan, Zeeshan Mehmood; Supervised by Dr. Fawad Khan

DSpace Home
→
E-Theses
→
MCS
→
Information Security
→
MSIS
→
View Item

dc.contributor.author	Awan, Zeeshan Mehmood
dc.contributor.author	Supervised by Dr. Fawad Khan
dc.date.accessioned	2022-10-22T07:27:57Z
dc.date.available	2022-10-22T07:27:57Z
dc.date.issued	2022-09
dc.identifier.other	TIS-354
dc.identifier.other	MSIS-17
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/31245
dc.description.abstract	In the advancing era, ML has become the backbone of IT and being used almost in every system. ML claims that it can solve any problem as long the adequate data is given for training and testing. Whereas Cryptography is another widely use technology, which is used for communication of data via secure mean. Cryptanalysis tells us that cryptosystems is secured if the system cannot be broken by any attack. If the cryptosystems provide indistinguishability, it is considered secure, which means that the attacker cannot get anything from encrypted data, in case of chosen ciphertext attack. To check the feasibility of distinguishability on the ciphertext of secured block ciphers and the identification of the underlying content, we have applied cryptanalysis on AES-128, CBC and ECB mode. We gather our data mainly from Kaggle and we restricted the file size to 11KB to perform ML classifier easily. Extension of the selected data files are jpg, xlsx, mp3 and txt. The files were encrypted 4 times with ECB mode keeping keys same and different, same for the CBC mode. Classification models are RF, KNN and SVM. Datasets were created by using frequency distribution method, and them divided into training and testing dataset. The best result we got is from the case I, where the classification model got an average accuracy of 69.8%. We can observe that when keeping the restricted size and encrypting the data with AES-128 ECB, we can find the underlying file type with relatively good accuracy. However, the worst result we got is from case IV, the classification model got an average accuracy of 25.05% with the latter having the same as random guessing. We can observe that when using AES-128 with CBC, it is difficult to find the underlying file. We used the key size of 16 bytes even then identification is difficult.	en_US
dc.language.iso	en	en_US
dc.publisher	MCS	en_US
dc.title	Encrypted Content Type identification via Machine Learning	en_US
dc.type	Thesis	en_US