NUST Institutional Repository

Encrypted Content Type identification via Machine Learning

Show simple item record

dc.contributor.author Awan, Zeeshan Mehmood
dc.contributor.author Supervised by Dr. Fawad Khan
dc.date.accessioned 2022-10-22T07:27:57Z
dc.date.available 2022-10-22T07:27:57Z
dc.date.issued 2022-09
dc.identifier.other TIS-354
dc.identifier.other MSIS-17
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/31245
dc.description.abstract In the advancing era, ML has become the backbone of IT and being used almost in every system. ML claims that it can solve any problem as long the adequate data is given for training and testing. Whereas Cryptography is another widely use technology, which is used for communication of data via secure mean. Cryptanalysis tells us that cryptosystems is secured if the system cannot be broken by any attack. If the cryptosystems provide indistinguishability, it is considered secure, which means that the attacker cannot get anything from encrypted data, in case of chosen ciphertext attack. To check the feasibility of distinguishability on the ciphertext of secured block ciphers and the identification of the underlying content, we have applied cryptanalysis on AES-128, CBC and ECB mode. We gather our data mainly from Kaggle and we restricted the file size to 11KB to perform ML classifier easily. Extension of the selected data files are jpg, xlsx, mp3 and txt. The files were encrypted 4 times with ECB mode keeping keys same and different, same for the CBC mode. Classification models are RF, KNN and SVM. Datasets were created by using frequency distribution method, and them divided into training and testing dataset. The best result we got is from the case I, where the classification model got an average accuracy of 69.8%. We can observe that when keeping the restricted size and encrypting the data with AES-128 ECB, we can find the underlying file type with relatively good accuracy. However, the worst result we got is from case IV, the classification model got an average accuracy of 25.05% with the latter having the same as random guessing. We can observe that when using AES-128 with CBC, it is difficult to find the underlying file. We used the key size of 16 bytes even then identification is difficult. en_US
dc.language.iso en en_US
dc.publisher MCS en_US
dc.title Encrypted Content Type identification via Machine Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account