Abstract:
An increasing number of people are using social media services and with it comes a more attractive outlet for phishing attacks. Our initial focus is to analyze Twitter as it is one of the most popular social media services. Phishers on Twitter curate tweets that lead users to websites that download malware. This is a major issue as phishers can then gain access to the user's digital identity and perform malicious acts. Phishing attacks have the potential to be similar in different regions, perhaps at different times. We use these characteristics to help identify attacks and investigate the use of transfer learning to detect phishing models learned in one region to detect phishing in other regions. We have made three major contributions. Firstly, we have developed a novel semisupervised machine learning algorithm, which we call Pelican, that detects potential phishing attacks in real-time on Twitter. Pelican can be used for early detection of potential phishing attacks and is able to detect potential new attacks without pre-existing assumptions about the type of data or understanding of the characteristics of the attacks. The technique uses ensembles and sampling methods to handle class imbalances in real-world applications. Secondly, the technique automatically detects unusual behaviour or changes in Twitter. We have investigated changes in trends across Twitter to detect changes in online behaviour of potential phishing links. The technique uses a change detector that enables automatic retraining when there is unusual behaviour detected. Pelican is a novel technique that adapts to changes within phishing attacks in real-time. The technique detects 93.94% of the phishing tweets in real-world data that we collected over a 9 month period, which is higher than benchmark algorithms. Finally, we have adapted our system to detect phishing in small populations where data is scarce such as New Zealand. We used inductive instance transfer learning from the United States dataset to build the New Zealand model, by leveraging similar instances of phishing in the US. As a result, we were able to build a more accurate model for NZ. We have also contrasted the types of phishing attacks internationally versus phishing attacks on New Zealand. We have discovered that New Zealand has the lowest rate of phishing among Singapore, Australia and the United States over a 9 month period.