Poster: Lightweight Content-based Phishing Detection. Ardi, C. & Heidemann, J. Technical Report ISI-TR-2015-698, USC/Information Sciences Institute, May, 2015.
Poster: Lightweight Content-based Phishing Detection [link]Paper  abstract   bibtex   
Increasing use of Internet banking and shopping by a broad spectrum of users results in greater potential profits from phishing attacks via websites that masquerade as legitimate sites to trick users into sharing passwords or financial information. Most browsers today detect potential phishing with URL blacklists; while effective at stopping previously known threats, blacklists must react to new threats as they are discovered, leaving users vulnerable for a period of time. Alternatively, whitelists can be used to identify ``known-good'' websites so that off-list sites (to include possible phish) can never be accessed, but are too limited for many users. Our goal is proactive detection of phishing websites with neither the delay of blacklist identification nor the strict constraints of whitelists. Our approach is to list known phishing targets, index the content at their correct sites, and then look for this content to appear at incorrect sites. Our insight is that cryptographic hashing of page contents allows for efficient bulk identification of content reuse at phishing sites. Our contribution is a system to detect phish by comparing hashes of visited websites to the hashes of the original, known good, legitimate website. We implement our approach as a browser extension in Google Chrome and show that our algorithms detect a majority of phish, even with minimal countermeasures to page obfuscation. A small number of alpha users have been using the extension without issues for several weeks, and we will be releasing our extension and source code upon publication.
@TechReport{Ardi15a,
	author          = "Calvin Ardi and John Heidemann",
	title           = "Poster: Lightweight Content-based Phishing Detection",
	institution     = "USC/Information Sciences Institute",
	year            = 2015,
	sortdate        = "2015-05-01",
	number          = "ISI-TR-2015-698",
	month           = may,
	location        = "johnh: pafile",
	keywords        = "hashing, content reuse, wikipedia, copying, phising",
	url             = "https://ant.isi.edu/%7ejohnh/PAPERS/Ardi15a.html",
	pdfurl          = "https://ant.isi.edu/%7ejohnh/PAPERS/Ardi15a.pdf",
	otherurl        = "http://www.isi.edu/publications/trpublic/files/tr-698.pdf",
	myorganization  = "USC/Information Sciences Institute",
	copyrightholder = "authors",
	project         = "ant, retrofuture, mega",
	abstract        = "
Increasing use of Internet banking and shopping by a broad spectrum of users
results in greater potential profits from phishing attacks via websites that
masquerade as legitimate sites to trick users into sharing passwords or
financial information. Most browsers today detect potential phishing with URL
blacklists; while effective at stopping previously known threats, blacklists
must react to new threats as they are discovered, leaving users vulnerable for
a period of time. Alternatively, whitelists can be used to identify
``known-good'' websites so that off-list sites (to include possible phish) can
never be accessed, but are too limited for many users. Our goal is proactive
detection of phishing websites with neither the delay of blacklist
identification nor the strict constraints of whitelists. Our approach is to
list known phishing targets, index the content at their correct sites, and then
look for this content to appear at incorrect sites. Our insight is that
cryptographic hashing of page contents allows for efficient bulk identification
of content reuse at phishing sites. Our contribution is a system to detect
phish by comparing hashes of visited websites to the hashes of the original,
known good, legitimate website. We implement our approach as a browser
extension in Google Chrome and show that our algorithms detect a majority of
phish, even with minimal countermeasures to page obfuscation. A small number of
alpha users have been using the extension without issues for several weeks, and
we will be releasing our extension and source code upon publication.",
}

Downloads: 0