Exposing the Invisible Web: An Analysis of Third-Party HTTP Requests on 1 Million Websites. Libert, T. International Journal of Communication, 9(0):18, October, 2015.
Exposing the Invisible Web: An Analysis of Third-Party HTTP Requests on 1 Million Websites [link]Paper  abstract   bibtex   
This article provides a quantitative analysis of privacy-compromising mechanisms on 1 million popular websites. Findings indicate that nearly 9 in 10 websites leak user data to parties of which the user is likely unaware; more than 6 in 10 websites spawn third-party cookies; and more than 8 in 10 websites load Javascript code from external parties onto users’ computers. Sites that leak user data contact an average of nine external domains, indicating that users may be tracked by multiple entities in tandem. By tracing the unintended disclosure of personal browsing histories on the Web, it is revealed that a handful of U.S. companies receive the vast bulk of user data. Finally, roughly 1 in 5 websites are potentially vulnerable to known National Security Agency spying techniques at the time of analysis.
@article{libert_exposing_2015,
	title = {Exposing the {Invisible} {Web}: {An} {Analysis} of {Third}-{Party} {HTTP} {Requests} on 1 {Million} {Websites}},
	volume = {9},
	copyright = {The  International Journal of Communication  is an academic journal. As such, it is dedicated to the open exchange of information. For this reason, IJoC is freely available to individuals and institutions. Copies of this journal or articles in this journal may be distributed for research or educational purposes free of charge and without permission. However, commercial use of the IJoC website or the articles contained herein is expressly prohibited without the written consent of the editor. Authors who publish in The  International Journal of Communication  will release their articles under the   Creative Commons Attribution Non-Commercial No Derivatives (by-nc-nd) license  . This license allows anyone to copy and distribute the article for non-commercial purposes provided that appropriate attribution is given. For details of the rights authors grants users of their work, see the  "human-readable summary" of the license , with a link to the full license. (Note that "you" refers to a user, not an author, in the summary.) This journal utilizes the  LOCKSS system to create a distributed archiving system among participating libraries and permits those libraries to create permanent archives of the journal for purposes of preservation and restoration. The publisher perpetually authorizes participants in the LOCKSS system to archive and restore our publication through the LOCKSS System for the benefit of all LOCKSS System participants. Specifically participating libraries may:  Collect and preserve currently accessible materials;  Use material consistent with original license terms;  Provide copies to other LOCKSS appliances for purposes of audit and repair.        Fair Use The U.S. Copyright Act of 1976 specifies, in Section 107, the terms of the Fair Use exception: Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include:  the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;  the nature of the copyrighted work;  the amount and substantiality of the portion used in relation to the copyrighted work as a whole; \&  the effect of the use upon the potential market for or value of the copyrighted work.   The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors. In accord with these provisions, the  International Journal of Communication  believes in the vigorous assertion and defense of Fair Use by scholars engaged in academic research, teaching and non-commercial publishing. Thus, we view the inclusion of “quotations” from existing print, visual, audio and audio-visual texts to be appropriate examples of Fair Use, as are reproductions of visual images for the purpose of scholarly analysis. We encourage authors to obtain appropriate permissions to use materials originally produced by others, but do not require such permissions as long as the usage of such materials falls within the boundaries of Fair Use.  The  International Journal of Communication  encourages authors to employ fair use in their scholarly publishing wherever appropriate. Fair use is the right to use unlicensed copyrighted material (whether it is text, images, audio-visual, or other) in your own work, in some circumstances. We consult the  Code of Best Practices in Fair Use for Scholarly Research in Communication , created by the International Communication Association and endorsed by the National Communication Association, and you should too. If you have any questions about whether fair use applies to your uses of copyrighted material (whether it is text, images, audio-visual, or other) in your scholarship, simply include your rationale, grounded in the Best Practices, as a supplementary document with your submission.},
	issn = {1932-8036},
	shorttitle = {Exposing the {Invisible} {Web}},
	url = {https://ijoc.org/index.php/ijoc/article/view/3646},
	abstract = {This article provides a quantitative analysis of privacy-compromising mechanisms on 1 million popular websites. Findings indicate that nearly 9 in 10 websites leak user data to parties of which the user is likely unaware; more than 6 in 10 websites spawn third-party cookies; and more than 8 in 10 websites load Javascript code from external parties onto users’ computers. Sites that leak user data contact an average of nine external domains, indicating that users may be tracked by multiple entities in tandem. By tracing the unintended disclosure of personal browsing histories on the Web, it is revealed that a handful of U.S. companies receive the vast bulk of user data. Finally, roughly 1 in 5 websites are potentially vulnerable to known National Security Agency spying techniques at the time of analysis.},
	language = {en},
	number = {0},
	urldate = {2020-02-04},
	journal = {International Journal of Communication},
	author = {Libert, Timothy},
	month = oct,
	year = {2015},
	keywords = {Do Not Track, Internet policy, advertising, behavioral tracking, hidden Web},
	pages = {18}
}

Downloads: 0