Modified Cheeger and Ratio Cut Methods Using the Ginzburg-Landau Functional for Classification of High-Dimensional Data.
Merkurjev, E.; Bertozzi, A.; Yan, X.; and Lerman, K.
to appear in Inverse Problems. 2016.
link
bibtex
abstract
@ARTICLE{Merkurjev2016ip,
author = {Ekaterina Merkurjev and Andrea Bertozzi and Xiaoran Yan and Kristina Lerman},
title = {Modified Cheeger and Ratio Cut Methods Using the Ginzburg-Landau Functional for Classification of High-Dimensional Data},
journal = {to appear in Inverse Problems},
year = {2016},
volume = {},
number = {},
pages = {},
abstract={Recent advances in clustering have included continuous relaxations of the Cheeger cut problem and those which address its linear approximation using the graph Laplacian. In this paper, we show how to use the graph Laplacian to solve the fully nonlinear Cheeger cut problem, as well as the ratio cut optimization task. Both problems are connected to total variation minimization, and the related Ginzburg-Landau functional is used in the derivation of the methods. The graph framework discussed in this paper is undirected. The resulting algorithms are efficient ways to cluster the data into two classes, and they can be easily extended to the case of multiple classes, or used on a multiclass data set via recursive bipartitioning. In addition to showing results on benchmark data sets, we also show an application of the algorithm to hyperspectral video data.},
keywords = {social-networks},
}
Recent advances in clustering have included continuous relaxations of the Cheeger cut problem and those which address its linear approximation using the graph Laplacian. In this paper, we show how to use the graph Laplacian to solve the fully nonlinear Cheeger cut problem, as well as the ratio cut optimization task. Both problems are connected to total variation minimization, and the related Ginzburg-Landau functional is used in the derivation of the methods. The graph framework discussed in this paper is undirected. The resulting algorithms are efficient ways to cluster the data into two classes, and they can be easily extended to the case of multiple classes, or used on a multiclass data set via recursive bipartitioning. In addition to showing results on benchmark data sets, we also show an application of the algorithm to hyperspectral video data.
Leveraging the Contributions of the Casual Majority to Identify AppealingWeb Content.
Hogg, T.; and Lerman, K.
In
Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2016), November 2016. AAAI
Paper
link
bibtex
abstract
@INPROCEEDINGS{Hogg2016hcomp,
author = {Tad Hogg and Kristina Lerman},
title = {Leveraging the Contributions of the Casual Majority to Identify AppealingWeb
Content},
booktitle = {Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2016)},
year = {2016},
month = {November},
publisher = {AAAI},
keywords = {crowdsourcing},
abstract={Users of peer production web sites differ greatly in their activity levels. A small minority are engaged contributors, while the vast majority are only casual surfers. The casual users devote little effort to evaluating the site�s content and many of them visit the site only once. This churn poses a challenge for sites attempting to gauge user interest in their content. The challenge is especially severe for sites focusing on content with subjective quality, including movies, music, restaurants and items in other cultural markets. A key question is whether content evaluation should use opinions of all users or only the minority who devote significant effort to reviewing content? Using Amazon Mechanical Turk, we experimentally address this question by comparing outcomes for these two approaches. We find that the larger numbers of less informed users more than offset their noisy signals on content quality to provide rapid evaluation. However, such users are systematically biased, and the speed of their assessments comes at the expense of limited collective accuracy.},
url={http://www.isi.edu/integration/people/lerman/papers/hogg-hcomp.pdf}
}
Users of peer production web sites differ greatly in their activity levels. A small minority are engaged contributors, while the vast majority are only casual surfers. The casual users devote little effort to evaluating the site�s content and many of them visit the site only once. This churn poses a challenge for sites attempting to gauge user interest in their content. The challenge is especially severe for sites focusing on content with subjective quality, including movies, music, restaurants and items in other cultural markets. A key question is whether content evaluation should use opinions of all users or only the minority who devote significant effort to reviewing content? Using Amazon Mechanical Turk, we experimentally address this question by comparing outcomes for these two approaches. We find that the larger numbers of less informed users more than offset their noisy signals on content quality to provide rapid evaluation. However, such users are systematically biased, and the speed of their assessments comes at the expense of limited collective accuracy.
Capturing the interplay of dynamics and networks through parameterizations of Laplacian operators.
Yan, X.; Teng, S.; Ghosh, R.; and Lerman, K.
PeerJ Computer Science, 2: e57+. 2016.
Paper
doi
link
bibtex
abstract
@article{Yan2016peerj,
abstract = {We study the interplay between a dynamical process and the structure of the network on which it unfolds using the parameterized Laplacian framework. This framework allows for defining and characterizing an ensemble of dynamical processes on a network beyond what the traditional Laplacian is capable of modeling. This, in turn, allows for studying the impact of the interaction between dynamics and network topology on the quality-measure of network clusters and centrality, in order to effectively identify important vertices and communities in the network. Specifically, for each dynamical process in this framework, we define a centrality measure that captures a vertex's participation in the dynamical process on a given network and also define a function that measures the quality of every subset of vertices as a potential cluster (or community) with respect to this process. We show that the subset-quality function generalizes the traditional conductance measure for graph partitioning. We partially justify our choice of the quality function by showing that the classic Cheeger's inequality, which relates the conductance of the best cluster in a network with a spectral quantity of its Laplacian matrix, can be extended to the parameterized Laplacian. The parameterized Laplacian framework brings under the same umbrella a surprising variety of dynamical processes and allows us to systematically compare the different perspectives they create on network structure.},
author = {Yan, Xiaoran and Teng, Shanghua and Ghosh, Rumi and Lerman, Kristina},
doi = {https://doi.org/10.7717/peerj-cs.57},
journal = {PeerJ Computer Science},
pages = {e57+},
title = {Capturing the interplay of dynamics and networks through parameterizations of Laplacian operators.},
url = {https://peerj.com/articles/cs-57/},
volume = {2},
keywords={social-networks, social-dynamics},
year = {2016}
}
We study the interplay between a dynamical process and the structure of the network on which it unfolds using the parameterized Laplacian framework. This framework allows for defining and characterizing an ensemble of dynamical processes on a network beyond what the traditional Laplacian is capable of modeling. This, in turn, allows for studying the impact of the interaction between dynamics and network topology on the quality-measure of network clusters and centrality, in order to effectively identify important vertices and communities in the network. Specifically, for each dynamical process in this framework, we define a centrality measure that captures a vertex's participation in the dynamical process on a given network and also define a function that measures the quality of every subset of vertices as a potential cluster (or community) with respect to this process. We show that the subset-quality function generalizes the traditional conductance measure for graph partitioning. We partially justify our choice of the quality function by showing that the classic Cheeger's inequality, which relates the conductance of the best cluster in a network with a spectral quantity of its Laplacian matrix, can be extended to the parameterized Laplacian. The parameterized Laplacian framework brings under the same umbrella a surprising variety of dynamical processes and allows us to systematically compare the different perspectives they create on network structure.
How the structure of Wikipedia articles influences user navigation.
Lamprecht, D.; Lerman, K.; Helic, D.; and Strohmaier, M.
New Review of Hypermedia and Multimedia,1–22. May 2016.
Paper
doi
link
bibtex
abstract
@article{Lamprecht2016,
abstract = {In this work we study how people navigate the information network of Wikipedia and investigate (i) free-form navigation by studying all clicks within the English Wikipedia over an entire month and (ii) goal-directed Wikipedia navigation by analyzing wikigames, where users are challenged to retrieve articles by following links. To study how the organization of Wikipedia articles in terms of layout and links affects navigation behavior, we first investigate the characteristics of the structural organization and of hyperlinks in Wikipedia and then evaluate link selection models based on article structure and other potential influences in navigation, such as the generality of an article's topic. In free-form Wikipedia navigation, covering all Wikipedia usage scenarios, we find that click choices can be best modeled by a bias towards article structure, such as a tendency to click links located in the lead section. For the goal-directed navigation of wikigames, our findings confirm the zoom-out and the homing-in phases identified by previous work, where users are guided by generality at first and textual similarity to the target later. However, our interpretation of the link selection models accentuates that article structure is the best explanation for the navigation paths in all except these initial and final stages. Overall, we find evidence that users more frequently click on links that are located close to the top of an article. The structure of Wikipedia articles, which places links to more general concepts near the top, supports navigation by allowing users to quickly find the better-connected articles that facilitate navigation. Our results highlight the importance of article structure and link position in Wikipedia navigation and suggest that better organization of information can help make information networks more navigable.},
author = {Lamprecht, Daniel and Lerman, Kristina and Helic, Denis and Strohmaier, Markus},
doi = {10.1080/13614568.2016.1179798},
journal = {New Review of Hypermedia and Multimedia},
keywords = {cognitive-limits},
month = may,
pages = {1--22},
publisher = {Taylor \& Francis},
title = {How the structure of Wikipedia articles influences user navigation},
url = {http://dx.doi.org/10.1080/13614568.2016.1179798},
year = {2016}
}
In this work we study how people navigate the information network of Wikipedia and investigate (i) free-form navigation by studying all clicks within the English Wikipedia over an entire month and (ii) goal-directed Wikipedia navigation by analyzing wikigames, where users are challenged to retrieve articles by following links. To study how the organization of Wikipedia articles in terms of layout and links affects navigation behavior, we first investigate the characteristics of the structural organization and of hyperlinks in Wikipedia and then evaluate link selection models based on article structure and other potential influences in navigation, such as the generality of an article's topic. In free-form Wikipedia navigation, covering all Wikipedia usage scenarios, we find that click choices can be best modeled by a bias towards article structure, such as a tendency to click links located in the lead section. For the goal-directed navigation of wikigames, our findings confirm the zoom-out and the homing-in phases identified by previous work, where users are guided by generality at first and textual similarity to the target later. However, our interpretation of the link selection models accentuates that article structure is the best explanation for the navigation paths in all except these initial and final stages. Overall, we find evidence that users more frequently click on links that are located close to the top of an article. The structure of Wikipedia articles, which places links to more general concepts near the top, supports navigation by allowing users to quickly find the better-connected articles that facilitate navigation. Our results highlight the importance of article structure and link position in Wikipedia navigation and suggest that better organization of information can help make information networks more navigable.
Assessing the Navigational Effects of Click Biases and Link Insertion on the Web.
Geigl, F.; Lerman, K.; Walk, S.; Strohmaier, M.; and Helic, D.
In
Hypertext Conference, 2016.
Paper
link
bibtex
abstract
@inproceedings{Geigl16hypertext,
abstract = {Websites have an inherent interest in steering user navigation in order to,
for example, increase sales of specific products or categories, or to guide
users towards specific information. In general, website administrators can use
the following two strategies to influence their visitors' navigation behavior.
First, they can introduce click biases to reinforce specific links on their
website by changing their visual appearance, for example, by locating them on
the top of the page. Second, they can utilize link insertion to generate new
paths for users to navigate over. In this paper, we present a novel approach
for measuring the potential effects of these two strategies on user navigation.
Our results suggest that, depending on the pages for which we want to increase
user visits, optimal link modification strategies vary. Moreover, simple
topological measures can be used as proxies for assessing the impact of the
intended changes on the navigation of users, even before these changes are
implemented.},
booktitle = {Hypertext Conference},
author = {Geigl, Florian and Lerman, Kristina and Walk, Simon and Strohmaier, Markus and Helic, Denis},
title = {Assessing the Navigational Effects of Click Biases and Link Insertion on the Web},
url = {http://arxiv.org/abs/1603.06200},
year = {2016}
}
Websites have an inherent interest in steering user navigation in order to, for example, increase sales of specific products or categories, or to guide users towards specific information. In general, website administrators can use the following two strategies to influence their visitors' navigation behavior. First, they can introduce click biases to reinforce specific links on their website by changing their visual appearance, for example, by locating them on the top of the page. Second, they can utilize link insertion to generate new paths for users to navigate over. In this paper, we present a novel approach for measuring the potential effects of these two strategies on user navigation. Our results suggest that, depending on the pages for which we want to increase user visits, optimal link modification strategies vary. Moreover, simple topological measures can be used as proxies for assessing the impact of the intended changes on the navigation of users, even before these changes are implemented.
Emotions, Demographics and Sociability in Twitter Interactions.
Lerman, K.; Arora, M.; Gallegos, L.; Kumaraguru, P.; and Garcia, D.
In
International Conference on the Web and Social Media, 2016.
Paper
link
bibtex
abstract
@INPROCEEDINGS{Lerman2016icwsm,
author = {Kristina Lerman and Megha Arora and Luciano Gallegos and Ponnurangam Kumaraguru and David Garcia},
title = {Emotions, Demographics and Sociability in Twitter Interactions},
booktitle = {International Conference on the Web and Social Media},
year = {2016},
keywords = {social-psychology},
abstract = {The social connections people form online affect the quality of information they receive and their online experience. Although a host of socioeconomic and cognitive factors were implicated in the formation of offline social ties, few of them have been empirically validated, particularly in an online setting. In this study, we analyze a large corpus of geo-referenced messages, or tweets, posted by social media users from a major US metropolitan area. We linked these tweets to US Census data through their locations. This allowed us to measure emotions expressed in the tweets posted from an area, the structure of social connections, and also use that area's socioeconomic characteristics in analysis. %We extracted the structure of online social interactions from the people mentioned in tweets from that area.
We find that at an aggregate level, places where social media users engage more deeply with less diverse social contacts are those where they express more negative emotions, like sadness and anger. Demographics also has an impact: these places have residents with lower household income and education levels. Conversely, places where people engage less frequently but with diverse contacts have happier, more positive messages posted from them and also have better educated, younger, more affluent residents. Results suggest that cognitive factors and offline characteristics affect the quality of online interactions. Our work highlights the value of linking social media data to traditional data sources, such as US Census, to drive novel analysis of online behavior.
},
url={http://arxiv.org/abs/1510.07090},
}
The social connections people form online affect the quality of information they receive and their online experience. Although a host of socioeconomic and cognitive factors were implicated in the formation of offline social ties, few of them have been empirically validated, particularly in an online setting. In this study, we analyze a large corpus of geo-referenced messages, or tweets, posted by social media users from a major US metropolitan area. We linked these tweets to US Census data through their locations. This allowed us to measure emotions expressed in the tweets posted from an area, the structure of social connections, and also use that area's socioeconomic characteristics in analysis. %We extracted the structure of online social interactions from the people mentioned in tweets from that area. We find that at an aggregate level, places where social media users engage more deeply with less diverse social contacts are those where they express more negative emotions, like sadness and anger. Demographics also has an impact: these places have residents with lower household income and education levels. Conversely, places where people engage less frequently but with diverse contacts have happier, more positive messages posted from them and also have better educated, younger, more affluent residents. Results suggest that cognitive factors and offline characteristics affect the quality of online interactions. Our work highlights the value of linking social media data to traditional data sources, such as US Census, to drive novel analysis of online behavior.
The "Majority Illusion" in Social Networks.
Lerman, K.; Yan, X.; and Wu, X.
PLoS ONE, 11(2): 1-13. 02 2016.
Paper
link
bibtex
abstract
2 downloads
@article{Lerman2016majority,
author = {Kristina Lerman and Xiaoran Yan and Xin-Zeng Wu},
journal = {PLoS ONE},
publisher = {Public Library of Science},
title = {The "Majority Illusion" in Social Networks},
year = {2016},
month = {02},
volume = {11},
keywords = {social-networks},
url = {http://dx.doi.org/10.1371%2Fjournal.pone.0147617},
pages = {1-13},
abstract = {Individual's decisions, from what product to buy to whether to engage in risky behavior, often depend on the choices, behaviors, or states of other people. People, however, rarely have global knowledge of the states of others, but must estimate them from the local observations of their social contacts. Network structure can significantly distort individual?s local observations. Under some conditions, a state that is globally rare in a network may be dramatically over-represented in the local neighborhoods of many individuals. This effect, which we call the ?majority illusion,? leads individuals to systematically overestimate the prevalence of that state, which may accelerate the spread of social contagions. We develop a statistical model that quantifies this effect and validate it with measurements in synthetic and real-world networks. We show that the illusion is exacerbated in networks with a heterogeneous degree distribution and disassortative structure.</p>},
number = {2},
}
Individual's decisions, from what product to buy to whether to engage in risky behavior, often depend on the choices, behaviors, or states of other people. People, however, rarely have global knowledge of the states of others, but must estimate them from the local observations of their social contacts. Network structure can significantly distort individual?s local observations. Under some conditions, a state that is globally rare in a network may be dramatically over-represented in the local neighborhoods of many individuals. This effect, which we call the ?majority illusion,? leads individuals to systematically overestimate the prevalence of that state, which may accelerate the spread of social contagions. We develop a statistical model that quantifies this effect and validate it with measurements in synthetic and real-world networks. We show that the illusion is exacerbated in networks with a heterogeneous degree distribution and disassortative structure.
Geography of Emotion: Where in a City are People Happier?.
Gallegos, L.; Lerman, K.; Huang, A.; and Garcia, D.
In
WWW workshop on MSM, 2016.
Paper
link
bibtex
abstract
1 download
@INPROCEEDINGS{Gallegos2016msm,
author = {Luciano Gallegos and Kristina Lerman and Arthur Huang and David Garcia},
title = {Geography of Emotion: Where in a City are People Happier?},
booktitle = {WWW workshop on MSM},
year = {2016},
keywords = {social-dynamics},
abstract = {During the last years, researchers explored the geographic and environmental factors that affect happiness. More recently,
location-sharing services provided by the social media has given an unprecedented access to geo-located data
for studying the interplay between these factors on a much bigger scale. Do location-sharing services help in turn at
distinguishing emotions in places within a city? Which aspects contribute better at understanding happier places?
To answer these questions, we use data from Foursquare location-sharing service to identify areas within a major US
metropolitan area with many check-ins, i.e., areas that people like to use. We then use data from the Twitter microblogging
platform to analyze the properties of these areas. Specifically, we have extracted a large corpus of geotagged
messages, called tweets, from a major metropolitan area and linked them US Census data through their locations.
This allows us to measure the sentiment expressed in tweets that are posted from a specific area, and also use that
area's demographic properties in analysis. Our results reveal that areas with many check-ins are diffierent from other
areas within the metropolitan region. In particular, these areas have happier tweets, which also encourage people living in it or from other areas to commute longer distances
to these places. These findings shed light on the influence certain places play within a city regarding people's emotions
and mobility, which in turn can be used for city planners for designing happier and more equitable cities.},
url={http://arxiv.org/abs/1507.07632},
}
During the last years, researchers explored the geographic and environmental factors that affect happiness. More recently, location-sharing services provided by the social media has given an unprecedented access to geo-located data for studying the interplay between these factors on a much bigger scale. Do location-sharing services help in turn at distinguishing emotions in places within a city? Which aspects contribute better at understanding happier places? To answer these questions, we use data from Foursquare location-sharing service to identify areas within a major US metropolitan area with many check-ins, i.e., areas that people like to use. We then use data from the Twitter microblogging platform to analyze the properties of these areas. Specifically, we have extracted a large corpus of geotagged messages, called tweets, from a major metropolitan area and linked them US Census data through their locations. This allows us to measure the sentiment expressed in tweets that are posted from a specific area, and also use that area's demographic properties in analysis. Our results reveal that areas with many check-ins are diffierent from other areas within the metropolitan region. In particular, these areas have happier tweets, which also encourage people living in it or from other areas to commute longer distances to these places. These findings shed light on the influence certain places play within a city regarding people's emotions and mobility, which in turn can be used for city planners for designing happier and more equitable cities.
Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior.
Kooti, F.; Lerman, K.; Aiello, L. M.; Grbovic, M.; Djuric, N.; and Radosavljevic, V.
In
The 9th ACM International Conference on Web Search and Data Mining, 2016.
Paper
link
bibtex
abstract
@INPROCEEDINGS{Kooti16wsdm,
author = {Farshad Kooti and Kristina Lerman and Luca Maria Aiello and Mihajlo Grbovic and Nemanja Djuric and Vladan Radosavljevic},
title = {Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior},
booktitle = {The 9th ACM International Conference on Web Search and Data Mining},
year = {2016},
keywords = {social-dynamics},
abstract = {Consumer spending accounts for a large fraction of US economic
activity. Increasingly, consumer activity is moving online, where
digital traces of shopping and purchases provide valuable data about
consumer behavior. We analyze these data extracted from emails
and combine them with demographic information to characterize,
model, and predict consumer behavior. Breaking purchasing down
by age and gender, we find that the amount of money spent on online
purchases grows sharply with age, peaking in late 30s. Men
are more frequent online purchasers and spend more money compared
to women. Linking online shopping to income, we find that
shoppers from more affluent areas purchase somewhat more expensive
items and buy them more frequently, resulting in significantly
more money spent on online purchases. We also look at dynamics
of purchasing behavior. Similar to other online activities, we
observe daily and weekly cycles in purchasing behavior. More interestingly,
we observe temporal patterns in individual purchasing
behavior suggesting shoppers have finite budgets: the more expensive
an item, the longer the shopper waits since the last purchase
to buy it. We also observe that shoppers who email each other
purchase more similar items than socially unconnected shoppers,
and this effect is particularly evident among women. Finally, we
build a model to predict when shoppers will make a purchase and
how much they will spend on it. We find that temporal features
improve prediction accuracy over competitive baselines. A better
understanding of consumer behavior can help improve marketing
efforts and make online shopping more pleasant and efficient.},
url={http://arxiv.org/abs/1512.04912},
}
Consumer spending accounts for a large fraction of US economic activity. Increasingly, consumer activity is moving online, where digital traces of shopping and purchases provide valuable data about consumer behavior. We analyze these data extracted from emails and combine them with demographic information to characterize, model, and predict consumer behavior. Breaking purchasing down by age and gender, we find that the amount of money spent on online purchases grows sharply with age, peaking in late 30s. Men are more frequent online purchasers and spend more money compared to women. Linking online shopping to income, we find that shoppers from more affluent areas purchase somewhat more expensive items and buy them more frequently, resulting in significantly more money spent on online purchases. We also look at dynamics of purchasing behavior. Similar to other online activities, we observe daily and weekly cycles in purchasing behavior. More interestingly, we observe temporal patterns in individual purchasing behavior suggesting shoppers have finite budgets: the more expensive an item, the longer the shopper waits since the last purchase to buy it. We also observe that shoppers who email each other purchase more similar items than socially unconnected shoppers, and this effect is particularly evident among women. Finally, we build a model to predict when shoppers will make a purchase and how much they will spend on it. We find that temporal features improve prediction accuracy over competitive baselines. A better understanding of consumer behavior can help improve marketing efforts and make online shopping more pleasant and efficient.