Hours | Shaughnessy I | Shaughnessy II | Pinnacle I | Pinnacle II | Pinnacle III |
---|---|---|---|---|---|
07:30 | Registration - (in the foyer outside of the Pinnacle Ballroom) | ||||
8:30-10:30 | READNet Workshop Session I Session Chair: Michele Coscia |
SI Workshop Session Session Chair: Jalal Kawash |
Tutorial I –Part I 8:30 -10:30 | ||
10:30-11:00 | |||||
11:00-12:30 | READNet Workshop Session II Session Chair: Andrea De Salve |
SNAST Workshop Session I Session Chair: Thirimachos Bourlai |
SNAA Workshop Session I Session Chair: Piotr Bródka |
Demo Track Session Session Chair: Tansel Ozyer |
Tutorial I. Part II. 11:00 – 11:30 Tutorial II. Part I. 11:30 – 12:30 |
12:30-14:00 | |||||
14:00-16:00 | MSNDS Workshop Session I Session Chair: Min-Yuh Day |
SNAST Workshop Session II Session Chair: Panagiotis Karampelas |
SNAA Workshop Session II Session Chair: Piotr Bródka |
Posters Madness Session Session Chair: Mohammad Tayebi |
Tutorial II. Part II. 14:00 – 15:30 Tutorial III. Part I. 15:30 – 16:00 |
16:00-16:30 | |||||
16:30 –18:30 | MSNDS Workshop Session II Session Chair: Shih-Hung Wu |
PhD Track Session Session Chair: Tansel Ozyer |
Tutorial III. Part II. | ||
19:30 – 22:00 |
Time | Tutorial Title | Instructor |
---|---|---|
8:30-10:30, and 11:00–11:30 | Tutorial I: Knowledge Discovery in the social network era: An overview of (big) data analytics approaches for social networks | Elio Masciari, University Federico II of Naples, Italy |
11:30–12:30 and 14:00–15:30 | Tutorial II: Introduction to Social Network Analysis with NodeXL | Marc A.Smith, Social Media Research Foundation,Redwood City, CA |
15:30–16:00 and 16:30–18:30 | Tutorial III: Using RAPIDS for Accelerated Social Network Analysis | Bradley Rees and Corey Nolet, RAPIDS cuML Sr Researcher, NVIDIA, San Jose, CA |
Hours | Pinnacle I | Pinnacle II | Pinnacle III | Shaughnessy I | Shaughnessy II |
---|---|---|---|---|---|
07:30 | Registration - (in the foyer outside of the Pinnacle Ballroom) | ||||
09:00 - 09:30 | Opening Session for ASONAM - (Pinnacle Ballroom) | ||||
09:30 - 10:30 | Keynote I Graphons and Machine Learning: Modeling and Estimation of Sparse Networks at Scale Jennifer Tour Chayes - Microsoft Research, Massachusetts, USA Chair: Wei Chen - (Pinnacle Ballroom) |
||||
10:30-11:00 | |||||
11:00 - 12:30 | 1A- Communities I Session Chair: Michele Coscia |
1B- Misbehavior & Misinformation I Session Chair: Rahul Pandey |
1C- Network Embedding Session Chair: Tyler Derr |
Industrial Track Session I Session Chair: Neil Shah |
FOSINT-SI Session I Session Chair: Mohammad Tayebi |
12:30-14:00 | |||||
14:00-16:00 | 2A- Information & Influence Diffusion I Session Chair: Elio Masciari |
2B- Misbehavior & Misinformation II Session Chair: Roy Ka-Wei Lee |
2C- Network Analysis with Machine Learning I Session Chair: Ashok Srinivasan |
Industrial Track Session II Session Chair: Konstantinos Xylogiannopoulos |
FOSINT-SI Session II Session Chair: Uwe Glässer |
16:00-16:30 | |||||
16:30 - 18:00 | Panel - (Pinnacle Ballroom) |
Hours | Pinnacle I | Pinnacle II | Pinnacle III | Shaughnessy I | Shaughnessy II |
---|---|---|---|---|---|
07:30 | Registration - (in the foyer outside of the Pinnacle Ballroom) | ||||
9:30-10:30 | Keynote II: Friendship Paradox and Information Bias in Networks Kristina Lerman - University of Southern California, USA Chair: Francesca Spezzano - (Pinnacle Ballroom) |
||||
10:30-11:00 | |||||
11:00-12:30 | 3A - Communities II Session Chair: Xiaokui Xiao |
3B - Social Media Analysis Session Chair: Sumeet Kumar |
3C - Network Analysis with Machine Learning II Session Chair: Rahul Pandey |
FAB- Session I Session Chair: Jon Rokne |
FOSINT-SI Session III Session Chair: Francesca Spezzano |
12:30-14:00 | |||||
14:00 - 16:00 | 4A - Information & Influence Diffusion II Session Chair: Andrea Tagarelli |
4B - Elections and Politics Session Chair: Dimitris Spiliotopoulos |
4C - Network Analysis I Session Chair: Vachik Dave |
FAB- Session II Session Chair: Panagiotis Karampelas |
FOSINT-SI Session IV Session Chair: Andrew Park |
16:00-16:30 | |||||
16:30 - 18:00 | 5A - Recommendations Session Chair: Chang-Tien Lu |
5B - Applications I Session Chair: Ashok Srinivasan |
5C - Behavioral Modeling Session Chair: Ninareh Mehrabi |
FAB- Session III Session Chair: Ismail Toroslu |
Multidisciplinary- Session I Session Chair: Jalal Kawash |
19:30—22:00 | Conference Dinner - (Pinnacle Ballroom) |
Hours | Pinnacle I | Pinnacle II | Pinnacle III | Shaughnessy I | Shaughnessy II |
---|---|---|---|---|---|
07:30 | Registration - (in the foyer outside of the Pinnacle Ballroom) | ||||
09:30 - 10:30 | Keynote III: Graph Neural Networks and Applications Jie Tang - Tsinghua University, China Chair: Xiaokui Xiao - (Pinnacle Ballroom) |
||||
10:30-11:00 | |||||
11:00 - 12:30 | 6A - Network Algorithms Session Chair: Tyler Derr |
6B - Network Modeling Session Chair: Vanessa Cedeno-Mieles |
6C - Misinformation & Online Content Session Chair: Francesca Spezzano |
HI-BI-BI Session I Session Chair: Peter Peng |
Multidisciplinary- Session II Session Chair: Panagiotis Karampelas |
12:30 - 14:00 | Lunch - (Pinnacle Ballroom) | ||||
14:00 - 16:00 | 7A - Modeling & Algorithms Session Chair: Peng Ni |
7B - Applications II Session Chair: Daniel Zhang |
7C - Network Analysis II Session Chair: Wei Chen |
HI-BI-BI Session II Session Chair: Tansel Ozyer |
Multidisciplinary- Session III Session Chair: Min-Yuh Day |
16:10 - 16:30 | Farewell - (Pinnacle Ballroom) |
Social Influence in Our Society
Moderator: |
PhD Track Session (16:30-18:30)
Johanna M. Werz, Valerie Varney and Ingrid Isenhardt |
The curse of self-presentation: Looking for career patterns in online CVs |
16:30-16:55 |
Henry Dambanemuya and Agnes Horvat |
Network-Aware Multi-Agent Simulations of Herder-Farmer Conflicts |
16:55-17:20 |
Pallavi Jain, Robert Ross and Bianca Schoen Phelan |
Estimating Distributed Representation Performance in Disaster-Related Social Media Classification |
17:20-17:45 |
Renny Márquez, Richard Weber and André C.P.L.F. de Carvalho |
17:45-18:10 |
|
Christian Zingg, Giona Casiraghi, Giacomo Vaccario and Frank Schweitzer |
18:10-18:35 |
Industrial Track Session I (11:00-12:30)
Ankit Kumar Saw |
11:00-11:25 |
|
John Piorkowski, Ian McCulloh |
Examining MOOC superposter behavior using social network analysis |
11:25-11:50 |
Alireza Pourali, Fattane Zarrinkalam, Ebrahim Bagheri |
Neural Embedding Features for Point-of-Interest Recommendation |
11:50-12:15 |
Industrial Track Session I (14:00-16:00)
Shreya Jain, Dipankar Niranjan, Hemank Lamba, Neil Shah, Ponnurangam Kumaraguru |
14:00-14:25 |
|
Shin-Ying Huang, Yen-Wen Huang, Ching-Hao Mao |
A multi-channel cybersecurity news and threat intelligent engine - SecBuzzer |
14:25-14:50 |
Yang Zhang, Xianggyu Dong, Daniel Zhang, Dong Wang |
A Syntax-based Learning Approach to Geo-locating Abnormal Traffic Events using Social Sensing |
14:50-15:15 |
Yingtong Dou and Philip Yu |
15:15-15:40 |
FAB Session I (11:00-12:30)
Carson Leung |
Full Paper |
|
Konstantinos Xylogiannopoulos, Panagiotis Karampelas and Reda Alhajj |
Full Paper |
|
Sirui Sun, Bin Wu, Zixing Zhang, Nianwen Ning and Bai Wang |
A Hierarchical Insurance Recommendation Framework Using GraphOLAM Approach |
Full Paper |
Ahmet Anıl Müngen, Emre Doğan and Mehmet Kaya |
Short Paper |
FAB Session II (14:00-16:00)
Emanuela Todeva, David Knoke and Donka Keskinova |
Multi-Stage Clustering with Complementary Structural Analysis of 2-Mode Networks |
Full Paper |
Tung Nguyen, Li Zhang and Aron Culotta |
Estimating Tie Strength in Follower Networks to Measure Brand Perceptions |
Full Paper |
Mahendra Piraveenan, Sheung Yat Law and Dharshana Kasthirirathne |
Full Paper |
|
Rich Takacs and Ian Mcculloh |
Dormant Bots in Social Media: Twitter and the 2018 U.S. Senate Election |
Full Paper |
Konstantinos Xylogiannopoulos |
Exhaustive Exact String Matching: The Analysis of the Full Human Genome |
Full Paper |
FAB Session III (16:30-18:00)
Ahmet Engin Bayrak and Faruk Polat |
Full Paper |
|
Lihi Idan and Joan Feigenbaum |
Full Paper |
|
Esen Tutaysalgir, Pinar Karagoz and Ismail Toroslu |
Full Paper |
|
Sharon Grubner, Ian McCulloh and John Piorkowski |
Social Media as a Main Source of Customer Feedback – Alternative to Customer Satisfaction Surveys |
Short Paper |
FOSINT-SI Session I (11:00-12:30)
Zoheb Borbora, Arpita Chandra, Ponnurangam Kumaraguru and Jaideep Srivastava |
Regular Paper |
|
Tor Berglind, Lisa Kaati and Bjorn Pelzer |
Regular Paper |
|
Anu Shrestha and Francesca Spezzano |
Short Paper |
|
Justin Song, Valerie Spicer, Andrew Park, Herbert H. Tsang and Patricia L. Brantingham |
Short Paper |
FOSINT-SI Session II (14:00-16:00)
David Skillicorn, Queen`s University, Canada |
FOSNIT-SI Keynote (14:00-15:00) |
|
Vivin Paliath and Paulo Shakarian |
Best Paper Candidate |
|
Shao-Fang Wen, Mazaher Kianpour and Stewart Kowalski |
An Empirical Study of Security Culture in Open Source Software Communities |
Best Paper Candidate |
FOSINT-SI Session III (11:00-12:30)
Avishek Bose, Vahid Behzadan, Carlos Aguirre and William Hsu |
Regular Paper |
|
Aditya Pingle, Aritran Piplai, Sudip Mittal, Anupam Joshi, James Holt and Richard Zak |
Regular Paper |
|
Konstantinos Xylogiannopoulos, Panagiotis Karampelas, Reda Alhajj |
Text Mining for Malware Classification Using Multivariate All Repeated Patterns Detection |
Regular Paper |
FOSINT-SI Session IV (14:00-16:00)
Mohammed Almukaynizi, Malay Shah and Paulo Shakarian |
A Hybrid KRR-ML Approach to Predict Malicious Email Campaigns |
Short Paper |
Mohammed Rashed, John Piorkowski and Ian Mcculloh |
Evaluation of Extremist Cohesion in a Darknet Forum Using ERGM and LDA |
Short Paper |
João Evangelista, Domingos Napolitano, Márcio Romero and Renato Sassi |
Short Paper |
|
Emily Alfs, Doina Caragea, Dewan Chaulagain, Sankardas Roy, Nathan Albin and Pietro Poggi-Corradini |
Short Paper |
|
Choukri Djellali, Mehdi Adda and Mohamed Tarik Moutacalli |
Short Paper |
READNet Workshop Session I (08:30-10:30)
Workshop Opening |
Chairs: Andrea De Salve, Michele Coscia |
8:30-8:45 |
Keynote Speaker: Michele Coscia |
8:45-9:40 |
|
Meisam Hejazinia, Pavlos Mitsoulis-Ntompos and Serena Zhang |
9:40-10:05 |
|
Mariella Bonomo, Gaspare Ciaccio, Andrea De Salve and Simona E. Rombo |
10:05-10:30 |
READNet Workshop Session II (11:00-12:30)
Dionisis Margaris, Dimitris Spiliotopoulos and Costas Vassilakis |
11:00-11:25 |
|
Carmela Comito |
11:25-11:50 |
|
Gianluca Lax and Antonia Russo |
11:50-12:15 |
|
|
Workshop Closure |
12:15 |
Demo's Track Session (11:00-12:30)
Adewale Obadimu, Muhammad Nihal Hussain and Nitin Agarwal |
11:00-11:15 |
|
Mayank Kejriwal and Peilin Zhou |
11:15-11:30 |
|
Thomas Marcoux, Nitin Agarwal, Adewale Obadimu and Nihal Hussain |
11:30-11:45 |
|
Trang Ha, Quyen Hoang and Kyumin Lee |
11:45-12:00 |
|
Ying Zhao, Charles C. Zhou and Sihui Huang |
12:00-12:15 |
|
Zizhen Chen and David Matula |
12:15-12:30 |
Tutorial I (Part I. 08:30 – 10:30; Part II 11:00 –11:30)
authors |
title |
Tutorial II (Part I. 11:30 – 12:30; Part II 14:00 – 15:30)
authors |
title |
Tutorial III (Part I. 15:30 – 16:00; Part II 16:30 –18:30)
authors |
title |
MSNDS Workshop Session II (14:00-16:00)
Patryk Pazura : West Pomeranian University of Technology, Szczecin; Jaroslaw Jankowski : West Pomeranian University of Technology |
14:00-14:25 |
|
Li Chen Cheng : National Taipei University of Technology; Song-Lin Tsai : Soochow University |
Deep Learning for Automated Sentiment Analysis of Social Media |
14:25-14:50 |
Takayasu Fushimi : Tokyo University of Technology; Kenichi Kanno : Tokyo University of Technology |
14:50-15:15 |
|
Shih-Hung Wu : Chaoyang University of Technology; Jun-Wei Wang : Chaoyang University of Technology |
Integrating Neural and Syntactic Features on the Helpfulness Analysis of the Online Customer Reviews |
15:15-15:40 |
MSNDS Workshop Session I (16:30-18:00)
K. (Lynn) Putman: LIACS, Leiden University; Hanjo D. Boekhout: LIACS, Leiden University; Frank W. Takes: LIACS, Leiden University |
Fast Incremental Computation of Harmonic Closeness Centrality in Directed Weighted Networks |
16:30-16:55 |
Min-Yuh Day: Tamkang University; Jian-Ting Lin: Tamkang University |
Artificial Intelligence for ETF Market Prediction and Portfolio Optimization |
16:55-17:20 |
Logan Praznik: Brandon University, Brandon, Canada ; Gautam Srivastava: Brandon University, Brandon, Canada; Chetan Mendhe: Lakehead University, Thunder Bay, Canada ; Vijay Mago: Lakehead University, Thunder Bay, Canada |
Vertex-Weighted Measures for Link Prediction in Hashtag Graphs |
17:20-17:45 |
Gui-Ru Li : National Central University; Chia-Hui Chang : National Central University |
Semantic Role Labeling for Opinion Target Extraction from Chinese Social Network |
17:45-18:10 |
SNAST Workshop Session I (11:00-12:30)
Opening Remarks |
Thirimachos Bourlai |
11:00 - 11:10 |
Lingwei Chen, Shifu Hou, Yanfang Ye, Thirimachos Bourlai, Shouhuai Xu and Liang Zhao |
iTrustSO: An Intelligent System for Automatic Detection of Insecure Code Snippets in Stack Overflow |
11:10 - 11:40 |
Dimitrios Lappas, Panagiotis Karampelas and George Fessakis |
The role of social media surveillance in search and rescue missions |
11:40 - 12:10 |
Sho Tsugawa and Sumaru Niida |
The Impact of Social Network Structure on the Growth and Survival of Online Communities |
12:10 - 12:40 |
SNAST Workshop Session II (14:00-16:00)
Jacob Rose and Thirimachos Bourlai |
Deep Learning Based Estimation of Facial Attributes on Challenging Mobile Phone Face Datasets |
14:00-14:30 |
Dimitris Spiliotopoulos, Costas Vassilakis and Dionisis Margaris |
Data-driven Country Safety Monitoring Terrorist Attack Prediction |
14:30-15:00 |
Kaustav Basu and Arun Sen |
On Augmented Identifying Codes for Monitoring Drug Trafficking Organizations |
15:00-15:30 |
Suha Reddy Mokalla and Thirimachos Bourlai |
On Designing MWIR and Visible Band based DeepFace Detection Models |
15:30-16:00 |
Posters Madness Session (14:00-16:00)
Chung-Chi Chen, Hen-Hsen Huang and Hsin-Hsi Chen |
Next Cashtag Prediction on Social Trading Platforms with Auxiliary Tasks |
Poster |
David Spence, Christopher Inskip, Novi Quadrianto and David Weir |
Poster |
|
Dipanjyoti Paul, Rahul Kumar, Sriparna Saha and Jimson Mathew |
Online Feature Selection for Multi-label Classification in Multi-objective Optimization Framework |
Poster |
Katchaguy Areekijseree, Yuzhe Tang and Sucheta Soundarajan |
Poster |
|
Liang Feng, Qianchuan Zhao and Cangqi Zhou |
An Efficient Method to Find Communities in K-partite Networks |
Poster |
Maryam Ramezani, Mina Rafiei, Soroush Omranpour and Hamid R. Rabiee |
Poster |
|
Masaomi Kimura |
CAB-NC: The Correspondence Analysis Based Network Clustering Method |
Poster |
Meysam Ghaffari, Ashok Srinivasan and Xiuwen Liu |
High-resolution home location prediction from tweets using deep learning with dynamic structure |
Poster |
Parham Hamouni, Taraneh Khazaei and Ehsan Amjadian |
TF-MF: Improving Multiview Representation for Twitter User Geolocation Prediction |
Poster |
Shalini Priya, Saharsh Singh, Sourav Kumar Dandapat, Kripabandhu Ghosh and Joydeep Chandra |
Identifying Infrastructure Damage during Earthquake using Deep Active Learning |
Poster |
Taha Hassan, Bob Edmison, Larry Cox, Matthew Louvet and Daron Williams |
Exploring the Context of Course Rankings on Online Academic Forums |
Poster |
Vivek Singh and Connor Hofenbitzer |
Fairness across Network Positions in Cyberbullying Detection Algorithms |
Poster |
Zhou Yang, Long Nguyen and Fang Jin |
Poster |
|
Adrien Benamira, Benjamin Devillers, Etienne Lesot, Ayush K. Rai, Manal Saadi and Fragkiskos Malliaros |
Semi-Supervised Learning and Graph Neural Networks for Fake News Detection |
Poster |
SNAA Workshop Session I (11:00 –12:30)
|
11:00-11:10 |
|
Christopher Yong, Charalampos Chelmis, Wonhyung Lee and Daphney-Stavroula Zois |
Understanding Online Civic Engagement: A Multi-Neighborhood Study of SeeClickFix |
11:10-11:35 |
Charalampos Chelmis, Mengfan Yao and Wonhyung Lee |
Web and Society: A First Look into the Network of Human Service Providers |
11:35-12:00 |
Do Yeon Kim, Xiaohang Li, Sheng Wang, Yunying Zhuo and Roy Ka-Wei Lee |
Topic Enhanced Word Embedding for Toxic Content Detection in Q&A Sites |
12:00-12:25 |
SNAA Workshop Session II (14:00 –15:30)
Arpita Chandra, Zoheb Borbora, Ponnurangam Kumaraguru and Jaideep Srivastava |
Finding Your Social Space: Empirical Study of Social Exploration in Multiplayer Online Games |
14:00-14:25 |
Abu Saleh Md. Tayeen, Abderrahmen Mtibaa and Satyajayant Misra |
14:25-14:50 |
|
Sandra Mitrovic, Laurent Lecoutere and Jochen De Weerdt |
A Comparison of Methods for Link Sign Prediction with Signed Network Embeddings |
14:50-15:15 |
|
15:15-15:30 |
SI Workshop Session (9:00-10:30)
Jan Hauffa, Wolfgang Bräu and Georg Groh |
Detection of Topical Influence in Social Networks via Granger-Causal Inference: A Twitter Case Study |
9:00-9:30 |
Sukankana Chakraborty, Sebastian Stein, Markus Brede, Ananthram Swami, Geeth de Mel and Valerio Restocchi |
9:30-10:00 |
|
Mihai Valentin Avram, Shubhanshu Mishra, Nikolaus Nova Parulian and Jana Diesner |
Adversarial perturbations to manipulate the perception of power and influence in networks |
10:00-10:30 |
HI-BI-BI Symposium Session I (11:00-12:30)
Joseph De Guia, Madhavi Devaraj and Carson Leung |
DeepGx: Deep Learning Using Gene Expression for Cancer Classification |
Full Paper |
Hanane Grissette and El Habib Nfaoui |
Full Paper |
|
Hüseyin Vural, Mehmet Kaya and Reda Alhajj |
Short Paper |
|
Carmela Comito, Agostino Forestiero and Giuseppe Papuzzo |
A clinical decision support framework for automatic disease diagnoses |
Short Paper |
HI-BI-BI Symposium Session II (14:00-16:00)
Krunal Dhiraj Patel, Andrew Heppner, Gautam Srivastava and Vijay Mago |
Full Paper |
|
Anu Shrestha and Francesca Spezzano |
Full Paper |
|
Farahnaz Golrooy Motlagh, Saeede Shekarpoor Shekarpour, Amit Sheth, Thirunarayan Krishnaprasad and Michael L. Raymer |
Predicting Public Opinion on Drug Legalization: Social Media Analysis and Consumption Trends |
Full Paper |
Hankyu Jang, Samuel Justice, Philip M. Polgreen, Alberto M. Segre, Daniel K. Sewell and Sriram V. Pemmaraju |
Evaluating Architectural Changes to Alter Pathogen Dynamics in a Dialysis Unit |
Full Paper |
Multidisciplinary Track Session III (14:00-16:00)
Soumajyoti Sarkar, Paulo Shakarian, Mika Armenta, Danielle Sanchez and Kiran Lakkaraju |
Can social influence be exploited to compromise security: An online experimental evaluation |
14:00-14:20 |
Dionisios Soiropoulos, Ifigeneia Georgoula and Christos Bilanakos |
Optimal Influence Strategies in an Oligopolistic Competition Network Environment |
14:20-14:40 |
Arunkumar Bagavathi, Pedram Bashiri, Shannon Reid, Matthew Phillips and Siddharth Krishnan |
Examining Untempered Social Media: Analyzing Cascades of Polarized Conversations |
14:40-15:00 |
Fernando Henrique Calderon Alvarado, Li-Kai Cheng, Ming-Jen Lin, Yen Hao Huang and Yi-Shin Chen |
Content-Based Echo Chamber Detection on Social Media Platforms |
15:00-15:20 |
Mattia Gasparini, Giorgia Ramponi, Marco Brambilla and Stefano Ceri |
15:20-15:40 |
|
Shuo Zhang and Mayank Kejriwal |
Concept Drift in Bias and Sensationalism Detection: An Experimental Study |
15:40-16:00 |
Multidisciplinary Track Session II (11:00-12:30)
Maria Camila Rivera and Subrata Acharya |
FastestER: A Web Application to enable effective Emergency Department Service |
11:00-11:20 |
Abigail Garrett and Naeemul Hassan |
Understanding the Silence of Sexual Harassment Victims Through the #WhyIDidntReport Movement |
11:20-11:40 |
Naeemul Hassan, Manash Kumar Mandal, Mansurul Bhuiyan, Aparna Moitra and Syed Ishtiaque Ahmed |
Can Women Break the Glass Ceiling?: An Analysis of #MeToo Hashtagged Posts on Twitter |
11:40-12:00 |
Roland Molontay and Marcell Nagy |
Two Decades of Network Science - as seen through the co-authorship network of network scientists |
12:00-12:20 |
Apratim Das, Alex Aravind and Mark Dale |
12:20-12:40 |
Multidisciplinary Track Session I (16:30-18:00)
Apratim Das, Mike Drakos, Alex Aravind and Darwin Horning |
16:30-16:50 |
|
Michelle Bowman and Subrata Acharya |
16:50-17:10 |
|
Meysam Ghaffari, Ashok Srinivasan, Anuj Mubayi, Xiuwen Liu and Krishnan Viswanathan |
Next-Generation High-Resolution Vector-Borne Disease Risk Assessment |
17:10-17:30 |
Dany Perwita Sari and Yun-Shang Chiou |
Transformation in Architecture and Spatial Organization at Javanese house |
17:30-17:50 |
Marcell Nagy and Roland Molontay |
17:50-18:10 |
Session 1A - Communities I (11:00-12:30)
Michele Coscia |
Long Paper 11:00-11:30 |
|
Neda Zarayeneh and Ananth Kalyanaraman |
A Fast and Efficient Incremental Approach toward Dynamic Community Detection |
Long Paper 11:30-12:00 |
Young D. Kwon, Reza Hadi Mogavi, Ehsan Ul Haq, Youngjin Kwon, Xiaojuan Ma and Pan Hui |
Effects of Ego Networks and Communities on Self-Disclosure in an Online Social Network |
Long Paper 12:00-12:30 |
Courtland Vandam, Farzan Masrour, Pang-Ning Tan and Tyler Wilson |
You have been CAUTE! Early Detection of Compromised Accounts on Social Media |
Long Paper 11:00-11:30 |
Thai Le, Kai Shu, Maria D. Molina, Dongwon Lee, S. Shyam Sundar and Huan Liu |
Long Paper 11:30-12:00 |
|
Limeng Cui, Suhang Wang and Dongwon Lee |
SAME: Sentiment-Aware Multi-Modal Embedding for Detecting Fake News |
Long Paper 12:00-12:30 |
Jundong Li, Liang Wu, Ruocheng Guo, Chenghao Liu and Huan Liu |
Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation |
Long Paper 11:00-11:30 |
Aynaz Taheri and Tanya Berger-Wolf |
Long Paper 11:30-12:00 |
|
Benedek Rozemberczki, Ryan Davies, Rik Sarkar and Charles Sutton |
Long Paper 12:00-12:30 |
Session 2A - Information & Influence Diffusion I (14:00-16:00)
Antoine Tixier, Maria Rossi, Fragkiskos Malliaros and Jesse Read |
Perturb and Combine to Identify Influential Spreaders in Real-World Networks |
Long Paper 14:00-14:30 |
Yang Chen and Jiamou Liu |
Becoming Gatekeepers Together with Allies: Collaborative Brokerage over Social Networks |
Long Paper 14:30-15:00 |
Yu Zhang |
Diversifying Seeds and Audience in Social Influence Maximization |
Short Paper 15:00-15:20 |
Arash Ghayoori and Rakesh Nagi |
Seed Investment Bounds for Viral Marketing under Generalized Diffusion |
Short Paper 15:20-15:40 |
Xiao Yang, Seungbae Kim and Yizhou Sun |
How Do Influencers Mention Brands in Social Media? Sponsorship Prediction of Instagram Posts |
Short Paper 15:40-16:00 |
Session 2B - Misbehavior & Misinformation II (14:00-16:00)
Sooji Han, Jie Gao and Fabio Ciravegna |
Neural Language Model Based Training Data Augmentation for Weakly Supervised Early Rumor Detection |
Long Paper 14:00-14:30 |
Amir Pouran Ben Veyseh, My T. Thai, Thien Huu Nguyen and Dejing Dou |
Rumor Detection in Social Networks via Deep Contextual Modeling |
Long Paper 14:30-15:00 |
Wataru Kudo, Mao Nishiguchi and Fujio Toriumi |
Fraudulent User Detection on Rating Networks Based on Expanded Balance Theory and GCNs |
Short Paper 15:00-15:20 |
Udit Arora, William Scott Pakka and Tanmoy Chakraborty |
Short Paper 15:20-15:40 |
|
Mohammad Raihanul Islam, Sathappan Muthiah and Naren Ramakrishnan |
RumorSleuth: Joint Detection of Rumor Veracity and User Stance |
Short Paper 15:40-16:00 |
Session 2C - Network Analysis with Machine Learning I (14:00-16:00)
Aravind Sankar, Xinyang Zhang and Kevin Chang |
Long Paper 14:00-14:30 |
|
Suhansanu Kumar, Heting Gao, Changyu Wang, Hari Sundaram and Kevin Chang |
Hierarchical Multi-Armed Bandits for Discovering Hidden Populations |
Long Paper 14:30-15:00 |
Yang Zhang, Hongxiao Wang, Daniel Zhang, Yiwen Lu and Dong Wang |
RiskCast: Social Sensing based Traffic Risk Forecasting via Inductive Multi-View Learning |
Short Paper 15:00-15:20 |
Sumeet Kumar and Kathleen M. Carley |
Short Paper 15:20-15:40 |
|
Renhao Cui, Gagan Agrawal and Rajiv Ramnath |
Tweets Can Tell: Activity Recognition using Hybrid Long Short-Term Memory Model |
Short Paper 15:40-16:00 |
Session 3A - Communities II (11:00-12:30)
Yulong Pei, George Fletcher and Mykola Pechenizkiy |
Long Paper 11:00-11:30 |
|
Jean Marie Tshimula, Belkacem Chikhaoui and Shengrui Wang |
HAR-search: A Method to Discover Hidden Affinity Relationships in Online Communities |
Long Paper 11:30-12:00 |
Domenico Mandaglio and Andrea Tagarelli |
Dynamic Consensus Community Detection and Combinatorial Multi-Armed Bandit |
Short Paper 12:00-12:20 |
Session 3B - Social Media Analysis (11:00-12:20)
Marija Stanojevic, Jumanah Alshehri and Zoran Obradovic |
Surveying public opinion using label prediction on social media data |
Long Paper 11:00-11:30 |
Taoran Ji, Xuchao Zhang, Nathan Self, Kaiqun Fu, Chang-Tien Lu and Naren Ramakrishnan |
Feature Driven Learning Framework for Cybersecurity Event Detection |
Long Paper 11:30-12:00 |
Virgile Landeiro and Aron Culotta |
Collecting Representative Social Media Samples from a Search Engine by Adaptive Query Generation</span> |
Short Paper 12:00-12:20 |
Session 3C - Network Analysis with Machine Learning II (11:00-12:20)
Mahsa Ghorbani, Mahdieh Soleymani Baghshah and Hamid R. Rabiee |
MGCN: Semi-supervised Classification in Multi-layer Graphs withGraph Convolutional Networks |
Short Paper 11:00-11:20 |
Vachik Dave, Baichuan Zhang, Pin-Yu Chen and Mohammad Hasan |
Neural-Brane: An inductive approach for attributed network embedding |
Short Paper 11:20-11:40 |
Kun Tu, Jian Li, Don Towsley, Dave Braines and Liam Turner |
gl2vec: Learning Feature Representation Using Graphlets for Directed Networks |
Short Paper 11:40-12:00 |
Caleb Belth, Fahad Kamran, Donna Tjandra and Danai Koutra |
When to Remember Where You Came from: Node Representation Learning in Higher-order Networks |
Short Paper 12:00-12:20 |
Soumajyoti Sarkar, Ashkan Aleali, Paulo Shakarian, Mika Armenta, Danielle Sanchez and Kiran Lakkaraju |
Impact of Social Influence on Adoption Behavior: An Online Controlled Experimental Evaluation |
Long Paper 14:00-14:30 |
Thiago Silva, Alberto Laender and Pedro Vaz de Melo |
Characterizing Knowledge-Transfer Relationships in Dynamic Attributed Networks |
Long Paper 14:30-15:00 |
Emily Fischer, Souvik Ghosh and Gennady Samorodnitsky |
Long Paper 15:00-15:30 |
|
Jianjun Luo, Xinyue Liu and Xiangnan Kong |
Long Paper 15:30-16:00 |
Session 4B - Elections and Politics (14:00-15:50)
Huyen Le, Bob Boynton, Zubair Shafiq and Padmini Srinivasan |
A Postmortem of Suspended Twitter Accounts in the 2016 U.S. Presidential Election |
Long Paper 14:00-14:30 |
Hamid Karimi, Tyler Derr, Aaron Brookhouse and Jiliang Tang |
Long Paper 14:30-15:00 |
|
Indu Manickam, Andrew Lan, Gautam Dasarthy and Richard Baraniuk |
Tracing Political Ideology on Twitter During the 2016 U.S. Presidential Election |
Long Paper 15:00-15:30 |
Alexandru Topirceanu and Radu-Emil Precup |
A Novel Methodology for Improving Election Poll Prediction Using Time-Aware Polling |
Short Paper 15:30-15:50 |
Michele Coscia and Luca Rossi |
The Impact of Projection and Backboning on Network Topologies |
Long Paper 14:00-14:30 |
Katchaguy Areekijseree and Sucheta Soundarajan |
Long Paper 14:30-15:00 |
|
Xiuwen Zheng and Amarnath Gupta |
Short Paper 15:00-15:20 |
|
Laurence Brandenberger, Giona Casiraghi, Vahan Nanumyan and Frank Schweitzer |
Short Paper 15:20-15:40 |
|
Seyed Amin Mirlohi Falavarjani, Ebrahim Bagheri, Jelena Jovanovic and Ali A. Ghorbani |
On the Causal Relation between Users' Real-World Activities and their Affective Processes |
Short Paper 15:40-16:00 |
Session 5A - Recommendations (16:30-18:00)
Chia-Wei Chen, Sheng-Chuan Chou, Chang-You Tai and Lun-Wei Ku |
PGA: Phrase-Guided Attention Web Article Recommendation for Next Clicks and Views |
Long Paper 16:30-17:00 |
Deqing Yang, Ziyi Wang, Junyang Jiang and Yanghua Xiao |
Knowledge Embedding towards the Recommendation with Sparse User-item Interactions |
Long Paper 17:00-17:30 |
Daniel Zhang, Bo Ni, Qiyu Zhi, Thomas Plummer, Qi Li, Hao Zheng, Qingkai Zeng, Yang Zhang and Dong Wang |
Through The Eyes of A Poet: Classical Poetry Recommendation with Visual Input on Social Media |
Long Paper 17:30-18:00 |
Session 5B - Applications I (16:30-18:00)
Henry Dambanemuya, Madhav Joshi and Ágnes Horvát |
Network Perspective on the Efficiency of Peace Accords Implementation |
Long Paper 16:30-17:00 |
Pradyumna Prakhar Sinha, Rohan Mishra, Ramit Sawhney and Rajiv Ratn Shah |
ASASNet - Exploiting Linguistic Homophily for Suicidal Ideation Detection in Social Media |
Short Paper 17:00-17:20 |
Sreeja Nair, Adriana Iamnitchi and John Skvoretz |
Promoting Social Conventions across Polarized Networks: An Empirical Study |
Short Paper 17:20-17:40 |
Mayank Kejriwal and Peilin Zhou |
Low-supervision urgency detection and transfer in short crisis messages |
Short Paper 17:40-18:00 |
Session 5C - Behavioral Modeling (16:30-17:50)
Vanessa Cedeno-Mieles, Zhihao Hu, Yihui Ren, Xinwei Deng, Abhijin Adiga, Christopher Barrett, Saliya Ekanayake, Gizem Korkmaz, Chris Kuhlman, Dustin Machi, Madhav Marathe, S. S. Ravi, Brian Goode, Naren Ramakrishnan, Parang Saraf, Nathan Self, Noshir Contractor, Joshua Epstein and Michael Macy |
Long Paper 16:30-17:00 |
|
Binxuan Huang and Kathleen Carley |
A Large-Scale Empirical Study of Geotagging Behavior on Twitter |
Long Paper 17:00-17:30 |
Rahul Pandey, Carlos Castillo and Hemant Purohit |
Modeling Human Annotation Errors to Design Bias-Aware Systems for Social Stream Processing |
Short Paper 17:30-17:50 |
Peng Ni, Masatoshi Hanai, Wen Jun Tan and Wentong Cai |
Efficient Closeness Centrality Computation in Time-Evolving Graphs |
Long Paper 11:00-11:30 |
Huda Nassar, Austin Benson and David Gleich |
Long Paper 11:30-12:00 |
|
Frédéric Simard |
Short Paper 12:00-12:20 |
Session 6B - Network Modeling (11:00-12:20)
Chen Avin, Zvi Lotker, Yinon Nahum and David Peleg |
Long Paper 11:00-11:30 |
|
Julian Müller and Ulrik Brandes |
Long Paper 11:30-12:00 |
|
Haripriya Chakraborty and Liang Zhao |
Short Paper 12:00-12:20 |
Session 6C - Misinformation & Online Content (11:00-12:20)
Jesper Holmström, Daniel Jonsson, Filip Polbratt, Olav Nilsson, Linnea Lundström, Sebastian Ragnarsson, Anton Forsberg, Karl Andersson and Niklas Carlsson |
Do we Read what we Share? Analyzing the Click Dynamic of News Article's Shared on Twitter |
Short Paper 11:00-11:20 |
Alexandre M. Sousa, Jussara M. Almeida and Flavio Figueiredo |
Analyzing and Modeling User Curiosity in Online Content Consumption: A LastFM Case Study |
Short Paper 11:20-11:40 |
Bhavtosh Rath, Wei Gao and Jaideep Srivastava |
Evaluating Vulnerability to Fake News in Social Networks: A Community Health Assessment Model |
Short Paper 11:40-12:00 |
Kai Shu, Xinyi Zhou, Suhang Wang, Reza Zafarani and Huan Liu |
Short Paper 12:00-12:20 |
Session 7A - Modeling & Algorithms (14:00-15:50)
Malik Magdon-Ismail and Kshiteesh Hegde |
Long Paper 14:00-14:30 |
|
Farzan Masrour Shalmani, Pang-Ning Tan and Abdol-Hossein Esfahanian |
OPTANE: An OPtimal Transport Algorithm for NEtwork Alignment |
Short Paper 14:30-14:50 |
Feiyu Long, Nianwen Ning, Chenguang Song and Bin Wu |
Short Paper 14:50-15:10 |
|
Sai Kiran Narayanaswami, Balaraman Ravindran and Venkatesh Ramaiyan |
Short Paper 15:10-15:30 |
|
Yulong Pei, Jianpeng Zhang, George Fletcher and Mykola Pechenizkiy |
Infinite Motif Stochastic Blockmodel for Role Discovery in Networks |
Short Paper 15:30-15:50 |
Session 7B - Applications II (14:00-16:00)
Rashid Tahir, Fareed Zaffar, Faizan Ahmad, Christo Wilson, Hammas Saeed and Shiza Ali |
Short Paper 14:00-14:20 |
|
Pamela Thomas, Rachel Krohn and Tim Weninger |
Dynamics of Team Library Adoptions: An Exploration of GitHub Commit Logs |
Short Paper 14:20-14:40 |
Lea Baumann and Sonja Utz |
Short Paper 14:40-15:00 |
|
Kaustav Basu and Arunabha Sen |
Monitoring Individuals in Drug Trafficking Organizations: A Social Network Analysis |
Short Paper 15:00-15:20 |
Victor S. Bursztyn, Larry Birnbaum and Doug Downey |
Thousands of Small, Constant Rallies: A Large-Scale Analysis of Partisan WhatsApp Groups |
Short Paper 15:20-15:40 |
Yo-Der Song, Benjamin Farrelly, Yiying Sun, Mingwei Gong and Aniket Mahanti |
Measurement and Analysis of an Adult Video Streaming Service |
Short Paper 15:40-16:00 |
Session 7C - Network Analysis II (14:00-16:00)
Mitchell Goist, Ted Hsuan Yun Chen and Christopher Boylan |
Reconstructing and Analyzing the Transnational Human Trafficking Network |
Long Paper 14:00-14:30 |
Pujan Paudel, Trung Nguyen and Amartya Hatua |
Long Paper 14:30-15:00 |
|
Ninareh Mehrabi, Fred Morstatter, Nanyun Peng and Aram Galstyan |
Debiasing Community Detection: The Importance of Lowly Connected Nodes |
Short Paper 15:00-15:20 |
Obaida Hanteer and Luca Rossi |
Short Paper 15:20-15:40 |
|
Mahboubeh Ahmadalinezhad, Masoud Makrehchi and Neil Seward |
Lineup Performance Prediction Through Network Analysis |
Short Paper 15:40-16:00 |
Abstract
Rating platforms provide users with useful information on products or other users. However, fake ratings are sometimes generated by fraudulent users. In this paper, we tackle the task of fraudulent user detection on rating platforms. We propose an end-to-end framework based on Graph Convolutional Networks (GCNs) and expanded balance theory, which properly incorporates both the signs and directions of edges. Experimental results on four real-world datasets show that the proposed framework performs better, or even best, in most settings. In particular, this framework shows remarkable stability in inductive settings, which is associated with the detection of new fraudulent users on rating platforms. Furthermore, using expanded balance theory, we provide new insight into the behavior of users in rating networks, that fraudulent users form a faction to deal with the negative ratings from other users. The owner of a rating platform can detect fraudulent users earlier and constantly provide users with more credible information by using the proposed framework. |
Abstract
Rating platforms provide users with useful information on products or other users. However, fake ratings are sometimes generated by fraudulent users. In this paper, we tackle the task of fraudulent user detection on rating platforms. We propose an end-to-end framework based on Graph Convolutional Networks (GCNs) and expanded balance theory, which properly incorporates both the signs and directions of edges. Experimental results on four real-world datasets show that the proposed framework performs better, or even best, in most settings. In particular, this framework shows remarkable stability in inductive settings, which is associated with the detection of new fraudulent users on rating platforms. Furthermore, using expanded balance theory, we provide new insight into the behavior of users in rating networks, that fraudulent users form a faction to deal with the negative ratings from other users. The owner of a rating platform can detect fraudulent users earlier and constantly provide users with more credible information by using the proposed framework. |
Abstract
When studying a network, it is often of interest to understand the robustness of that network to noise. Network robustness has been studied in a variety of contexts, examining network properties such as the number of connected components and the lengths of shortest paths. In this work, we present a new network robustness measure, which we refer to as `sampling robustness'. The goal of the sampling robustness measure is to quantify the extent to which a network sample collected from a graph with errors is a good representation of a network sample collected from that same graph, but without errors. These errors may be introduced by humans or by the system (e.g., mistakes from the respondents or a bug in an API program), and may affect the performance of a data collection algorithm and the quality of the obtained sample. Thus, when data analysts analyze the sampled network, they may wish to know whether such errors will affect future analysis results. We demonstrate that sampling robustness is dependent on a few, easily-computed properties of the network: the leading eigenvalue, average node degree and clustering coefficient. In addition, we introduce regression models for estimating sampling robustness given an obtained sample. As a result, our models can estimate the sampling robustness with MSE less than 0.0015 and the model has an R-squared of up to 75. |
Abstract
Over the past decade, network analysis plays an important role for understanding several important problems. However, when performing any analysis task, some information may be leaked or scattered among individuals who may not willing to share their information (e.g., number of individual’s friends and who they are). Secure multi-party computation (MPC) allows individuals to jointly perform any computation without revealing each individual’s input. MPC has been used a lot for distributed databases but there is only a few works on graph mining applications. Here, we present two novel secure frameworks which allow a node to securely compute its clustering coefficient, which we evaluate the trade-off between efficiency and security of several proposed instantiations. Our results show that the cost for secure computing highly depends on network structure. This work is a step towards developing the library of secure graph operations in the future. |
Abstract
In anagram games, players are provided with letters for forming as many words as possible over a specified time duration. Anagram games have been used in controlled experiments to study problems such as collective identity, effects of goal-setting, internal-external attributions, test anxiety, and others. The majority of work on anagram games involves individual players. Recently, work has expanded to group anagram games where players cooperate by sharing letters. In this work, we analyze experimental data from online social networked experiments of group anagram games. We develop mechanistic and data-driven models of human decision-making to predict detailed game player actions (e.g., what word to form next). With these results, we develop a composite agent-based modeling and simulation platform that incorporates the models from data analysis. We compare model predictions against experimental data, which enables us to provide explanations of human decision-making and behavior. Finally, we provide illustrative case studies using agent-based simulations to demonstrate the efficacy of models to provide insights that are beyond those from experiments alone. |
Abstract
In the future, analysis of social networks will conceivably move from graphs to hypergraphs. However, theory has not yet caught up with this type of data organizational structure. By introducing and analyzing a general model of preferential attachment hypergraphs, this paper makes a step towards narrowing this gap. We consider a random preferential attachment model H(p,Y) for network evolution that allows arrivals of both nodes and hyperedges of random size. At each time step t, two possible events may occur: (1) [vertex arrival event:] with probability p>0 a new vertex arrives and a new hyperedge of size Y_t, containing the new vertex and Y_t-1 existing vertices, is added to the hypergraph; or (2) [hyperedge arrival event:] with probability 1-p, a new hyperedge of size Y_t, containing Y_t existing vertices, is added to the hypergraph. In both cases, the involved existing vertices are chosen independently at random according to the preferential attachment rule, i.e., with probability proportional to their degree, where the degree of a vertex is the number of edges containing it. Assuming general restrictions on the distribution of Y_t, we prove that the H(p,Y) model generates {\em power law networks}, i.e., the expected fraction of nodes with degree k is proportional to $k^{-1-\Gamma}$, where $\Gamma=\lim_{t\rightarrow\infty}\frac{\sum_{i=0}^{t-1}E[Y_i]}{t(E[Y_t]-p)}\in (0,\infty)$. This extends the special case of preferential attachment graphs, where Y_t=2 for every t, yielding $\Gamma=2/(2-p)$. Therefore, our results show that the exponent of the degree distribution is sensitive to whether one considers the structure of a social network to be a hypergraph or a graph. We discuss, and provide examples for, the implications of these considerations. |
Abstract
The prediction of opinion distribution and evolution in real-world scenarios represents a major scientific challenge for current social networks analysis. Indeed, multiple prediction solutions based on statistics and economic indices have been proposed over time, but, as we better understand diffusion phenomena, we know that temporal characteristics provide even more uncertainty. As such, current literature is not yet able to define truly reliable models for the evolution of political opinion, marketing preferences, or social unrest. Inspired from micro-scale opinion dynamics, we develop an original time-aware (TA) methodology which is able to improve the prediction of opinion distribution, by modeling opinion as a function which spikes up when opinion is expressed, and slowly dampens down otherwise. After a parametric analysis, we validate our TA method on survey data from the US presidential elections of 2012 and 2016. By comparing our time-aware method (TA) with classic survey averaging (SA), and cumulative vote counting (CC), we find our method is substantially closer to the real election outcomes. On average, we measure that SA is 6.3% off, CC is 5.6% off, while TA is only 1.5% off from the final registered election outcomes; this difference translates into a 75% prediction improvement of our TA method. As our work falls in line with studies on the microscopic temporal dynamics of social networks, we find evidence of how macroscopic prediction can be improved using time-awareness. |
Abstract
The scarcity and class imbalance of training data are known issues in current rumor detection tasks. We propose a straight-forward and general-purpose data augmentation technique which is beneficial to early rumor detection relying on event propagation patterns. The key idea is to exploit massive unlabeled event data sets on social media to augment limited labeled rumor source tweets. This work is based on rumor spreading patterns revealed by recent rumor studies and semantic relatedness between labeled and unlabeled data. A state-of-the-art neural language model (NLM) and large credibility focused Twitter corpora are employed to learn context-sensitive representations of rumor tweets. Six different real-world events based on three publicly available rumor datasets are employed in our experiments to provide a comparative evaluation of the effectiveness of the method. The results show that our method can expand the size of an existing rumor data set nearly by 200% and corresponding social context by 100% for conversation threads (and retweets additionally) with reasonable quality. Preliminary experiments with a state-of-the-art deep learning-based rumor detection model show that augmented data can alleviate overfitting and class imbalance caused by limited train data and can help to train complex neural networks (NNs). With augmented data, the performance of rumor detection can be improved by 6.4%. Our experiments also indicate that augmented training data can help to generalize rumor detection models on unseen new rumors. |
Abstract
SimRank is a widely studied link-based similarity measure that is known for its simple, yet powerful philosophy that two nodes are similar if they are referenced by similar nodes. While this philosophy has been the basis of several improvements, there is another useful, albeit less frequently discussed interpretation for SimRank known as the Random Surfer-Pair Model. In this work, we show that other well known measures related to SimRank can also be reinterpreted using Random Surfer-Pair Models, and establish a mathematically sound, general and unifying framework for several link-based similarity measures. This also serves to provide new insights into their functioning and allows for using these measures in a Monte Carlo framework, which provides several computational benefits. As an illustration of its utility in designing measures, we develop a new measure based on two existing measures under this framework, and empirically demonstrate its efficacy. |
Abstract
Social trading platforms provide a forum for investors to share their analysis and opinions. Posts on these platforms are characterized by narrative styles which are much different from posts on general social platforms, for instance tweets. As a result, recommendation systems for social trading platforms should leverage tailor-made latent features. This paper presents a representation for these latent features in both textual data and market information. A real-world dataset is adopted to conduct experiments involving a novel task called next cashtag prediction. We propose a joint learning model with an attentive capsule network. Experimental results show positive results with the proposed methods and the corresponding auxiliary tasks. |
Abstract
Clickbait is an attractive yet misleading headline that lures readers to commit click-conversion. Development of robust clickbait detection models has been, however, hampered due to the shortage of high-quality labeled training samples. To overcome this challenge, we investigate how to exploit human-written and machine-generated synthetic clickbaits. We first ask crowdworkers and journalism students to generate clickbaity news headlines. Second, we utilize deep generative models to generate clickbaity headlines. Through empirical evaluations, we demonstrate that synthetic clickbaits by human entities and deep generative models are consistently useful in improving the accuracy of various prediction models, by as much as 14.5% in AUC, across two real datasets and different types of algorithms. Especially, we observe an improvement in accuracy, up to 8.5% in AUC, even for top-ranked clickbait detectors from the Clickbait Challenge 2017. Our study proposes a novel direction to address the shortage of labeled training data, one of fundamental bottlenecks in supervised learning, by means of synthetic training data with reinforced domain knowledge. It also provides a solution for distinguishing between bot-generated and human-written clickbaits, thus aiding the work of moderators and better alerting news consumers. |
Abstract
For trajectory data that tend to have beyond first-order (i.e., non-Markovian) dependencies, higher-order networks have been shown to accurately capture details lost with the standard aggregate network representation. At the same time, representation learning has shown success on a wide range of network tasks, removing the need to hand-craft features for these tasks. In this work, we propose a node representation learning framework called EVO or Embedding Variable Orders, which captures non-Markovian dependencies by combining work on higher-order networks with work on node embeddings. We show that EVO outperforms baselines in tasks where high-order dependencies are likely to matter, demonstrating the benefits of considering high-order dependencies in node embeddings. We also provide insights into when it does or does not help to capture these dependencies. To the best of our knowledge, this is the first work on representation learning for higher-order networks. |
Abstract
Role discovery and community detection in networks are two essential tasks in network analytics where the role denotes the global structural patterns of nodes in networks and the community represents the local connections of nodes in networks. Previous studies viewed these two tasks orthogonally and solved them independently while the relation between them has been totally neglected. However, it is intuitive that roles and communities in a network are correlated and complementary to each other. In this paper, we propose a novel model for simultaneous roles and communities detection (REACT) in networks. REACT uses non-negative matrix tri-factorization (NMTF) to detect roles and communities and utilizes L_{2,1} norm as the regularization to capture the diversity relation between roles and communities. The proposed model has several advantages comparing with other existing methods: (1) it incorporates the diversity relation between roles and communities to detect them simultaneously using a unified model, and (2) it provides extra information about the interaction patterns between roles and between communities using NMTF. To analyze the performance of REACT, we conduct experiments on several real-world SNs from different domains. By comparing with state-of-the-art community detection and role discovery methods, the obtained results demonstrate REACT performs best for both role and community detection tasks. Moreover, our model provides a better interpretation for the interaction patterns between communities and between roles. |
Abstract
Role/block discovery is an essential task in network analytics so it has attracted significant attention recently. Previous studies on role discovery either relied on first or second-order structural information to group nodes but neglected the higher-order information or required the number of roles/blocks as the input which may be unknown in practice. To overcome these limitations, in this paper we propose a novel generative model, infinite motif stochastic blockmodel (IMM), for role discovery in networks. IMM takes advantage of high-order motifs in the generative process and it is a nonparametric Bayesian model which can automatically infer the number of roles. To validate the effectiveness of IMM, we conduct experiments on synthetic and real-world networks. The obtained results demonstrate IMM outperforms other blockmodels in role discovery task. |
Abstract
An information broker in a social network acts as gatekeepers to control incoming information or resources to her group and decides whether or not the unconnected agents in the group have access to information or resources. In this paper, we study the problem of identifying a group of agents as information brokers in a social network. We focus on the cases that brokers hold heterogeneous influencing capabilities and put analysis on a social network with directed relationships. To this end, we define a {\em collaborative broker team} over a directed network, which provides a selection of agents with different influencing power to control the whole network. We ask that how to find a smallest collaborative broker team so that the network can be well controlled with fewer ties. By assuming that agents only own two-typed control capabilities, we investigate the fundamental case of this problem and formalize it as the {\em collaborative brokerage} problem. We show that collaborative brokerage is NP-hard generally, yet has polynomial-time optimal solutions over directed trees. We then develop efficient algorithms over arbitrary directed networks. To evaluate the algorithms, we run experiments over networks generated using well-known random graph models and real-world datasets. Experimental results show that our algorithms produce relatively good solutions with faster speed. |
Abstract
News and information spread over social media can have big impact on thoughts, beliefs, and opinions. It is therefore important to understand the sharing dynamics on these forums. However, most studies trying to capture these dynamics rely only on Twitter's open APIs to measure how frequently articles are shared/retweeted, and therefore do not capture how many users actually read the articles linked in these tweets. To address this problem, in this paper, we first develop a novel measurement methodology, which combines the Twitter steaming API, the Bitly API, and careful sample rate selection to simultaneously collect and analyze the timeline of both the number of retweets and clicks generated by news article links. Second, we present a temporal analysis of the news cycle based on five-day-long traces (containing both clicks and retweet over time) for the news article links discovered during a seven-day period. Among other things, our analysis highlights differences in the relative timelines observed for clicks and retweets (e.g., retweet data often lags and underestimates the bias towards reading popular links/articles), and helps answer important questions regarding differences in how age-based biases and churn affect how frequently news articles shared on Twitter are accessed over time. Our temporal findings are shown to be consistent both when comparing data collected a year apart (2017 vs 2018) and across articles published on news websites with vastly different characteristics. |
Abstract
Link Streams were proposed as a model of temporal networks. We seek to understand topological and temporal properties of those objects through efficiently computing the distances, latencies and lengths of shortest fastest paths. We develop different algorithms to compute those values. One purpose of this study is to help develop algorithms to compute centrality functions on link streams such as the betweenness and the closeness. |
Abstract
Consuming news from social media is becoming increasingly popular. Social media appeals to users due to its fast dissemination of information, low cost, and easy access. However, social media also enables the widespread of fake news. Because of the detrimental societal effects of fake news, detecting fake news has attracted increasing attention. However, the detection performance only using news contents is generally not satisfactory as fake news is written to mimic true news. Thus, there is a need for an in-depth understanding on the relationship between user profiles on social media and fake news. In this paper, we study the challenging problem of understanding and exploiting user profiles on social media for fake news detection. In an attempt to understand connections between user profiles and fake news, first, we measure users' sharing behaviors on social media and group representative users who are more likely to share fake and real news; then, we perform a comparative analysis of explicit and implicit profile features between these user groups, which reveals their potential to help differentiate fake news from real news. To exploit user profile features, we demonstrate the usefulness of these user profile features in a fake news classification task. We further validate the effectiveness of these features through feature importance analysis. The findings of this work lay the foundation for deeper exploration of user profile features of social media and enhance the capabilities for fake news detection. |
Abstract
Community detection and evolution has been largely studied in the last few years, especially for network systems that are inherently dynamic and undergo different types of changes in their structure and organization in communities. Because of the inherent uncertainty and dynamicity in such network systems, we argue that temporal community detection problems can profitably be solved under a particular class of multi-armed bandit problems, namely combinatorial multi-armed bandit (CMAB). More specifically, we propose a CMAB-based methodology for the novel problem of dynamic consensus community detection, i.e., to compute a single community structure that is designed to encompass the whole information available in the sequence of observed temporal snapshots of a network in order to be representative of the knowledge available from community structures at the different time steps. Unlike existing approaches, our key idea is to produce a dynamic consensus solution for a temporal network to have unique capability of embedding both long-term changes in the community formation and newly observed community structures. |
Abstract
Finding clusters in a network has been practically important in many applications and was studied by many researchers. Most commonly used methods are spectral clustering and Newman’s modularity maximization. However, there has been no unified view of them. In this study, we introduced a new guiding principle based on correspondence analysis to obtain nodes’ coordinates and discussed its equivalence to spectral clustering and its relationship to Newman’s modularity. |
Abstract
Closeness centrality is one of the key indicators for vertex importance in social network analytics. Since social networks are constantly growing, it is essential to monitor a vertex’s closeness centrality through time in order to study its trend in influential power. In this paper, we model dynamic social networks as a time-evolving graph (a sequence of graph snapshots through time) and work on the problem of computing a vertex’s closeness centrality for all graph snapshots. We propose novel algorithms that efficiently utilize graph temporal information, and give theoretical analysis on its time complexity, which is shown to be small for real-world social networks. Experiments on various real-world data sets show speedups of one to two orders of magnitude compared to existing graph topology based algorithms. Lastly, we create synthetic data sets to show how topological and temporal graph features affect the closeness centrality computation time for both existing and proposed approaches. |
Abstract
Heterogeneous Information Networks (HINs) comprise nodes of different types inter-connected through diverse semantic relationships. In many real-world applications, nodes in information networks are often associated with additional attributes, resulting in Attributed HINs (or AHINs). In this paper, we study semi-supervised learning (SSL) on AHINs to classify nodes based on their structure, node types and attributes, given limited supervision. Recently, Graph Convolutional Networks (GCNs) have achieved impressive results in several graph-based SSL tasks. However, they operate on homogeneous networks, while being completely agnostic to the semantics of typed nodes and relationships in real-world HINs. In this paper, we seek to bridge the gap between semantic-rich HINs and the neighborhood aggregation paradigm of graph neural networks, to generalize GCNs through metagraph semantics. We propose a novel metagraph convolution operation to extract features from local metagraph-structured neighborhoods, thus capturing semantic higher-order relationships in AHINs. Our proposed neural architecture Meta-GNN extracts features of diverse semantics by utilizing multiple metagraphs, and employs a novel metagraph-attention module to learn personalized metagraph preferences for each node. Our semi-supervised node classification experiments on multiple real-world AHIN datasets indicate significant performance gains of 6% Micro-F1 on average over state of-the-art AHIN baselines. Visualizations on metagraph attention weights yield interpretable insights into their relative task-specific importance. |
Abstract
Community detection in complex networks has attracted lots of interest in scientific fields. However, current community detection algorithms mainly focus on unipartite network. In this paper, we propose a new definition of $K$-partite modularity and a new method which strictly follows the idea of original Louvain algorithm. Compared with other algorithms, our method is more intuitive and easier to implement. We evaluate on both synthetic and real-world networks. Experimental results show that, our method is not only capable to obtain better partitions, but scalable to large-scale data sets. |
Abstract
This paper proposes a novel algorithm to discover hidden individuals in a social network. The problem is increasingly important for social scientists as the populations (e.g., individuals with mental illness) that they study converse online. Since these populations do not use the category (e.g., mental illness) to self-describe, directly querying with text is non-trivial. To by-pass the limitations of network and query re-writing frameworks, we focus on identifying hidden populations through attributed search. We propose a hierarchical Multi-Arm Bandit (DT-TMP) sampler that uses a decision tree coupled with reinforcement learning to query the combinatorial attributed search space by exploring and expanding along high yielding decision-tree branches. A comprehensive set of experiments over a suite of twelve sampling tasks on three online web platforms, and three offline entity datasets reveals that DT-TMP outperforms all baseline samplers by upto a margin of 54\% on Twitter and 48\% on RateMDs. An extensive ablation study confirms DT-TMP's superior performance under different sampling scenarios. |
Abstract
Detection of compromised social media accounts is an important problem as the compromised accounts can be exploited by hackers to spread false and misleading information. In particular, early detection of compromised accounts is essential to mitigating the damages caused by the hackers' posts, which may range from victim shaming to causing widespread public panic and civil unrest. This paper proposes CAUTE, a deep learning framework that simultaneously learns the feature embeddings of the users and their posts in order to identify which, if any, of their posts were written by a different person, i.e. a hacker. Using Twitter as an example of the social media platform, CAUTE learns a tweet-to-user encoder to infer the user features from tweet features and a user-to-tweet encoder to predict the tweet content from a combination of the user features and the tweet meta features. The residual errors of both encoders are then fed into a fully-connected neural network layer to detect whether a post was published by the specified user or by a hacker. Experimental results showed that the features learned by CAUTE are more informative than those generated by conventional representation learning methods. Additionally, CAUTE outperformed several state-of-the-art baseline algorithms in terms of their overall performance and can effectively detect compromised posts early without generating too many false alarms. |
Abstract
The relationship extraction and fusion of networks are the hotspots of current research in social network mining. Most previous work is based on single-source data. However, the relationships portrayed by single-source data are not sufficient to characterize the relationships of the real world. To solve this problem, a Semi-supervised Fusion framework for Multiple Network (SFMN), using gradient boosting decision tree algorithm (GBDT) to fuse the information of multi-source networks into a single network, is proposed in this paper. Our framework aims to take advantage of multi-source networks fusion to enhance the accuracy of the network construction. The experiment shows that our method optimizes the structural and community accuracy of social networks which makes our framework outperforms several state-of-the-art methods. |
Abstract
Bipartite networks are a well known strategy to study a variety of phenomena. The commonly used method to deal with this type of network is to project the bipartite data into a unipartite weighted graph and then using a backboning technique to extract only the meaningful edges. Despite the wide availability of different methods both for projection and backboning, we believe that there has been little attention to the effect that the combination of these two processes has on the data and on the resulting network topology. In this paper we study the effect that the possible combinations of projection and backboning techniques have on a bipartite network. We show that the 12 methods group into two clusters producing unipartite networks with very different topologies. We also show that the resulting level of network centralization is highly affected by the combination of projection and backboning applied. |
Abstract
Discovering communities in complex networks means grouping nodes similar to each other, to uncover latent information about them. There are hundreds of different algorithms to solve the community detection task, each with its own understanding and definition of what a "community" is. Dozens of review works attempt to order such a diverse landscape -- classifying community discovery algorithms by the process they employ to detect communities, by their explicitly stated definition of community, or by their performance on a standardized task. In this paper, we classify community discovery algorithms according to a fourth criterion: the similarity of their results. We create an Algorithm Similarity Network (ASN), whose nodes are the community detection approaches, connected if they return similar groupings. We then perform community detection on this network, grouping algorithms that consistently return the same partitions or overlapping coverage over a span of more than one thousand synthetic and real world networks. This paper is an attempt to create a similarity-based classification of community detection algorithms based on empirical data. It improves over the state of the art by comparing more than seventy approaches, discovering that the ASN contains well-separated groups, making it a sensible tool for practitioners, aiding their choice of algorithms fitting their analytic needs. |
Abstract
The 2016 United States presidential election has been characterized as a period of extreme divisiveness that was exacerbated on social media by the influence of fake news, trolls, and social bots. However, the extent to which the public became more polarized in response to these influences over the course of the election is not well understood. In this paper we propose IdeoTrace, a framework for (i) jointly estimating the ideology of social media users and news websites and (ii) tracing changes in user ideology over time. We apply this framework to the last two months of the election period for a group of 47508 Twitter users and demonstrate that both liberal and conservative users became more polarized over time. |
Abstract
Research in social network analytics has already extensively explored how engagement on online social networks can lead to observable effects on users’ real-world behavior (e.g., changing exercising patterns or dietary habits), and their psychological states. The objective of our work in this paper is to investigate the flip-side and examine whether engaging in or disengaging from real-world activities would reflect itself in users’ affective processes such as anger, anxiety, and sadness, as expressed in users’ posts on online social media. We have collected data from Foursquare and Twitter and found that engaging in or disengaging from a real-world activity, such as frequenting at bars or stopping going to a gym, have direct impact on the users’ affective processes. In particular, we report that engaging in a routine real-world activity leads to expressing less emotional content online, whereas the reverse is observed when users abandon a regular real-world activity. |
Abstract
With the increasing popularity of portable devices with cameras (e.g., smartphones and tablets) and ubiquitous Internet connectivity, travelers can share their instant experience during the travel by posting photos they took to social media platforms. In this paper, we present a new image-driven poetry recommender system that takes a traveler's photo as input and recommends classical poems that can enrich the photo with aesthetically pleasing quotes from the poems. Three critical challenges exist to solve this new problem: i) how to extract the implicit artistic conception embedded in both poems and images? ii) How to identify the salient objects in the image without knowing the creator's intent? iii) How to accommodate the diverse user perceptions of the image and make a diversified poetry recommendation? The proposed iPoemRec system jointly addresses the above challenges by developing heterogeneous information network and neural embedding techniques. Evaluation results from real-world datasets and a user study demonstrate that our system can recommend highly relevant classical poems for a given photo and receive significantly higher user ratings compared to the state-of-the-art baselines. |
Abstract
Road traffic accidents are a major challenge in urban transportation systems. An effective countermeasure to address this problem is to accurately forecast the traffic risks in a city before accidents actually happen. Current traffic accident prediction solutions largely rely on accurate data collected from infrastructure-based sensors, which is not always available due to various resource constraints or privacy and legal concerns. In this paper, we address this limitation by exploring social sensing, a new sensing paradigm that uses humans as sensors to report the states of the physical world. In particular, we consider two types of publicly available social sensing data sources: social media data (e.g., traffic posts on Twitter) and open city data (e.g., traffic data from the city web portal). In this paper, we develop the RiskCast, an inductive multi-view learning approach to accurately forecast the traffic risk by exploiting the social sensing data under a principled co-regularization framework. The evaluation results on a real world dataset from New York City show that RiskCast significantly outperforms the state-of-the-art baselines in forecasting the traffic risks in a city. |
Abstract
We propose a novel formalization of roles in social networks that unifies the most commonly used definitions of role equivalence. As one consequence, we obtain a single, straightforward proof that role equivalences form lattices. Our formalization focuses on the evolution of roles from arbitrary initial conditions and thereby generalizes notions of relative and iterated roles that have been suggested previously. In addition to the unified structure result this provides a micro-foundation for the emergence of roles. Considering the genesis of roles may explain, and help overcome, the problem that social networks rarely exhibit interesting role equivalences of the traditional kind. Finally, we hint at ways to further generalize the role concept to multivariate networks. |
Abstract
How to effectively detect fake news and prevent its diffusion on social media has gained much attention in recent years. However, relatively little focus has been given on exploiting user comments left for posts and latent sentiments therein in detecting fake news. Inspired by the rich information available in user comments on social media, therefore, we investigate whether the latent sentiments hidden in user comments can potentially help distinguish fake news from reliable content. We incorporate users' latent sentiments into an end-to-end deep embedding framework for detecting fake news, named as SAME. First, we use multi-modal networks to deal with heterogeneous data modalities. Second, to learn semantically meaningful spaces per data source, we adopt an adversarial mechanism. Third, we define a novel regularization loss to bring embeddings of relevant pairs closer. Our comprehensive validation using two real-world datasets, PolitiFact and GossipCop, demonstrates the effectiveness of SAME in detecting fake news, significantly outperforming state-of-the-art methods. |
Abstract
The penetration of social media has had deep and far-reaching consequences in information production and consumption. Widespread use of social media platforms has engendered malicious users and attention seekers to spread rumors and fake news. This trend is particularly evident in various microblogging platforms where news becomes viral in a matter of hours and can lead to mass panic and confusion. One intriguing fact regarding rumors and fake news is that very often rumor stories prompt users to adopt different stances about the rumor posts. Understanding user stances in rumor posts is thus very important to identify the veracity of the underlying content. While rumor veracity and stance detection have been viewed as disjoint tasks we demonstrate here how jointly learning both of them can be fruitful. In this paper, we propose RumorSleuth, a multi-task deep learning model which can leverage both the textual information and user profile information to jointly identify the veracity of a rumor along with users' stances. Tests on two publicly available rumor datasets demonstrate that RumorSleuth outperforms current state-of-the-art models and achieves up to 14% performance gain in rumor veracity classification and around 6% improvement in user stance classification. |
Abstract
Understanding how much users disclose personal information in Online Social Networks (OSN) has served various scenarios such as maintaining social relationships and customer segmentation. Prior studies on self-disclosure have relied on surveys or users' direct social networks. These approaches, however, cannot represent the whole population nor consider user dynamics at the community level. In this paper, we conduct a quantitative study at different granularities of networks (ego networks and user communities) to understand users' self-disclosing behaviors better. As our first contribution, we characterize users into three types (open, closed, and moderate) based on the Communication Privacy Management theory and extend the analysis of the self-disclosure of users to a large-scale OSN dataset which could represent the entire network structure. As our second contribution, we show that our proposed features of ego networks and positional and structural properties of communities significantly affect self-disclosing behavior. Based on these insights, we present the possible relation between the propensity of the self-disclosure of users and the sociological theory of structural holes, i.e., users at a bridge position can leverage advantages among distinct groups. To the best of our knowledge, our study provides the first attempt to shed light on the self-disclosure of users using the whole network structure, which paves the way to a better understanding of users' self-disclosing behaviors and their relations with overall network structures. |
Abstract
Communications on the popular social networking platform, Twitter, can be mapped in terms of a hashtag graph, where vertices correspond to hashtags, and edges correspond to co-occurrences of hashtags within the same distinct tweet. Furthermore, a vertex in hashtag graphs can be weighted with the number of tweets a hashtag has occurred in, and edges can be weighted with the number of tweets both hashtags have co-occurred in. In this paper, we describe additions to some well-known link prediction methods that allow the weights of both vertices and edges in a weighted hashtag graph to be taken into account. We base our novel predictive additions on the assumption that more popular hashtags have a higher probability to appear with other hashtags in the future. We then apply these improved methods to $3$ sets of Twitter data with the intent of predicting hashtags co-occurences in the future. Experimental results on real-life data sets consisting of over $3,000,000$ combined unique Tweets and over $250,000$ unique hashtags show the effectiveness of the proposed models and algorithms on weighted hashtag graphs. |
Abstract
This paper presents our on-going research on studying the actors responsible for misinformation spread and identifying potential victims. Preliminary results show that (i) there is a correlation between fake news publisher bias and its credibility and (ii) social network properties help in identifying active fake news spreaders. Moreover, we discuss the most vulnerable victims of fake news and report on our experience in educating seniors about online misinformation. |
Abstract
Depression is the most common mental illness in the U.S., with 6.7\% of all adults who have experienced a major depressive episode. Unfortunately, depression extends to teens and young users as well, and researchers observed an increasing rate in the recent years (from 8.7\% in 2005 to 11.3\% in 2014 in adolescents and from 8.8\% to 9.6\% in young adults), especially among girls and women. People themselves are a barrier to fight this disease as they tend to hide their symptoms and do not receive treatments. However, protected by anonymity, they share their sentiments on the Web, looking for help. In this paper, we address the problem of detecting depressed users in online forums. We analyze user behavior in the ReachOut.com online forum, a platform providing a supportive environment for young people to discuss their everyday issues, including depression. We examine the linguistic style of user posts in combination with network-based features modeling how users connect in the forum. Our results show that network features are strong predictors of depressed users and, by combining them with user post linguistic features, we can achieve an average precision of 0.78 (vs. 0.47 of a random classifier and 0.71 of linguistic features only) and perform better than related work (F1-measure of 0.63 vs. 0.50). |
Abstract
Influence maximization in social networks has been intensively studied in recent years, where the goal is to find a small set of seed nodes in a social network that maximizes the spread of influence according to a diffusion model. Recent research on influence maximization mainly focuses on incorporating either user opinions or competitive settings in the influence diffusion model. In many real-world applications, however, the influence diffusion process often involves both real-valued opinions from users and multiple parties that are competing with each other. In this paper, we study the problem of competitive opinion maximization (COM), where the game of influence diffusion includes multiple competing products and the goal is to maximize the total opinions of activated users by each product. This problem is very challenging because it is #P-hard and no longer keeps the property of submodularity. We propose a novel model, called ICOM (Iterative Competitive Opinion Maximization), that can effectively and efficiently maximize the total opinions in competitive games by taking user opinions as well as the competitor's strategy into account. Different from existing influence maximization methods, we inhibit the spread of negative opinions and search for the optimal response to opponents' choices of seed nodes. We apply iterative inference based on a greedy algorithm to reduce the computational complexity. Empirical studies on real-world datasets demonstrate that comparing with several baseline methods, our approach can effectively and efficiently improve the total opinions achieved by the promoted product in the competitive network. |
Abstract
In asset allocation and time-series forecasting studies, few have shed light on using the different machine learning and deep learning models to verify the difference in the result of investment returns and optimal asset allocation. To fill this research gap, we develop a robo-advisor with different machine learning and deep learning forecasting methodologies and utilize the forecasting result of the portfolio optimization model to support our investors in making decisions. This research integrated several dimensions of technologies, which contain machine learning, data analytics, and portfolio optimization. We focused on developing robo-advisor framework and utilized algorithms by integrating machine learning and deep learning approaches with the portfolio optimization algorithm by using our predicted trends and results to replace the historical data and investor views. We eliminate the extreme fluctuation to maintain our trading within the acceptable risk coefficient. Accordingly, we can minimize the investment risk and reach a relatively stable return. We compared different algorithms and found that the F1 score of the model prediction significantly affects the result of the optimized portfolio. We used our deep learning model with the highest winning rate and leveraged the prediction result with the portfolio optimization algorithm to reach 12% of annual return, which outperform our benchmark index 0050.TW and the optimized portfolio with the integration of historical data. |
Abstract
Security Analysts that work in a `Security Operations Center' (SoC) play a major role in ensuring the security of the organization. The amount of background knowledge they have about the evolving and new attacks makes a significant difference in their ability to detect attacks. Open source threat intelligence sources, like text descriptions about cyber-attacks, can be stored in a structured fashion in a cybersecurity knowledge graph. A cybersecurity knowledge graph can be paramount in aiding a security analyst to detect cyber threats because it stores a vast range of cyber threat information in the form of semantic triples which can be queried. A semantic triple contains two cybersecurity entities with a relationship between them. In this work, we propose a system to create semantic triples over cybersecurity text, using deep learning approaches to extract possible relationships. We use the set of semantic triples generated through our system to assert in a cybersecurity knowledge graph. Security Analysts can retrieve this data from the knowledge graph, and use this information to form a decision about a cyber-attack. |
Abstract
Networks provide a powerful representation tool for modeling dyadic interactions among interconnected entities in a complex system. For many applications such as social network analysis, it is common for the entities to appear in more than one network. Network alignment (NA) is an important first step towards learning the entities' behavior across multiple networks by finding the correspondence between similar nodes in different networks. However, learning the proper alignment matrix in noisy networks is a challenge due to the difficulty in preserving both the neighborhood topology and feature consistency of the aligned nodes. In this paper, we present OPTANE, a robust unsupervised network alignment framework, inspired from an optimal transport theory perspective. The framework provides a principled way to combine node similarity with topology information to learn the alignment matrix. Experimental results conducted on both synthetic and real-world data attest to the effectiveness of the OPTANE framework compared to other baseline approaches. |
Abstract
This paper attempts to provide viral marketeers guidance in terms of an investment level that could help capture some desired $\gamma$ percentage of the market-share by some target time $t$ with a desired level of confidence. To do this, we first introduce a diffusion model for social networks. %which is shown to generalize a number of the previously known diffusion models. A distance-dependent random graph is then considered as a model for the underlying social network, which we use to analyze the proposed diffusion model. Using the fact that vertices degrees have an almost Poisson distribution in distance-dependent random networks, we then provide a lower bound on the probability of the event that the time it takes for an idea (or a product, disease, etc.) to dominate a pre-specified $\gamma$ percentage of a social network (denoted by $R_\gamma$) is smaller than some pre-selected target time $t>0$, i.e., we find a lower bound on the probability of the event $\{R_\gamma \leq t\}$. Simulation results performed over a wide variety of networks, including random as well as real-world, are then provided to verify that our bound indeed holds in practice. The Kullback-Leibler divergence measure is used to evaluate performance of our lower bound over these groups of networks, and as expected, we note that for networks that deviate more from the Poisson degree distribution, our lower bound does worse. |
Abstract
Open source software (OSS) is a core part of virtually all software applications today. Due to the rapidly growing impact of OSS on society and the economy, the security aspect has attracted researchers’ attention to investigate this distinctive phenomenon. Traditionally, research on OSS security has often focused on technical aspects of software development. We argue that these aspects are important, however, technical security practice considering different social aspects of OSS development will assure the effectiveness and efficiency of the implementation of the tool. To mitigate this research gap, in this empirical study, we explore the current security culture in the OSS development phenomenon using a survey instrument with six evaluation dimensions: attitude, behavior, competency, subjective norms, governance, and communication. By exploring the current security culture in OSS communities, we can start to understand the influence of security on participants’ security behaviors and decision-making, so that we can make realistic and practical suggestions. In this paper, we present the measurements of security culture adopted in the study and discuss corresponding security issues that need to be addressed in OSS communities. |
Abstract
This paper presents a high-fidelity agent-based simulation of the spread of methicillin-resistant Staphylococcus aureus (MRSA), a serious hospital acquired infection, within the dialysis unit at the University of Iowa Hospitals and Clinics (UIHC). The simulation is based on ten days of fine-grained healthcare worker (HCW) movement and interaction data collected from a sensor mote instrumentation of the dialysis unit by our research group in the fall of 2013. The simulation layers a detailed model of MRSA pathogen transfer, die-off, shedding, and infection on top of agent interactions obtained from data. The specific question this paper focuses on is whether there are simple, inexpensive architectural or process changes one can make in the dialysis unit to reduce the spread of MRSA? We evaluate two architectural changes of the nurses' station: (i) splitting the central nurses' station into two smaller distinct nurses' stations, and (ii) doubling the surface area of the nursing station. The first architectural change is modeled as a graph partitioning problem on a HCW contact network obtained from our HCW movement data. Somewhat counter-intuitively, our results suggest that the first architectural modification and the resulting reduction in HCW-HCW contacts has little to no effect on the spread of MRSA and may in fact lead to an increase in MRSA infection counts. In contrast, the second modification leads to a substantial reduction -- between 12\% and 22\% for simulations with different parameters -- in the number of patients infected by MRSA. These results suggest that the dynamics of an environmentally mediated infection such as MRSA may be quite different from that of an infection that can be modeled via classical SIR-type models which do not consider environment contamination in disease transmission (e.g., respiratory infections or influenza). |
Abstract
High volumes of valuable data and information can be easily collected in the current era of big data. As rich and constant sources of big data, an incredible amount of people from different social stratum take part in social networks, therefore, social networks are desired for many research topics. In social networks, users (or social entities) are often linked by some `following' relationships. As the social networks growing, some famous users account (or social entities) might be followed by a large number of same other users. In this situation, we call those famous users as frequently followed groups, which some researchers (or businesses) may be interested in them for investigating. However, the discovery of those frequently followed groups might be difficult and challenging because the following data in social networks are usually very big but sparse (huge number of users lead to big `following' data, but each user is likely only following a small number of other users). As a result, in this paper, we present a new compression model, which can be used during mining these very big but sparse social networks for discovering the frequently followed groups of users/social entities. |
Abstract
This paper aims to explore the problems associated in solving the classification of cancer in gene expression data using deep learning model. Our proposed solution for the cancer classification of RNA-Seq extracted from the Pan-Cancer Atlas is to transform the 1-dimensional (1D) gene expression values into 2-dimensional (2D) images. This solution of embedding the gene expression values into a 2D image considers the overall features of the genes and computes features that are needed in the classification task of the deep learning model by using the convolutional neural network. When training and testing the 33 cohorts of cancer types in the convolutional neural network, our classification model led to an accuracy of 95.65%. This result is reasonably good when compared with existing works that use multiclass label classification. We also examine the genes based on their significance related to cancer types through the heat map and associate them with biomarkers. Our convolutional neural network for the classification task fosters the deep learning framework in the cancer genome analysis and leads to better understanding of complex features in cancer disease. |
Abstract
In this study, a procedure is proposed for surveying public opinion from big social media domain-specific textual data to minimize the difficulties associated with modeling public behavior. Strategies for labeling posts relevant to a topic are discussed. A two-part framework is proposed in which semi-automatic labeling is applied to a small subset of posts, referred to as the "seed" in further text. This seed is used as bases for semi-supervised labeling of the rest of the data. The hypothesis is that the proposed method will achieve better labeling performance than existing classification models when applied to small amounts of labeled data. The seed is labeled using posts of users with a known and consistent view on the topic. A semi-supervised multi-class prediction model labels the remaining data iteratively. In each iteration, it adds context-label pairs to the training set if softmax-based label probabilities are above the threshold. The proposed method is characterized on four datasets by comparison to the three popular text modeling algorithms (n-grams + tfidf, fastText, VDCNN) for different sizes of labeled seeds (5,000 and 50,000 posts) and for several label-prediction significance thresholds. Our proposed semi-supervised method outperformed alternative algorithms by capturing additional contexts from the unlabeled data. The accuracy of the algorithm was increasing by (3-10%) when using a larger fraction of data as the seed. For the smaller seed, lower label probability threshold was clearly a better choice, while for larger seeds no predominant threshold was observed. The proposed framework, using fastText library for efficient text classification and representation learning, achieved the best results for a smaller seed, while VDCNN wrapped in the proposed framework achieved the best results for the bigger seed. The performance was negatively influenced by the number of classes. Finally, the model was applied to characterize a biased dataset of opinions related to gun control/rights advocacy. The proposed semi-automatic seed labeling is used to label 8,448 twitter posts of 171 advocates for guns control/rights. On this application, our approach performed better than existing models and it achieves 96.5% accuracy and 0.68 F1 score. |
Abstract
\emph{Quantification} is the estimation of class proportions in a dataset, as opposed to \emph{classification}, the estimation of the class label of individual instances. Quantification methods typically require the assumption that the data which is being quantified has the same class-conditional feature distribution as the data on which the quantifier was trained. In reality this may not be the case, particularly when the dataset that is being quantified has been compiled as a result of a selection process. In this paper we address the quantification problem when there is class-conditional dataset shift between the training data and the test data. Domain adaptation methods from classification have been combined with a the \emph{Adjusted Count} quantification method to form two new methods: \emph{AC-FR} and \emph{AC-IW}. In the \emph{AC-FR} method, Marginalized Stacked Denoising Autoencoders are used to generate a new feature representation that effectively reduces the distributional difference between the training and test datasets. In the \emph{AC-IW} method, the training data was weighted depending on its closeness to the test data. The \emph{AC-IW} method outperforms a current state-of-the-art quantification method at foreseeable levels of class-conditional dataset shift. |
Abstract
The number of posts made by a single user account on a social media platform Twitter in any given time interval is usually low. However, there is a subset of users whose volume of posts is much higher than the median. In this paper, we investigate the content diversity and the social neighborhood of these extreme users and others. We define a metric called ``interest narrowness'', and identify that a subset of extreme users, termed anomalous users, post with very low topic diversity, including posts with no text content. We show that anomalous groups have the strongest within-group interactions, compared to their interaction with others, and exhibit different information sharing behaviors with other anomalous users compared to non-anomalous extreme tweeters. |
Abstract
While online communities are important platforms for various social activities, many online communities fail to survive, which motivates researchers to investigate factors affecting the growth and survival of online communities. We comprehensively examine the effects of a wide variety of social network features on the growth and survival of communities in Reddit. We show that several social network features, including clique ratio, density, clustering coefficient, reciprocity and centralization, have significant effects on the survival of communities. In contrast, we also show that social network features examined in this paper only have weak effects on the growth of communities. Moreover, we conducted experiments predicting future growth and survival of online communities from social network features. The results show that social network features are useful for predicting the survival of communities but not for predicting their growth. |
Abstract
We study information diffusion modeled by epidemic models on a class of growing preferential attachment networks. We show through a thorough simulation study that there is a fundamental difference in the nature of the epidemic process on growing temporal networks in comparison to the same process on static networks. The empirical distribution of the epidemic lifetime on growing networks has a considerably heavier, and possibly infinite, tail. Furthermore, the notion of the epidemic threshold has only minor significance in this context, since network growth reduces the critical value of the corresponding static graph. |
Abstract
In social networks, edges often form closed triangles or triads. Standard approaches to measuring triadic closure, however, fail for multi-edge networks, because they do not consider that triads can be formed by edges of different multiplicity. We propose a novel measure of triadic closure for multi-edge networks based on a shared partner statistic and demonstrate that this measure can detect meaningful closure in synthetic and empirical multi-edge networks, where conventional approaches fail. This work is a cornerstone in driving inferential network analyses from the analysis of binary networks towards the analyses of multi-edge and weighted networks, which offer a more realistic representation of social interactions and relations. |
Abstract
ISIS and similar extremist communities are increasingly using forums in the darknet to connect with each other and spread news and propaganda. In this paper, we attempt to understand their network in an online forum by using descriptive statistics, an exponential random graph model (ERGM) and Topic Modeling. Our analysis shows how the cohesion between active members forms and grows over time and under certain thread topics. We find that the top attendants of the forum have high centrality measures and other attributes of influencers. |
Abstract
This paper examines quantity and quality superposter value creation within Coursera Massive Open Online Courses (MOOC) forums using a social network analysis (SNA) approach. The value of quantity superposters (i.e. students who post significantly more often than the majority of students) and quality superposters (i.e. students who receive significantly more upvotes than the majority of students) is assessed using Stochastic Actor-Oriented Modeling (SAOM) and network centrality calculations. Overall, quantity and quality superposting was found to have a significant effect on tie formation within the discussion networks. In addition, quantity and quality superposters were found to have higher-than-average information brokerage capital within their networks. |
Abstract
Geotagging on social media has become an important proxy for understanding people's mobility and social events. Research that uses geotags to infer public opinions relies on several key assumptions about the behavior of geotagged and non-geotagged users. However, these assumptions have not been fully validated. Lack of understanding the geotagging behavior prohibits people further utilizing it. In this paper, we present an empirical study of geotagging behavior on Twitter based on more than 40 billion tweets collected from 20 million users. There are three main findings that may challenge these common assumptions. Firstly, different groups of users have different geotagging preferences. For example, less than 3% of users speaking in Korean are geotagged, while more than 40% of users speaking in Indonesian use geotags. Secondly, users who report their locations in profiles are more likely to use geotags, which may affects the generability of those location prediction systems on non-geotagged users. Thirdly, strong homophily effect exists in users' geotagging behavior, that users tend to connect to friends with similar geotagging preferences. |
Abstract
Twitter bots have evolved from easily detectable, unsophisticated looking content spammers and intrusive identities to deceptive key players embedded in deep levels of the social networks, silently promoting affiliate campaigns, marketing premium versions of online products, and orchestrating coordinated political movements. Recently, multiple works on social bots on Twitter have discussed this paradigm shift, moving from building highly accurate machine learning classifiers to identifying individual bots towards focusing on the operations and existence of bots in a collective manner. In this work, we study two different families of Twitter bots which have been studied before for showing spamming activities through advertisement and political campaigns and perform an evolutionary comparison with the new waves of bots recently identified. We uncover various evolved tendencies of the new wave of social bots under social, communication, and behavioral patterns. Results show that those bots demonstrate evolved core-periphery structure, deeply embedded and robust communication networks, complex information diffusion patterns, heterogeneous content authoring patterns, mobilization of leaders across communication roles, and presence of niche topic communities, which have made them highly deceptive as well as more effective in their operations than their traditional counterparts. Finally, we conclude our work by discussing possible applications of the discovered behavioral and social traits of the evolved bots to build highly robust and effective bot detection systems. |
Abstract
In this paper, we present a framework for predicting the personality traits by analyzing tweets written in Turkish. The prediction model is constructed with a clustering based approach. Since the model is based on linguistic features, it is language specific. The prediction model uses features applicable to Turkish language and related to writing style of Turkish Twitter users. Our approach uses anonymous BIG5 questionnaire scores of volunteer participants as the ground truth in order to generate personality model from Twitter posts. Experiment results show that constructed model can predict personality traits of Turkish Twitter users with relatively small errors. |
Abstract
The focus of point-of-interest recommendation techniques is to suggest a venue to a given user that would match the users' interests and is likely to be adopted by the user. Given the multitude of venues and the sparsity of user check-ins, the problem of recommending venues has shown to be a difficult task. Existing literature has already explored various types of features such as geographical distribution, social structure and temporal behavioral patterns to make a recommendation. In this paper, we propose a new set of features derived based on the neural embeddings of venues and users. We show how the neural embeddings for users and venues can be jointly learnt based on the prior check-in sequence of users and then be used to define three types of features, namely user, venue, and user-venue interaction features. These features are integrated into a feature-based matrix factorization model. Our experiments show that the features defined over the user and venue embeddings are effective for venue recommendation. |
Abstract
Social media has grown to be the place for voicing one’s opinions, sharing information, and shaping discourse. Individuals use social media as a platform to mobilize, coordinate, and conduct cyber campaigns ranging from awareness for diseases or disorders to deviant acts threatening democratic principles and institutions. Blogosphere has continued to rise and afford an effective medium for content framing. With no restriction on the number of characters, many use blogs to set narratives then use other social media channels like Twitter and Facebook to steer their audience to their blogs. Blog content is not structured and hard to collect than other social media channels. Blog monitoring and analysis could be of great use to sociologists, political scientists, communication researchers, journalists, and information scientists to examine events. Toward this direction, we present Blogtrackers tool, which is designed to explore the blogosphere and gain insights on various events. Blogtrackers can help in identifying leading information actors, influential bloggers, popular and emerging trends assess tones, sentiments and opinions, extract entities, and analyze their networks Blog analysis, Blogtrackers, Blog monitoring, Social media |
Abstract
YouTube is the second most popular website in the world. Over 300 hours worth of videos are uploaded every single minute and 5 billion videos are watched every day - almost one video per person worldwide. Because videos can deliver a complex message in a way that captures the audience's attention more effectively than text-based platforms, it has become one of the most relevant platform in the age of digital mass communication. This makes the analysis of YouTube content and user behavior invaluable to information scientists but also communication researchers, journalists, sociologists, and many more. There exists a number of YouTube analysis tools but none of them provide in-depth qualitative and quantitative insights into user behavior or networks. To that direction, we introduce YouTubeTracker - a tool designed to gather YouTube data and gain insights on content and users. It can help identify leading actors, networks and spheres of influence, emerging popular trends, as well as user opinion. The analysis of user engagement and social networks even reveals suspicious and inorganic behaviors (e.g., trolling, botting) causing algorithmic manipulations. |
Abstract
In recent times we have seen a trend of having the ideologies of the two dominant political parties in the U.S. growing further and further apart. Simultaneously we have entered the age of big data raising enormous interest in computational approaches to solve problems in many domains such as political elections. However, an overlooked problem lies in predicting what happens once our elected officials take office, more specifically, predicting the congressional votes, which are perhaps the most influential decisions being made in the U.S. This, nevertheless, is far from a trivial task, since the congressional system is highly complex and heavily influenced by both ideological and social factors. Thus, dedicated efforts are required to first effectively identify and represent these factors, then furthermore capture the interactions between them. To this end, we proposed a robust end-to-end framework Multi-Factor Congressional Vote Prediction (MFCVP) that defines and encodes features from indicative ideological factors while also extracting novel social features. This allows for a principled expressive representation of the complex system, which ultimately leads to MFCVP making accurate vote predictions. Experimental results on a dataset from the U.S. House of Representatives shows the superiority of MFCVP to several representatives approaches when predicting votes for individual representatives and also the overall outcome of the bill voted on. Finally, we perform a factor analysis to understand the effectiveness and interplay between the different factors. |
Abstract
Because most of language production is outside conscious control, it provides a channel that is of considerable interest in intelligence contexts. Analytics applied to natural language operates in two modes. The first might be called reverse engineering and tries to infer properties of speakers and writers such as their attitudes, personality, affective state (moods and emotions), and mental health. The second looks how language is being leveraged, both consciously and unconsciously, for purposes such as influence, propaganda, and abuse. There is some overlap between these two modes. For example, understanding the language production mechanisms of abusive language helps us detect and block it, but also understand the mental state of those who produce it, and their tactics for using it. |
Abstract
Climbing the career ladder to a senior executive position is a long and complex
process that, nevertheless, many people are trying to master. Over the last decades, the number of people providing their CVs on professional online social networks,
such as LinkedIn is growing. New methods of pattern detection raise the question of whether online CVs provide insights into career patterns and paths. The respective
hypothesis is that online CVs map people’s careers and therefore build the ideal data set to detect career patterns. To test this hypothesis, 100.006 online CVs were
downloaded and preprocessed. This paper presents initial results of one educational and one internship variable. Whereas a higher degree positively predicts career level,
having made an internship negatively relates to career level. These results reveal that rather than objectively mirroring people’s career trajectories, online career
platforms provide selective information. The information of online CVs and the respective career level is intermingled, i.e. people with a high career level present
different parts of their careers than people on lower levels. Furthermore, self-presentational effects might have an impact. The effect on similar research and possible
implications are discussed. |
Abstract
TBA |
Abstract
The structure of the urban setting determines the crime patterns. This research explores the street profile analysis which is a new method for analyzing crime in relation to street networks. Street profile analysis can be used to identify crime surges or heavy concentrations of crime along roadways. In this study, the street profile technique is combined with a discrete calculus approach to locate the boundaries of small criminal spaces in the City of Vancouver, British Columbia, Canada. This experimental technique utilizes open source property crime data from the Vancouver Police Department to analyze crime patterns within Vancouver. This computational crime analysis technique is described in detail and the utility of this technique explored. The new technique is a valuable tool for the intelligence and security informatics communities. |
Abstract
In most cases, feature sets available for machine learning algorithms require a feature engineering approach to pick the subset for optimal performance. During our link prediction research, we had observed the same challenge for features of Location Based Social Networks (LBSNs). We applied multiple reduction approaches to avoid performance issues caused by redundancy and relevance interactions between features. One of the approaches was the custom two-step method; starts with clustering features based on the proposed interaction related similarity measurement and ends with non-monotonically selecting optimal feature subset from those clusters. In this study, we applied well-known generic feature reduction algorithms together with our custom method for LBSNs to evaluate novelty and verify the contributions. Results from multiple data groups depict that our custom feature reduction approach makes higher and more stable effectivity optimizations for link prediction when compared with others. |
Abstract
Online social media platforms have made the world more connected than ever before, thereby making it easier for everyone to spread their content across a wide variety of audiences. Twitter is one such popular platform where people publish tweets to spread their messages to everyone. Twitter allows users to Retweet other users' tweets in order to broadcast it to their network. The more retweets a particular tweet gets, the faster it spreads. This creates incentives for people to obtain artificial growth in the reach of their tweets by using certain blackmarket services to gain inorganic appraisals for their content. In this paper, we attempt to detect such tweets that have been posted on these blackmarket services in order to gain artificially boosted retweets. We use a multitask learning framework to leverage soft parameter sharing between a classification and a regression based task on separate inputs. This allows us to effectively detect tweets that have been posted to these blackmarket services, achieving an F1-score of 0.89 when classifying tweets as blackmarket or genuine. |
Abstract
Bots are often identified on social media due to their behavior. How easily are they identified, however, when they are dormant and exhibit no measurable behavior at all, except for their silence? We identified “dormant bot networks” positioned to influence social media discourse surrounding the 2018 U.S. senate election. A dormant bot is a social media persona that does not post content yet has large follower and friend relationships with other users. These relationships may be used to manipulate online narratives and elevate or suppress certain discussions in the social media feed of users. Using a simple structure-based approach, we identify a large number of dormant bots created in 2017 that begin following the social media accounts of numerous US government politicians running for re-election in 2018. Findings from this research were used by the U.S. Government to suspend dormant bots prior to the elections to prevent any malign influence campaign. Application of this approach by social media providers may provide a novel method to reduce the risk of content manipulation for online platforms. |
Abstract
When a group of people strives to understand new information, struggle ensues as various ideas compete for attention. Steep learning curves are surmounted as teams learn together. To understand how these team dynamics play out in software development, we explore Git logs, which provide a complete change history of software repositories. In these repositories, we observe code additions, which represent successfully implemented ideas, and code deletions, which represent ideas that have failed or been superseded. By examining the patterns between these commit types, we can begin to understand how teams adopt new information. We specifically study what happens after a software library is adopted by a project, \ie, when a library is used for the first time in the project. We find that a variety of factors, including team size, library popularity, and prevalence on Stack Overflow are associated with how quickly teams learn and successfully adopt new software libraries. |
Abstract
Online discussions assemble people to talk about various types of topics and to share information. People progressively develop the affinity, and they get closer as frequently as they mention themselves in messages and they send positive messages to one another. We propose an algorithm, named HAR-search, for discovering hidden affinity relationships between individuals. Based on Markov Chain Models, we derive the affinity scores amongst individuals in an online community. We show that our method allows to track the evolution of the affinity over time and to predict affinity relationships arisen from the influence of certain community members. The comparison with the state-of-the-art method shows that our method results in robust discovery and considers minute details. |
Abstract
We investigate via simulation the conditions under which a new social convention promoted by a persistent minority can spread in a network polarized into two communities. Our experiments show how the fractional size of the persistent minority and its network location affect the network's tipping point. Specifically, we discover that i)~previously research on the size of the critical mass need for widespread adoption does not apply in the presence of polarized communities; ii)~spread is more affected by the position of the persistent minority than its size; and iii)~topological properties of the nodes in the persistent minority, such as leverage centrality and degree, are relevant for the diffusion of the new social convention. |
Abstract
We use Twitter streaming API for many purposes like monitoring brands and discovering events. Because Twitter Streaming API only allows tracking words (commonly called `search-terms'), the data collection goal needs to be formulated in terms of search terms. Twitter limits the number of search terms that can be tracked using the API, and the number of tweets retrieved per search-term depends on the terms being tracked. Therefore it's crucial to use a small set of highly relevant terms for tracking. Because social media is very dynamic and conversations evolve fast, the search terms that are relevant now might be less useful in as short of time as an hour. Manual monitoring of such discussions to update the search terms is cumbersome, error-prone and expensive. Can we have an algorithm to update the search terms based on the goals of the dataset collection? Taking inspiration from the knapsack bandits problem that effectively handle exploration (new search terms to explore) and exploitation (keep using useful search terms) when resources (network bandwidth, disk capacity or number of search terms) are constrained, we propose a new approach to dynamically update the search terms based on the goals of the data collection. |
Abstract
In this paper we examine the effectiveness of arange of pre-trained language representations in determining the informativeness and information type of social media in adisaster scenario. Within the context of disaster tweet analysiswe aim to accurately analyse tweets while minimising both false positive and false negatives in the automated information analysis. We perform our investigation across a number of wellknown disaster-related twitter datasets. We examine performanceusing models that are built from pre-trained word embeddings from Word2Vec, GloVE, ELMo and BERT. Given the relativeubiquity of BERT as a standout language representation in recenttimes we expected BERT to dominate our results. However, wefound the results were more diverse, with classical Word2Vec andGloVE both displaying strong results. As part of the analysis we discuss some challenges related to automated twitter analysis including the fine-tuning of language models to disaster related scenarios. |
Abstract
Community detection is a discovery tool used by network scientists to analyze the structure of real-world networks. It seeks to identify natural divisions that may exist in the input networks that partition the vertices into coherent modules (or communities). While this problem space is rich with efficient algorithms and software, most of this literature caters to the static use-case where the underlying network does not change. However, many emerging real-world use-cases give rise to a need to incorporate dynamic graphs as inputs. In this paper, we present a fast and efficient incremental approach toward dynamic community detection. The key contribution is a generic technique called ∆-screening, which examines the most recent batch of changes made to an input graph and selects a subset of vertices to reevaluate for potential community (re)assignment. This technique can be incorporated into any of the community detection methods that use modularity as its objective function for clustering. For demonstration purposes, we incorporated the technique into two well-known community detection tools. Our experiments demonstrate that our new incremental approach is able to generate performance speedups without compromising on the output quality (despite its heuristic nature). For instance, on a real-world network with 63M temporal edges (over 12 time steps), our approach was able to complete in 1056 seconds, yielding a 3× speedup over a baseline implementation. In addition to demonstrating the performance benefits, we also show how to use our approach to delineate appropriate intervals of temporal resolutions at which to analyze an input network. |
Abstract
Influence maximization (IM) has been extensively studied for better viral marketing. However, previous works put less emphasis on how balancedly the audience are affected across different communities and how diversely the seed nodes are selected. In this paper, we incorporate audience diversity and seed diversity into the IM task. From the model perspective, in order to characterize both influence spread and diversity in our objective function, we adopt three commonly used utilities in economics (i.e., Perfect Substitutes, Perfect Complements and Cobb-Douglas). We validate our choices of these three functions by showing their nice properties. From the algorithmic perspective, we present various approximation strategies to maximize the utilities. In audience diversification, we propose a solution-dependent approximation algorithm to circumvent the hardness results. In seed diversification, we prove a ($1/e-\epsilon$) approximation ratio based on non-monotonic submodular maximization. Experimental results show that our framework outperforms other natural heuristics both in utility maximization and result diversification. |
Abstract
University students routinely use the tools provided by online course ranking forums to share and discuss their satisfaction with the quality of instruction and content in a wide variety of courses. Student perception of the efficacy of pedagogies employed in a course is a reflection of a multitude of decisions by professors, instructional designers and university administrators. This complexity has motivated a large body of research on the utility, reliability and behavioral correlates of course rankings. There is, however, little investigation of the (potential) implicit student bias on these forums towards desirable course outcomes at the institution level. To that end, we examine the connection between course outcomes (student-reported GPA) and the overall ranking of the primary course instructor, as well as rating disparity by nature of course outcomes, based on data from two popular academic rating forums. Our experiments with ranking data about over ten thousand courses taught at Virginia Tech and its 25 SCHEV-approved peer institutions indicate that there is a discernible albeit complex bias towards course outcomes in the professor ratings registered by students. |
Abstract
Some of the most effective influential spreader detection algorithms are unstable to small perturbations of the network structure. Inspired by bagging in Machine Learning, we propose the first Perturb and Combine (P&C) procedure for networks. It (1) creates many perturbed versions of a given graph, (2) applies a node scoring function separately to each graph, and (3) combines the results. Experiments conducted on real-world networks of various sizes with the k-core, generalized k-core, and PageRank algorithms reveal that P&C brings substantial improvements. Moreover, this performance boost can be obtained at almost no extra cost through parallelization. Finally, a bias-variance analysis suggests that P&C works mainly by reducing bias, and that therefore, it should be capable of improving the performance of all vertex scoring functions, including stable ones. |
Abstract
Today, with the emergence of various business review sites such as Yelp, Trip Advisor, and Zomato, people can write reviews and provide an assessment (often as 1-5 score rating). The success of a business on the crowd-sourced review platform has taken the form of positive reviews and high star ratings (failure are associated with negative reviews and low star ratings). We often claim that location plays a major role in determining the success or the failure of a given business. This paper attempts to verify this claim and quantifies the impact of location, solely, on business success, using two data sets; a Yelp dataset for business information and reviews, and another Location dataset that gathers location-based information in a city or an area. We perform an empirical study to quantify the impact of (i) relative location to well known landmarks and (ii) parameterized location (such as cost of living in a given zip code), on the success of restaurants. In our study, we found that parameterized location using location characteristic parameters such as housing affordability correlate highly with restaurant success with more than 0.81 correlation ratio. We also observe that the closer the restaurant to a landmark (relative location) the more likelihood it succeeds. |
Abstract
Understanding the spread of false information in social networks has gained a lot of attention recently. In this paper, we explore the community structures in determining how people get exposed to fake news. We are inspired by the idea of Computational Trust in social networks to propose a novel Community Health Assessment model. We define the concepts of neighbor, boundary and core nodes of a community and propose appropriate metrics to quantify the vulnerability of nodes (individual-level) and communities (group-level) to spreading fake news. |
Abstract
In recent years, substantial effort has been devoted to learning to represent the static graphs and their substructures. A few studies explored utilizing temporal information available in a dynamic setting in order to address the node representation learning. However, the representation learning problem for the entire graph in a dynamic context is yet to be addressed. In this paper, we propose an unsupervised encoder-decoder framework that projects a dynamic graph at each time step into a d-dimensional space, taking into account both the graph's topology and dynamics. We investigate two different strategies. First, we address the representation learning problem by auto-encoding the graph dynamics. Second, we formulate a graph prediction problem and enforce the encoder to learn the representation that an autoregressive decoder then uses to predict the future of a dynamic graph. Gated graph neural networks (GGNNs) are incorporated to learn the topology of the graph at each time step and Long short-term memory networks (LSTMs) are leveraged to propagate the temporal information among the nodes through time. We demonstrate the efficacy of our approach with a graph classification task using two real-world datasets of animal behaviour and brain networks. |
Abstract
Community detection is an important task in social network analysis, allowing us to identify and understand the communities within the social structures provided by the network. However, many community detection approaches either fail to assign low-degree (or lowly connected) users to communities, or assign them to trivially small communities that prevent them from being included in analysis. In this work we investigate how excluding these users can bias analysis results. We then introduce an approach that is more inclusive for lowly connected users by incorporating them into larger groups. Experiments show that our approach outperforms the existing state-of-the-art in terms of F1 and Jaccard similarity scores while reducing the bias towards low-degree users. |
Abstract
Social Media platforms have become common venue for sharing experiences and knowledge about health-related topics. This research focuses on examining social media based communication patterns related to diabetes on the Twitter platform. Specifically, we apply an updated methodology to examine changes in the current use of hash-tags, trending hash-tags, and the frequency of diabetes-related tweets using a previous study as a baseline. Our results show significant growth in the diabetes community on Twitter over time and also evidence that this community is increasing in it's capacity to spread awareness around diabetes related health topics. Our methodological contributions include an improved framework for collecting, cleaning and analyzing Twitter data related to diabetes as well as the application of regular expressions to categorize subsets of Tweets. |
Abstract
Fake news and rumors constitute a major problem in social networks recently. Due to the fast information propagation in social networks, it is inefficient to use human labor to detect suspicious news. Automatic rumor detection is thus necessary to prevent devastating effects of rumors on the individuals and society. Previous work has shown that in addition to the content of the news/posts and their contexts (i.e., replies), the relations or connections among those components are important to boost the rumor detection performance. In order to induce such relations between posts and contexts, the prior work has mainly relied on the inherent structures of the social networks (e.g., direct replies), ignoring the potential semantic connections between those objects. In this work, we demonstrate that such semantic relations are also helpful as they can reveal the implicit structures to better capture the patterns in the contexts for rumor detection. We propose to employ the self-attention mechanism in neural text modeling to achieve the semantic structure induction for this problem. In addition, we introduce a novel method to preserve the important information of the main news/posts in the final representations of the entire threads to further improve the performance for rumor detection. Our method matches the main post representations and the thread representations by ensuring that they predict the same latent labels in a multi-task learning framework. The extensive experiments demonstrate the effectiveness of the proposed model for rumor detection, yielding the state-of-the-art performance on recent datasets for this problem. |
Abstract
Cybersecurity event detection is a crucial problem for mitigating effects on various aspects of society. Social media has become a notable source of indicators for detection of diverse events. Though previous social media based strategies for cybersecurity event detection focus on mining certain event-related words, the dynamic and evolving nature of online discourse limits the performance of these approaches. Further, because these are typically unsupervised or weakly supervised learning strategies, they do not perform well in an environment of biased samples, noisy context, and informal language which is routine for online, user-generated content. This paper takes a supervised learning approach by proposing a novel multi task learning based model. Our model can handle diverse structures in feature space by learning models for different types of potential high-profile targets simultaneously. For parameter optimization, we develop an efficient algorithm based on the alternating direction method of multipliers. Through extensive experiments on a real world Twitter dataset, we demonstrate that our approach consistently outperforms existing methods at encoding and identifying cybersecurity incidents. |
Abstract
Differentiating between real and fake news propagation through online social networks is an important issue in many applications. The time gap between the news release time and detection of its label is a significant step towards broadcasting the real information and avoiding the fake. Therefore, one of the challenging tasks in this area is to identify fake and real news in early stages of propagation. However, there is a trade-off between minimizing the time gap and maximizing accuracy. Despite recent efforts in detection of fake news, there has been no significant work that explicitly incorporates early detection in its model. The proposed method utilizes recurrent neural networks with a novel loss function, and a new stopping rule. Experiments on real datasets demonstrate the effectiveness of our model both in terms of early labelling and accuracy, compared to the state of the art baseline and models. |
Abstract
This paper presents a non-linear optimization methodology for determining the Nash Equilibrium (NE) solutions of a non-cooperative two-player game. Each player, in particular, is trying to maximize a rational profit function within a continuous action space. The game arises in the context of a duopolistic network environment where two identical rival firms are competing to maximize their influence over a single consumer. Specifically, we consider a weighted and strongly connected network which mediates the opinion formation processes concerning the perceived qualities of their products. Obtaining the NE solutions for such a game is an extremely difficult task which cannot be analytically addressed, even if additional simplifying assumptions are imposed on the exogenous parameters of the model. Our approach, obtains the required NE solutions by combining the Karush-Kuhn-Tucker (KKT) conditions associated with the original optimization tasks into a single-objective nonlinear maximization problem under nonlinear constrains. The resulting optimization problem is, ultimately, solved through the utilization of the Sequential Quadratic Programming (SQP) algorithm which constitutes a state-of-the-art method for nonlinear optimization problems. The validity of our work is justified through the conduction of a series of experiments in which we simulated the best response-based dynamical behaviour of the two agents in the network that make strategic decisions. Juxtaposing the intersection points of the acquired best response curves against the NE solutions obtained by the proposed non-linear optimization methodology verifies that the corresponding solution points coincide. |
Abstract
Graph embedding is an important approach for graph analysis tasks such as node classification and link prediction. The goal of graph embedding is to find a low dimensional representation of graph nodes that preserves the graph information. Recent methods like Graph Convolutional Network (GCN) try to consider node attributes (if available) besides node relations and learn node embeddings for unsupervised and semi-supervised tasks on graphs. On the other hand, multi-layer graph analysis has been received attention recently. However, the existing methods for multi-layer graph embedding cannot incorporate all available information (like node attributes). Moreover, most of them consider either type of nodes or type of edges, and they do not treat within and between layer edges differently. In this paper, we propose a method called MGCN that utilizes the GCN for multi-layer graphs. MGCN embeds nodes of multi-layer graphs using both within and between layers relations and nodes attributes. We evaluate our method on the semi-supervised node classification task. Experimental results demonstrate the superiority of the proposed method to other multi-layer and single-layer competitors and also show the positive effect of using cross-layer edges. |
Abstract
With the ever-increasing importance of computer-mediated communication in our everyday life, understanding the effects of social influence in online social networks has become a necessity. In this work, we argue that cascade models of information diffusion do not adequately capture attitude change, which we consider to be an essential element of social influence. To address this concern, we propose a topical model of social influence and attempt to establish a connection between influence and Granger-causal effects on a theoretical and empirical level. While our analysis of a social media dataset finds effects that are consistent with our model of social influence, evidence suggests that these effects can be attributed largely to external confounders. The dominance of external influencers, including mass media, over peer influence raises new questions about the correspondence between objectively measurable information diffusion and social influence as perceived by human observers. |
Abstract
Modern graph embedding procedures can efficiently process graphs with millions of nodes. In this paper, we propose GEMSEC - a graph embedding algorithm which learns a clustering of the nodes simultaneously with computing their embedding. GEMSEC is a general extension of earlier work in the domain of sequence-based graph embedding. GEMSEC places nodes in an abstract feature space where the vertex features minimize the negative log-likelihood of preserving sampled vertex neighborhoods, and it incorporates known social network properties through a machine learning regularization. We present two new social network datasets and show that by simultaneously considering the embedding and clustering problems with respect to social properties, GEMSEC extracts high-quality clusters competitive with or superior to other community detection algorithms. In experiments, the method is found to be computationally efficient and robust to the choice of hyperparameters. |
Abstract
As deep learning models are getting popular, upgrading the retrieval-based content recommendation system to the learning-based system is highly demanded. However, efficiency is a critical issue. For article recommendation, an effective neural network which generates a good representation of the article content could prove useful. Hence, we propose PGA-Recommender, a phrase-guided article recommendation model which mimics the process of human behavior -- first browsing, then guided by key phrases, and finally aggregating the gleaned information. As this can be performed independently offline, it is thus compatible with current commercial retrieval-based (keyword-based) article recommender systems. A total of six months of real logs -- from Apr 2017 to Sep 2017 -- were used for experiments. Results show that PGA-Recommender outperforms different state-of-the-art schemes including session-, collaborative filter-, and content-based recommendation models. Moreover, it suggests a diverse mix of articles while maintaining superior performance in terms of both click and view predictions. The results of A/B tests show that simply using the backward version of PGA-Recommender yields 40% greater click-through rates as compared to the retrieval-based system when deployed to a language of which we have zero knowledge. |
Abstract
Social networks have become the main platforms for information dissemination. Nevertheless, due to the increasing number of users, social media platforms tend to be highly vulnerable to the propagation of disinformation - making the detection of fake news a challenging task. In this work, we focus on content-based methods for detecting fake news - casting the problem to a binary text classification one (an article corresponds to either fake news or not). In particular, our work proposes a graph-based semi-supervised fake news detection method based on graph neural networks. The experimental results indicate that the proposed methodology achieves better performance compared to traditional classification techniques, especially when trained on limited number of labeled articles. |
Abstract
Due to climate change and the effects of geopolitical and social challenges like the refugee crisis in Europe, the world is facing an unprecedented set of humanitarian problems. According to the United Nations, there is a projected funding shortfall of more than 20 billion dollars in addressing these needs. Technology can play a vital role in mitigating this burden, especially with the advent of real-time social media and advances in areas like Natural Language Processing and machine learning. An important problem addressed by machine learning in current crisis informatics platforms is situation labeling, which can be intuitively defined as semi-automatically assigning one or more actionable labels (such as food, medicine or water) to tweets or documents from a controlled vocabulary. Despite multiple advances, current situation labeling systems are noisy and do not generalize very well to arbitrary crisis data. Consequentially, consumers of these outputs (which include humanitarian responders) are unwilling to trust these outputs without due diligence or provenance. In this paper, we demonstrate an interactive visualization platform called SAVIZ that provides non-technical first responders with such capabilities. SAVIZ is completely built using open-source technologies, can be rendered on a web browser and is backward-compatible with several pre-existing crisis intelligence platforms. We use two real-world scenarios (the 2015 earthquake in Nepal, and the unfolding Ebola crisis in Africa) to illustrate the potential of SAVIZ. |
Abstract
Download fraud is a prevalent threat in mobile App markets, where fraudsters manipulate the number of downloads of Apps via various cheating approaches. Purchased fake downloads can mislead recommendation and search algorithms and further lead to bad user experience in App markets. In this paper, we investigate download fraud problem based on a company's App Market, which is one of the most popular Android App markets. We release a honeypot App on the App Market and purchase fake downloads from fraudster agents to track fraud activities in the wild. Based on our interaction with the fraudsters, we categorize download fraud activities into three types according to their intentions: boosting front end downloads, optimizing App search ranking, and enhancing user acquisition&retention rate. For the download fraud aimed at optimizing App search ranking, we select, evaluate, and validate several features in identifying fake downloads based on billions of download data. To get a comprehensive understanding of download fraud, we further gather stances of App marketers, fraudster agencies, and market operators on download fraud. The followed analysis and suggestions shed light on the ways to mitigate download fraud in App markets and other social platforms. To the best of our knowledge, this is the first work that investigates the download fraud problem in mobile App markets. |
Abstract
Many researchers have extensively studied the importance of social media in every aspect of human liferecent years.There are several instances in which information collected from social media can help not only companies and political parties to design their strategy, but also can save human lives if appropriately analyzed and used. There are numerous cases in which information posted to social media in emergencies and natural disasters was used by the emergency responders to get immediate access to the areas in need or by authorities to acquire a better understanding of the affected areas. In our work, we attempt to test whether information from social media can improve the effectiveness of first responders in the areas of refugee crisis and especially in the Mediterranean Sea.Hundreds of thousands of people attemptto cross the Mediterranean sea and enter Europebut several of them lose their lives because of the dangerous boats that smugglers use.We simulated a Search and Rescue (SAR) mission in a virtual environment which takes place in the 3D world of a real Greek islandand we tested our hypothesis that social media can help rescuers to locate people in need by applying the visual search theory.The experimental results were very promising for the specific application of social media surveillance. |
Abstract
This paper proposes a novel approach to efficiently compute the exact closeness centrality values of all nodes in dynamically evolving directed and weighted networks. Closeness centrality is one of the most frequently used centrality measures in the field of social network analysis. It uses the total distance to all other nodes to determine node centrality. Previous work has addressed the problem of dynamically updating closeness centrality values for either undirected networks or only for the top-k nodes in terms of closeness centrality. Here, we propose a fast approach for exactly computing all closeness centrality values at each timestamp of directed and weighted evolving networks. Such networks are prevalent in many real-world situations. The main ingredients of our approach are a combination of work filtering methods and efficient incremental updates that avoid unnecessary recomputation. We tested the approach on several real-world datasets of dynamic small-world networks and found that we have mean speed-ups of about 33 times. In addition, the method is highly parallelizable. |
Abstract
Due to easy dissemination of news in social media and the Web, there has been an increasing rise of disinformation on important political issues like elections in recent years. Computational solutions for automatic bias and sensationalism detection for news articles can have tremendous impact if used in the right way. Because news is an ever-shifting domain, concept drift is an issue that must be dealt with in any real-world computational news classification system that relies on features and trained machine learning models. Yet, an empirical study of concept drift in such systems, especially popular systems released recently as open-source and used within organizations, has been lacking thus far. This short paper reports results on an empirical study specifically designed to assess concept drift, using an open-source, popular computational news classification system, on real news data crawled from the Web. We find that even a gap of two years (2017 vs. 2019) can lead to significant concept drift, a far narrower gap than observed in traditional machine learning domains, making deployment of pre-trained or openly available computational news classification models an ethically suspect issue. |
Abstract
Sexual violence is a serious problem across the globe. A lot of victims, particularly women, go through this experience. Unfortunately, not all of these violent incidents come to public. A large portion of victims don’t disclose their experience. On the September of 2018, people started revealing in Twitter why they didn’t report a sexual violence experience using a hashtag #WhyIDidntReport. We collect about 40K such tweets and conduct a large-scale supervised analysis of why victims don’t report. Our study finds the extent to which people shared their reasons as well as categorizes the reasons into finer reasons. We also analyze user engaged with he victims and compare our findings with existing literature. |
Abstract
As opposed to manual feature engineering which is tedious and difficult to scale, network embedding has attracted a surge of research interests as it automates the feature learning on graphs. The learned low-dimensional node vectors ease the knowledge discovery on graphs by enabling various off-the-shelf machine learning tools to be directly applied. Recent research has shown that the past decade of network embedding approaches either explicitly factorize a carefully designed matrix or are closely related to implicit matrix factorization, with the fundamental assumption that the factorized node connectivity matrix is low-rank. Nonetheless, the global low-rank assumption does not necessarily hold especially when the factorized matrix encodes complex node interactions, and the resultant single low-rank embedding matrix is insufficient to capture all the observed connectivity patterns. In this regard, we propose a novel multi-level network embedding framework BoostNE, which can learn multiple node embeddings of different granularity from coarse to fine without imposing the prevalent global low-rank assumption. The proposed BoostNE method is also in line with the successful gradient boosting method in ensemble learning. We demonstrate the superiority of the proposed BoostNE framework by comparing it with existing state-of-the-art network embedding methods on various datasets. |
Abstract
This paper offers a synthesis of a new analytical procedure based on the complementary use of a large number of methods and techniques for categorization of objects, pattern recognition and for structural analysis. It represents an example of a functional clustering [17] and an extension to the ‘posterior methods’ for clusterization [31]. We call this approach Multi-Stage Clustering (MSC), as it applies cluster analysis methods at three distinctive stages. We present the MSC and demonstrate its application to a business dataset of 275 multinational corporations (MNCs), aiming to address the inherent weaknesses of existing industrial classification tools designed to capture diversification of firms. We evaluate the outcomes from the MSC using a combination of complementary methods for structural analysis and data visualization, such as multi-dimensional scaling (MDS), network mapping (NM) and multiple correspondence analysis (MCA). The MSC is designed for the analysis of diversification patterns of MNCs, enabling the measurement of group competitiveness and performance across these patterns, known as industry segments, or strategic industry groups (SIGs). |
Abstract
Understanding the evolution of large-scale cooperation is important for the social welfare and stability of economic and social networks. Therefore, there is a need to model real-world scenarios that involve a trade-off between self-interest and social welfare with minimal artificial assumptions or constraints in a versatile framework. In this paper, we build an agent-based model to simulate the dynamics of a multi-agent, bilateral, resource-exchange network. We analyze how various strategies employed by communities can improve or hurt community payoffs as well as the overall social welfare of the network. We also analyze the role of common knowledge in inducing cooperation in the network. Our experimental evidence from simulations confirms that carefully-designed trading mechanisms can indeed encourage cooperation among communities with various motivations. |
Abstract
In October 2017, there happened the uprising of an unprecedented online movement on social media by women across the world who started publicly sharing their untold stories of being sexually harassed along with the hashtag #MeToo (or some variants of it). Those stories did not only strike the silence that had long hid the perpetrators, but also allowed women to discharge some of their bottled-up grievances, and revealed many important information surrounding sexual harassment. In this paper, we present our analysis of about one million such tweets collected between October 15 and October 31, 2017 that reveals some interesting patterns and attributes of the people, place, emotions, actions, and reactions related to the tweeted stories. Based on our analysis, we also advance the discussion on the potential role of online social media in breaking the silence of women by factoring in the strengths and limitations of these platforms. |
Abstract
Twitter provides important information for emergency responders in the rescue process during disasters. However, tweets containing relevant information are sparse and are usually hidden in a vast set of noisy contents. This leads to inherent challenges in generating suitable training data that are required for neural network models. In this paper, we study the problem of retrieving the infrastructure damage information from tweets generated from different location during crisis using the model actively trained on past but similar events. We combine RNN and GRU based model coupled with active learning that gets trained on most uncertain samples and captures the latent features of different data distribution. It reduces the uses of around 90% less training data, thereby significantly reducing the manual annotation efforts. We use the model pre-trained using active learning based approach to retrieve the infrastructure damage tweets originated from different regions. We obtain a minimum of 18% gain on F1-measure and considerably on other metrics over recent state-of-the-art IR techniques. |
Abstract
Cyber threat such as malware and exploit have causes significant losses to the economy and has become a lucrative form of illicit business by leveraging the darkweb as a communication channel. To understand more about the emerging cyber threats of attacking tools and its actors, a threat intelligence collecting mechanism is proposed for identifying the emerging threat. With crowdsourcing intelligence and public threat intelligence such as NVD and CERT, it is able to leverage multiple sources of information and provide domain-specific security intelligence. In addition, we propose a network-based darkweb cyberthreat alert model, which can well represent and visualize actors’ similarity and thus uncover the vulnerable vendor (organization) exposed in the underground markets. |
Abstract
In this paper, we propose two approaches to the problem of finding similar users to a set of champions representing domains of interest on social media. The first approach is based on the content shared by the users, while the second one relies on the social network connections (following, followers, and mentions). Given a small set of champions accounts, we construct a centroid and we rank candidates by computing their distance from the centroid. We compare the results both in terms of computational cost and results performance. Experiments show that social network connection features provide better performance, but they are computationally much more intensive. This approach can be used for providing highly reliable recommendations of the top-k instances which are most similar to a given target set, specified through examples rather than through specific properties. |
Abstract
With professional social networking sites (PSNS) networking opportunities basically have no limits. Industry experts, influencers, and knowledgeable people from all sorts of fields and from all over the world can potentially become part of your business network, providing access to new perspectives and new information. Connecting with a diverse set of people with different expertise and knowledge can enhance personal work performance and career advancement. Yet, many people are online mainly connected with others they know from their direct work environment. Despite a lot of research concerning the influences on networking behavior, there is no research investigating if people who know that networking can be beneficial network more. More importantly, there is no research investigating if people who know that a diverse network is important network more diversely and whether or not affective and technical influences interfere with the relationship between knowing and doing. In an experimental study (n = 316), we examine the effects of knowledge and website functionalities on professional networking in order to draw implications on how to improve PSNS to encourage people to build diverse business networks. We find that people who generally know that networking is beneficial, in fact, network more. Moreover, people who know that diversity is important network more diversely. Besides, technical features of the website (e.g. who is recommended) can influence people’s networking behavior. Finally, results are discussed and implications to improve PSNS are drawn. |
Abstract
Increasing use of social media in campaigns raises the question of whether one can predict the voting behavior of social-network users who do not disclose their political preferences in their online profiles. Prior work on this task only considered users who generate politically oriented content or voluntarily disclose their political preferences online. We avoid this bias by using a novel Bayesian-network model that combines demographic, behavioral, and social features; we apply this novel approach to the 2016 U.S. Presidential election. Our model is highly extensible and facilitates the use of incomplete datasets. Furthermore, our work is the first to apply a semi-supervised approach for this task: Using the EM algorithm, we combine labeled survey data with unlabeled Facebook data, thus obtaining larger datasets as well as addressing self-selection bias. |
Abstract
Link prediction is a common problem in network science that transects many disciplines. The goal is to forecast the appearance of new links or to find links missing in the network. Typical methods for link prediction use the topology of the network to predict the most likely future or missing connections between a pair of nodes. However, network evolution is often mediated by higher-order structures involving more than pairs of nodes; for example, cliques on three nodes (also called triangles) are key to the structure of social networks, but the standard link prediction framework does not directly predict these structures. To address this gap, we propose a new link prediction task called "pairwise link prediction" that directly targets the prediction of new triangles, where one is tasked with finding which nodes are most likely to form a triangle with a given edge. We develop two PageRank-based methods for our pairwise link prediction problem and make natural extensions to existing link prediction methods. Our experiments on a variety of networks show that diffusion based methods are less sensitive to the type of graphs used and more consistent in their results. We also show how our pairwise link prediction framework can be used to get better predictions within the context of standard link prediction evaluation. |
Abstract
Learning network representation has a variety of applications, such as network classification. Most existing work in this area focuses on static undirected networks and does not account for presence of directed edges or temporal changes. Furthermore, most work focuses on node representations that do poorly on tasks like network classification. In this paper, we propose a novel network embedding methodology, \emph{gl2vec}, for network classification in both static and temporal directed networks. \emph{gl2vec} constructs vectors for feature representation using static or temporal network graphlet distributions and a null model for comparing them against random graphs. We demonstrate the efficacy and usability of \emph{gl2vec} over existing state-of-the-art methods on network classification tasks such as network type classification and subgraph identification in several real-world static and temporal directed networks. We argue that \emph{gl2vec} provides additional network features that are not captured by state-of-the-art methods, which can significantly improve their classification accuracy by up to 10\% in real-world applications. |
Abstract
Cyberbullying, which often has a deeply negative impact on the victim, has grown as a serious issue in online social networks. Recently, researchers have created automated machine learning algorithms to detect Cyberbullying using social and textual features. However, the very algorithms that are intended to fight off one threat (cyberbullying) may inadvertently be falling prey to another important threat (bias of the automatic detection algorithms). This is exacerbated by the fact that while the current literature on algorithmic fairness has multiple empirical results, metrics, and algorithms for countering bias across immediately observable demographic characteristics (e.g. age, race, gender), there have been no efforts at empirically quantifying the variation in algorithmic performance based on the network role or position of individuals. We audit an existing cyberbullying algorithm using Twitter data for disparity in detection performance based on the network centrality of the potential victim and then demonstrate how this disparity can be countered using an Equalized Odds post-processing technique. The results pave the way for more accurate and fair cyberbullying detection algorithms. |
Abstract
Social sensing has emerged as a new sensing paradigm to observe the physical world by exploring the ''wisdom of crowd'' on social media. This paper focuses on the abnormal traffic event localization problem using social media sensing. Two critical challenges exist in the state-of-the-arts: i) ''content-only inference'': the limited and unstructured content of a social media post provides little clue to accurately infer the locations of the reported traffic events; ii) ''informal and scarce data'': the language of the social media post (e.g., tweet) is informal and the number of the posts that report the abnormal traffic events is often quite small. To address the above challenges, we develop SyntaxLoc, a syntax-based probabilistic learning framework to accurately identify the location entities by exploring the syntax of social media content. We perform extensive experiments to evaluate the SyntaxLoc framework through real world case studies in both New York City and Los Angeles. Evaluation results demonstrate significant performance gains of the SyntaxLoc framework over state-of-the-art baselines in terms of accurately identifying the location entities that can be directly used to locate the abnormal traffic events. |
Abstract
Winning a game in professional sports is the most significant matter for a team. All teams strive to bring their best performance to a game, and this requires considering all the possible lineups which coaches have available. Therefore, determining the lineup is more and more significant for a team in their winning endeavour. The ongoing result during a game defines the next decision coaches have to make to maintain or improve the outcome. Adaptive changes in a lineup of a team requires a complex decision making system. This system must consider the advantages, drawbacks, and previous experience about both teams' performance under similar situations. In order to analyze and predict lineups' performance, the authors create a directed, weighted, and signed network of all lineups that teams use against each other from 2007-2016 seasons in National Basketball Association (NBA) games. The proposed model uses machine learning and network analysis techniques to predict the performance of a lineup under a given situation by utilizing graph theory and Inverse Squared Metric. In order to evaluate the performance of the proposed method, several baseline models are established and results are compared. The final results over the span of ten years show that the proposed method in this paper improves the baseline results by 10% accuracy. The average of the best baseline results has an accuracy of 58% in lineup outcome prediction; however, the new method yields accuracy of 68%. |
Abstract
Recently, the use of crowdsourcing platforms (e.g., Amazon Mechanical Turk) has boomed because of their flexible and cost-effective nature, which benefits both requestors and workers. However, some requestors misused power of the crowdsourcing platforms by creating malicious tasks, which targeted manipulating search results, leaving fake reviews, etc. Crowdsourced manipulation reduces the quality of online social media, and threatens the social values and security of the cyberspace as a whole. To help solve this problem, we build a classification model which filters out malicious campaigns from a large number of campaigns crawled from several popular crowdsourcing platforms. We then build a blacklist web service, which provides users with a keyword-based search so that they can understand, moderate and eliminate potential malicious campaigns from the Web. |
Abstract
Livestreaming platforms enable content producers, or streamers, to broadcast creative content to a potentially large viewer base. Chatrooms form an integral part of such platforms, enabling viewers to interact both with the streamer, and amongst themselves. Streams with high engagement (many viewers and active chatters) are typically considered engaging, and often promoted to end users by means of recommendation algorithms, and exposed to better monetization opportunities via revenue share from platform advertising, viewer donations, and third-party sponsorships. Given such incentives, some streamers make use of fraudulent means to increase perceived engagement by simulating chatter via fake ``chatbots'' which can be purchased from shady online marketplaces. This inauthentic engagement can negatively influence recommendation, hurting streamer and viewer trust in the platform, as well as result in improved monetization for fraudulent streamers at the cost of honest ones. In this paper, we tackle the novel problem of automating detection of chatbots on livestreaming platforms. To this end, we first formalize the livestreaming chatbot detection problem and characterize differences between botted and legitimate chat behavior. Next, we discuss a proposed strategy of collecting a labeled, synthetic chatter dataset (typically unavailable) from such platforms, and enables evaluation of proposed detection approaches against chatbot behaviors with varying signatures. We then propose SHERLOCK, which posits a two-stage approach of detecting chatbotted streams, and subsequently detecting the constituent chatbots. |
Abstract
Civil wars are as frequent and debilitating now as ever. More often than not, their resolution consists of the negotiation of a peace accord that involves a number of provisions. Although previous work in political science indicates an underlying interdependence between provision implementation sequences, it is unclear how the structure and dynamics of this interdependence relate to the successful implementation of peace accords. To fill this gap, we systematically study peace process implementation activity from 34 peace accords containing 51 provisions negotiated between 1989 and 2015. We begin by constructing a bipartite network between peace accords and their provisions' implementation and explore statistical properties of the structural underpinnings of peace processes. Then, we examine motifs (i.e., significantly frequent patterns) in provision implementation activity and uncover higher order correlations between provisions. Finally, we identify provision implementation sequences (i.e., meta-groups) that are most strongly associated with successful peace processes. Our empirical findings provide new insights for the implementation of peace accords by revealing temporal sequences of peace process implementation that help build confidence, enhance security, and ultimately prevent negative cascading effects in different stages of the peacebuilding process. |
Abstract
With the advent of child-centric content-sharing platforms, such as YouTube Kids, thousands of children, from all age groups are consuming gigabytes of content on a daily basis. With PBS Kids, Disney Jr. and countless others joining in the fray, this consumption of video data stands to grow further in quantity and diversity. However, it has been observed increasingly that content unsuitable for children often slips through the cracks and lands on such platforms. To investigate this phenomenon in more detail, we collect a first of its kind dataset of inappropriate videos hosted on such children-focused apps and platforms. Alarmingly, our study finds that there is a noticeable percentage of such videos currently being watched by kids with some inappropriate videos having millions of views already. To address this problem, we develop a deep learning architecture that can flag such videos and report them. Our results show that the proposed system can be successfully applied to various types of animations, cartoons and CGI videos to detect any inappropriate content within them. |
Abstract
Human service organizations (HSOs) operate in an environment considered to be prohibitive of collaboration. To understand how HSOs come together to address the grand challenges associated with meeting human needs, we attempted to automatically construct the network of HSOs based on the information publicly available through each organization’s website-the medium that people use to find relevant information to access services. Our analysis of the the complex system of relationships among HSOs in Albany, New York suggests that the network of HSOs in this area exhibits a multipolar structure with few super connectors, and strong relations between organizations that serve similar functions. We quantitatively evaluate the quality of the constructed HSOs’ network from Web data based on structured, in-person interviews we conducted with HSOs. |
Abstract
Studies in computational social science often require collecting data about users via a search engine interface: a list of keywords is provided as a query to the interface and documents matching this query are returned. The validity of a study will hence critically depend on the representativeness of the data returned by the search engine. In this paper, we develop a multi-objective approach to build queries yielding documents that are both relevant to the study and representative of the larger population of documents. We then specify measures to evaluate the relevance and the representativeness of documents retrieved by a query system. Using these measures, we experiment on three real-world datasets and show that our method outperforms baselines commonly used to solve this data collection problem. |
Abstract
The relationship between local governments and the general public is being redefined by the increasing use of online platforms that enable participatory reporting of non-emergency urban issues, such as potholes and illegal graffiti by concerned citizens to their local authorities. In this work, we study, for the first time, participatory reporting data together with neighborhood-level demographics, socioeconomic indicators, and pedestrian friendliness and transit and bike scores, across multiple neighborhoods in the Capital District of the New York State. Our data-driven approach offers a large-scale, low-cost alternative to traditional survey methods, and provides insights on citizen participation and satisfaction, and public value creation on such platforms. Our findings can be used to guide government service departments to work more closely with each neighborhood to improve the offline and online communication channels through which citizens can report urban issues. |
Abstract
Human trafficking is a global problem which impacts a countless number of individuals every year. In this project, we demonstrate how machine learning techniques and qualitative reports can be used to generate new valuable quantitative information on human trafficking. Our approach generates original data, which we release publicly, on the directed trafficking relationship between countries that can be used to reconstruct the global transnational human trafficking network. Using this new data and statistical network analysis, we identify the most influential countries in the network and analyze how different factors and network structures influence transnational trafficking. Most importantly, our methods and data can be employed by policymakers, non-governmental organizations, and researchers to help combat the problem of human trafficking. |
Abstract
Characterizing dynamic interactions is currently an important issue when analyzing complex social networks. In this paper, we reinforce the importance of social concepts as the strategic positioning of an actor in a social structure, thus bringing new insights to the analysis of complex networks. Specifically, we propose a new method to characterize relationships based on temporal node-attributes that captures how knowledge is transferred across the network. As a result, we unveil the differences of social relationships in different academic social networks and Q&A communities. We also validate our social definitions in terms of the importance of the edges as assessed by the betweenness centrality metric and compare our results with those of two existing methods. Finally, we apply our method to a ranking task in order to measure the academic importance of researchers. |
Abstract
Network embedding methodologies, which learn a distributed vector representation for each vertex in a network, have shown to achieve superior performance in many real-world applications, such as node classification, link prediction, and community detection. However, the existing methods for network embedding are unable to generate representation vectors for unseen vertices; besides, these methods only utilize topological information from the network ignoring a rich set of nodal attributes, which is abundant in all real-life networks. In this paper, we present a novel network embedding approach called Neural-Brane, which overcomes both of the above limitations. For a given network, Neural-Brane extracts latent feature representation of its vertices using a designed neural network model that unifies network topological information and nodal attributes. Additionally, Neural-Brane is an inductive embedding approach, which enables generating embedding vectors for unseen future vertices of the attributed network. We evaluate the quality of vertex embedding produced by Neural-Brane by solving the node classification task on four real-world graph datasets. Experimental results demonstrate the superiority of Neural-Brane over the state-of-the-art existing methods. |
Abstract
Curiosity is a natural trait of human behavior. When we take into account the time we spend consuming content online, it is expected that at least a fraction of that time was driven by curious behavior. Aiming at understanding how curiosity drives online information consumption, we here propose a model that captures user curiosity relying on several stimulus metrics. Our model relies on the well-established Wundt's curve from psychology and is based on metrics capturing Novelty, Complexity and Uncertainty as key stimuli driving one's curiosity. As a case study, we apply our model on a dataset of online music consumption from LastFM. We found that there are four main types of user behaviors in terms of how the curiosity stimulus metrics drive the user accesses to online music. These are characterized based on the diversity in the songs, artists and musical genres accessed. |
Abstract
There is growing concern about the use of social platforms to push political narratives during elections. One very recent case is Brazil's, where WhatsApp is now widely perceived as a key enabler of the far-right's rise to power. In this paper, we perform a large-scale analysis of partisan WhatsApp groups to shed light on how both right-wingers and left-wingers used the platform in the 2018 Brazilian presidential election. Across its two rounds, we collected +2.8M messages from +45k users in 232 public groups (175 right-wing vs. 57 left-wing). After describing how we obtained a sample that is many times larger than previous works, we contrast right-wingers and left-wingers on their social network metrics, regional distribution of users, content-sharing habits, and most characteristic news sources. |
Abstract
Timely and high-resolution estimates of the home locations of a sufficiently large subset of the population are critical for applications such as disaster response and public health. However, conventional data sources, such as census and surveys, have a substantial time lag and cannot capture seasonal trends. Recently, social media data has been exploited to address this problem by leveraging its large user-base and real-time nature. However, inherent sparsity and noise, along with large estimation uncertainty in home locations, have limited their effectiveness. In this paper, we develop a deep-learning solution that uses a two-phase dynamic structure to deal with sparse and noisy social media data. We obtained over 90% accuracy for large subsets on a commonly used dataset. Systematic comparisons show that our method gives the highest accuracy both for the entire sample and for subsets. |
Abstract
Humanitarian disasters have been on the rise in recent years due to the effects of climate change and socio-political situations such as the refugee crisis. Technology can be used to best mobilize resources such as food and water in the event of a natural disaster, by semi-automatically flagging tweets and short messages as indicating an urgent need. The problem is challenging not just because of the sparseness of data in the immediate aftermath of a disaster, but because of the varying characteristics of disasters in developing countries (making it difficult to train just one system) and the noise and quirks in social media. In this paper, we present a robust, low-supervision social media urgency system that adapts to arbitrary crises by leveraging both labeled and unlabeled data in an ensemble setting. The system is also able to adapt to new crises where an unlabeled background corpus may not be available yet by utilizing a simple and effective transfer learning methodology. Experimentally, our transfer learning and low-supervision approaches are found to outperform viable baselines with high significance on myriad disaster datasets. In this short paper, we provide details on the low-supervision approach for urgency detection on Twitter feeds. |
Abstract
This paper presents techniques to detect “offline” activities of a person when she is tweeting in order to create a dynamic profile of the user, for uses such as better targeting of advertisements. To this end, we propose a hybrid LSTM model for rich contextual learning, along with studies on the effects of applying and combining multiple LSTM based methods with different contextual features. The hybrid model outperforms a set of baselines as well as state-of-the-art methods. |
Abstract
The United Nations, in their annual World Drug Report in 2018, reported that the production of Opium, Cocaine, Cannabis, etc. all observed record highs, which indicates the ever-growing demand of these drugs. Social networks of individuals associated with Drug Trafficking Organizations (DTO) have been created and studied by various research groups to capture key individuals, in order to disrupt operations of a DTO. With drug offenses increasing globally, the list of suspect individuals has also been growing over the past decade. As it takes significant amount of technical and human resources to monitor a suspect, an increasing list entails higher resource requirements on the part of law enforcement agencies. Monitoring all the suspects soon becomes an impossible task. In this paper, we present a novel methodology which ensures reduction in resources on the part of law enforcement authorities, without compromising the ability to uniquely identify a suspect, when they become ``active'' in drug related activities. Our approach utilizes the mathematical notion of Identifying Codes, which generates unique identification for all the nodes in a network. We find that just monitoring important individuals in the network leads to a wastage in resources and show how our approach overcomes this shortcoming. Finally, we evaluate the efficacy of our approach on real world datasets. |
Abstract
Recently, many researchers in recommender systems have realized that encoding user-item interactions based on deep neural networks (DNNs) promotes collaborative-filtering (CF)'s performance. Nonetheless, those DNN-based models' performance is still limited when observed user-item interactions are very less because the training samples distilled from these interactions are critical for deep learning models. To address this problem, we resort to plenty features distilled from knowledge graphs (KGs), to profile users and items precisely and sufficiently rather than observed user-item interactions. In this paper, we propose a knowledge embedding based recommendation framework to alleviate the problem of sparse user-item interactions in recommendation. In our framework, each user and each item are both represented by the combination of an item embedding} and a tag embedding at first. Specifically, item embeddings are learned by Metapath2Vec which is a graph embedding model qualified to embedding heterogeneous information networks. Tag embeddings are learned by a Skip-gram model similar to word embedding. We regarded these embeddings as knowledge embeddings because they both indicate knowledge about the latent relationships of movie-movie and user-movie. At last, a target user's representation and a candidate movie's representation are both fed into a multi-layer perceptron to output the probability that the user likes the item. The probability can be further used to achieve top-n recommendation. The extensive experiments on a movie recommendation dataset demonstrate our framework's superiority over some state-of-the-art recommendation models, especially in the scenario of sparse user-movie interactions. |
Abstract
Social media sites such as Twitter have faced significant pressure to mitigate spam and abuse on their platform in the aftermath of congressional investigations into Russian interference in the 2016 U.S. presidential election. Twitter publicly acknowledged the exploitation of their platform and has since conducted aggressive cleanups to suspend the involved accounts. To shed light on Twitter's countermeasures, we conduct a postmortem analysis of about one million Twitter accounts who engaged in the 2016 U.S. presidential election but were later suspended by Twitter. To systematically analyze coordinated activities of these suspended accounts, we group them into communities based on their retweet/mention network and analyze different characteristics such as popular tweeters, domains, and hashtags.The results show that suspended and regular communities exhibit significant differences in terms of popular tweeter and hashtags. Our qualitative analysis also shows that suspended communities are heterogeneous in terms of their characteristics. We further find that accounts suspended by Twitter's new countermeasures are tightly connected to the original suspended communities. |
Abstract
Opioid addiction is a severe public health threat in the U.S, causing massive deaths and many social problems. Accurate relapse prediction is of practical importance for recovering patients since relapse prediction promotes timely relapse preventions that help patients stay clean. In this paper, we introduce a Generative Adversarial Networks (GAN) model to predict the addiction relapses based on sentiment images and social influences. Experimental results on real social media data from Reddit.com demonstrate that the GAN model delivers a better performance than comparable alternative techniques. The sentiment images generated by the model show that relapse is closely connected with two emotions `joy' and `negative'. This work is one of the first attempts to predict relapses using massive social media (Reddit.com) data and generative adversarial nets. The proposed method, combined with knowledge of social media mining, has the potential to revolutionize the practice of opioid addiction prevention and treatment. |
Abstract
High-quality human annotations are necessary to create effective machine learning systems for social media. Low-quality human annotations indirectly contribute to the creation of inaccurate or biased learning systems. We show that human annotation quality is dependent on the ordering of instances shown to annotators (referred as 'annotation schedule'), and can be improved by local changes in the instance ordering provided to the annotators, yielding a more accurate annotation of the data stream for efficient real-time social media analytics. We propose an error-mitigating active learning algorithm that is robust with respect to some cases of human errors when deciding an annotation schedule. We validate the human error model and evaluate the proposed algorithm against strong baselines by experimenting on classification tasks of relevant social media posts during crises. According to these experiments, considering the order in which data instances are presented to human annotators leads to both an increase in accuracy for machine learning and awareness toward some potential biases in human learning that may affect the automated classifier. |
Abstract
The current paper addresses the online feature selection problem in multi-label classification framework where multi-labelled data with features arriving in an online fashion is considered as input. The proposed approach works in two phases, in the first phase, the best subset of features is selected from the initial available set of features using a multi-objective based feature selection technique. In the second phase of the proposed feature selection technique, a newly arrived feature is accepted or rejected based on redundancy with respect to the already selected set of features and relevancy of the arrived feature with respect to the class labels. In order to show the efficacy of the proposed algorithm, it is tested on 7 various types of multi-label datasets of different domains such as text, biology, and audio. The obtained results outperform the results obtained by state-of-the-art approaches in majority of the cases. |
Abstract
The purpose of this project was the development of a web application, that allow users to find the Emergency Department that provides the best timely and effective care in a defined area. The inputs that the user must provide are a location ZIP code and a maximum radius in miles. The output is a list of ED in the defined area and each one is associated to a “timely and effective care index” that can be used as an indicator of quality care and estimate travel time to the corresponding ED. |
Abstract
Software for electronic prescriptions offers healthcare providers numerous benefits including increased patient safety, prescribing effectiveness, workflow efficiencies, and financial savings. Particularly due to recent legislation, an increasing number of organizations are adopting this software. Yet, there are many concerns and risks associated with e-prescription technologies. Despite increased prescribing effectiveness, there is still the potential for human error. Many software concerns including integration and the implementation of specific transactions like CancelRx still need to be addressed. Additionally, some patients have negative perceptions of e-Rx that must be overcome. To this effect, this research conducts risk assessment of two real world cases (the CVS Pharmacy (retail setting) and the Ascension Seton (inpatient setting)) using the NIST risk model. The study concludes that the top three challenges in the pharmacy domain are comprised of a lack in the implementation of essential software functionalities, the loss and theft of portable devices, and errors due to email phishing attacks. |
Abstract
Profile images play an important role in partner selection in a matrimony or dating site. The hypothesis of this paper is that perceived beauty of a profile image is a subjective opinion based on who is viewing the image. We validate this hypothesis by showing that this subjective bias for attractiveness can be learnt from the sender-receiver image pairs. We train a Bi-channel CNN based deep architecture that incorporates the visual features of both users and learns the attractiveness of sender as perceived by the receiver. This network was trained and tested on 3.5 million image pairs and achieved an accuracy of 69% with images alone, thus proving that rather than the eye, beauty lies in the face of the beholder. When this network was used in conjunction with other profile features such as age, city and caste, it further improved the accuracy of the system by a 5% relative number. |
Abstract
As the world is flooded with deluge of data, the demand for mining data to gain insights is increasing. One effective technique to deal with the problem is to model the data as networks (graphs) and then apply graph mining techniques to uncover useful patterns. Several graph mining techniques have been studied in the literature, and graphlet-based analysis is gaining popularity due to its power in exposing hidden structure and interaction within the networks.The concept of graphlets for basic (undirected) networks was introduced around 2004 by Pržulj, et. al.. Subsequently, graphlet based network analysis gained attraction when Pržulj added the concept of graphlet orbits and applied to biological networks. A decade later, Sarajlić, et. al. introduced graphlets and graphlet orbits for directed networks, illustrating its application to fields beyond biology such as world trade networks, brain networks, communication networks, etc.. Hence, directed graphlets are found to be more powerful in exposing hidden structures of the network than undirected graphlets of same size, due to added information on theedges. Taking this approach further, more recently, graphlets and orbits for signed networks have been introduced by Dale. This paper presents a simple algorithm to enumerate signed graphlet sand orbits. It then demonstrates an application of signed graphlet sand orbits to a metabolic network. |
Abstract
Growing population, urbanization, increased sophistication in life, global warming, and climate change are some of the factors that can worsen the scarcity and quality of water in coming decades. Therefore, a sustainable water governance is essential across the world to face this challenge. In this paper, we study two water governance networks constructed from ground-truth using a new network analysis technique called graphlet analysis. Graphlet is gaining popularity in network analysis due to its power in exposing network structure and functions. To the best of our knowledge, this is the first work to apply graphlet analysis to study water governance network. |
Abstract
Indonesia’s cities are growing faster than other Asian countries at a rate of 4.1% per year. Traditional java house in the Indonesia urban area have been undergoing dramatic physical changes since the beginning of the 20th century. From 1920 to 2018, the cities in Central Java Province have transformed becomes three types of housing: vernacular house, contemporary house, and modern house. This paper will focus on the process of house plan transformation and its impact of the user behavior to adapt with the surrounding environment. Three different Javanese houses in Java, Indonesia were analyzed by applying SNA (Social Network Analysis). The finding reveals that there is a socio-cultural factor from Javanese culture which still influence all architectural types of the Java housing. |
Abstract
Twitter is used by many users, and posted tweets include user's straightforward real intention. Therefore, we can obtain various opinions on items and events by collecting tweets. However, since the tweets are posted one after another over time and are represented by characters, it is difficult to grasp the overall picture of opinions on items. Therefore, by visualizing opinions on items, it is easier to grasp the whole picture more clearly. In this study, we collect tweets including item names and construct a graph connecting similar tweets. Then, from the connected component, we attempt to extract expressions related to user demands. Also, when constructing a similar tweet graph, it is necessary to appropriately set the similarity threshold. If the threshold is too low, unrelated tweets will be connected and a connected component will consist of different demand expressions. On the other hand, if the threshold value is too high, the demand expression of the same meaning will be divided as other connected components due to some notation fluctuation. In this paper, by focusing on the occurrence probability of the demand expression appearing in each connected component and defining the purity and the cohesiveness, we propose a method of setting the apropriate similarity threshold. In our experimental evaluations using a lot of tweets for two games ``Mario tennis ace'' and ``Dairanto smash brothers SPECIAL'', we confirmed that opinions such as ``interesting'' or ``difficult'' can be extracted from similar tweets graph constructed by the appropriate similarity threshold value. We also confirmed that we can overlook the demands related to items. |
Abstract
Despite the apparent benefits of modern social coding paradigm such as Stack Overflow, its potential security risks have been largely overlooked (e.g., insecure codes could be easily embedded and distributed). To address this imminent issue, in this paper, we bring a significant insight to leverage both social coding properties and code content for automatic detection of insecure code snippets in Stack Overflow. To determine if the given code snippets are insecure, we not only analyze the code content, but also utilize various kinds of relations among users, badges, questions, answers and code snippets in Stack Overflow. To model the rich semantic relationships, we first introduce a structured heterogeneous information network (HIN) for representation and then use meta-path based approach to incorporate higher-level semantics to build up relatedness over code snippets. Later, we propose a novel hierarchical attention-based sequence learning model named CodeHin2Vec to seamlessly integrate node (i.e., code snippet) content with HIN-based relations for representation learning. After that, a classifier is built for insecure code snippet detection. Integrating our proposed method, an intelligent system named iTrustSO is accordingly developed to address the code security issues in modern software coding platforms. Comprehensive experiments on the data collections from Stack Overflow are conducted to validate the effectiveness of our developed system iTrustSO by comparisons with alternative methods. |
Abstract
Many models have been proposed to address the learning tasks. Most deep learning models are influenced by presentation order, complex shapes, architecture configuration and learning instability. This paper provides comparative study to deep learning for pattern recognition. Two types of supervised learning techniques were tested which are used for comparison purpose. They correspond to Batch Gradient Descent and Stochastic Gradient Descent. In order to obtain an accurate results with both methods, we used a re-sampling method based on k-fold cross-validation. Experimental Results show that Stochastic Gradient Descent gives good results in comparison to Batch Gradient Descent. The recognition accuracies are seen to improve significantly when Stochastic Gradient Descent is applied for intrusion detection. |
Abstract
Brand mentioning is a type of word-of-mouth advertising method where a brand name is disclosed by social media users in posts. Recently, brand mentioning by influencers has raised great attention because of the strong viral effects on the huge fan base of influencers. In this paper, we study the brand mentioning practice of influencers. More specifically, we analyze a brand mentioning social network built on 18,523 Instagram influencers and 804,397 brand mentioning posts. As a result, we found four inspiring findings: (i) most influencers mention only a few brands in their posts; (ii) popular influencers tend to mention only popular brands while micro-influencers do not have a preference on brand popularity; (iii) audience have highly similar reactions to sponsored and non-sponsored posts; and (iv) compared to non-sponsored posts, sponsored brand mentioning posts favor fewer usertags and more hashtags with longer captions to exclusively promote the specific products. Moreover, we propose a neural network-based model to classify the sponsorship of posts utilizing network embedding and social media features. The experimental results show that our model achieves 80% accuracy and significantly outperforms baseline methods. |
Abstract
Graph has been widely used for modeling complex relationship datasets in different application fields. Social networks based recommendation system have obtained satisfactory results in Business Intelligence(BI). However, current personalized recommendation methods based on graph structure generally lack interactivity and seldom consider efficient data management. To address these problems, Graph On-Line Analytical Mining (GraphOLAM) is a promising method, which combines OLAP technology with social networks. We first propose an efficient recommendation framework based on GraphOLAM data cube technology for the recommendation in the insurance service. Based on this framework, a new algorithm framework named RU-GOLAM for insurance is proposed, which combines GraphOLAM dimensional aggregation operation and specific recommendation methods. A series of graphs can be generated by GraphOLAM dimensional aggregation operations, which reflect the relationships of nodes under the constraints of different hierarchical dimensions. Node similarities are calculated to generate the Top-N sequential recommendation based on all of these graphs, which can achieve the balance between the topology of the original graph and high-dimensional information of the nodes. Experiments show that our approach outperforms other baseline algorithms on an insurance service dataset. |
Abstract
Increasingly, users are adopting community question-and-answer (Q&A) sites to exchange information. Detecting and eliminating toxic and divisive content in these Q&A sites are paramount tasks to ensure a safe and constructive environment for the users. Insincere question, which is founded upon false premises, is one type of toxic content in Q&A sites. In this paper, we proposed a novel deep learning framework enhanced pre-trained word embeddings with topical information for insincere question classification. We evaluated our proposed framework on a large real-world dataset from Quora Q&A site and showed that the topically enhanced word embedding is able to achieve better results in toxic content classification. An empirical study was also conducted to analyze the topics of the insincere questions on Quora, and we found that topics on ``religion'', ``gender'' and ``politics'' has a higher proportion of insincere questions. |
Abstract
Before purchasing a product online, customers often read the reviews posted by people who also brought the product. Customer reviews provide opinions and relevant information such as comparisons among similar products or usage experiences about the product. Previous studies addressed on the prediction of the helpfulness of customer reviews to predict the helpfulness voting results. However, the voting result of an online review is not a constant over time; predicting the voting result based on the analysis of text is not practical. Therefore, we collect the voting results of the same online customer review over time, and observe whether the number of votes will increase or not. We construct a dataset with 10,195 online reviews in six different product categories (Computer Hardware, Drink, Makeup, Pen, Shoes, and Toys) from Amazon.cn with the voting result on the helpfulness of the reviews, and monitor the helpfulness voting in six weeks. Experiments are conducted on the dataset to predict whether the helpfulness voting result of each review will increase or not. We propose a classification system that can classify the online reviews into more helpful ones, based on a set of syntactic features and neural features trained via CNN. The results show that integrating the syntactic features with the neural features can get better result. |
Abstract
Hate speech in online environments is a severe problem for many reasons. The space for reasoning and argumentation shrinks, individuals refrain from expressing their opinions, and polarization of views increases. Hate speech contributes to a climate where threats and even violence are increasingly regarded as acceptable. The amount and the intensity of hate expressions vary greatly between different digital environments. To analyze the level of hate in a given online environment, to study the development over time and to compare the level of hate with online environments we have developed the notion of a hate level. The hate level encapsulates the level of hate in a given digital environment. We present methods to automatically determine the hate level, utilizing transfer learning on pre-trained language models with annotated data to create automated hate detectors. We evaluate our approaches on a set of websites and discussion forums. |
Abstract
Social network advertising is currently one of the most effective advertising types available to promote a product or a brand. The problem discussed in this paper concerns the possibility to ensure that advertising reaches really interested users, and also to prove this. At this aim, we propose the use of Blockchain to store users' interest and to obtain an assertion that a user is interested in a product before the advertising is shown. The proposal has been implemented by a Solidity smart contract in Ethereum and has been shown to be effective and cheap. |
Abstract
As public entities like brands and politicians increasingly rely on social media to engage their constituents, analyzing who follows them can reveal information about how they are perceived. Whereas most prior work considers following networks as unweighted directed graphs, in this paper we use a tie strength model to place weights on follow links to estimate the strength of relationship between users. We use conversational signals (retweets, mentions) as a proxy class label for a binary classification problem, using social and linguistic features to estimate tie strength. We then apply this approach to a case study estimating how brands are perceived with respect to certain issues (e.g., how environmentally friendly is Patagonia perceived to be?). We compute weighted follower overlap scores to measure the similarity between brands and exemplar accounts (e.g., environmental non-profits), finding that the tie strength scores can provide more nuanced estimates of consumer perception. |
Abstract
Cyber adversaries employ a variety of malware and exploits to attack computer systems, usually via sequential or “chained” attacks, that take advantage of vulnerability dependencies. In this paper, we introduce a formalism to model such attacks. We show that the determination of the set of capabilities gained by an attacker, which also translates to extent to which the system is compromised, corresponds with the convergence of a simple fixed-point operator. We then address the problem of determining the optimal/most-dangerous strategy for a cyber-adversary with respect to this model and find it to be an NP-Complete problem. To address this complexity we utilize an A*-based approach with an admissible heuristic, that incorporates the result of the fixed-point operator and uses memoization for greater efficiency. We provide an implementation and show through a suite of experiments, using both simulated and actual vulnerability data, that this method performs well in practice for identifying adversarial courses of action in this domain. On average, we found that our techniques decrease runtime by 82%. |
Abstract
The spread of information on Facebook and Twitter is much more efficient than on traditional social media platforms. For word-of-mouth (WOM) marketing, social media have become a rich information source for companies or scholars to design models to examine this repository and mine useful insights for marketing strategies. However, social media language is relatively short and contains special words and symbols. Most natural language processing (NLP) methods focus on processing formal sentences and are not well-suited to such short messages. In this study we propose a novel sentiment analysis framework based on deep learning models to extract sentiment from social media. We collect data from which we compile a dataset. After processing these special terms, we seek to establish a semantic dataset for further research. The extracted information will be useful for many future applications. The experimental data have been obtained by crawling several social media platforms sentiment analysis, deep learning, social media |
Abstract
We propose a general framework for the recommendation of possible customers (users) to advertisers (e.g., brands) based on the comparison between On-Line Social Network profiles. In particular, we associate suitable categories and sub-categories to both user and brand profiles in the considered On-line Social Network. When categories involve posts and comments, the comparison is based on word embedding, and this allows to take into account the similarity between the topics of particular interest for a brand and the user preferences. Furthermore, user personal information, such as age, job or genre, are used for targeting specific advertising campaigns. Results on real Facebook dataset show that the proposed approach is successful in identifying the most suitable set of users to be used as target for a given advertisement campaign. |
Abstract
Predicting booking probability and value at the traveler level plays a central role in computational advertising for massive two-sided vacation rental marketplaces. These marketplaces host millions of travelers with long shopping cycles, spending a lot of time in the discovery phase. The footprint of the travelers in their discovery is a useful data source to help these marketplaces to predict shopping probability and value. However, there is no one-size-fits-all solution for this purpose. In this paper, we propose a hybrid model that infuses deep and shallow neural network embeddings into a gradient boosting tree model. This approach allows the latent preferences of millions of travelers to be automatically learned from sparse session logs. In addition, we present the architecture that we deployed into our production system. We find that there is a pragmatic sweet spot between expensive complex deep neural networks and simple shallow neural networks that can increase the prediction performance of a model by seven percent, based on offline analysis. |
Abstract
In recent years, there are very frequent reports of disasters attributed to the climate change and there are several reports that these extreme phenomena will further affect people not only as weather disasters but also indirectly with the shortage of natural resources such as water or food due to the climate change. Towards this direction, there is an on-going research that studies weather phenomena by collecting data not only in the surface of the globe but also at the different levels of the atmosphere. Having such a large volume of data, traditional numerical weather prediction models may not be able to assimilate those data and extract knowledge useful for the prediction of extreme phenomena. Thus, analysis of weather data has been transformed into a big data analytics problem which may enable weather scientists to better understand the interrelations of the weather variables and use the knowledge discovered to improve their prediction models. In this context, the current paper proposes a big data analytics methodology that is able to detect all common patterns between different weather variables in neighboring or distant points in a specific time window revealing useful associations between weather variables which is not possible to detect otherwise with the traditional numerical methods. The proposed methodology is based on a data structure that is able to store the magnitude of the weather data in different dimensions and a pattern detection algorithm which is able to detect all common patterns. The experimental results using weather data from the National Oceanic and Atmospheric Administration (NOAA) revealed interesting otherwise unknown patterns in two weather variables for two specific locations that were studied. |
Abstract
Exact string matching has been a fundamental problem in computer science for decades because of many practical applications. Some are related to common procedures, such as searching in files and text editors, or, more recently, to more advanced problems such as pattern detection in Artificial Intelligence and Bioinformatics. Tens of algorithms and methodologies have been developed for pattern matching and several programming languages, packages, applications and online systems exist that can perform exact string matching in biological sequences. These techniques, however, are limited to searching for specific and predefined strings in a sequence. In this paper a novel methodology (called Ex2SM) is presented, which is a pipeline of execution of advanced data structures and algorithms, explicitly designed for text mining, that can detect every possible repeated string in multivariate biological sequences. In contrast to known algorithms in literature, the methodology presented here is string agnostic, i.e., it does not require an input string to search for it, rather it can detect every string that exists at least twice, regardless of its attributes such as length, frequency, alphabet, overlapping etc. The complexity of the problem solved and the potential of the proposed methodology is demonstrated with the experimental analysis performed on the entire human genome. More specifically, all repeated strings with a length of up to 50 characters have been detected, an achievement which is practically impossible using other algorithms due to the exponential number of possible permutations of such long strings. |
Abstract
A staggering 450,000 people died due to drug consumption in 2015, out of which, a third of the deaths were a direct result of drug overdosing. Illicit manufacturing of Cocaine, Heroin, Cannabis, etc., by Drug Trafficking Organizations (DTOs), all peaked recently, which is a major indication of their worldwide demand. With drug offenses increasing globally, the list of suspect individuals, associated with drug trafficking organizations, has also been growing over the past few decades. As it takes significant amount of technical and human resources to monitor a suspect, an increasing list entails greater resource requirements on the part of law enforcement agencies. Soon, monitoring all the suspects on the list becomes an impossible task. In this paper, we present a novel methodology called Augmented Identifying Codes (AIC), an extension of the mathematical notion of Identifying Codes. We show that our method requires significantly lesser resources, on the part of the law enforcement agencies, when compared to strategies adopting standard network centrality measures, for monitoring of individuals associated with drug trafficking organizations. Finally, we evaluate the efficacy of our approach on real world datasets. |
Abstract
We present a new machine learning and text information extraction approach to detection of cyber threat events in Twitter that are novel (previously non-extant) and developing (marked by significance with respect to similarity with a previously detected event). While some existing approaches to event detection measure novelty and trendiness, typically as independent criteria and occasionally as a holistic measure, this work focuses on detecting both novel and developing events using an unsupervised machine learning approach. Furthermore, our proposed approach enables the ranking of cyber threat events based on an importance score by extracting the tweet terms that are characterized as named entities, keywords, or both. We also impute influence to users in order to assign a weighted score to noun phrases in proportion to user influence and the corresponding event scores for named entities and keywords. To evaluate the performance of our proposed approach, we measure the efficiency and detection error rate for events over a specified time interval, relative to human annotator ground truth. |
Abstract
Great success has been witnessed in the last few years for approaches combining Machine Learning (ML) with Knowledge Representation and Reasoning (KRR) to predict cybersecurity events. These approaches benefited from the high accuracy of ML, and the inherent transparency of KRR. In this paper, we develop a multi-layered, hybrid system that benefits from both approaches. When the developed system is fused with an existing statistical forecasting model, it demonstrates an average recall improvement of more than 14% while maintaining precision. |
Abstract
Evaluating a community detection method involves measuring the extent to which the resulted solution, i.e clustering, is similar to an optimal solution, a ground truth. Different normalised similarity indices have been proposed in the literature to quantify the extent to which two clusterings are similar where 1 refers to a perfect agreement between them (i.e the two clusterings are identical) and 0 refers to a perfect disagreement. While interpreting the similarity score 1 seems to be intuitive, it does not seem to be so when the similarity score is otherwise suggesting a level of disagreement between the compared clusterings. That is because there is no universal definition of dissimilarity when it comes to comparing two clusterings. In this paper, we address this issue by first providing a taxonomy of similarity indices commonly used for evaluating community detection solutions. We then elaborate on the meaning of clusterings dissimilarity and the types of possible dissimilarities that can exist among two clusterings in the context of community detection. We perform an extensive evaluation to study the behaviour of different similarity indices as a function of the dissimilarity type with both disjoint and non-disjoint clusterings.We finally provide practitioners with some insights on which similarity indices to use for the task at hand and how to interpret their values. |
Abstract
Facial attribute analysis is an important step in many biometric algorithms, including face recognition based human authentication. Detecting the state of face attributes is becoming even more popular in mobile based applications, where a biometric authentication application is used to quickly and accurately verify the claimed identity of the owner’s device, and also, keep the device secure if an intruder attempts to gain unauthorized access to it. While there is a large number of facial attributes that can be automatically detected to support a face recognition system, in this paper we focus on detecting three specific ones that are useful during both an enrollment and authentication process: (1) determining whether the eyes of a subject are open or closed, (2) determining whether a subject is wearing glasses or not, and, finally, (3) detecting whether a subject’s facial pose is either frontal or non-frontal. These attributes are associated with face image quality control, which is a very useful component of modern FR systems under the following context: during live authentication, by limiting low quality face image data, we can enhance the face-based authentication accuracy, while at the same time improve user satisfaction via improved system efficiency (i.e. less false match attempts during authentication). Thus, to automatically and efficiently detect all of the aforementioned facial attributes, we developed both conventional and deep learning based models. These models are trained and tested on diverse and challenging face image datasets, using data captured from traditional cameras and mobile devices, when operating at multiple standoff distances, in either indoor or outdoor conditions. Our proposed attribute-specific detection models are robust, yielding up to 100% accuracy (in terms of F1 score) depending on the attribute tested, as well as the model and dataset(s) used for training and testing. |
Abstract
In this work, we propose a competitive solution for face detection when operating in the thermal and visible bands. Our aim is to train, fine tune, optimize and validate pre-existing object detection models using thermal and visible data separately, in order to efficiently detect face images independent of the spectrum of the original face images used as an input (visible or thermal). Thus, we perform an empirical study to determine the most efficient band specific DeepFace detection model in terms of a set of performance metrics including F1 scores. The original object detection models that are selected for our study are the Faster R-CNN (Region based Convolutional Neural Network), SSD (Single-shot Multi-Box Detector) and R-FCN (Region-based Fully Convolutional Network). Also, the dual-band dataset used for this work is composed of two challenging MWIR and visible band face datasets, where the faces were captured under variable conditions, i.e. indoors, outdoors, different standoff distances (5 and 10 meters) and poses. Experimental results show that the proposed detection model yields the high detection performance results, independent of the band we operate and the face dataset scenario tested. Specifically, we show that a modified and tuned Faster R-CNN architecture, with ResNet 101 is the most promising model when compared to all the other models evaluated, tuned and tested. The proposed model yields an accuracy of 99.2% and 98.4% when tested on thermal and visible face data respectively. Finally, while the proposed model is relatively slower than its competitors, our study shows that the speed of this network can be increased by reducing the number of proposals in RPN (Region Proposal Network), and thus, the computational complexity challenge is significantly minimized by reducing the time taken for single detection by 66%. |
Abstract
The proliferation of Android applications has resulted in many malicious apps entering the market and causing significant damage. Robust techniques that determine if an app is malicious are greatly needed. We propose the use of network-based approaches to effectively separate malicious from benign apps, based on a small labeled dataset. The apps in our dataset come from the Google Play Store and have been scanned for malicious behavior using VirusTotal to produce a ground truth dataset with labels malicious or benign. The apps in the resulting dataset have been represented in the form of binary feature vectors (where the features represent permissions, intent actions, discriminative APIs, obfuscation signatures, and native code signatures). We have used these vectors to build a weighted network that captures the ``closeness'' between apps. We propagate labels from the labeled apps to unlabeled apps, and evaluate the effectiveness of the approaches studied using the F1-measure. We have conducted experiments to compare three variants of the label propagation approaches on datasets that consist of increasingly larger amounts of labeled data. |
Abstract
While social media enables users and organizations to obtain useful information about technology like software and security feature usage, it can also allow an adversary to exploit users by obtaining information from them or influencing them towards injurious decisions. Prior research indicates that security technology choices are subject to social influence and that these decisions are often influenced by the peer decisions and number of peers in a user’s network. In this study we investigated whether peer influence dictates users’ decisions by manipulating social signals from peers in an online, controlled experiment. Human participants recruited from Amazon Mechanical Turk played a multi-round game in which they selected a security technology from among six of differing utilities. We observe that at the end of the game, a strategy to expose users to high quantity of peer signals reflecting suboptimal choices, in the later stages of the game successfully influences users to deviate from the optimal security technology. This strategy influences almost 1.5 times the number of users with respect to the strategy where users receive constant low quantity of similar peer signals in all rounds of the game. |
Abstract
It is widely believed that the adoption behavior of a decision-maker in a social network is related to the number of signals it receives from its peers in the social network. It is unclear if these same principles hold when the ``pattern" by which they receive these signals vary and when potential decisions have different utilities. To investigate that, we manipulate social signal exposure in an online controlled experiment with human participants. Specifically, we change the number of signals and the pattern through which participants receive them over time. We analyze its effect through a controlled game where each participant makes a decision to select one option when presented with six choices with differing utilities, with one choice having the most utility. We avoided network effects by holding the neighborhood network of the users constant. Over multiple rounds of the game, we observe the following: (1) even in the presence of monetary risks and previously acquired knowledge of the six choices, decision-makers tend to deviate from the obvious optimal decision when their peers make similar choices, (2) when the quantity of social signals vary over time, the probability that a participant selects the decision similar to the one reflected by the social signals and therefore being responsive to social influence does not necessarily correlate proportionally to the absolute quantity of signals and (3) an early subjugation to higher quantity of peer social signals turned out to be a more effective strategy of social influence when aggregated over the rounds. |
Abstract
Online advertising benefits by recommender systems since the latter analyse reviews and rating of products, providing useful insight of the buyer perception of products and services. When traditional recommender system information is enriched with social network information, more successful recommendations are produced, since more users’ aspects are taken into consideration. However, social network information may be unavailable since some users may not have social network accounts or may not consent to their use for recommendations, while rating data may be unavailable due to the cold start phenomenon. In this paper, we propose an algorithm that combines limited collaborative filtering information, comprised only of users’ ratings on items, with limited social network information, comprised only of users’ social relations, in order to improve (1) prediction accuracy and (2) prediction coverage in collaborative filtering recommender systems, at the same time. The pro-posed algorithm considerably improves rating prediction accuracy and coverage, while it can be easily integrated in recommender systems. |
Abstract
Terrorism is a key risk for prospective visitors of tourist destinations. This work reports on the analysis of past terrorist at-tack data, focusing on tourist-related attacks and attack types in Mediterranean EU area and the development of algorithms to predict terrorist attack risk levels. Data on attacks in 10 countries have been analyzed to quantify the threat level of tourism-related terrorism based on the data from 2000 to 2017 and formulate predictions for subsequent periods. Results show that predictions on potential target types can be derived with adequate accuracy. Such results are useful for initiating, shifting and validating active terrorism surveillance based on predicted attack and target types per country from real past data. |
Abstract
On line social networks (e.g., Facebook, Twitter) allow users to tag their posts with geographical coordinates collected through the GPS interface of smart phones. The time- and geo-coordinates associated with a sequence of tweets manifest the spatial-temporal movements of people in real life. The paper presents an approach to recommend travel routes to social media users exploiting historic mobility data, social features of users and geographic characteristics of locations. Travel routes recommendation is formulated as a ranking problem aiming at minimg the top interesting locations and travel sequences among them, and exploit such information to recommend the most suitable travel routes to a target user. A ranking function that exploits users’ similarity in visiting locations and in travelling along mobility paths is used to predict places the user could like. The experimental results obtained by using a real-world dataset of tweets show that the proposed method is effective in recommending travel routes achieving remarkable precision and recall rates. |
Abstract
Detecting diseases at early stage can help to overcome and treat them accurately. Identifying the appropriate treatment depends on the method that is used in diagnosing the diseases. A Clinical Decision Support System (CDS) can greatly help in identifying diseases and methods of treatment. In this paper we propose a CDS framework that can integrate heterogeneous health data from different sources, such as laboratory test results, basic information of patients, and health records. Using the electronic health medical data so collected, innovative machine learning and deep learning approaches are employed to implement a set of services to recommend a list of diseases and thus assist physicians in diagnosing or treating their patients’ health issues more efficiently. |
Abstract
Complex networks have attracted a great deal of research interest in the last two decades since Watts&Strogatz, Barabási&Albert and Girvan&Newman published their highly-cited seminal papers on small-world networks, on scale-free networks and on the community structure of complex networks, respectively. These fundamental papers initiated a new era of research establishing an interdisciplinary field called network science. Due to the multidisciplinary nature of the field, a diverse but not divided network science community has emerged in the past 20 years. This paper honors the contributions of network science by exploring the evolution of this community as seen through the growing co-authorship network of network scientists (here the notion refers to a scholar with at least one paper citing at least one of the three aforementioned milestone papers). After investigating various characteristics of 29,528 network science papers, we construct the co-authorship network of 52,406 network scientists and we analyze its topology and dynamics. We shed light on the collaboration patterns of the last 20 years of network science by investigating numerous structural properties of the co-authorship network and by using enhanced data visualization techniques. We also identify the most central authors, the largest communities, investigate the spatiotemporal changes, and compare the properties of the network to scientometric indicators. |
Abstract
Data-driven analysis of large social networks has attracted a great deal of research interest. In this paper, we investigate 120 real social networks and their measurement-calibrated synthetic counterparts generated by four well-known network models. We investigate the structural properties of the networks revealing the correlation profiles of graph metrics across various social domains (friendship networks, communication networks, and collaboration networks). We find that the correlation patterns differ across domains. We identify a non-redundant set of metrics to describe social networks. We study which topological characteristics of real networks the models can or cannot capture. We find that the goodness-of-fit of the network models depends on the domains. Furthermore, while 2K and stochastic block models lack the capability of generating graphs with large diameter and high clustering coefficient at the same time, they can still be used to mimic social networks relatively efficiently. |
Abstract
With the development of Natural Language Processing, Automatic question-answering system such as Waston, Siri, Alexa, has become one of the most important NLP applications. Nowadays, enterprises try to build automatic custom service chatbots to save human resources and provide a 24-hour customer service. Evaluation of chatbots currently relied greatly on human annotation which cost a plenty of time. Thus, [32] has initiated a new Short Text Conversation subtask called Dialogue Quality (DQ) and Nugget Detection (ND) which aim to automatically evaluate dialogues generated by chatbots. In this paper, we solve the DQ and ND subtasks by deep neural network. We proposed two models for both DQ and ND subtasks which is constructed by hierarchical structure: embedding layer, utterance layer, context layer and memory layer, to hierarchical learn dialogue representation from word level, sentence level, context level to long range context level. Furthermore, we apply gating and attention mechanism at utterance layer and context layer to improve the performance. We also tried BERT to replace embedding layer and utterance layer as sentence representation. The result shows that BERT produced a better utterance representation than multi-stack CNN for both DQ and ND subtasks and outperform other models proposed by other researches. The evaluation measures are proposed by [26], that is, NMD, RSNOD for DQ and JSD, RNSS for ND, which is not traditional evaluation measures such as accuracy, precision, recall and f1-score. Thus, we have done a series of experiments by using traditional evaluation measures and analyze the performance and error. |
Abstract
Social network is a good resource to collect public opinions considering the diversity and variety in fashion, especially user generated content (UGC). Extracting the opinions from UGC can be the base of commercial policy, so how to extract the opinions correctly is an important problem. However, there are two problems: Are the opinions really talking about the target entities? Or the amount of opinions is enough for network volume analysis? In this study, we combine rule-based method and Semantic Role Labeling (SRL) to detect the opinion target (OTD) from UGC. In SRL task, we design a neural network model to extract the opinion words. Considering gradient vanish problem, we use highway connection to control the percentage of data. To improve the performance of SRL, we use additional features to help model learn the features between verb and other phrases. Finally, we design a rule to filter the result of SRL for OTD task. The experimental results show that our SRL model obtains 71% F1, and our method on OTD task obtains 73% precision which outperforms LTP. This research can be the basis of hot topic prediction for sponsors and helps them to decide marketing strategies. |
Abstract
Twitter user geolocation detection can inform and benefit a range of downstream geospatial tasks such as event and venue recommendation, local search, and crisis planning and response. In this paper, we take into account user shared tweets as well as their social network, and run extensive comparative studies to systematically analyze the impact of a variety of language-based, network-based, and hybrid methods in predicting user geolocation. In particular, we evaluate different text representation methods to construct text views that capture the linguistic signals available in tweets that are specific to and indicative of geographical locations. In addition, we investigate a range of network-based methods, such as embedding approaches and graph neural networks, in predicting user geolocation based on user interaction network. Our findings provide valuable insights into the design of effective and efficient geolocation identification engines. Finally, our best model, called TF-MF, substantially outperforms state-of-the-art approaches under minimal supervision Twitter, social computing, neural networks, graph, geolocation |
Abstract
—In this paper, we focus on the collection and analysis of relevant Twitter data on a state-by-state basis for (i) measuring public opinion on marijuana legalization by mining sentiment in Twitter data and (ii) determining the usage trends for six distinct types of marijuana. We overcome the challenges posed by the informal and ungrammatical nature of tweets to analyze a corpus of 306,835 relevant tweets collected over the four-month period, preceding the November 2015 Ohio Marijuana Legalization ballot and the four months after the election for all states in the US. Our analysis revealed two key insights: (i) the people in states that have legalized recreational marijuana express greater positive sentiments about marijuana than the people in states that have either legalized medicinal marijuana or have not legalized marijuana at all; (ii) the states that have a high percentage of positive sentiment about marijuana is more inclined to authorize (e.g., by allowing medical marijuana) or broaden its legal usage (e.g., by allowing recreational marijuana in addition to medical marijuana). Our analysis shows that social media can provide reliable information and can serve as an alternative to traditional polling of public opinion on drug use and epidemiology research. |
Abstract
In the recent years bustling online communities focus a lot of attention from research dealing with information spreading. Through acquired knowledge about characteristics of information spreading processes we are able to influence their dynamics via enhancement of propagation properties or changing them to decrease spread within a network. One of approaches is adding or removing connections within a network. While optimal linking within complex networks requires high computational resources in this investigation we focused on optimization of topology of small graphs within larger network structures. Study showed how enhancement of propagation properties within small networks is preserved in bigger networks based on connected smaller graphs. We compared results from combined small graphs with added link providing optimal spread and networks with additional random linking. Results showed that improvements of linking within small sub-graphs with optimal linking improves the diffusional properties of whole network. |
Abstract
“Echo chamber” is a metaphorical description of a situation in which beliefs are amplified inside a closed network,and social media platforms provide an environment that is well-suited to this phenomenon. Depending on the scale of the echo chamber, a user’s judgment of different opinions maybe restricted. The current study focuses on detecting echoing interaction between a post and its related comments to then quantify the predominating degree of echo chamber behavior on Facebook pages. To enable such detection, two content-based features are designed; the first aids stance representation of comments on a particular discussion topic, and the second focuses on the type and intensity of emotion elicited by a subject. This work also introduces data-driven semi-supervised approaches to extract such features from social media data. |
Abstract
Social dynamics are based on human needs for trust, support, resource sharing, irrespective of whether they operate in real life or in a virtual setting. Massively multiplayer online role-playing games (MMORPGS) serve as enablers of a leisurely social interaction and are important tools for social interactions. Past research has shown that socially dense gaming environments like MMORPGs can be used to study important social phenomena, which may operate in real life, too. We describe the process of social exploration to entail the following components 1) finding the balance between personal and social time 2) making choice between a large number of weak ties or few strong social ties. 3) finding a social group. In general, these are the major determinants of an individual’s social life. This paper looks into the phenomenon of social exploration in an activity based online social environment. We study this process through the lens of the following research questions, 1) What are the different social behavior types? 2) Is there a change in a player’s social behavior over time? 3) Are certain social behaviors more stable than the others? 4) Can longitudinal research of player behavior help shed light on the social dynamics and processes in the network? We use an unsupervised machine learning approach to come up with 4 different social behavior types - Lone Wolf, Pack Wolf of Small Pack, Pack Wolf of a Large Pack and Social Butterfly. The types represent the degree of socialization of players in the game. Our research reveals that social behaviors change with time. While lone wolf and pack wolf of small pack are more stable social behaviors whereas pack wolf of large pack and social butterflies are more transient. We also observe that players progressively move from large groups with weak social ties to settle in small groups with stronger ties. |
Abstract
Massively Multiplayer Online Role-Playing Games (MMORPGs) are persistent virtual environments where millions of players interact in an online manner. We study the problem of player churn and social contagion using MMORPG game logs by analyzing the impact of a node’s churn behavior on its immediate neighborhood or group. The two key research questions in this paper are - When an active node, ego, becomes dormant, what is the impact on the activity behavior of ego’s immediate neighbor, alter, 1) based on ego’s characteristics and ego’s relationship with alter and 2) based on the activity behavior of alter’s remaining neighbors. We use a supervised learning framework to study the impact of player churn and social contagion. Experimental results show that the classification models perform substantially better than random for both the research problems. Finally, we use a data-driven approach to propose a player typology based on degree of socialization and analyze churn behavior among these player types. Experimental results show that the loner player type is much more likely to churn than the socializer player types and as the degree of socialization decreases among socializers, the propensity to churn increases. |
Abstract
Almost all academic studies include a literature review section. This section is of significance in terms of presenting the value of the suggested method of the researcher and making comparisons. Due to the increasing number of academic papers and the emergence of various directories and indices, the time spent for finding the related previous studies is an important period for the researcher, which consumes a significant amount of time. By means of the suggested method, researchers can access various types of featured publications related to the keyword from different years from a single address. The system also helps to reveal an exemplary and guiding literature review among the found publications by conducting a text generation. The system uses the TF-IDF method for keyword-based publication search and “Template-Based Text Generation” method for the text generation algorithm. In the study, the largest open-access journal platform, TÜBİTAK Dergipark and SOBIAD Citation Index were used as the data set. As a result of the conducted tests, a method that supports the literature review process, even helping to the writing of literature review, was suggested. Along with the fact that there has not been an equivalent of the suggested study, the comparisons for success, “Text Generation” and “Literature Review” were independently calculated and presented. |
Abstract
Recent studies show that circRNAs have critical roles in many biological processes. Knowing the associations between circRNAs and diseases may contribute to the understanding of the mechanism of circRNAs and to the diagnostic and therapeutic methods of diseases at the molecular level. A small number of computation models have been developed to estimate CircRNA-disease associations. Therefore, in this study, a computational model has been developed. Similarity matrices have been obtained for circRNA and disease respectively by applying gaussian on the data obtained from the circRNADisease database. Then, random walk with restart algorithm applied on the combined matrices. The AUC value was obtained by 5-fold cross validation is 0.861 and this demonstrates the reliability of the model. |
Abstract
Customer satisfaction surveys, which have been the most common way of gauging customer feedback, involve high costs, require customer active participation, and typically involve low response rates. The tremendous growth of social media platforms such as Twitter provides businesses an opportunity to continuously gather and analyze customer feedback, with the goal of identifying and rectifying issues. This paper examines the alternative of replacing traditional customer satisfaction surveys with social media data. To evaluate this approach the following steps were taken, using customer feedback data extracted from Twitter: 1) Applying sentiment to each Tweet to compare the overall sentiment across different products and/or services. 2) Constructing a hashtag co-occurrence network to further optimize the customer feedback query process from Twitter. 3) Comparing customer feedback from survey responses with social media feedback, while considering content and added value. We find that social media provides advantages over traditional surveys. |
Abstract
Pornography can be distributed in multiple forms on the Internet. Online pornography forms a non-negligible fraction of the total Internet traffic, with adult video streaming gaining significant traction among the most visited global websites. Similar to the rise of User Generated Content (UGC) on general Web 2.0 services, adult video service providers have also promoted social interaction and UGC in what is called Porn 2.0. Discovering the characteristics of Porn 2.0 allows for better understanding of both Internet traffic in general and specifically UGC services. In this paper, using trace-driven analysis, we examined the characteristics of one of the most well-known Porn 2.0 service providers, XHamster. We found that a large proportion of the currently available videos were uploaded in recent years and this has coincided with a rapid growth in the use of video categories. Compared to non-adult UGC services, we found user interaction on XHamster to revolve more strongly around ratings than comments and the average duration and views per video were higher. |
Abstract
Millions of health-related messages and fresh communications can reveal important public health issues. New Drugs, Diseases, Adverse Drug Reactions (ADRs) keep appearing on social media in new Unicode versions. In particular, generative Model for both Sentiment analysis (SA) and Naturel Language Understanding (NLU) requires medical human labeled data or making use of resources for weak supervision that operates with the ignorance and the inability to define related-medication targets, and results in inaccurate sentiment prediction performance. The frequent use of informal medical language, non-standard format and abbreviation forms, as well as typos in social media messages has to be taken into account [6]. We probe the transition-based approach between patient’s language used in social media messages and formal medical language used in the descriptions of medical concepts in a standard ontology to be formal input of our neural network model. At this end, we propose daily life patient’s Sentiment Analysis model based on hybrid embedding vocabulary for related-medication text under distributed dependency, and concepts translation methodology by incorporating medical knowledge from social media and real life medical science systems. The proposed neural network layers is shared between medical concept Normalization model and sentiment prediction model in order to understand and leverage related-sentiment information behind conceptualized features in Multiple context. The experiments were performed on various real world scenarios where limited resources in this case. |
Abstract
Vector-borne diseases cause more than 1 million deaths annually. Estimates of epidemic risk at high spatial resolutions can enable effective public health interventions. Our goal is to identify the risk of importation of such diseases into vulnerable cities at the granularity of neighborhoods. Conventional models cannot achieve such spatial resolution, especially in real-time. Besides, they lack real-time data on demographic heterogeneity, which is vital for accurate risk estimation. Social media, such as Twitter, promise data from which demographic and spatial information could be inferred in real-time. On the other hand, such data can be noisy and inaccurate. Our novel approach leverages Twitter data, using machine learning techniques at multiple spatial scales to overcome its limitations, to deliver results at the desired resolution. We validate our method against the Zika outbreak in Florida in 2016. Our main contribution lies in proposing a novel approach that uses machine learning on social media data to identify the risk of vector-borne disease importation at a sufficiently fine spatial resolution to permit effective intervention. It will lead to a new generation of epidemic risk assessment models, promising to transform public health by identifying specific locations for targeted intervention. |
Abstract
We identify optimal strategies for maximising influence within a social network in competitive settings under budget constraints. While existing work has focussed on simple threshold models, we consider more realistic settings, where (i) states are dynamic, i.e., nodes oscillate between influenced and uninfluenced states, and (ii) continuous amounts of resources (e.g., incentives or effort) can be expended on the nodes. We propose a mathematical model using voting dynamics to characterise optimal strategies in a prototypical star topology against known and unknown adversarial strategies. In cases where the adversarial strategy is unknown, we characterise the Nash Equilibrium. To generalise the work further, we introduce a fixed cost incurred to gain access to nodes, together with the dynamic cost proportional to the influence exerted on the nodes, constrained by the same budget. We observe that, as the cost changes, the system interpolates between the historic discrete and the current continuous case. |
Abstract
Social systems are increasingly being modelled as complex networks, and the interactions and decision making of individuals in such systems can be modelled using game theory. Therefore, networked game theory can be effectively used to model social dynamics. Individuals can use pure or mixed strategies in their decision making, and recent research has shown that there is a connection between the topological placement of an individual within a social network and the best strategy they can choose to maximise their returns. Therefore, if certain individuals have a preference to employ a certain strategy, they can be swapped or moved around within the social network to more desirable topological locations where their chosen strategies will be more effective. To this end, it has been shown that to increase the overall public good, the cooperators should be placed at the hubs, and the defectors should be placed at the peripheral nodes. In this paper, we tackle a related question, which is the time (or number of swaps) it takes for individuals who are randomly placed within the network to move to optimal topological locations which ensure that the public utility satisfies a certain utility threshold. We show that this time depends on the topology of the social network, and we analyse this topological dependence in terms of topological metrics such as scale-free exponent, assortativity, clustering coefficient, and Shannon information content. We show that the higher the scale-free exponent, the quicker the public utility threshold can be reached by swapping individuals from an initial random allocation. On the other hand, we find that assortativity has negative correlation with the time it takes to reach the public utility threshold. We find also that in terms of the correlation between information content and the time it takes to reach a public utility threshold from a random initial assignment, there is a bifurcation: one class of networks show a positive correlation, while another shows a negative correlation. Our results highlight that by designing networks with appropriate topological properties, one can minimise the need for the movement of individuals within a network before a certain public good threshold is achieved. This result has obvious implications for defence strategies in particular. |
Abstract
We define the intrinsic scale at which a network begins to reveal its identity as the scale at which subgraphs in the network (created by a random walk) are distinguishable from similar sized subgraphs in a perturbed copy of the network. We conduct an extensive study of intrinsic scale for several networks, ranging from structured (e.g. road networks) to ad-hoc and unstructured (e.g. crowd sourced information networks), to biological. We find: (a) The intrinsic scale is surprisingly small (7-20 vertices), even though the networks are many orders of magnitude larger. (b) The intrinsic scale quantifies ``structure'' in a network -- networks which are explicitly constructed for specific tasks have smaller intrinsic scale. (c) The structure at different scales can be fragile (easy to disrupt) or robust. |
Abstract
We propose a network-aware multi-agent simulation approach to understanding the interlacing connections between herder-farmer communities in open property regimes. Specifically, we model herder-farmer conflicts in agent-based terms whereby individual decision-making, pastoral mobility, and symbiotic herder-farmer relations result in the emergence of a complex adaptive system in which communal resources are managed in ways that either lead to peaceful coexistence or conflict. From a theoretical perspective, we hope to further understanding of how individual decision-making and coordination produces complex adaptive systems as well as how emergent structures shape individual action. In practice, we anticipate that this study will help shed light on how herder-farmer communities can cooperate and coordinate their activity and mobility patterns to manage common pool resources in sustainable ways that mitigate violent conflict. Broadly, our work aims to contribute new insights towards multi-agent modelling of traditional small-scale societies. |
Abstract
Community detection looks for groups of nodes in networks, mainly using network topological, link-based features, not taking into account features associated with each node. Clustering algorithms, on the other hand, look for groups of objects using features describing each object. Recently, link features and node attributes have been combined to improve community detection. Community detection methods can be designed to identify communities that are disjoint or overlapping, crisp or soft and static or dynamic. In this paper, we propose a dynamic community detection method for finding soft overlapping groups in temporal networks with node attributes. Our approach is based on a non-negative matrix factorization model that uses automatic relevance determination to detect the number of communities. Preliminary results on toy and artificial networks, are promising. To the extent of our knowledge, a dynamic approach that includes link and node information, for soft overlapping community detection, has not been proposed before. |
Abstract
Online social media, periodically serves as a platform for cascading polarizing topics of conversation. The inherent community structure present in online social networks (homophily) and the advent of fringe outlets like Gab have created online "echo chambers" that amplify the effects of polarization, which fuels detrimental behavior. Recently, in October 2018, Gab made headlines when it was revealed that Robert Bowers, the individual behind the Pittsburgh Synagogue massacre, was an active member of this social media site and used it to express his anti-Semitic views and discuss conspiracy theories. Thus to address the need of automated data-driven analyses of such fringe outlets, this research proposes novel methods to discover topics that are prevalent in Gab and how they cascade within the network. Specifically, using approximately 34 million posts, and 3.7 million cascading conversation threads with close to 300k users; we demonstrate that there are essentially five cascading patterns that manifest in Gab and the most "viral" ones begin with an echo-chamber pattern and grow out to the entire network. Also, we empirically show, through two models viz. Susceptible-Infected and Bass, how the cascades structurally evolve from one of the five patterns to the other based on the topic of the conversation with up to 84% accuracy. |
Abstract
We demonstrate a machine learning method, namely lexical link analysis (LLA), which can be used to discover high-value information from financial data. LLA is an unsupervised learning method that does not require manually labeled training data. We also demonstrate how to form LLA in a game-theoretic framework. We show that with game theory: high-value information selected by LLA reaches a Nash equilibrium by superpositioning popular and anomalous information and at the same time generates high social welfare, therefore containing higher intrinsic value. We show the results of LLA of two sets of financial data validating and correlating with the ground truth. |
Abstract
Mobile phones have become nowadays a commodity to the majority of people. Using them, people are able to access the world of Internet and connect with their friends, their colleagues at work or even unknown people with common interests. This proliferation of the mobile devices has also been seen as an opportunity for the cyber criminals to deceive smartphone users and steel their money directly or indirectly, respectively, by accessing their bank accounts through the smartphones or by blackmailing them or selling their private data such as photos, credit card data, etc. to third parties. This is usually achieved by installing malware to smartphones masking their malevolent payload as a legitimate application and advertise it to the users with the hope that mobile users will install it in their devices. Thus, any existing application can easily be modified by integrating a malware and then presented it as a legitimate one. In response to this, scientists have proposed a number of malware detection and classification methods using a variety of techniques. Even though, several of them achieve relatively high precision in malware classification, there is still space for improvement. In this paper, we propose a text mining all repeated pattern detection method which uses the decompiled files of an application in order to classify a suspicious application into one of the known malware families. Based on the experimental results using a real malware dataset, the methodology tries to correctly classify (without any misclassification) all randomly selected malware applications of 3 categories with 3 different families each. |
Abstract
Observed social networks are often considered as proxies for underlying social networks. The analysis of observed networks oftentimes involves the identification of influential nodes via various centrality metrics. Our work is motivated by recent research on the investigation and design of adversarial attacks on machine learning systems. We apply the concept of adversarial attacks to social networks by studying strategies by which an adversary can minimally perturb the observed network structure to achieve their target function of modifying the ranking of nodes according to centrality measures. This can represent the attempts of an adversary to boost or demote the degree to which others perceive them as influential or powerful. It also allows us to study the impact of adversarial attacks on targets and victims, and to design metrics and security measures that help to identify and mitigate adversarial network attacks. We conduct a series of experiments on synthetic network data to identify attacks that allow the adversarial node to achieve their objective with a single move. We test this approach on different common network topologies and for common centrality metrics. We find that there is a small set of moves that result in the adversary achieving their objective, and this set is smaller for decreasing centrality metrics than for increasing them. These results can help with assessing the robustness of centrality measures. The notion of changing social network data to yield adversarial outcomes has practical implications, e.g., for information diffusion on social media, influence and power dynamics in social systems, and improving network security. |
Abstract
In many real-world networks, it is important to explicitly differentiate between positive and negative links, thus considering the observed networks as signed. To derive useful features, just as in the case of unsigned networks, representation learning can be used to learn meaningful representations of a network that characterize its underlying topology. Several methods for learning representations on signed networks have already been proposed but have not been systematically benchmarked together before. Hence, in this paper, we bridge this literature gap providing a quantitative and qualitative benchmark of the four most prominent representation learning methods for signed networks. Results on three different datasets for link sign prediction showcase the superiority of the StEM method over its competitors both from a predictive performance and runtime perspective. |