SIGMOD '16- Proceedings of the 2016 International Conference on Management of Data

Full Citation in the ACM Digital Library

SESSION: Keynote - Jeff Dean

Building Machine Learning Systems that Understand

SESSION: Session 1 - Scalable Analytics and Machine Learning

Learning Linear Regression Models over Factorized Joins

To Join or Not to Join?: Thinking Twice about Joins before Feature Selection

Real-time Video Recommendation Exploration

Towards Globally Optimal Crowdsourcing Quality Management: The Uniform Worker Setting

Building the Enterprise Fabric for Big Data with Vertica and Spark Integration

Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms

Efficient and Progressive Group Steiner Tree Search

SESSION: Session 2 - Privacy and Security

Publishing Attributed Social Graphs with Formal Privacy Guarantees

Publishing Graph Degree Distribution with Node Differential Privacy

Principled Evaluation of Differentially Private Algorithms using DPBench

PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions

Adaptive Indexing over Encrypted Numeric Data

Practical Private Range Search Revisited

Privacy Preserving Subgraph Matching on Large Graphs in Cloud

SESSION: Session 3 - Logical and Physical Database Design

The Snowflake Elastic Data Warehouse

Closing the functional and Performance Gap between SQL and NoSQL

Have Your Data and Query It Too: From Key-Value Caching to Big Data Management

Ambry: LinkedIn's Scalable Geo-Distributed Object Store

SQL Schema Design: Foundations, Normal Forms, and Normalization

SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment

Automatic Generation of Normalized Relational Schemas from Nested Key-Value Data

SESSION: Session 4 - New Storage and Network Architectures

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices

SHARE Interface in Flash Storage for Relational and NoSQL Databases

Accelerating Relational Databases by Leveraging Remote Memory and RDMA

FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory

Micro-architectural Analysis of In-memory OLTP

SESSION: Session 5 - Graphs 1: Infrastructure and Processing on Modern Hardware

iBFS: Concurrent Breadth-First Search on GPUs

Tornado: A System For Real-Time Iterative Analysis Over Evolving Data

EmptyHeaded: A Relational Engine for Graph Processing

GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs

Graph Analytics Through Fine-Grained Parallelism

Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing

SESSION: Session 6 - Streaming 1: Systems and Outlier Detection

Scalable Pattern Sharing on Event Streams*

How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates

Sharing-Aware Outlier Analytics over High-Volume Data Streams

THEMIS: Fairness in Federated Stream Processing under Overload

SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures

Range Thresholding on Streams

SESSION: Session 7 - Approximate Query Processing

Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads

An Effective Syntax for Bounded Relational Queries

Wander Join: Online Aggregation via Random Walks

Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters

A Study of Sorting Algorithms on Approximate Memory

Distributed Wavelet Thresholding for Maximum Error Metrics

Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee

SESSION: Session 8 - Networks and the Web

Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks

Spheres of Influence for More Effective Viral Marketing

Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users?

Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware Models

Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale

Robust and Noise Resistant Wrapper Induction

SESSION: Session 9 - Data Discovery and Extraction

Goods: Organizing Google's Datasets

Multi-Source Uncertain Entity Resolution at Yad Vashem: Transforming Holocaust Victim Reports into People

A Hybrid Approach to Functional Dependency Discovery

Ontological Pathfinding

Extracting Databases from Dark Data with DeepDive

Estimating the Impact of Unknown Unknowns on Aggregate Query Results

SESSION: Session 10 - Data Integration / Cleaning

Constraint-Variance Tolerant Data Repairing

Interactive and Deterministic Data Cleaning: A Tossed Stone Raises a Thousand Ripples

Sequential Data Cleaning: A Statistical Approach

Learning-Based Cleansing for Indoor RFID Data

PrivateClean: Data Cleaning and Differential Privacy

RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets

Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach

SESSION: Session 11 - Spatio / Temporal Databases

Topic Exploration in Spatio-Temporal Document Collections

ParTime: Parallel Temporal Aggregation

Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets

Distributed Evaluation of Top-k Temporal Joins

AT-GIS: Highly Parallel Spatial Query Processing with Associative Transducers

Towards Best Region Search for Data Exploration

Simba: Efficient In-Memory Spatial Analytics

SESSION: Session 12 - Distributed Data Processing

Realtime Data Processing at Facebook

SparkR: Scaling R Programs with Spark

VectorH: Taking SQL-on-Hadoop to the Next Level

Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed In-memory Databases

Big Data Analytics with Datalog Queries on Spark

An Efficient MapReduce Cube Algorithm for Varied DataDistributions

SESSION: Session 13 - Graphs 2: Subgraph-based Optimization Techniques

Diversified Top-k Subgraph Querying in a Large Graph

Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs

Efficient Subgraph Matching by Postponing Cartesian Products

Adding Counting Quantifiers to Graph Patterns

DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine

Distributed Set Reachability

SESSION: Session 14 - Main Memory Analytics

Fast Multi-Column Sorting in Main-Memory Column-Stores

Elastic Pipelining in an In-Memory Database Cluster

Page As You Go: Piecewise Columnar Access In SAP HANA

Hybrid Garbage Collection for Multi-Version Concurrency Control in SAP HANA

UpBit: Scalable In-Memory Updatable Bitmap Indexing

SESSION: Session 15 - Interactive Analytics

FluxQuery: An Execution Framework for Highly Interactive Query Workloads

iOLAP: Managing Uncertainty for Efficient Incremental OLAP

Dynamic Prefetching of Data Tiles for Interactive Visualization

Expressive Query Construction through Direct Manipulation of Nested Relational Results

Shasta: Interactive Reporting At Scale

Datometry Hyper-Q: Bridging the Gap Between Real-Time and Historical Analytics

SESSION: Session 16 - Streaming 2: Sketches

Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams

Streaming Algorithms for Robust Distinct Elements

Augmented Sketch: Faster and More Accurate Stream Processing

Matrix Sketching Over Sliding Windows

Graph Stream Summarization: From Big Bang to Big Crunch

Scalable Approximate Query Tracking over Highly Distributed Data Streams