Research & Development
We wrote 30+ peer reviewed papers that propel research and development forward. Find out more about some of them.
NoSQL databases power modern web applications like social networks by efficiently partitioning data across nodes for scalability. However, current systems ignore data access patterns, leading to load imbalances and inefficient configurations. This paper demonstrates that optimizing HBase partition placement based on access patterns can boost throughput by 35%. Additionally, it proposes improvements over elastic systems.
This article provides an overview of CumuloNimbo, a platform for multi-tier applications that supports scalable and fault-tolerant OLTP workloads. Its key innovation is offering a standard SQL interface with full transactional support, without sharding or prior workload knowledge. Scalability is achieved by distributing request execution and transaction control across multiple nodes, with data stored in a distributed system.
NoSQL databases were originally designed for specific large-scale applications, where limited query capabilities were acceptable due to custom coding. As NoSQL use broadens, hand-crafted queries are less appealing. This paper addresses this by integrating a full SQL engine on HBase, preserving scalability and schema flexibility. The result is an ANSI SQL-compliant system that, under a TPC-C workload, scales linearly with nodes and outperforms an optimized NoSQL TPC-C implementation for HBase.
NoSQL databases were originally designed for specific large-scale applications, where limited query capabilities were acceptable due to custom coding. As NoSQL use broadens, hand-crafted queries are less appealing. This paper addresses this by integrating a full SQL engine on HBase, preserving scalability and schema flexibility. The result is an ANSI SQL-compliant system that, under a TPC-C workload, scales linearly with nodes and outperforms an optimized NoSQL TPC-C implementation for HBase.
The shift to cloud storage raises two challenges: ensuring privacy, confidentiality, and integrity, and tailoring security and performance to diverse needs. This paper presents SafeFS, a modular storage architecture that uses stackable blocks like encryption, replication, and coding to build secure distributed file systems. Implemented with FUSE for remote data access, SafeFS lets users customize storage to their needs. Evaluations show that while each layer adds overhead, it’s possible to create secure, efficient architectures with some surprising tradeoffs.
Very large-scale distributed systems present key research challenges and are crucial for modern applications. Current data stores can’t handle the high churn and dynamics of such environments. This paper introduces a data store based on epidemic (gossip-based) protocols, ensuring data persistence in massive, dynamic systems. An open-source prototype and evaluation are included.
Consensus is crucial in distributed systems, but practical implementation often requires trade-offs. This paper implements and analyzes mutable consensus, a protocol offering flexible trade-offs between decision latency and message complexity. Tested in a large-scale environment, the analysis identifies and addresses practical issues without affecting the protocol’s correctness.
Testing large-scale distributed systems is costly and difficult due to scale and non-determinism. This paper introduces Minha, a framework that simulates distributed environments by virtualizing multiple JVM instances within a single JVM. Minha runs JVM bytecode programs on thousands of virtual nodes and allows global observation in standard testing frameworks. Experiments demonstrate its ability to detect errors and scale tests efficiently with the same hardware.
Cloud storage services like Dropbox and Google Drive are popular for their convenience, but users often overlook privacy risks such as data leaks, subpoenas, or provider access. Solutions like encryption help but reduce features like easy sharing. This report evaluates security mechanisms for cloud storage, analyzing design choices and testing techniques. Results show trade-offs between security guarantees and user privacy needs on commercial platforms
Cloud storage services like Dropbox and Google Drive are popular for their convenience, but users often overlook privacy risks such as data leaks, subpoenas, or provider access. Solutions like encryption help but reduce features like easy sharing. This report evaluates security mechanisms for cloud storage, analyzing design choices and testing techniques. Results show trade-offs between security guarantees and user privacy needs on commercial platforms