My (almost) full publication list can be accessed here. The following is a list of papers with slides and/or source code available.
[SIGMOD’21] Top-K Deep Video Analytics: A Probabilistic Approach
– Z Lai, C Han, C Liu, P Zhang, Eric Lo, B Kao
Abstract: The impressive accuracy of deep neural networks (DNNs) has created great demands on practical analytics over video data. Although efficient and accurate, the latest video analytic systems have not supported analytics beyond selection and aggregation queries. In data analytics, Top-K is a very important analytical operation that enables analysts to focus on the most important entities. In this paper, we present Everest, the first system that supports efficient and accurate Top-K video analytics. Everest ranks and identifies the most interesting frames/moments from videos with probabilistic guarantees. Everestis a system built with a careful synthesis of deep computer vision models, uncertain data management, and Top-K query processing. Evaluations on real-world videos and the latest Visual Road benchmark show that Everest achieves between 14.3×to 20.6×higher efficiency than baseline approaches with high result accuracy.
[VLDB’20] Dash: Scalable Hashing on Persistent Memory
– B. Lu, X. Hao, T Wang, Eric Lo
Abstract: Byte-addressable persistent memory (PM) brings hash tables the potential of low latency, cheap persistence, and instant recovery. The recent advent of Intel Optane DC Persistent Memory Modules (DCPMM) further accelerates this trend. Many new hash table de- signs have been proposed, but most of them were based on emulation and perform sub-optimally on real PM. They were also piece-wise and partial solutions that side-step many important properties, in particular good scalability, high load factor and instant recovery.
We present Dash, a holistic approach to building dynamic and scalable hash tables on real PM hardware with all the aforemen- tioned properties. Based on Dash, we adapted two popular dynamic hashing schemes (extendible hashing and linear hashing). On a 24- core machine with Intel Optane DCPMM, we show that compared to state-of-the-art, Dash-enabled hash tables can achieve up to ∼3.9× higher performance with up to over 90% load factor and an instant recovery time of 57ms regardless of data size.
[AAAI’20] High-Performance Depthwise and Pointwise Convolutions on Mobile Devices
– P. Zhang, Eric Lo, B. Lu
Abstract: Lightweight convolutional neural networks (e.g., MobileNets) are specifically designed to carry out inference directly on mobile devices. Among the various lightweight models, depthwise convolution (DWConv) and pointwise convolution (PWConv) are their key operations. In this paper, we observe that the existing implementations of DWConv and PWConv are not well utilizing the ARM processors in the mobile devices, and exhibit lots of cache misses under multi-core and poor data reuse at register level. We propose techniques to re-optimize the implementations of DWConv and PWConv based on ARM architecture. Experimental results show that our implementation can respectively achieve a speedup of up to 5.5x and 2.1x against TVM on DWConv and PWConv. Using MobileNetV1 as an example, our optimized implementation can carry out inference at 64GFlops, a figure that is almost hitting the roofline of ARM processors.
[TODS’19 (SIGMOD’19)] Historic Moments Discovery in Sequence Data
– Ran Bai, WK Hon, Eric Lo, Zhian He, Kenny Zhu
Abstract: Many emerging applications are based on finding interesting subsequences from sequence data. Finding “prominent streaks”, a set of longest contiguous subsequences with values all above (or below) a certain threshold, from sequence data, is one of that kind that receives much attention. Motivated from real applications, we observe that prominent streaks alone are not insightful enough but require the discovery of something we coined as “historic moments” as companion. In this paper, we present an algorithm to efficiently compute historic moments from sequence data. The algorithm is incremental and space-optimal, meaning that when facing new data arrival, it is able to efficiently refresh the results by keeping minimal information. Case studies show that historic moments can significantly improve the insights offered by prominent streaks alone. Furthermore, experiments show that our algorithm can outperform the baseline in both time and space.
[SIGMOD’16] Fast Multi-column Sorting in Main-Memory Column-Stores
– Wenjian Xu, ZiQiang Feng, Eric Lo
Abstract: Sorting is a crucial operation that could be used to implement SQL operators such as GROUP BY, ORDER BY, and SQL:2003 PARTITION BY. Queries with multiple attributes in those clauses are common in real workloads. When executing queries of that kind, state-of-the-art main-memory column-stores require one round of sorting per input column. With the advent of recent fast scans and denormalization techniques, that kind of multi-column sorting could become a bottleneck. In this paper, we propose a new technique called “code massaging”, which manipulates the bits across the columns so that the overall sorting time can be reduced by eliminating some rounds of sorting and/or by improving the degree of SIMD data level parallelism. Empirical results show that a main memory column-store with code massaging can achieve speedup of up to 4.7X, 4.7X, 4X, and 3.2X on TPC-H, TPC-H skew, TPC-DS, and real workload, respectively.
[SIGMOD’15] ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout
– ZiQiang Feng, Eric Lo, Ben Kao, Wenjian Xu
Abstract: Scan and lookup are two core operations in main memory column stores. A scan operation scans a column and returns a result bit vector that indicates which records satisfy a filter. Once a column scan is completed, the result bit vector is converted into a list of record numbers, which is then used to look up values from other columns of interest for a query. Recently there are several in-memory data layout proposals that aim to improve the performance of in-memory data processing. However, these solutions all stand at either end of a trade-off — each is either good in lookup performance or good in scan performance, but not both. In this paper we present ByteSlice, a new main memory storage layout that supports both highly efficient scans and lookups. ByteSlice is a bytelevel columnar layout that fully leverages SIMD data-parallelism. Micro-benchmark experiments show that ByteSlice achieves a data scan speed at less than 0.5 processor cycle per column value — a new limit of main memory data scan, without sacrificing lookup performance. Our experiments on TPC-H data and real data show that ByteSlice offers significant performance improvement over all state-of-the-art approaches.
[SIGMOD’13] Parallel Analytics as a Service
– Petrie Wong, Zhian He, Eric Lo
Abstract: Recently, massively parallel processing relational database systems (MPPDBs) have gained much momentum in the big data analytic market. With the advent of hosted cloud computing, we envision that the offering of MPPDB-as-a-Service (MPPDBaaS) will become attractive for companies having analytical tasks on only hundreds gigabytes to some ten terabytes of data because they can enjoy high-end parallel analytics at a cheap cost. This paper presents Thrifty, a prototype implementation of MPPDB-as-a-service. Themajor research issue is how to achieve a lower total cost of owner-ship by consolidating thousands of MPPDB tenants on to a shared hardware infrastructure, with a performance SLA that guarantees the tenants can obtain the query results as if they are executing their queries on dedicated machines. Thrifty achieves the goal by using atenant-driven design that includes (1) a cluster design that carefully arranges the nodes in the cluster into groups and creates an MPPDB for each group of nodes, (2) a tenant placementthat assigns each tenant to several MPPDBs (for high availability service through replication), and (3) a query routing algorithm that routesa tenant’s query to the proper MPPDB at run-time. Experiments show that in a MPPDBaaS with 5000 tenants, where each tenant requests 2 to 32 nodes MPPDB to query against 200GB to 3.2TB of data, Thrifty can serve all the tenants with a 99.9% performance SLA guarantee and a high availability replication factor of 3, using only 18.7% of the nodes requested by the tenants.
[ICDE’12] Answering Why-Not Questions on Top-K Queries
– Zhian He, Eric Lo
Abstract: After decades of effort working on database performance, the quality and the usability of database systems have received more attention in recent years. In particular, the feature of explaining missing tuples in a query result, or the so-called “why-not” questions, has recently become an active topic. In this paper, we study the problem of answering why-not questions on top-k queries. Our motivation is that we know many users love to use top-k queries when they are making multi-criteria decisions. However, they often feel frustrated when they are asked to quantify their feeling as a set of numeric weightings, and feel even more frustrated after they see the query results do not include their expected answers. In this paper, we use the query-refinement method to approach the problem. Given as inputs the original top-k query and a set of missing tuples, our algorithm returns to the user a refined top-k query that includes the missing tuples. A case study and experimental results show that our approach returns high quality explanations to users efficiently.
Notice: The materials contained here are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders.
Disclaimer: By accessing the code, it is understood that all codes are written by my research personnel and they hold the full account for the validity of the results.