江松教授系列学术讲座

发布时间:2016-07-06

报告题目一:Design and Implementation of Effective Key-Value Systems for Large-scale Data Centers

报 告 人:Prof. Song JiangWayne State University

时间:2016714日(星期四)下午1400

地点:安徽大学磬苑校区行知楼负一楼报告厅

报告题目二:o-locating Metadata and Data to Improve Efficiency of Virtual Disks

报 告 人:Prof. Song JiangWayne State University

时间:2016715日(星期五)上午900

地点:安徽大学磬苑校区行知楼负一楼报告厅

主办单位:计算机科学与技术学院

欢迎各位老师、同学届时前往!

科学技术处

201675

报告一内容:

Data management systems in large-scale data centers are designed for high performance, scalability, and reliability.They play important roles in supporting Internet-wide data-centric computing.An important design principle critical to their success is to design according to workload characteristics:the general-purpose, one-size-fits-all approach once used in small-scale systems is no longer cost-effective.Examples of modern, carefully engineered systems include Google’s GFS file system, Facebook’s Haystack photo storage, and Baidu’s Atlas cloud storage system.

In this talk we will describe how rigorous workload characterization is used to design and implement a key-value (KV) system for large-scale data centers.In collaboration with Facebook, our team collected week-long KV access traces from Facebook’s production Memcached system and systematically characterized the relevant workload characteristics. This study showed some distinct access patterns that have significant implications for the KV systems’ designs, such as that (1) very small KV items are widespread; (2) accesses are highly skewed towards a small set of hot keys in KV cache; and (3) access traffic can be highly dynamic with request traffic varying by a factor of two.

Using our understanding of real-world workloads we designed and implemented the high-performance and resource-efficient zExpander KV cache and the LSM-trie KV store system.We will detail how the two systems’ designs were motivated by the understanding of their targeted workloads.Evaluation results reveal substantially, sometimes dramatically, improved performance over other state-of-the-art systems.As an anecdotal example, the LSM-trie system can improve the read and write throughputs of Google’s LevelDB by up to 10 and 20 times, respectively.We will conclude with a brief overview of our on-going projects and future visions.

 

报告二内容:

Virtual block devices are widely used to provide block interface to virtual machines (VM). A virtual block device manages an indirection mapping from the virtual address space presented to a VM, to a storage image hosted on file system or storage volume. This indirection is recorded as metadata on the image, which needs to be immediately updated upon each space allocation for data safety. Though each update involves only a few bytes of metadata, it demands a random write of an entire block. Furthermore, data consistency demands correct order of metadata and data writes be enforced, usually by inserting expensive FLUSH commands between them. The metadata operations compromise virtual devices’ efficiency.

This talk will introduce Selfie, a virtual disk format that eliminates frequent metadata writes by embedding metadata into data blocks. Selfie makes write of a data block and its associated metadata be completed in one atomic block operation. This is made possible by opportunistically compressing data in a block to make room for the metadata. Experiments show that Selfie gains as much as 5x performance improvements over existing mainstream virtual disks. It delivers near-raw performance with an impressive scalability for concurrent I/O workloads.

 

报告人简介:

Dr. Song Jiang is currently an associate professor of the ECE department at Wayne State University. His research interests include system infrastructure for big data processing, such as file and storage systems and data management systems, as well as I/O systems for high-performance computing.He was a recipient of a 2009 US National Science Foundation (NSF) CAREER award and his research activities have been continuously supported by the NSF. He has served on numerous conference program committees and proposal review panels.He has been involved in projects at Facebook and Baidu as a collaborator for providing high-quality Internet-wide services based on big data, resulting in many significant publications at top-tier conferences.Dr. Jiang’s research has generated substantial impact in industry where several of his proposed algorithms for memory and storage management have been officially adopted into mainstream systems, including the Linux kernel, the NetBSD kernel, and the storage engine of MySQL.

He received his B.S and M.S from the University of Science and Technology of China, and his Ph.D. in computer science from the College of William and Mary in 2004.From 2004 to 2006 he was a post-doctoral researcher at the Los Alamos National Laboratory where his research work was cited at the national level as a “success story” of the NNSA Laboratory Directed Research and Development program.

More information about his research can be found at http://www.ece.eng.wayne.edu/~sjiang/

 

返回原图
/