Tachyon Overview

Tachyon is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, thereby avoiding going to disk to load datasets that are frequently read. This enables different jobs/queries and frameworks to access cached files at memory speed.

Tachyon is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without any code change. The project is open source (Apache License 2.0) and is deployed at multiple companies. It has more than 80 contributors from over 30 institutions, including Yahoo, Intel, Red Hat, and Tachyon Nexus. The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the Fedora distribution.

Github Repository | Releases and Downloads | User Documentation | Developer Documentation | Meetup Group | JIRA | User Mailing List

Current Features

User Documentation

Deployment Guide:

Configuration:

Frameworks on Tachyon:

Others:

Tachyon Presentations:

Developer Documentation

Contributing to Tachyon

Building Tachyon Master Branch

External resources

Tachyon Mini Courses:

Hot Rod Hadoop With Tachyon on Fedora 21

Support or Contact

You are welcome to join our mailing list to discuss questions and make suggestions. We use JIRA to track development and issues. If you are interested in trying out Tachyon in your cluster, please contact Haoyuan.

Acknowledgement

Tachyon is an open source project started in the UC Berkeley AMP Lab. This research is supported in part by NSF CISE Expeditions Award CCF-1139158, LBNL Award 7076018, and DARPA XData Award FA8750-12-2-0331, and gifts from Amazon Web Services, Google, SAP, The Thomas and Stacey Siebel Foundation, Adatao, Adobe, Apple, Inc., Blue Goji, Bosch, C3Energy, Cisco, Cray, Cloudera, EMC, Ericsson, Facebook, Guavus, Huawei, Informatica, Intel, Microsoft, NetApp, Pivotal, Samsung, Splunk, Virdata, VMware, and Yahoo!.

We would also like to thank to our project contributors.

Related Projects

Berkeley Data Analysis Stack (BDAS) from AMPLab at Berkeley