Tachyon is a fault tolerant distributed file system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory thereby avoiding going to disk to load datasets that are frequently read. This enables different jobs/queries and frameworks to access cached files at memory speed.
Java-like File API: Tachyon’s native API is similar to that of the java.io.File class, providing InputStream and OutputStream interfaces and efficient support for memory-mapped I/O. We recommend using this API to get the best performance from Tachyon
Compatibility: Tachyon implements the Hadoop FileSystem interface. Therefore, Hadoop MapReduce and Spark can run with Tachyon without modification. However, close integration is required to fully take advantage of Tachyon and we are working towards that. End-to-end latency speedup depends on the workload and the framework, since various frameworks have different execution overhead.
Pluggable underlayer file system: Tachyon checkpoints in-memory data to the underlayer file system. Tachyon has a generic interface to make plugging an underlayer file system easy. It currently supports HDFS, S3, and single-node local file systems. Support for many other file systems is coming.
Native support for raw tables: Table data with over hundreds of columns is common in data warehouses. Tachyon provides native support for multi-columned data. The user can choose to only put hot columns in memory.
Web UI: Users can browse the file system easily through web UI. Under debug mode, administrators can view detailed information of each file, including locations, checkpoint path, etc.
Command line interaction: Users can use
./bin/tachyon tfsto interact with Tachyon, e.g. copy data in and out of the file system.
Running Tachyon Locally: Get Tachyon up and running on a single node for a quick spin in ~ 5 minutes.
Running Tachyon on a Cluster: Get Tachyon up and running on your own cluster.
Fault Tolerant Tachyon Cluster: Make your cluster fault tolerant.
Running Spark on Tachyon: Get Spark running on Tachyon
Running Shark on Tachyon: Get Shark running on Tachyon
Running Hadoop MapReduce on Tachyon: Get Hadoop MapReduce running on Tachyon
Configuration Settings: How to configure Tachyon.
Command-Line Interface: Interact with Tachyon through the command line.
Syncing the Underlying Filesystem: Make Tachyon understand an existing underlayer filesystem.
Tachyon Presentation at Strata and Hadoop World 2013 (October, 2013)
Support or Contact
You are welcome to join our mailing list to discuss questions and make suggestions. We use JIRA to track development and issues. If you are interested in trying out Tachyon in your cluster, please contact Haoyuan.
Tachyon is an open source project started in the UC Berkeley AMP Lab. This research and development is supported in part by NSF CISE Expeditions award CCF-1139158 and DARPA XData Award FA8750-12-2-0331, and gifts from Amazon Web Services, Google, SAP, Apple, Inc., Cisco, Clearstory Data, Cloudera, Ericsson, Facebook, GameOnTalis, General Electric, Hortonworks, Huawei, Intel, Microsoft, NetApp, Oracle, Samsung, Splunk, VMware, WANdisco and Yahoo!.
We would also like to thank to our project contributors.
Berkeley Data Analysis Stack (BDAS) from AMPLab at Berkeley