BitTorious volunteer: server-side extensions for centrally-managed volunteer storage in BitTorrent swarms
© Lee and Dinu. 2015
Received: 13 March 2015
Accepted: 14 October 2015
Published: 4 November 2015
Our publication of the BitTorious portal  demonstrated the ability to create a privatized distributed data warehouse of sufficient magnitude for real-world bioinformatics studies using minimal changes to the standard BitTorrent tracker protocol. In this second phase, we release a new server-side specification to accept anonymous philantropic storage donations by the general public, wherein a small portion of each user’s local disk may be used for archival of scientific data. We have implementated the server-side announcement and control portions of this BitTorrent extension into v3.0.0 of the BitTorious portal, upon which compatible clients may be built.
Automated test cases for the BitTorious Volunteer extensions have been added to the portal’s v3.0.0 release, supporting validation of the “peer affinity” concept and announcement protocol introduced by this specification. Additionally, a separate reference implementation of affinity calculation has been provided in C++ for informaticians wishing to integrate into libtorrent-based projects.
The BitTorrent “affinity” extensions as provided in the BitTorious portal reference implementation allow data publishers to crowdsource the extreme storage prerequisites for research in “big data” fields. With sufficient awareness and adoption of BitTorious Volunteer-based clients by the general public, the BitTorious portal may be able to provide peta-scale storage resources to the scientific community at relatively insignificant financial cost.
KeywordsData transfer Bioinformatics Big data Software Open source BitTorrent BitTorious Citizen scientist
Individual .torrent files hold compartmentalized metadata treating the torrent payload as a sequence of equally-sized “pieces”, calculated at time of torrent creation. Hashes for each piece are included in the .torrent metadata file, which allow clients to validate the correctness and completeness of transferred data.
Under traditional BitTorrent use, users run individual BitTorrent clients with the intention of downloading the entirety of a given torrent’s payload to local disk, often seeding those data to other peers. That is, every piece listed in the .torrent file will be download and assembled by each client using the long-standardized and well understood BitTorrent specification. Our BitTorious v2.x.x portal release  supports these use cases while introducing strict publisher and subscriber role semantics as an access control layer not present in most BitTorrent networks. (The reader is encouraged to review this work  for a deeper explanation of core BitTorious concepts, user role types and their relation to standard BitTorrent.)
The needs of a client in a volunteer storage grid are fundamentally different from most studies of peer behavior [2–4]. We cannot assume the user desires to provide storage for the entirety of a single torrent, even if they have sufficient disk space as well as available bandwidth. For example, a user donating storage towards a typical WGS project will almost certainly not want to contribute more than a mere fraction of the disk space required for even a single patient study. For this reason alone, existing BitTorrent portals oriented toward distribution of scientific data cannot reasonably expect participation from citizen scientists, as there are no built-in mechanisms for centrally access controlled, partial replication.
Additionally, piece selection mechanisms vary between client implementations and target audience. A client oriented towards distribution of video files may, for example, deviate from standard piece selection by favoring pieces at the beginning of each file, thus supporting real-time video streaming prior to download completion. Any such piece and torrent prioritization algorithms active within a given BitTorrent client are or little, if any, use to a BitTorious client, and are likely to impair the resiliency of the BitTorious network by disproportionately over-replicating certain pieces since BitTorrent client developers tend to assume that the user chooses the torrents to join, not the tracker.
BitTorious v3.0.0 addresses these core usage differences using the publisher/subscriber security model unchanged from 2.x.x releases.
Building upon the existing role-based, per-user-per-feed security model supported by the BitTorious portal , we introduce several new concepts and capabilities in order to support controlled, partial replication.
Publisher feed configuration
Enable Public Archiving (default: disabled) - When enabled, all volunteer clients will be able to donate storage to the feed. When enabled, the “private” flag will not be set for each torrent, which is automatic in BitTorrious v2.x.x.
Target Replication Percentage (default: 20 %, min: 1 %, max: 100 %) – Approximate percentage of pieces that a volunteer should replicate for a given torrent. The effective piece count will always be rounded up, and applies to all torrents published to the feed. A low percentage increases the capacity of the network at the expense of limited piece replication, while a higher value favors high piece replication to maximize availability at the expense of capacity.
BitTorious v3.0.0 introduces a concept we call “piece affinity”, which must be understood and calculated by the portal as well as fully compatible peers. The affinity of a peer declares the pieces it intends to replicate and seed. Affinity values are calculated via a simple, pre-defined function based on publisher-defined feed configuration as well as peer ID, thus enabling coordinated partial replication across the entire network and a limited degree of peer-peer affinity enforcement as every peer is able to calculate the affiny of every other peer without querying the portal.
All BitTorrent swarms require that all N pieces of a torrent must be of the same length, in bytes. (BitTorious recommends a piece size of exactly 4MiB, i.e., 2 ^ 22 bytes). Unlike standard BitTorrent, volunteer clients should only download pieces to which they are affine, though this cannot be enforced by the server since P2P transfers do not, by definition, route through the server. Also, strict server enforcement would likely introduce incompatibilities with normal, non-BitTorious-aware BitTorrent clients.
Inform the tracker of configuration parameters when announcing.
Obey a user-configurable local storage limit.
Delete non-affine pieces if and only if disk space is required for affine pieces or requested by the user.
Keep all portal/tracker interactions over HTTPS.
Delete all pieces for a given torrent when the time since last successful announce passes a pre-set time period.
Only send a given peers pieces to which it’s affine, regardless of what peers request.
Outside of these core semantic differences, a volunteer client uses the existing BitTorrent peer wire protocol.
Volunteer announce parameters
Only a value of “1” will be recognized by the tracker as a volunteer announcement.
The hard storage limit, in bytes, as set by the user.
The amount of storage, in bytes, used by the client on the user’s machine.
To facilitate the tracking process, the BitTorious tracker observes a “volunteer = 1” option. Note that the semantic of the “left” value (the number of missing bytes across the entire torrent) submitted on peer announcement is unchanged, even if the client never intends on replicating remaining pieces. That is, it is perfectly normal for a volunteer client’s “left” to never reach zero, and a value below a calculatable threshold to be a symptom of a misbehaving client.
A client should attempt to locally replicate a given piece number, X, if and only if the following function evaluates to true:
should_replicate (X) : = (X > = A & & X < = L) || (N < = L & & X < = L ‐ N)
The portal may, in future releases, provide a piece prioritization component by returning a dynamic piece acquistion priority order as part of the volunteer announcement response, but expects no specific piece prioritization in v3.x.x releases. Leaving piece order in the hands of client implementations allows for use of existing piece selection and peer choke/unchoke logic inherent to every BitTorrent client, given those pieces are within the assigned affiny range (see Fig. 5). In such event, the tracker will return an ordered array within the root level of the dictionary keyed by “piece_priority” when the client “volunteer” bit is present in the announcement as defined above, with higher priority pieces listed first. The client should obey this request when observed, but is not strictly required to do so since implementation inherently requires further modification of peer choking/unchocking determination.
We demonstrate affinity calculation with two reference implementations, both distributed as part of the portal source code. The first is written in Ruby and built into the portal’s peer registration logic. Automated regression test cases are provided to validate correct volunteer announcement extension behavior as well as server-side affinity calculation.
The second is a standalone command-line application written in C++ to quickly test how affinity will be calculated by the server (and compatible clients) for any valid feed configuration that may be easily integrated into existing C/C++ BitTorrent clients, such as those based on the popular “libtorrent” C++ library. Both implementations are released under unencumbered Open Source-compatible licenses.
Notable existing computing platforms such as BOINC have long proved the viability of using crowd sourced resources in real-world research, but have evolved with a strong skew towards time-heavy computing: algorithms that divide a problem into small distributed computing tasks later assembled to conquer the whole. Users of BOINC must abide by BOINC-specific API and prototcol mechanisms [5, 6] only well known within scientific communities. BitTorious is equal and opposite in this regard, providing centralized control of a storage “grid”, but designed foremost to address space-heavy storage of scientific data, not CPU cycles. BitTorious maintains direct compatibility with the most popular, widely known and best understood algorithms – BitTorrent – and leaves the existing core “peer wire protocol” unchanged, allowing voluminous amounts of research from other BitTorrent-based systems to be adapted far easier .
Developers implementing compatible clients are strongly encouraged to allow user control of when the client runs, where on local disk to put the data, a hard upper donation limit, GUI mechanism for selecting the feed(s) to which the resources will be donated, and automatic disabling of transfers while on “metered” Internet connections. Additionally, support for uPnP of NAT devices is very strongly recommended to facilitate inbound network connections without manual port forwarding configuration . Failure to address the reality of home firewalls will significantly limit the overall network performance of a BitTorious-compatible client.
In addition to clear functional potential, we observe our place in human evolution as marked by the advent of the citizen scientist. Legal, philosophical and ethical battles regarding regulation of “medical devices” (as they pertain to bioinformatics specifically) and digital governance (of BitTorrent networks in general [9, 10]) in American government are far from over, but the expectation of direct engagement with scientific communities has been set. Philanthropically, users expect voluntary contributions to be direct, online, and with clear accountability of how contributions are allocated. BitTorious’ v3.0.0 extensions fall in line with these cultural expectations by enabling the public to individually donate local storage resources to specific projects of their choosing via standards-compliant clients provided by the research community. Such a simple, direct and individually actionable mechanism for volunteer storage does not exist in practice.
The value of networks such as BitTorious, while not constrained by any fundamental limits of technological possibility, is limited by the magnitude of its user base. Any such effort to build a significantly sized storage network based on BitTorious must be met with a proportional effort in volunteer recruitment. Notwithstanding, introduction of a simple piece affinity mechanism as presented here is paramount to respecting the generous but limited contributions of volunteer peers. Without such a partial replication function, any general-purpose P2P technology is unlikely to be met with success in big data fields where payload size often exceeds locally available storage resources by an order of magnitude or more.
The adaptation of BitTorrent algorithms and protocols to public volunteer computing is important in its own right, but more importantly, has far-reaching potential to change the fundamental economics of 21st-century science. To the scientific community, the potential cost savings resultant of using a large volunteer network of partial-replication peers is most dramatic when considered at scale. For example, a theoretical personalized medicine trial syndicating a total of 1 PB of data into a single feed across 1000 torrents would average 1 TB of individual study data per torrent. At a minimum 1 % default Target Replication Percentage and volunteer maximum donation mode of 20GB, peers would hit their local 20GB device limit before reaching the 1 %-per-torrent (10GB) target threshold. To achieve 4x redundancy of each byte in the entire syndicated feed – that is, 4 distributed copies exclusive of the original seed copy – approximately 2 million volunteer devices are needed on the network. While lofty, this is significantly less than the 3 million active volunteer compute devices on the World Community Grid  network. Using 1 PB of Amazon S3 reduced-redundancy pricing as a baseline, we estimate the annual storage savings of this network alone to be in excess of $300,000 (USD) annually.
Openly available P2P technologies have disrupted the entertainment industry enough to create entirely new means of distributing artistic works, and BitTorious aims to do the same for scientific data. BitTorious decouples distributed storage from any singular domain, geographic, institutional or national interest, and if run at an adoption scale similar to World Community Grid would provide peta-scale distribution and archival resources to scientists in every domain worldwide.
Availability and requirements
Project name: BitTorious
Operating system(s): Both portal and clients are platform independent.
Programming language: Ruby on Rails, AngularJS, Bootstrap, PostgreSQL.
Other requirements: Ruby 2.2.1 or higher.
Any restrictions to use by non-academics: None
We would like to thank the Arizona State University College of Health Solutions. BitTorious research and development is self-funded.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Lee PV, Dinu V. BitTorious: global controlled genomics data publication, research and archiving via BitTorrent extensions. BMC Bioinformatics. 2014;15(1):424. doi:10.1186/s12859-014-0424-9.View ArticlePubMedGoogle Scholar
- Kash IA, Lai JK, Zhang H, Zohar A. Economics of BitTorrent Communities. New York, New York, USA: ACM; 2012. p. 221–30. doi:10.1145/2187836.2187867.Google Scholar
- Adamsky F, Khayam SA, Jager R, Rajarajan M. Who Is Going to Be the Next BitTorrent Peer Idol? 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing (EUC). 2014:293–298. doi:10.1109/EUC.2014.50.
- Cai QC, Lo KT. An analysis of user behavior in a private BitTorrent community. Int J Commun Syst. 2014;27(10):1572–81. doi:10.1002/dac.2420.View ArticleGoogle Scholar
- Anderson DP. Boinc: A system for public-resource computing and storage. GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing. 2004;4-10. doi:10.1109/GRID.2004.14.
- Elwaer A, Harrison A, Kelley I, Taylor I. Attic: A Case Study for Distributing Data in BOINC Projects. Distributed Processing, Workshops and Phd Forum (IPDPSW). 2011:1863–1870. doi:10.1109/IPDPS.2011.348.
- Leon X, Chaabouni R, Sanchez-Artigas M, Garcia-Lopez P. Smart Cloud Seeding for BitTorrent in Datacenters. IEEE Internet Computing. 2014;18(4):47–54. doi:10.1109/MIC.2014.43.View ArticleGoogle Scholar
- Liu Y, Chang L, Pan J. On the performance and fairness of BitTorrent-like data swarming systems with NAT devices. Comput Netw. 2014;59:197–212. doi:10.1016/j.bjp.2013.11.005.View ArticleGoogle Scholar
- Druckerman JA. The Uncertifiable Swarm: Why Defendant Class Actions and Mass BitTorrent Copyright Litigation Don't Mix. NYL Sch L Rev. Hein Online. 2013;931.Google Scholar
- Foreman VS. Problems with BitTorrent Litigation in the United states: Personal Jurisdiction, Joinder, Evidentiary Issues, and Why the Dutch Have a Better System. Wash U Global Stud L Rev. Hein Online. 2014;127.Google Scholar
- World Community Grid: Global Statistics. Downloaded September 3, 2015 from http://www.worldcommunitygrid.org/stat/viewGlobal.do; 2015.