Over the past 2-3 years a number of scientific disciplines have purchased large instruments that are capable in generating appreciably large datasets. In addition to this increase in data production, there is a requirement for the produced data to be processed and analysed. Generation and processing capabilities are physically located in different locations around the UK, therefore there is a growing requirement from the community to transfer large (>1Tb) files, or multiple thereof, in the shortest possible time frame.
The US Energy Sciences Network has a number of throughput test servers that Janet users can use to check their transatlantic throughput between hosts. These DTN's or data transfer nodes use GridFTP as their application for data transfer, and have a wide range of file sizes from 1MB to 100GB. The Servers are based in Lawrence Berkley National Lab, California; Argonne National Lab, Chicago and Brookhaven National Lab, New York.
Here is an archive of an historic working group organised by Mark Leese at the Council for the Central Laboratory of the Research Councils (CCLRC), now known as STFC.
*The following text is taken from the web archive of the group*
Networks for Non-Networkers 2 (NFNN2) was a one and a half day workshop for people working at the technical level in high-bandwidth dependent science. It is not aimed at network researchers or networking experts, but at people trying to use the network for science; people who have questions like:
GridFTP is a high capacity and reliable extension of the well used File Transfer Protocol. GridFTP uses single and multple streams to transfer large volumes of data between hosts. Here is a link to best practice around using and tuning GridFTP from the US Argonne National Lab.
The Barbar copy protocol is well used in *nix OS's and can be used to move large amounts of data. Getting the right configuration can increase the throughput between two hosts significantly. Here is a guide from Caltech in the states on how to do this:
Here is a extremely well put together and useful web resource. The Energy Sciences network developed this Knowledge Base and provides advice around:
- Network Architecture, including the Science DMZ model
- Host Tuning
- Network Tuning
- Data Transfer Tools
- Network Performance Testing
The the knowledgebase can be found here http://fasterdata.es.net/
HI All,
I am greatful for Stephen Booth from EPCC who has forwarded a blog article he has recently posted on file transfer technologies for HPC. EPCC are the based out of Edinburgh university, and operate the UK's national supercomputer HECToR.
The article can be found here
The Barbar copy protocol is well used in *nix OS's and can be used to move large amounts of data. Getting the right configuration can increase the throughput between two hosts significantly. Here is a guide from Caltech in the states on how to do this: