CollecTor fetches data from various nodes and services in the public Tor network and makes it available to the world. If you're doing research on the Tor network, or if you're developing an application that uses Tor network data, this is your place to start.
Browse Recent Descriptors Browse Archived DescriptorsDescriptors are available in two different file formats: recent descriptors that were published in the last 72 hours are available as plain text, and archived descriptors covering over 10 years of Tor network history are available as compressed tarballs.
Descriptor Type | Type Annotation | Recent Descriptors | Archived Descriptors | Data Format |
---|---|---|---|---|
Relay Server Descriptors | @type server-descriptor 1.0 | recent | archive | format |
Relay Extra-info Descriptors | @type extra-info 1.0 | recent | archive | format |
Network Status Consensuses | @type network-status-consensus-3 1.0 | recent | archive | format |
Network Status Votes | @type network-status-vote-3 1.0 | recent | archive | format |
Directory Key Certificates | @type dir-key-certificate-3 1.0 | archive | format | |
Microdescriptor Consensuses | @type network-status-microdesc-consensus-3 1.0 | recent | archive | format |
Microdescriptors | @type microdescriptor 1.0 | recent | archive | format |
Version 2 Network Statuses | @type network-status-2 1.0 | archive | format | |
Version 1 Directories | @type directory 1.0 | archive | format | |
Bridge Network Statuses | @type bridge-network-status 1.1 | recent | archive | format |
Bridge Server Descriptors | @type bridge-server-descriptor 1.2 | recent | archive | format |
Bridge Extra-info Descriptors | @type bridge-extra-info 1.3 | recent | archive | format |
Hidden Service Descriptors | @type hidden-service-descriptor 1.0 | format | ||
Bridge Pool Assignments | @type bridge-pool-assignment 1.0 | archive | format | |
Exit Lists | @type tordnsel 1.0 | recent | archive | format |
Torperf Measurement Results | @type torperf 1.0 | recent | archive | format |
Each descriptor provided here contains an @type annotation using the format @type $descriptortype $major.$minor. Any tool that processes these descriptors may parse files without meta data or with an unknown descriptor type at its own risk, can safely parse files with known descriptor type and same major version number, and should not parse files with known descriptor type and higher major version number.
Relays and directory authorities publish relay descriptors, so that clients can select relays for their paths through the Tor network. All these relay descriptors are specified in the Tor directory protocol, version 3 specification document (or in the earlier protocol version 2 or version 1).
Server descriptors contain information that relays publish about themselves. Tor clients once downloaded this information, but now they use microdescriptors instead. The server descriptors in the descriptor archives contain one descriptor per file, whereas the recently published files contain all descriptors collected in an hour concatenated into a single file.
Extra-info descriptors contain relay information that Tor clients do not need in order to function. These are self-published, like server descriptors, but not downloaded by clients by default. The extra-info descriptors in the descriptor archives contain one descriptor per file, whereas the recently published files contain all descriptors collected in an hour concatenated into a single file.
Though Tor relays are decentralized, the directories that track the overall network are not. These central points are called directory authorities, and every hour they publish a document called a consensus, or network status document. The consensus is made up of router status entries containing flags, heuristics used for relay selection, etc.
The directory authorities exchange votes every hour to come up with a common consensus. Vote documents are by far the largest documents provided here.
The directory authorities sign votes and the consensus with their key that they publish in a key certificate. These key certificates change once every few months, so they are only available in a single descriptor archive tarball.
Tor clients used to download all server descriptors of active relays, but now they only download the smaller microdescriptors which are derived from server descriptors. The microdescriptor consensus lists all active relays and references their currently used microdescriptor. The descriptor archive tarballs contain both microdescriptor consensuses and referenced microdescriptors together.
Microdescriptors are minimalistic documents that just includes the information necessary for Tor clients to work. The descriptor archive tarballs contain both microdescriptor consensuses and referenced microdescriptors together. The microdescriptors in descriptor archive tarballs contain one descriptor per file, whereas the recently published files contain all descriptors collected in an hour concatenated into a single file.
Version 2 network statuses have been published by the directory authorities before consensuses have been introduced. In contrast to consensuses, each directory authority published their own authoritative view on the network, and clients combined these documents locally. We stopped archiving version 2 network statuses in 2012.
The first directory protocol version combined the list of active relays with server descriptors in a single directory document. We stopped archiving version 1 directories in 2007.
Bridges and the bridge authority publish bridge descriptors that are used by censored clients to connect to the Tor network. We cannot, however, make bridge descriptors available as we do with relay descriptors, because that would defeat the purpose of making bridges hard to enumerate for censors. We therefore sanitize bridge descriptors by removing all potentially identifying information and publish sanitized versions here. The sanitizing steps are as follows:
Sanitized bridge network statuses are similar to version 2 relay network statuses, but with only a published line in the header and without any lines in the footer. The format has changed over time to accomodate changes to the sanitizing process, with earlier versions being:
Bridge server descriptors follow the same format as relay server descriptors, except for the sanitizing steps described above. The bridge server descriptor archive tarballs contain one descriptor per file, whereas recently published bridge server descriptor files contain all descriptors collected in an hour concatenated into a single file to reduce the number of files. The format has changed over time to accomodate changes to the sanitizing process, with earlier versions being:
Bridge extra-info descriptors follow the same format as relay extra-info descriptors, except for the sanitizing steps described above. The format has changed over time to accomodate changes to the sanitizing process, with earlier versions being:
The bridge extra-info descriptor archive tarballs contain one descriptor per file, whereas recently published bridge extra-info descriptor files contain all descriptors collected in an hour concatenated into a single file to reduce the number of files.
Tor hidden services make it possible for users to hide their locations while offering various kinds of services, such as web publishing or an instant messaging server. A hidden service assembles a hidden service descriptor to make its service available in the network. This descriptor gets stored on hidden service directories and can be retrieved by hidden service clients. Hidden service descriptors are not formally archived, but some libraries support parsing these descriptors when obtaining them from a locally running Tor instance.
Hidden service descriptors contain all details that are necessary for clients to connect to a hidden service. Despite the version number being 1.0, these descriptors are part of the version 2 hidden service protocol.
The bridge distribution service BridgeDB publishes bridge pool assignments describing which bridges it has assigned to which distribution pool. BridgeDB receives bridge network statuses from the bridge authority, assigns these bridges to persistent distribution rings, and hands them out to bridge users. BridgeDB periodically dumps the list of running bridges with information about the rings, subrings, and file buckets to which they are assigned to a local file. The sanitized versions of these lists containing SHA-1 hashes of bridge fingerprints instead of the original fingerprints are available for statistical analysis.
The document below shows a BridgeDB pool assignment file from March 13, 2011. Every such file begins with a line containing the timestamp when BridgeDB wrote this file. Subsequent lines start with the SHA-1 hash of a bridge fingerprint, followed by ring, subring, and/or file bucket information. There are currently three distributor ring types in BridgeDB:
bridge-pool-assignment 2011-03-13 14:38:03 00b834117566035736fc6bd4ece950eace8e057a unallocated 00e923e7a8d87d28954fee7503e480f3a03ce4ee email port=443 flag=stable 0103bb5b00ad3102b2dbafe9ce709a0a7c1060e4 https ring=2 port=443 flag=stable [...]
As of December 8, 2014, bridge pool assignment files are no longer archived.
The exit list service TorDNSEL publishes exit lists containing the IP addresses of relays that it found when exiting through them.
Tor Check makes the list of known exits and corresponding exit IP addresses available in a specific format. The document below shows an entry of the exit list written on December 28, 2010 at 15:21:44 UTC. This entry means that the relay with fingerprint 63BA.. which published a descriptor at 07:35:55 and was contained in a version 2 network status from 08:10:11 uses two different IP addresses for exiting. The first address 91.102.152.236 was found in a test performed at 07:10:30. When looking at the corresponding server descriptor, one finds that this is also the IP address on which the relay accepts connections from inside the Tor network. A second test performed at 10:35:30 reveals that the relay also uses IP address 91.102.152.227 for exiting.
ExitNode 63BA28370F543D175173E414D5450590D73E22DC Published 2010-12-28 07:35:55 LastStatus 2010-12-28 08:10:11 ExitAddress 91.102.152.236 2010-12-28 07:10:30 ExitAddress 91.102.152.227 2010-12-28 10:35:30
The performance measurement service Torperf publishes performance data from making simple HTTP requests over the Tor network. Torperf uses a trivial SOCKS client to download files of various sizes over the Tor network and notes how long substeps take.
A Torperf results file contains a single line per Torperf run with key=value pairs. Such a result line is sufficient to learn about 1) the Tor and Torperf configuration, 2) measurement results, and 3) additional information that might help explain the results. Known keys are explained below.
Recently published Torperf measurement result files accumulate all new Torperf measurements of a given day, which means that they may change throughout the day. This is different from some of the other recently published files provided here which do not change once they are written.
There are multiple ways to download descriptors from this site. Of course, the obvious way is to browse the directories and download contained files using your browser. However, this method cannot be automated very well.
A more elaborate way to automatically download descriptors is to use Unix tools like wget which support recursively downloading files from this site. Example:
wget --recursive \ # turn on recursive retrieving --reject "index.html*" \ # don't retrieve directory listings --no-parent \ # don't ascend to parent directory --no-host-directories \ # don't generate host-prefixed directories --directory-prefix descriptors \ # set directory prefix https://collector.torproject.org/recent/relay-descriptors/microdescs/
Another automated way to download descriptors is to develop a tool that uses the provided index.json file or one of its compressed versions index.json.gz, index.json.bz2, or index.json.xz. These files contain a machine-readable representation of all descriptor files available on this site. Index files use the following custom JSON data format that might still be extended at a later time:
There are three parsing libraries available to facilitate processing the descriptors provided on this site:
If you're unclear which library to pick and if you're flexible regarding the programming language, be sure to look at the library comparison on the Stem website.
If you developed a descriptor parsing library for another language and want it to be listed here, please let us know!
A couple of applications have been developed and plenty of research papers have been written using the Tor network data provided here. The following list is not at all exhaustive:
If you wrote an application or research paper that uses Tor network data and that is not yet listed here, please let us know! Please include a short description what your application does or what your research was about.
If you have any questions about the Tor network data provided here, we'd like to hear from you! Of course, suggestions or other feedback are welcome, too.