1 - Overview

Goals of the Project

What is it?

The Internet History Initiative explores decentralized, community-driven approaches to the preservation, curation, and analysis of Internet History: the accumulated corpus of historical network measurements from many sources that document the day-to-day evolution of today’s global Internet over many decades.

Internet History at this level is not about Web content in particular; it’s about the history of the infrastructure that supports all Internet applications, starting with fundamental infrastructure (BGP routing, DNS, traceroute measurement) and ascending over time to capture the emergence of the Internet’s overlay networks (CDNs, P2P content, privacy-protecting services, federated social media).

Why do we need it?

The Internet underpins much of modern civilization, yet researchers from beyond the Internet’s core technical community have never had good access to well-interpreted data about the evolution of the regional and global Internet over time. If we can put quantitative tools and timeseries into the hands of the relevant researchers who study (for example) violence, employment, public health, and political science, we can help tackle bigger research questions collaboratively.

What are the goals?

There are multiple levels to the Internet History challenge:

  • Preservation of data sources that may only be hosted by a single institution
  • Curating datasets that describe not only the past structure of the Internet, but the kinds of functions that Internet hosts were providing at the time
  • Extending recorded history with interpretative derived data sets that are useful to researchers beyond the Internet technical community
  • Building tools and visualizations that provide interpretation of Internet history for nontechnical audiences

Where should I go next?

  • Sign up: Let us know of your interest in Internet History
  • Follow IHI: Connect to the initiative on Mastodon
  • Read More: Peruse the blogs that set this initiative in motion

2 - Sources

A growing list of the public sources for datasets relevant to the Internet History Initiative.

These sections are a collection of minimally-documented deep links into public repositories maintained by their originating institutions. Their inclusion in this index does not imply endorsement of this project by any of these institutions.

This rough index is supplied purely as a convenience for researchers, in preparation for constructing a more carefully curated index (and potential mirrored storage) to follow.

Please consult the original sources for complete documentation, and abide by their original acceptable use policies and licenses in your use of these datasets. Finally, please support these organizations and let them know that you appreciate their continuing stewardship and research over the years!

2.1 - BGP Sources

Public sources of historical BGP data, generally RIBs and Updates.

2.1.1 - Lost Sources of BGP Data

Some sources of BGP RIBs and updates are known to have existed, but no known copy exists. If you have a copy of any of these, please share!

Renesys (2000-2014)

Internet2 backbone (2012-2014?)

2.1.2 - Packet Clearinghouse

“Research Packet Clearing House investigates technological, economic, and policy issues in areas related to Internet traffic exchange.

“PCH’s longest-running research project is the Internet Routing Topology Archive, a database of Internet topology measurements begun in 1997. This archive of routing data from all major and many minor Internet provider networks is available to academic and commercial researchers and the operations community, to aid in the understanding of the dynamic nature and topology of the Internet.

“Other topics of ongoing research include the economic impact of local traffic exchange in developing nations, inter-provider notifications and communication, and database schemas for topological data.

“Packet Clearing House facilitates research, instruments the Internet, collects, archives, and disseminates information and data, and creates a climate conducive to analytical examination of all aspects of Internet topology, operational practice, and economics. Although we have collected and maintain the world’s largest database of Internet routing information, we prefer to put researchers in academia together with the data that they need, rather than analyze all the data we collect ourselves - a task beyond the capacity of any one organization. We believe that, by facilitating partnerships between industry and academia and enabling communication between the two communities, we can achieve a more enduring beneficial effect.”

CollectorURLLocationStart DateNotes

2.1.3 - RIPE Routing Information Service

“The RIPE Routing Information Service (RIS) is a RIPE NCC service. With the help of network operators all over the world, RIS employs a globally distributed set of Remote Route Collectors (RRCs), typically located at Internet Exchange Points, to collect and store Internet routing data. Volunteers peer with the RRCs using the BGP protocol and RIS stores the update and withdraw messages. RIS data can be accessed via:

  • RIPEstat, the “one-stop shop” for all available information about Internet number resources. RIPEstat uses individual widgets to display routing and other information;
  • RIS Live, a real time BGP streaming API allowing server-side filtering of BGP messages by prefix or autonomous system;
  • RIS Raw Data, available for each route collector, with state dumps and batches of updates made available periodically;
  • RISwhois, that searches the latest RIS data for details of an IP address using a plaintext “whois”-style interface. It is useful when querying RIS data using scripts.”

Terms of Service

CollectorURLLocationStart DateNotes
rrc00https://data.ris.ripe.net/rrc00/RIPE-NCC Multihop, Amsterdam03 Sep 1999
rrc01https://data.ris.ripe.net/rrc01/LINX / LONAP, London27 Jul 2000
rrc02https://data.ris.ripe.net/rrc02/retired24 Mar 2001ends 2 Oct 2008
rrc03https://data.ris.ripe.net/rrc03/AMS-IX / NL-IX, Amsterdam17 Jan 2001
rrc04https://data.ris.ripe.net/rrc04/CIXP, Geneva04 Apr 2001
rrc05https://data.ris.ripe.net/rrc05/VIX, Vienna13 Jun 2001
rrc06https://data.ris.ripe.net/rrc06/DIX-IE / JPIX, Tokyo30 Aug 2001
rrc07https://data.ris.ripe.net/rrc07/Netnod, Stockholm04 Apr 2002
rrc08https://data.ris.ripe.net/rrc08/retired07 May 2002ends 2 Sep 2004
rrc09https://data.ris.ripe.net/rrc09/retired10 May 2003ends 4 Feb 2004
rrc10https://data.ris.ripe.net/rrc10/MIX, Milan10 May 2003
rrc11https://data.ris.ripe.net/rrc11/NYIIX, New York City13 Feb 2004
rrc12https://data.ris.ripe.net/rrc12/DE-CIX, Frankfurt06 Jul 2004
rrc13https://data.ris.ripe.net/rrc13/MSK-IX, Moscow24 May 2005
rrc14https://data.ris.ripe.net/rrc14/PAIX, Palo Alto01 Jan 2005
rrc15https://data.ris.ripe.net/rrc15/PTTMetro, Sao Paulo14 Dec 2005
rrc16https://data.ris.ripe.net/rrc16/NOTA, Miami01 Feb 2008
rrc18https://data.ris.ripe.net/rrc18/Catnix, Barcelona04 Nov 2015
rrc19https://data.ris.ripe.net/rrc19/NAP Africa JB, Johannesburg28 Jan 2016
rrc20https://data.ris.ripe.net/rrc20/SwissIX, Zurich04 Nov 2015
rrc21https://data.ris.ripe.net/rrc21/France-IX, Paris04 Nov 2015
rrc22https://data.ris.ripe.net/rrc22/InterLAN, Bucharest22 Dec 2017RIBs only before 08 Jan 2018
rrc23https://data.ris.ripe.net/rrc23/Equinix SG, Singapore22 Dec 2017RIBs only before 08 Jan 2018
rrc24https://data.ris.ripe.net/rrc24/LACNIC Multihop, Montevideo22 Feb 2019
rrc25https://data.ris.ripe.net/rrc25/RIPE-NCC Multihop, Amsterdam18 Feb 2021
rrc26https://data.ris.ripe.net/rrc26/UAE-IX, Dubai1 Jul 2021

2.1.4 - Oregon Routeviews

“The University’s RouteViews project was initially conceived as a tool for Internet operators to obtain real-time information about the global routing system from the perspectives of several different backbones and locations around the Internet. Although other tools handle related tasks, such as the various Looking Glass Collections (see e.g., TRACEROUTE.ORG), they typically either provide only a constrained view of the routing system (e.g., either a single provider or the route server) or they do not provide real-time access to routing data.

“While the RouteViews project was initially motivated by interest on the part of operators in determining how the global routing system viewed their prefixes and/or AS space, there have been many other interesting uses of this RouteViews data. For example, NLANR has used RouteViews data for AS path visualization and to study IPv4 address space utilization (archive). Others have used RouteViews data to map IP addresses to origin AS for various topological studies. CAIDA has used it in conjunction with the NetGeo database in generating geographic locations for hosts, functionality that both CoralReef and the Skitter project support.”

CollectorURLLocationStart DateNotes

“Note: MRT RIB and UPDATE files have internal timestamps in the standard Unix format, however the file names are constructed based on the time zone setting of the collector. The collectors had their time zones set to Pacific Time prior to Feb 3, 2003 at approximately 19:00 UTC. At that time all but one of the existing collectors had their time zones reset to UTC. The one exception was routeviews.eqix which was not reset to UTC until Feb 1, 2006 at approximately 21:00 UTC.”

2.2 - DNS Data Sources

Public sources of historical DNS data of various sorts. Formats may vary considerably.

2.2.1 - CAIDA ARK DNS Names

“This public dataset contains all IPv4 measurements from Archipelago (Ark) that are older than approximately one year, and all IPv6 measurements (upto the present).”

“The IPv4 Routed /24 DNS Names Dataset provides fully-qualified domain names for IP addresses seen in the traces of the IPv4 Routed /24 Topology Dataset.

“DNS names are useful for obtaining additional information about routers and hosts making up the Internet topology. For example, DNS names of routers often encode the link type (backbone vs. access), link capacity, Point of Presence (PoP), and geographic location. We have DNS Names data starting March 2008.”

2.2.2 - RIPE Reverse DNS Zones

“This dataset consists of a daily snapshot of the reverse zones (in-addr.arpa and ip6.arpa) as published by the RIRs on their public FTP servers in the /pub/zones/ directory. The RIRs use this data operationally. The zones under in-addr.arpa and ip6.arpa are assembled from these files.”

2.3 - Internet Exchange Point Data

Mirrored public sources of historical data about peering and IXPs. Formats may vary considerably.

2.3.1 - CAIDA PeeringDB Dumps

“This dataset is a repository of daily snapshots of historic PeeringDB data. The repository consists of two parts, version 1 and version 2.

“The old v1 format is available from July 29, 2010 through March 13, 2016 as sql and sqlite files. The new v2 format is available from May 27, 2016 to March 10, 2018 as sqlite files, and from March 11, 2016 onwards as json files.”

2.4 - Traceroute Sources

Mirrored public sources of historical traceroute data. Formats may vary considerably.

2.4.1 - CAIDA ARK Prefix Probing

“This public dataset contains all IPv4 measurements from Archipelago (Ark) that are older than approximately one year, and all IPv6 measurements (upto the present).”

2.4.2 - RIPE Atlas Daily Dumps

“RIPE Atlas measurement data is always available via the API, but many distinct measurements run every single day and aggregating the results manually can be a pain. We collect a rich set of measurement data and, as good citizens, we don’t want to generate more measurement traffic than is necessary. It’s important that we make this data as easy to retrieve as possible.

And you all want a ton of data, right? Right.

Last year, we prototyped a service to expose this data for around one month from the day of collection. The feedback has all been positive, so we’ve worked on making it more permanent.”