Linked Open Data

An overview of the Linked Open Data datasets.

A group for Linked Open Data datasets. The initial import of data for this group was done in October 2009 from the list of RDF datasets dumps provided by the W3C Linked Open Data Interest Group.


DBpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data.

The English version of the DBpedia knowledge base currently describes 6.0M entities of which 4.6M have abstracts, 1.53M have geo coordinates and 1.6M depictions. In total, 5.2M resources are classified in a consistent ontology, consisting of 1.5M persons, 810K places (including 505K populated places), 490K works (including 135K music albums, 106K films and 20K video games), 275K organizations (including 67K companies and 53K educational institutions), 301K species and 5K diseases. The total number of resources in English DBpedia is 16.9M that, besides the 6.0M resources, includes 1.7M skos concepts (categories), 7.3M redirect pages, 260K disambiguation pages and 1.7M intermediate nodes.

Altogether the DBpedia 2016-04 release consists of 9.5 billion (2015-10: 8.8 billion) pieces of information (RDF triples) out of which 1.3 billion (2015-10: 1.1 billion) were extracted from the English edition of Wikipedia, 5.0 billion (2015-04: 4.4 billion) were extracted from other language editions and 3.2 billion (2015-10: 3.2 billion) from DBpedia Commons and Wikidata. In general, we observed a growth in mapping-based statements of about 2%.

Thorough statistics can be found on the DBpedia website and general information on the DBpedia datasets here.


Linguistic Data Size of dump and dataset: ~40MB

Download dump: CC-BY-SA 3.0 license

The web service additionally provides some parts that are not fully open, e.g. English language names taken from the Ethnologue Language Codes database are subject to specific redistribution conditions. Please, see lexvo dataset.

General Multilingual Environmental Thesaurus

A thesaurus in 20+ languages for terms related to the environment and environmental data. Published by the European Environment Agency.

Available in RDF without reuse constraints.



"Freebase is an open database of the world?s information. It is built by the community and for the community?free for anyone to query, contribute to, built applications on top of, or integrate into their websites."

Openness: OPEN

  • License: CC-BY for data. Variety of open source licenses (or PD) for text blurbs & images.
  • Access: API + bulk.
    • bulk: yes.
    • api: yes.

Triple count and link statistics provided by Freebase contributor Tom Morris

RDF data and URIs

Freebase has an RDF service that exposes URIs and generates RDF descriptions for all Freebase topics.

Linked Railway Data Project

Bringing together data on the United Kingdom's railway network under Linked data principles. Please, see dataset.

This effort has started in the context of the Linked open data community project of the Semantic Web Education and Outreach Interest Group.

Its main purpose is to make available freely available data concerning music on the semantic-web, such as Magnatune, Jamendo, Dogmazic, Mutopia, and to create links between them and other available semantic web repositories, such as Frederick Giasson's Musicbrainz dump, or DBPedia and GeoNames." Please, see DBTune – Music-Related RDF.


Schema.org markup schema to provide structured metadata for websites and beyond. Linked data variant provided by http://schema.rdfs.org/ Please, see schema.org dataset

The CIA World Factbook

US government profiles of countries and territories around the world. Information on geography, people, government, transportation, economy, communications, etc.

Please, see The CIA World Factbook dataset

U.S. Census

2000 U.S. Census converted into over a billion RDF triples.

Population statistics at various geographic levels, from the U.S. as a whole, down through states, counties, sub-counties (roughly, cities and incorporated towns)

Notes: also found in the of SPARQL Endpoints.

Please, see U.S. Census dataset

SIMILE Data Collection

Various data sets including CIA's World Factbook, Library of Congress' Thesaurus of Graphic Materials, National Cancer Institute's cancer thesaurus, Web Consortium's Technical Reports

Please, see SIMILE Data Collection


From home page:

OpenVocab is a community maintained vocabulary intended for use on the Semantic Web
OpenVocab is ideal for properties and classes that don't warrant the effort of creating or maintaining a full schema. OpenVocab allows anyone to create and modify vocabulary terms using their web browser. Each term is described using appropriate elements of RDF, RDFS and OWL. OpenVocab allows you to create any properties and classes; assign labels, comments and descriptions; declare domains and ranges and much more.

Please, see OpenVocab dataset

The Mondial Database

From home page:

The MONDIAL database has been compiled from geographical Web data sources listed below:

  • CIA World Factbook,
  • a predecessor of Global Statistics which has been collected by Johan van der Heijden.
  • additional textual sources for coordinates,
  • the International Atlas by Kümmerly & Frey, Rand McNally, and Westermann,
  • and some geographical data of the Karlsruhe TERRA database.

Please, see The Mondial Database

Linked ISO 3166-2 Data

Linked ISO 3166-2 Data. ISO-3166-2 gives codes for countries and their principal subdivisions.

Please, see Linked ISO 3166-2 Data

Historical Events Markup Language

Historical Event Markup and Linked Project (Heml) provides an XML schema for historical events and a Java Web app which transforms conforming documents into hyperlinked timelines, maps and tables. It aims to provide a most information-rich interchange format for historical data, and thus add a historical component to the growing movement for a 'Semantic Web.'

For our purposes main item of interest is the data provided with the project available from: . About 30 timelines in the form of RDF files on historical themes ranging from Alexander the Great to the Meiji Restoration.

Please, see Historical Events Markup Language

A Short Biographical Dictionary of English Literature (RKBExplorer)


This repository contains data processed by Hugh Glaser as a birthday present to Ian Davis.

It is a very quick and dirty job of processing the data provided by Project Gutenberg from A Short Biographical Dictionary of English Literature by John W. Cousin.

Please, see A Short Biographical Dictionary of English Literature (RKBExplorer)

Billion Triples Challenge Dataset 2008

  • Data exposed: various dumps
  • Size of dump and data set: 1 billion triples

Please, see Billion Triples Challenge Dataset 2008