In this tutorial, we explain how you can use
data tool to extract information about remote datasets, preview tabular data and download it. We assume you already have
data installed. If not, please, visit this page - https://datahub.io/docs/getting-started/installing-data.
For this tutorial, we'll use Global CO2 Emissions dataset from the DataHub:
Extract summary about a dataset
info command, you can easily extract summary information about the dataset from the given URL:
data info https://datahub.io/core/co2-fossil-global
which will print out README of the dataset + summary table about available resources:
| Name | Format | Size | Title | |-----------------------|--------|-------|-------| | validation_report | json | 511 | | | global_csv | csv | 6714 | | | global_json | json | 37857 | | | co2-fossil-global_zip | zip | 11080 | | | global | csv | 6453 | |
You can see that it has
global CSV file, derived CSV and JSON versions of it, a validation report and ZIP version of the dataset.
:::info Read more about derived CSV and JSON of a tabular data and ZIP version of the datasets: https://datahub.io/docs/features/auto-generated-csv-json-and-zip :::
Preview tabular data
global CSV file so we know how data looks like before downloading it. We can do it by using
:::info If you wonder how we constructed the above URL, read this docs about "r" links - https://datahub.io/docs/getting-started/getting-data#perma-urls-for-data. :::
data cat https://datahub.io/core/co2-fossil-global/r/global.csv
and it prints out a table so you can see the data.
Finally, download the dataset using
data get https://datahub.io/core/co2-fossil-global
which will save all available files in
./core/co2-fossil-global directory. If you run
tree core/co2-fossil-global/, you'd see the following output:
core/co2-fossil-global/ ├── README.md ├── archive │ └── global.csv ├── data │ ├── global_csv.csv │ ├── global_json.json │ └── validation_report.json └── datapackage.json 2 directories, 6 files
You can find original data in
archive directory, while
data directory contains all derived files. If you don't know what is
datapackage.json, please, read through this document - https://datahub.io/docs/data-packages#datapackagejson.
We hope this tutorial helps you to get the most of the
data tool. If you experience any bugs or have suggestions on improvements, feel free to open an issue at https://github.com/datahq/datahub-qa/issues.