JSON API data interface
Contents
What is JSON data?
JSON data is a data transfer syntax from a data provider to a data consumer.
See also: JSON Introduction
What is the access URL?
This access URL: https://api.genome.ucsc.edu/ is used to access
the endpoint functions. For example:
wget -O- 'https://api.genome.ucsc.edu/list/publicHubs'
What type of data can be accessed?
The following data sets can be accessed at this time:
- List of available public hubs
- List of available UCSC Genome Browser genome assemblies
- List of genomes from a specified assembly or track hub
- List of available data tracks from a specified hub or UCSC Genome Browser genome assembly
(see also: track definition help)
- List of chromosomes contained in an assembly hub or UCSC Genome Browser genome assembly
- List of chromosomes contained in a specific track of an assembly or track hub, or UCSC Genome
Browser genome assembly
- Return DNA sequence from an assembly hub 2bit file, or UCSC Genome Browser assembly
- Return track data from a specified assembly or track hub, or UCSC Genome Browser assembly
Endpoint functions to return data
The URL https://api.genome.ucsc.edu/ is used to access
the endpoint functions. For example:
curl -L 'https://api.genome.ucsc.edu/list/ucscGenomes'
- /list/publicHubs - list public hubs
- /list/ucscGenomes - list UCSC Genome Browser database genomes
- /list/hubGenomes - list genomes from specified hub
- /list/tracks - list data tracks available in specified hub or database genome
(see also: track definition help)
- /list/chromosomes - list chromosomes from a data track in specified hub or database
genome
- /getData/sequence - return sequence from specified hub or database genome
- /getData/track - return data from specified track in hub or database genome
Parameters to endpoint functions
- hubUrl=<url> - specify track hub or assembly hub URL
- genome=<name> - specify genome assembly in UCSC Genome Browser or track/assembly hub
- track=<trackName> - specify data track in track/assembly hub or UCSC database genome
assembly
- chrom=<chrN> - specify chromosome name for sequence or track data
- start=<123> - specify start coordinate (0 relative) for data from track or sequence
retrieval (start and end required together). See also: UCSC browser coordinate counting systems
- end=<456> - specify end coordinate (1 relative) for data from track or sequence
retrieval (start and end required together). See also: UCSC browser coordinate counting systems
- maxItemsOutput=1000 - limit number of items to output, default: 1,000, maximum limit:
1,000,000 (use -1 to get maximum output)
- trackLeavesOnly=1 - on /list/tracks function, only show tracks, do not show
composite container information
- jsonOutputArrays=1 - on /getData/track function, JSON format is array type
for each item of data, instead of the default object type
The parameters are added to the endpoint URL beginning with a
question mark ?, and multiple parameters are separated with
the semi-colon ;. For example:
https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chrM
Required and optional parameters
Endpoint function | Required | Optional |
/list/publicHubs | (none) | (none) |
/list/ucscGenomes | (none) | (none) |
/list/hubGenomes | hubUrl | (none) |
/list/tracks | genome or (hubUrl and genome) | trackLeavesOnly=1 |
/list/chromosomes | genome or (hubUrl and genome) | track |
/getData/sequence | (genome or (hubUrl and genome)) and chrom | start and
end |
/getData/track | (genome or (hubUrl and genome)) and track | chrom,
(start and end), maxItemsOutput, jsonOutputArrays |
The hubUrl and genome parameters are required together to
specify a unique genome in an assembly or track hub. The genome for
a track hub will usually be a UCSC database genome. Assembly hubs will
have their own unique genome sequences. Specify genome without
a hubUrl to refer to a UCSC Genome Browser assembly.
Using the chrom=<name> parameter will limit the request
to the single specified chromosome. To limit the request to a specific
position, both start=4321 and end=5678 must be given together.
Any extra parameters not allowed in a function will be flagged as an error.
Supported track types for getData functions
Example data access
Your WEB browser can be configured to interpret JSON data and format
in a convenient browsing format. Firefox has this function built in,
other browsers have add-ons that can be turned on to format JSON data.
With your browser thus configured, the following links can demonstrate
the functions of the API interface.
Listing functions
- list public hubs -
api.genome.ucsc.edu/list/publicHubs
- list UCSC database genomes -
api.genome.ucsc.edu/list/ucscGenomes
- list genomes from specified hub -
api.genome.ucsc.edu/list/hubGenomes?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt
- list tracks from specified hub and genome -
api.genome.ucsc.edu/list/tracks?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ
- list tracks from UCSC database genome -
api.genome.ucsc.edu/list/tracks?genome=hg38
- list chromosomes from UCSC database genome -
api.genome.ucsc.edu/list/chromosomes?genome=hg38
- list chromosomes from specified track in UCSC database genome -
api.genome.ucsc.edu/list/chromosomes?genome=hg38;track=gold
- list chromosomes from assembly hub genome -
api.genome.ucsc.edu/list/chromosomes?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ
- list chromosomes from specified track in assembly hub genome -
api.genome.ucsc.edu/list/chromosomes?hubUrl=hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=assembly
getData functions
- Get DNA sequence from specified chromosome in UCSC database genome -
api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chrM
- Get DNA sequence from specified chromosome and start,end coordinates in UCSC database genome -
api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chrM;start=4321;end=5678
- Get DNA sequence from a track hub where 'genome' is a UCSC database -
api.genome.ucsc.edu/getData/sequence?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=mm10;chrom=chrM;start=4321;end=5678
- Get DNA sequence from specified chromosome and start,end coordinates in an assembly hub genome -
api.genome.ucsc.edu/getData/sequence?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;chrom=chr1;start=4321;end=5678
- Get track data for specified track in UCSC database genome -
api.genome.ucsc.edu/getData/track?genome=hg38;track=gold;maxItemsOutput=100
- Get track data for specified track and chromosome in UCSC database genome -
api.genome.ucsc.edu/getData/track?genome=hg38;track=gold;chrom=chrM
- Get track data for specified track, chromosome and start,end coordinates in UCSC database genome -
api.genome.ucsc.edu/getData/track?genome=hg38;track=gold;chrom=chr1;start=47000;end=48000
- Get track data for specified track in an assembly hub genome -
api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=assembly
- Get track data for specified track and chromosome in an assembly hub genome -
api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=assembly;chrom=chr1
- Get track data for specified track in a track hub -
api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=ensGene
- Get track data for specified track and chromosome in a track hub -
api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=ensGene;chrom=chr1
- Wiggle track data for specified track, chromosome with start and end limits in an assembly hub genome -
api.genome.ucsc.edu/getData/track?hubUrl=hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=gc5Base;chrom=chr1;start=4321;end=5678
- Wiggle track data for specified track in a UCSC database genome -
api.genome.ucsc.edu/getData/track?genome=galGal6;track=gc5BaseBw;maxItemsOutput=100
- bigBed data from a UCSC database, chrom and start,end limits -
api.genome.ucsc.edu/getData/track?genome=galGal6;track=ncbiRefSeqOther;chrom=chr1;start=750000;end=55700000
Error return examples
- Request track data for non-existent chromosome in an assembly hub genome -
api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=assembly;chrom=chrI;start=43521;end=54321
- Request track data from a restricted track. See FAQ -
api.genome.ucsc.edu/getData/track?genome=hg19;track=decipherSnvs
Practical examples
Looking up the schema of a specific track
When querying track data with the /getData/track function, the jsonOutputArrays
can be used in conjunction to see the track schema. This includes a description of each field
present in the track. The data will also be returned in JSON array type.
Request data from hg38 gold track in array type -
api.genome.ucsc.edu/getData/track?genome=hg38;track=gold;chrom=chrM;jsonOutputArrays=1
Hide track container information with trackLeavesOnly parameter
When using the /list/tracks function to see the available tracks in an assembly, it can
be useful to return all tracks in the same hierarchical level. By default, composite and
supertracks will have the subtracks nested below, however, the trackLeavesOnly=1 parameter
can be passed to hide the container information and display all tracks and subtracks at the
same level.
In the following example, the first link does not include the trackLeavesOnly parameter. The
output can be compared to the second link to see the difference, which can be observed in the
conservation track. In the first link, the multiz20way track is nested within the
cons20way track. In the second link, however, the multiz20way subtrack is seen at
an equivalent level with all other tracks, and the container, cons20way, is not
present in the list.
Request available tracks in the rn6 genome -
api.genome.ucsc.edu/list/tracks?genome=rn6
Request available tracks in the rn6 genome, hiding container information -
api.genome.ucsc.edu/list/tracks?genome=rn6;trackLeavesOnly=1
Requesting track data with over one million (1M) items in output
Certain tracks may contain over 1M items. When these tracks are queried using the
/getData/track function, only the first million items are returned. The API assumes
this default value of 1M unless a different value (less than 1M) is specified with the
parameter maxItemsOutput.
One of these tracks is the knownGene track for hg19. Removing the maxItemsOutput
parameter from the following link will lead to a 384Mb download, and may cause certain
web browsers to time out.
Request items in knownGene track of hg19, remove maxItemsOutput parameter for 1M max return -
api.genome.ucsc.edu/getData/track?genome=hg19;track=knownGene;maxItemsOutput=5
There are different ways around this item limit, depending on how many items are in the track. For
the knownGene track, breaking it down to component chromosome queries using the chrom
parameter will suffice. In order to get a listing of the chrom names, and what chroms have data
for that track, the /list/chromosomes function can be used.
Request listing of chroms that have data for the knownGene track in hg19 -
api.genome.ucsc.edu/list/chromosomes?genome=hg19;track=knownGene
With the list of chrom names that have data, the /getData/track function can be used
again while specifying the chrom parameter. In the following example, chr1 is queried
and the itemsReturned field shows a total of 7967 items in the output, well below the
1M limit, meaning all data for chr1 has been extracted. This can then be repeated for all
chroms of interest.
Request items in knownGene track of hg19, only for chr1 -
api.genome.ucsc.edu/getData/track?genome=hg19;track=knownGene;chrom=chr1
For tracks that have additional items, such as SNP tracks, the query can be further broken
down using the additional start and end parameters.