The HTRU Medlat Training Data set is a collection of labeled pulsar candidates from the intermediate galactic latitude part of the HTRU survey. It was assembled to train the SPINN pulsar classifier described in:
SPINN: a straightforward machine learning solution to the pulsar candidate selection problem
http://arxiv.org/abs/1406.3627
This dataset contains precisely 1,196 known pulsar candidates from 521 distinct sources, and 89,996 non-pulsar candidates. The file format used is Pulsar Hunter Candidate XML, or PHCX, which is human-readable XML with some data arrays (folded profile, sub-bands, sub-integrations) encoded in hexadecimal strings.
The dataset is approximately 2GB in size and can be directly downloaded from the following locations:
Google Drive
The Python script phcx.py found in the main data folder contains useful code for reading and plotting candidate files:
You may freely copy and edit the code provided in phcx.py to suit your needs, and integrate any part of it into your own work. The Candidate class provided has the following attributes:
Attribute name | Attribute description |
candidate.snr | Best folded signal-to-noise ratio returned by PDMP (part of the PSRCHIVE software package) |
candidate.topo_period | Best topocentric period returned by PDMP |
candidate.bary_period | Best barycentric period returned by PDMP |
candidate.width | Best pulse width returned by PDMP, expressed as a fraction of the barycentric period |
candidate.dm | Best dispersion measure in cm-3.pc returned by PDMP |
candidate.accn | Best acceleration value in m/s/s returned by the PEASOUP pulsar search software |
candidate.rajd | Right Ascension (J2000) in degrees |
candidate.decjd | Declination (J2000) in degrees |
candidate.fftsnr | Best FFT signal-to-noise ratio returned by PEASOUP |
candidate.profile | Folded profile at best candidate parameters (Period, DM, acceleration) found by PDMP |
candidate.subints | Two dimensional array containing the candidate sub-integrations, also called phase-time diagram. Note that every sub-integration has been individually normalized by PDMP to values between 0 and 1. |
candidate.subbands | Two dimensional array containing the candidate sub-bands, or phase-frequency diagram. Note that every sub-band has been individually normalized by PDMP to values between 0 and 1. |
candidate.dm_curve | Tuple of arrays (DmValues, SnrValues) representing the evolution of the FFT signal-to-noise ratio found by PEASOUP as a function of trial DM values. Only points with S/N > 6 are given. |
candidate.accn_curve | Tuple of arrays (AccnValues, SnrValues) representing the evolution of the FFT signal-to-noise ratio found by PEASOUP as a function of trial acceleration values. Only points with S/N > 6 are given. |
candidate.rank | Rank of the candidate within the beam it was found, as returned by PEASOUP. The brightest candidate in a beam has a rank of 0. |
candidate.hits | Number of (DM, Acceleration) trial pairs at which the candidate, or any of its harmonics, was found with FFT S/N > 6 by PEASOUP |
candidate.pdm_plane | Tuple of arrays (PeriodCorrections, DmValues, SnrValues) representing the evolution of the folded signal-to-noise ratio found by PDMP as a function of a grid of trial delta-period and DM values. |
Coming soon.