HTRU Medlat Training Data

Description

The HTRU Medlat Training Data set is a collection of labeled pulsar candidates from the intermediate galactic latitude part of the HTRU survey. It was assembled to train the SPINN pulsar classifier described in:

SPINN: a straightforward machine learning solution to the pulsar candidate selection problem
V. Morello, E.D. Barr, M. Bailes, C.M. Flynn, E.F. Keane and W. van Straten
http://arxiv.org/abs/1406.3627

This dataset contains precisely 1,196 known pulsar candidates from 521 distinct sources, and 89,996 non-pulsar candidates. The file format used is Pulsar Hunter Candidate XML, or PHCX, which is human-readable XML with some data arrays (folded profile, sub-bands, sub-integrations) encoded in hexadecimal strings.

Getting the data

The dataset is approximately 2GB in size and can be directly downloaded from the following locations:
Google Drive

Reading and using the data

The Python script phcx.py found in the main data folder contains useful code for reading and plotting candidate files:

You may freely copy and edit the code provided in phcx.py to suit your needs, and integrate any part of it into your own work. The Candidate class provided has the following attributes:

Attribute name Attribute description
candidate.snr Best folded signal-to-noise ratio returned by PDMP (part of the PSRCHIVE software package)
candidate.topo_period Best topocentric period returned by PDMP
candidate.bary_period Best barycentric period returned by PDMP
candidate.width Best pulse width returned by PDMP, expressed as a fraction of the barycentric period
candidate.dm Best dispersion measure in cm-3.pc returned by PDMP
candidate.accn Best acceleration value in m/s/s returned by the PEASOUP pulsar search software
candidate.rajd Right Ascension (J2000) in degrees
candidate.decjd Declination (J2000) in degrees
candidate.fftsnr Best FFT signal-to-noise ratio returned by PEASOUP
candidate.profile Folded profile at best candidate parameters (Period, DM, acceleration) found by PDMP
candidate.subints Two dimensional array containing the candidate sub-integrations, also called phase-time diagram. Note that every sub-integration has been individually normalized by PDMP to values between 0 and 1.
candidate.subbands Two dimensional array containing the candidate sub-bands, or phase-frequency diagram. Note that every sub-band has been individually normalized by PDMP to values between 0 and 1.
candidate.dm_curve Tuple of arrays (DmValues, SnrValues) representing the evolution of the FFT signal-to-noise ratio found by PEASOUP as a function of trial DM values. Only points with S/N > 6 are given.
candidate.accn_curve Tuple of arrays (AccnValues, SnrValues) representing the evolution of the FFT signal-to-noise ratio found by PEASOUP as a function of trial acceleration values. Only points with S/N > 6 are given.
candidate.rank Rank of the candidate within the beam it was found, as returned by PEASOUP. The brightest candidate in a beam has a rank of 0.
candidate.hits Number of (DM, Acceleration) trial pairs at which the candidate, or any of its harmonics, was found with FFT S/N > 6 by PEASOUP
candidate.pdm_plane Tuple of arrays (PeriodCorrections, DmValues, SnrValues) representing the evolution of the folded signal-to-noise ratio found by PDMP as a function of a grid of trial delta-period and DM values.

Example plots and code snippets

Coming soon.