Scan

class pylidc.Scan(**kwargs)[source]

The Scan model class refers to the top-level XML file from the LIDC. A scan has many pylidc.Annotation objects, which correspond to the unblindedReadNodule XML attributes for the scan.

study_instance_uid

string – DICOM attribute (0020,000D).

series_instance_uid

string – DICOM attribute (0020,000E).

patient_id

string – Identifier of the form “LIDC-IDRI-dddd” where dddd is a string of integers.

slice_thickness

float – DICOM attribute (0018,0050). Note that this may not be equal to the slice_spacing attribute (see below).

slice_zvals

ndarray – The “z-values” for the slices of the scan (i.e., the last coordinate of the ImagePositionPatient DICOM attribute) as a NumPy array sorted in increasing order.

slice_spacing

float – This computed property is the median of the difference between the slice coordinates, i.e., scan.slice_zvals.

Note

This attribute is typically (but not always!) the same as the slice_thickness attribute. Furthermore, the slice_spacing does NOT necessarily imply that all the slices are spaced with spacing (although they often are).

pixel_spacing

float – Dicom attribute (0028,0030). This is normally two values. All scans in the LIDC have equal resolutions in the transverse plane, so only one value is used here.

contrast_used

bool – If the DICOM file for the scan had any Contrast tag, this is marked as True.

is_from_initial

bool – Indicates whether or not this PatientID was tagged as part of the initial 399 release.

sorted_dicom_file_names

string – This attribute is no longer used, and can be ignored.

Example

A short example of Scan class usage:

import pylidc as pl

scans = pl.query(pl.Scan).filter(pl.Scan.slice_thickness <= 1)
print(scans.count())
# => 97

scan = scans.first()
print(scan.patient_id,
      scan.pixel_spacing,
      scan.slice_thickness,
      scan.slice_spacing)
# => LIDC-IDRI-0066, 0.63671875, 0.6, 0.5

print(len(scan.annotations))
# => 11
cluster_annotations(metric='min', tol=None, factor=0.9, min_tol=0.1, return_distance_matrix=False, verbose=True)[source]

Estimate which annotations refer to the same physical nodule in the CT scan. This method clusters all nodule Annotations for a Scan by computing a distance measure between the annotations.

Parameters:
  • metric (string or callable, default 'min') –

    If string, see:

    from pylidc.annotation_distance_metrics import
    print(metrics metrics.keys())
    

    for available metrics. If callable, the provided function, should take two Annotation objects and return a float, i.e., isinstance( metric(ann1, ann2), float ).

  • tol (float, default=None) – A distance in millimeters. Annotations are grouped when the minimum distance between their boundary contour points is less than tol. If tol = None (the default), then tol = scan.pixel_spacing is used.
  • factor (float, default=0.9) – If tol resulted in any group of annotations with more than 4 Annotations, then tol is multiplied by factor and the grouping is performed again.
  • min_tol (float, default=0.1) – If tol is reduced below min_tol (see the factor parameter), then the routine exits because we conclude that the annotation groups cannot be automatically reduced to have groups with each group having Annotations<=4 (as expected with LIDC data).
  • return_distance_matrix (bool, default False) – Optionally return the distance matrix that was used to produce the clusters.
  • verbose (bool, default=True) – If True, a warning message is printed when tol < min_tol occurs.
Returns:

clustersclusters[i] is a list of pylidc.Annotation objects that refer to the same physical nodule in the Scan. len(clusters) estimates the number of unique physical nodules in the Scan.

Return type:

list of lists.

Note

The “distance” matrix, d[i,j], between all Annotations for the Scan is first computed using the provided metric parameter. Annotations are said to be adjacent when d[i,j]<=tol. Annotation groups are determined by finding the connected components of the graph associated with this adjacency matrix.

Example

An example:

import pylidc as pl

scan = pl.query(pl.Scan).first()
nodules = scan.cluster_annotations()

print("This can has %d nodules." % len(nodules))
# => This can has 4 nodules.

for i,nod in enumerate(nodules):
    print("Nodule %d has %d annotations." % (i+1,len(nod)))
# => Nodule 1 has 4 annotations.
# => Nodule 2 has 4 annotations.
# => Nodule 3 has 1 annotations.
# => Nodule 4 has 4 annotations.
get_path_to_dicom_files()[source]

Get the path to where the DICOM files are stored for this scan, relative to the root path set in the pylidc configuration file (i.e., ~/.pylidc in MAC and Linux).

  1. In older downloads, the data DICOM data would download as:

    [...]/LIDC-IDRI/LIDC-IDRI-dddd/uid1/uid2/dicom_file.dcm
    

    where […] is the base path set in the pylidc configuration filee; uid1 is Scan.study_instance_uid; and, uid2 is Scan.series_instance_uid .

  2. However, in more recent downloads, the data is downloaded like:

    [...]/LIDC-IDRI/LIDC-IDRI-dddd/???
    

    where “???” is some unknown folder hierarchy convention used by TCIA.

We first check option 1. Otherwise, we check if the “LIDC-IDRI-dddd” folder exists in the root path. If so, then we recursively search the “LIDC-IDRI-dddd” directory until we find the correct subfolder that contains a DICOM file with the correct study_instance_uid and series_instance_uid.

Option 2 is less efficient than 1; however, option 2 is robust.

load_all_dicom_images(verbose=True)[source]

Load all the DICOM images assocated with this scan and return as list.

Parameters:verbose (bool) – Turn the loading method on/off.

Example

An example:

import pylidc as pl
import matplotlib.pyplot as plt

scan = pl.query(pl.Scan).first()

images = scan.load_all_dicom_images()
zs = [float(img.ImagePositionPatient[2]) for img in images]
print(zs[1] - zs[0], img.SliceThickness, scan.slice_thickness)

plt.imshow( images[0].pixel_array, cmap=plt.cm.gray )
plt.show()
slice_spacing

This computes the median of the difference between the slice coordinates, i.e., scan.slice_zvals.

Note

This attribute is typically (but not always!) the same as the slice_thickness attribute. Furthermore, the slice_spacing does NOT necessarily imply that all the slices are spaced with spacing (although they often are).

slice_zvals

The “z-values” for the slices of the scan (i.e., the last coordinate of the ImagePositionPatient DICOM attribute) as a NumPy array sorted in increasing order.

to_volume(verbose=True)[source]

Return the scan as a 3D numpy array volume.

visualize(annotation_groups=None)[source]

Visualize the scan.

Parameters:annotation_groups (list of lists of Annotation objects, default=None) – This argument should be supplied by the returned object from the cluster_annotations method.

Example

An example:

import pylidc as pl

scan = pl.query(pl.Scan).first()
nodules = scan.cluster_annotations()

scan.visualize(annotation_groups=nodules)