Scan¶
-
class
pylidc.
Scan
(**kwargs)[source]¶ The Scan model class refers to the top-level XML file from the LIDC. A scan has many
pylidc.Annotation
objects, which correspond to the unblindedReadNodule XML attributes for the scan.-
study_instance_uid
¶ string – DICOM attribute (0020,000D).
-
series_instance_uid
¶ string – DICOM attribute (0020,000E).
-
patient_id
¶ string – Identifier of the form “LIDC-IDRI-dddd” where dddd is a string of integers.
-
slice_thickness
¶ float – DICOM attribute (0018,0050). Note that this may not be equal to the slice_spacing attribute (see below).
-
slice_zvals
¶ ndarray – The “z-values” for the slices of the scan (i.e., the last coordinate of the ImagePositionPatient DICOM attribute) as a NumPy array sorted in increasing order.
-
slice_spacing
¶ float – This computed property is the median of the difference between the slice coordinates, i.e., scan.slice_zvals.
Note
This attribute is typically (but not always!) the same as the slice_thickness attribute. Furthermore, the slice_spacing does NOT necessarily imply that all the slices are spaced with spacing (although they often are).
-
pixel_spacing
¶ float – Dicom attribute (0028,0030). This is normally two values. All scans in the LIDC have equal resolutions in the transverse plane, so only one value is used here.
-
contrast_used
¶ bool – If the DICOM file for the scan had any Contrast tag, this is marked as True.
-
is_from_initial
¶ bool – Indicates whether or not this PatientID was tagged as part of the initial 399 release.
-
sorted_dicom_file_names
¶ string – This attribute is no longer used, and can be ignored.
Example
A short example of Scan class usage:
import pylidc as pl scans = pl.query(pl.Scan).filter(pl.Scan.slice_thickness <= 1) print(scans.count()) # => 97 scan = scans.first() print(scan.patient_id, scan.pixel_spacing, scan.slice_thickness, scan.slice_spacing) # => LIDC-IDRI-0066, 0.63671875, 0.6, 0.5 print(len(scan.annotations)) # => 11
-
cluster_annotations
(metric='min', tol=None, factor=0.9, min_tol=0.1, return_distance_matrix=False, verbose=True)[source]¶ Estimate which annotations refer to the same physical nodule in the CT scan. This method clusters all nodule Annotations for a Scan by computing a distance measure between the annotations.
Parameters: - metric (string or callable, default 'min') –
If string, see:
from pylidc.annotation_distance_metrics import print(metrics metrics.keys())
for available metrics. If callable, the provided function, should take two Annotation objects and return a float, i.e., isinstance( metric(ann1, ann2), float ).
- tol (float, default=None) – A distance in millimeters. Annotations are grouped when the minimum distance between their boundary contour points is less than tol. If tol = None (the default), then tol = scan.pixel_spacing is used.
- factor (float, default=0.9) – If tol resulted in any group of annotations with more than 4 Annotations, then tol is multiplied by factor and the grouping is performed again.
- min_tol (float, default=0.1) – If tol is reduced below min_tol (see the factor parameter), then the routine exits because we conclude that the annotation groups cannot be automatically reduced to have groups with each group having Annotations<=4 (as expected with LIDC data).
- return_distance_matrix (bool, default False) – Optionally return the distance matrix that was used to produce the clusters.
- verbose (bool, default=True) – If True, a warning message is printed when tol < min_tol occurs.
Returns: clusters – clusters[i] is a list of
pylidc.Annotation
objects that refer to the same physical nodule in the Scan. len(clusters) estimates the number of unique physical nodules in the Scan.Return type: list of lists.
Note
The “distance” matrix, d[i,j], between all Annotations for the Scan is first computed using the provided metric parameter. Annotations are said to be adjacent when d[i,j]<=tol. Annotation groups are determined by finding the connected components of the graph associated with this adjacency matrix.
Example
An example:
import pylidc as pl scan = pl.query(pl.Scan).first() nodules = scan.cluster_annotations() print("This can has %d nodules." % len(nodules)) # => This can has 4 nodules. for i,nod in enumerate(nodules): print("Nodule %d has %d annotations." % (i+1,len(nod))) # => Nodule 1 has 4 annotations. # => Nodule 2 has 4 annotations. # => Nodule 3 has 1 annotations. # => Nodule 4 has 4 annotations.
- metric (string or callable, default 'min') –
-
get_path_to_dicom_files
()[source]¶ Get the path to where the DICOM files are stored for this scan, relative to the root path set in the pylidc configuration file (i.e., ~/.pylidc in MAC and Linux).
In older downloads, the data DICOM data would download as:
[...]/LIDC-IDRI/LIDC-IDRI-dddd/uid1/uid2/dicom_file.dcm
where […] is the base path set in the pylidc configuration filee; uid1 is Scan.study_instance_uid; and, uid2 is Scan.series_instance_uid .
However, in more recent downloads, the data is downloaded like:
[...]/LIDC-IDRI/LIDC-IDRI-dddd/???
where “???” is some unknown folder hierarchy convention used by TCIA.
We first check option 1. Otherwise, we check if the “LIDC-IDRI-dddd” folder exists in the root path. If so, then we recursively search the “LIDC-IDRI-dddd” directory until we find the correct subfolder that contains a DICOM file with the correct study_instance_uid and series_instance_uid.
Option 2 is less efficient than 1; however, option 2 is robust.
-
load_all_dicom_images
(verbose=True)[source]¶ Load all the DICOM images assocated with this scan and return as list.
Parameters: verbose (bool) – Turn the loading method on/off. Example
An example:
import pylidc as pl import matplotlib.pyplot as plt scan = pl.query(pl.Scan).first() images = scan.load_all_dicom_images() zs = [float(img.ImagePositionPatient[2]) for img in images] print(zs[1] - zs[0], img.SliceThickness, scan.slice_thickness) plt.imshow( images[0].pixel_array, cmap=plt.cm.gray ) plt.show()
-
slice_spacing
This computes the median of the difference between the slice coordinates, i.e., scan.slice_zvals.
Note
This attribute is typically (but not always!) the same as the slice_thickness attribute. Furthermore, the slice_spacing does NOT necessarily imply that all the slices are spaced with spacing (although they often are).
-
slice_zvals
The “z-values” for the slices of the scan (i.e., the last coordinate of the ImagePositionPatient DICOM attribute) as a NumPy array sorted in increasing order.
-
visualize
(annotation_groups=None)[source]¶ Visualize the scan.
Parameters: annotation_groups (list of lists of Annotation objects, default=None) – This argument should be supplied by the returned object from the cluster_annotations method. Example
An example:
import pylidc as pl scan = pl.query(pl.Scan).first() nodules = scan.cluster_annotations() scan.visualize(annotation_groups=nodules)
-