importing aligater & the aligater config

AliGater will attempt to detect if you are running an interactive python session (Ipython/jupyter) or started it in a script (terminal mode), and show a corresponding standard error message.

[1]:
import aligater as ag
AliGater started in Jupyter mode

This mainly switches on and off plotting.

When aligater is imported the config file is run. Located in aligater/aligater/AGConf.py

This contains some settings that are well worth inspecting before going on to batch processing. For loading single files and exploring, the defaults are usually fine.

You can always access and change settings in the AGConf file after aligater as been imported if needed:

[2]:
ag.AGConfig.execMode = 'terminal'

AliGater attempts to detects it’s root directory on start

[3]:
ag.AGConfig.ag_home
[3]:
'/media/ludvig/Project_Storage/BloodVariome/aligater'

Another useful path is the ag_tmp property, this defines aligaters ‘scratch space’, where it stores intermediate files, downsampled images etc. The space requirements can be rather large with big batch runs. By default it’s set to a “temp” folder in aligaters home directory. It might be necessary to set this to a different folder if you’ve installed to a drive with limited space.

[4]:
ag.AGConfig.ag_tmp
[4]:
'/media/ludvig/Project_Storage/BloodVariome/aligater/temp'

Finally there’s a default output directory, used to store results if no, or an invalid, output path was provided when running an analysis

[5]:
ag.AGConfig.ag_out
[5]:
'/media/ludvig/Project_Storage/BloodVariome/aligater/out'

Single file i/o

AliGater file i/o is mainly done in one of two ways, either through the loadFCS function, or batch loading of many files through setting up an AGExperiment object.

By default, the loadFCS function returns the data as a pandas Dataframe

[6]:
ag.loadFCS(path=ag.AGConfig.ag_home+"/tutorial/data/example1.fcs",
           compensate=True,
           flourochrome_area_filter=True)
Opening file example1 from folder /tutorial/data
[6]:
FSC 488/10-H FSC 488/10-A FSC 488/10-W SSC 488/10-H SSC 488/10-A SSC 488/10-W BB515 CD39-A PE-Cy7 CD25-A PE CD127-A PE-Dazzle 594 CCR6-A BV650 HLA-DR-A BV711 CCR7-A BV786 CXCR5-A BV421 CXCR3-A BV605 CD194-A BV510 CD4-A Alexa Fluor 700 CD3-A APC-H7 CD8-A APC CD45RA-A
0 88768.6144 115694.1312 78764.4416 23343.1808 25840.0768 66093.0560 120.370272 239.641722 58.923444 103.918683 93.028168 141.317305 258.044027 114.663754 398.293056 80.621713 206.555734 152.830524 4.659925
1 45153.5104 67816.0128 99375.5136 167609.8816 191416.7296 71129.4976 200.210499 693.326859 249.931336 295.704729 103.487257 752.449007 370.677672 1679.466896 3827.976214 45.983813 -84.886763 652.231400 53.660000
2 60145.6384 77661.2352 80288.1536 13294.7456 14618.8800 67498.8032 81.190419 132.194142 112.163318 3543.683607 1742.238333 3796.207810 1613.339615 400.131389 800.681072 35.508496 -776.301108 375.489958 195.150474
3 68961.9200 86361.0624 75399.1680 12068.2240 13162.2400 66538.7008 84.735906 161.763261 216.086232 147.602239 -33.096730 1246.019463 147.300612 207.899467 428.570031 505.577913 4579.946665 -108.898032 367.448467
4 131969.7664 208002.1504 96872.0384 214000.0000 214000.0000 83997.4912 275.688918 1305.255277 169.837805 139.485735 219.137415 878.919580 294.435400 1796.582907 4844.724148 484.844886 92.258240 610.634542 154.902509
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
999995 89871.8720 113879.0912 77515.9808 17263.6160 18954.9568 67662.6432 163.243621 291.009162 61.342903 243.272295 -64.492626 388.948296 269.014414 335.925523 680.944393 102.200811 -84.231826 700.249344 569.440784
999996 80852.3520 107713.5616 80537.1904 20498.0224 22555.4944 67698.6880 59.597661 291.044370 87.118466 -101.975519 54.008898 561.816350 323.532090 341.294551 566.059552 55.544961 1156.231582 3450.219284 1382.457310
999997 75933.3376 95031.7056 76677.1200 22532.5056 24762.0352 67380.8384 155.831188 252.498242 349.383662 126.954442 -100.255948 336.092223 210.900084 565.794824 586.369531 43.023669 8508.503913 394.399828 127.832572
999998 157514.2144 213773.6960 84456.2432 140626.7904 157082.8032 69477.9904 211.728988 390.228888 315.458599 335.544054 114.022993 332.122552 416.700949 374.146323 771.626994 150.280757 363.049544 75.812150 66.347037
999999 93161.3440 110947.7632 73423.2576 22326.5536 24479.6160 66915.5328 61.286748 372.204139 449.375425 95.652379 202.722344 319.965699 386.453953 373.172388 963.899741 31.233489 1467.968905 7456.991510 129.638480

1000000 rows × 19 columns

Normally I’d recommend loading it into an aligater.AGSample object, which holds a dataframe internally with extra metadata

[7]:
sample = ag.loadFCS(path=ag.AGConfig.ag_home+"/tutorial/data/example1.fcs",
                    compensate=True,
                    flourochrome_area_filter=True,
                    return_type="agsample")
Opening file example1 from folder /tutorial/data
[8]:
type(sample)
[8]:
aligater.AGClasses.AGSample

You can always access the pandas dataframe by calling the sample object

[9]:
sample()
[9]:
FSC 488/10-H FSC 488/10-A FSC 488/10-W SSC 488/10-H SSC 488/10-A SSC 488/10-W BB515 CD39-A PE-Cy7 CD25-A PE CD127-A PE-Dazzle 594 CCR6-A BV650 HLA-DR-A BV711 CCR7-A BV786 CXCR5-A BV421 CXCR3-A BV605 CD194-A BV510 CD4-A Alexa Fluor 700 CD3-A APC-H7 CD8-A APC CD45RA-A
0 88768.6144 115694.1312 78764.4416 23343.1808 25840.0768 66093.0560 120.370272 239.641722 58.923444 103.918683 93.028168 141.317305 258.044027 114.663754 398.293056 80.621713 206.555734 152.830524 4.659925
1 45153.5104 67816.0128 99375.5136 167609.8816 191416.7296 71129.4976 200.210499 693.326859 249.931336 295.704729 103.487257 752.449007 370.677672 1679.466896 3827.976214 45.983813 -84.886763 652.231400 53.660000
2 60145.6384 77661.2352 80288.1536 13294.7456 14618.8800 67498.8032 81.190419 132.194142 112.163318 3543.683607 1742.238333 3796.207810 1613.339615 400.131389 800.681072 35.508496 -776.301108 375.489958 195.150474
3 68961.9200 86361.0624 75399.1680 12068.2240 13162.2400 66538.7008 84.735906 161.763261 216.086232 147.602239 -33.096730 1246.019463 147.300612 207.899467 428.570031 505.577913 4579.946665 -108.898032 367.448467
4 131969.7664 208002.1504 96872.0384 214000.0000 214000.0000 83997.4912 275.688918 1305.255277 169.837805 139.485735 219.137415 878.919580 294.435400 1796.582907 4844.724148 484.844886 92.258240 610.634542 154.902509
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
999995 89871.8720 113879.0912 77515.9808 17263.6160 18954.9568 67662.6432 163.243621 291.009162 61.342903 243.272295 -64.492626 388.948296 269.014414 335.925523 680.944393 102.200811 -84.231826 700.249344 569.440784
999996 80852.3520 107713.5616 80537.1904 20498.0224 22555.4944 67698.6880 59.597661 291.044370 87.118466 -101.975519 54.008898 561.816350 323.532090 341.294551 566.059552 55.544961 1156.231582 3450.219284 1382.457310
999997 75933.3376 95031.7056 76677.1200 22532.5056 24762.0352 67380.8384 155.831188 252.498242 349.383662 126.954442 -100.255948 336.092223 210.900084 565.794824 586.369531 43.023669 8508.503913 394.399828 127.832572
999998 157514.2144 213773.6960 84456.2432 140626.7904 157082.8032 69477.9904 211.728988 390.228888 315.458599 335.544054 114.022993 332.122552 416.700949 374.146323 771.626994 150.280757 363.049544 75.812150 66.347037
999999 93161.3440 110947.7632 73423.2576 22326.5536 24479.6160 66915.5328 61.286748 372.204139 449.375425 95.652379 202.722344 319.965699 386.453953 373.172388 963.899741 31.233489 1467.968905 7456.991510 129.638480

1000000 rows × 19 columns

An aligater sample object will know which file it was loaded from and a shortened version containing two parent folders.

Folder structure is a common way to sort files into case/control, cohorts etc…

[10]:
sample.filePath
[10]:
'/media/ludvig/Project_Storage/BloodVariome/aligater/tutorial/data/example1.fcs'
[11]:
sample.sample
[11]:
'tutorial/data/example1'

In the above function two parameters were passed; compensate and fluorochrome_area_filter which merits some explaination.

The compensate flag tells aligater to apply compensation data available in the fcs metadata - more on that shortly.

The fluorochrome_area_filter is only relevant to certain .fcs files, coming from some flow machines/ways of exporting. As with forward- and sidescatters, each flow channel can be reported with height and width. In most setups these extra channels are not used. The filter will shave them and only keep the -area channel, which is what’s typically used in flow gating.

Metadata & Compensation

Supplying the metadata flag to the loadFCS function will cause the function to return two things; a metadata python dictionary as well as the AGSample or pandas Dataframe

[12]:
metadata, sample = ag.loadFCS(path=ag.AGConfig.ag_home+"/tutorial/data/example1.fcs",
                              metadata=True,
                              compensate=True,
                              flourochrome_area_filter=True,
                              return_type="agsample")
Opening file example1 from folder /tutorial/data
[13]:
type(metadata)
[13]:
dict

It’s a pretty big dictionary and would be too clunky to show the entire content in this tutorial. Feel free to browse the content yourself. One of the most important parts of the dictionary, however, is the spill matrix:

[14]:
metadata['$SPILLOVER']
[14]:
'13,FL02-A,FL08-A,FL12-A,FL14-A,FL15-A,FL16-A,FL17-A,FL19-A,FL20-A,FL21-A,FL22-A,FL23-A,FL25-A,1.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0008,0.0065,0.0002,0.0001,0.0004,0.0012,1.0000,0.0303,0.0088,0.0003,0.0038,0.1847,0.0000,0.0013,0.0000,0.0114,0.4024,0.0003,0.0002,0.0041,1.0000,0.2845,0.0103,0.0021,0.0007,0.0000,0.0418,0.0000,0.0001,0.0001,0.0001,0.0004,0.0391,0.0695,1.0000,0.0439,0.0125,0.0049,0.0000,0.1181,0.0000,0.0015,0.0002,0.0013,0.0000,0.0019,0.0006,0.0115,1.0000,0.2716,0.1033,0.0319,0.1924,0.0009,0.1265,0.0168,0.1011,0.0000,0.0180,0.0000,0.0000,0.0899,1.0000,0.5779,0.0465,0.0013,0.0046,0.9123,0.3237,0.0216,0.0000,0.0100,0.0000,0.0000,0.0029,0.0326,1.0000,0.0294,0.0017,0.0031,0.0071,0.1445,0.0006,0.0001,0.0000,0.0000,0.0000,0.0027,0.0004,0.0004,1.0000,0.0101,0.0935,0.0001,0.0000,0.0000,0.0000,0.0126,0.0187,0.2286,0.5096,0.1328,0.0603,0.0221,1.0000,0.0017,0.0006,0.0001,0.0010,0.0011,0.0001,0.0000,0.0002,0.3372,0.0959,0.0510,0.0010,0.8689,1.0000,0.0007,0.0004,0.0000,0.0001,0.0107,0.0003,0.0010,0.0013,0.0307,0.0167,0.0000,0.0000,0.0000,1.0000,0.3095,0.0110,0.0000,0.0534,0.0000,0.0000,0.0005,0.0003,0.0517,0.0000,0.0000,0.0000,0.0227,1.0000,0.0070,0.0000,0.0126,0.0000,0.0019,0.1090,0.0199,0.0066,0.0000,0.0000,0.0000,0.9262,0.2261,1.0000'

This is the matrix defining how much signal from one laser spills into other channels, and needs to be corrected - i.e. it’s used for compensation.

Note that it will not always be called $SPILLOVER, there’s some different aliases depending on machine/software used for exporting.

To make it properly readable you need to reformat it by the number of colors, which is given in the first element.

Below is a somewhat complicated one-liner to push it into a more readable pandas Dataframe

[15]:
import numpy as np
import pandas as pd
[16]:
pd.DataFrame(np.array(metadata['$SPILLOVER'].split(',')[13+1:]).reshape(13, 13).astype(float))
[16]:
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0008 0.0065 0.0002 0.0001 0.0004
1 0.0012 1.0000 0.0303 0.0088 0.0003 0.0038 0.1847 0.0000 0.0013 0.0000 0.0114 0.4024 0.0003
2 0.0002 0.0041 1.0000 0.2845 0.0103 0.0021 0.0007 0.0000 0.0418 0.0000 0.0001 0.0001 0.0001
3 0.0004 0.0391 0.0695 1.0000 0.0439 0.0125 0.0049 0.0000 0.1181 0.0000 0.0015 0.0002 0.0013
4 0.0000 0.0019 0.0006 0.0115 1.0000 0.2716 0.1033 0.0319 0.1924 0.0009 0.1265 0.0168 0.1011
5 0.0000 0.0180 0.0000 0.0000 0.0899 1.0000 0.5779 0.0465 0.0013 0.0046 0.9123 0.3237 0.0216
6 0.0000 0.0100 0.0000 0.0000 0.0029 0.0326 1.0000 0.0294 0.0017 0.0031 0.0071 0.1445 0.0006
7 0.0001 0.0000 0.0000 0.0000 0.0027 0.0004 0.0004 1.0000 0.0101 0.0935 0.0001 0.0000 0.0000
8 0.0000 0.0126 0.0187 0.2286 0.5096 0.1328 0.0603 0.0221 1.0000 0.0017 0.0006 0.0001 0.0010
9 0.0011 0.0001 0.0000 0.0002 0.3372 0.0959 0.0510 0.0010 0.8689 1.0000 0.0007 0.0004 0.0000
10 0.0001 0.0107 0.0003 0.0010 0.0013 0.0307 0.0167 0.0000 0.0000 0.0000 1.0000 0.3095 0.0110
11 0.0000 0.0534 0.0000 0.0000 0.0005 0.0003 0.0517 0.0000 0.0000 0.0000 0.0227 1.0000 0.0070
12 0.0000 0.0126 0.0000 0.0019 0.1090 0.0199 0.0066 0.0000 0.0000 0.0000 0.9262 0.2261 1.0000

Typically you would only need to inspect/look at this this when there’s some compensation issues

Aligater will report that compensation information is missing if this matrix is equal to the identity matrix.

In that case you might want to compensate your flow data using external compensation information, such as from another sample.

‘Manual’ compensation

Below is such a sample where compensation hasn’t been applied for some reason, and the associated shown AliGater warning.

[17]:
metaDict, fcsDF = ag.loadFCS(ag.AGConfig.ag_home+"/tutorial/data/Uncompensated.fcs", compensate=True, metadata=True)
Opening file Uncompensated from folder /tutorial/data
WARNING: No compensation data available in sample!

Using the same ‘hack’ from before we can inspect the compensation data

[18]:
pd.DataFrame(np.array(metaDict['SPILL'].split(',')[8+1:]).reshape(8, 8).astype(float))
[18]:
0 1 2 3 4 5 6 7
0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
6 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0

In the same run a sample where a correct compensation matrix was present, AliGater lets you use compensation data from that secondary sample.

For single files this is can be achieved like below. There are ways to automate this process for batch runs.

[19]:
metaDict, fcsDF = ag.loadFCS(ag.AGConfig.ag_home+"/tutorial/data/Compensated.fcs",metadata=True)
marker_labels,compensation_matrix = ag.getCompensationMatrix(fcsDF, metaDict)
compensation_matrix
Opening file Compensated from folder /tutorial/data
[19]:
array([[1.00000000e+00, 1.32450863e-01, 1.02665103e-02, 2.01250917e-03,
        0.00000000e+00, 1.08101399e-02, 0.00000000e+00, 9.71868171e-03],
       [0.00000000e+00, 1.00000000e+00, 1.15291878e-01, 2.48027420e-02,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.18944635e-01],
       [0.00000000e+00, 9.32454529e-03, 1.00000000e+00, 3.97515291e-02,
        1.48551524e-04, 0.00000000e+00, 1.05581201e-04, 4.48574031e-02],
       [2.82618619e-04, 3.41073206e-04, 4.06687202e-02, 1.00000000e+00,
        1.08666204e-03, 8.90738978e-04, 1.41479627e-04, 8.85263402e-01],
       [1.78071188e-03, 6.43905570e-05, 0.00000000e+00, 0.00000000e+00,
        1.00000000e+00, 1.09196710e-01, 0.00000000e+00, 0.00000000e+00],
       [1.88402026e-02, 5.74699525e-03, 7.81401221e-05, 0.00000000e+00,
        7.51734389e-02, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [1.28217849e-02, 3.15879976e-01, 1.92875902e-04, 0.00000000e+00,
        0.00000000e+00, 1.04141284e-03, 1.00000000e+00, 6.52849114e-02],
       [3.20313049e-04, 1.50062611e-03, 0.00000000e+00, 1.57187596e-02,
        8.37001753e-05, 0.00000000e+00, 1.66419571e-03, 1.00000000e+00]])
[20]:
marker_labels
[20]:
Index(['IgA', 'CD34', 'IgD', 'CD45', 'CD38', 'CD24', 'CD27', 'CD19'], dtype='object')

As seen above, the function getCompensationMatrix will extract the compensation matrix, aswell as the marker labels from the given Dataframe and metadata dictionary

We can supply this compensation ag.loadFCS when we load the uncompensated sample. A confirmation message will be shown if successful

[21]:
ag.loadFCS(ag.AGConfig.ag_home+"/tutorial/data/Uncompensated.fcs",
          compensate=True,
          comp_matrix=compensation_matrix)
Opening file Uncompensated from folder /tutorial/data
External compensation matrix passed, applying
Applied passed compensation matrix
[21]:
FSC-A FSC-H SSC-A SSC-H IgA CD34 IgD CD45 CD38 CD24 CD27 CD19
0 161583.906250 126673.0 180250.359375 148789.0 419.890919 704.346825 745.353422 515.768723 460.933154 558.299844 56.922603 75.548921
1 154024.296875 126671.0 163986.625000 142751.0 493.929533 471.400704 932.985111 2309.102986 419.945958 1026.249939 35.534087 -125.756470
2 137975.109375 107581.0 193105.234375 155385.0 625.095272 571.084435 698.768740 1418.433455 476.331344 975.560317 52.880820 86.962923
3 206777.421875 159966.0 151537.015625 118430.0 531.943217 94.538583 536.822357 3645.957853 632.636837 596.313700 3078.929818 -662.435862
4 87524.781250 75609.0 36023.210938 32077.0 128.671438 81.821096 152.881221 1019.520290 3005.939069 48.312547 -9.880112 -18.189542
... ... ... ... ... ... ... ... ... ... ... ... ...
499995 182131.468750 146053.0 169496.078125 137509.0 493.504654 333.079739 709.160040 1269.270643 476.413507 546.679594 50.072443 -166.383163
499996 136246.515625 103334.0 106858.562500 86930.0 242.504396 52.082154 213.620474 561.401434 399.520540 484.466408 82.275759 103.506316
499997 114188.773438 103168.0 25389.810547 23776.0 48.862930 -217.537407 -26.185809 5248.727350 18.933493 61.405944 2077.097106 127.915029
499998 115908.664062 106740.0 24942.669922 23264.0 106.790166 -15.989950 -15.702818 7588.493731 3461.494050 49.971828 5867.613467 -291.864247
499999 172545.781250 144780.0 138276.859375 121640.0 460.486252 258.253951 582.292534 650.547175 235.437385 534.648082 34.140457 93.752117

500000 rows × 12 columns