Skip to content

Pipeline

Pipeline helper functions.

parsimony_only(data_set, full_prioritization=False)

Run the XLRanker pipeline with only the parsimonious selection step.

This will likely result in many PARSIMONY_AMBIGUOUS protein pairs. To avoid ambiguity, you can set full_prioritization to True. This will select one random pair as the representative pair for ambiguous groups.

Parameters:

Name Type Description Default
data_set XLDataSet

Cross-linking dataset that needs prioritization

required
full_prioritization bool

Default to False. If True, randomly select representative pairs for ambiguous groups.

False

Returns:

Name Type Description
XLDataSet XLDataSet

XLDataSet with only parsimonious selection performed.

Source code in src/xlranker/pipeline.py
def parsimony_only(data_set: XLDataSet, full_prioritization: bool = False) -> XLDataSet:
    """Run the XLRanker pipeline with only the parsimonious selection step.

    This will likely result in many PARSIMONY_AMBIGUOUS protein pairs. To avoid ambiguity, you can set full_prioritization to True. This will select one random pair as the representative pair for ambiguous groups.

    Args:
        data_set (XLDataSet): Cross-linking dataset that needs prioritization
        full_prioritization (bool): Default to False. If True, randomly select representative pairs for ambiguous groups.

    Returns:
        XLDataSet: XLDataSet with only parsimonious selection performed.

    """
    data_set.build_proteins()  # TODO: Determine if this should be done when loaded/initialized
    parsimony = ParsimonySelector(data_set)
    parsimony.run()
    if full_prioritization:
        select_random(data_set)
    return data_set

run_full_pipeline(data_set, threshold=0.5)

Run the full XLRanker pipeline.

Parameters:

Name Type Description Default
data_set XLDataSet

Cross-linking dataset that needs prioritization

required
threshold float

Score threshold for the expanded report

0.5

Returns:

Name Type Description
XLDataSet XLDataSet

XLDataSet with full prioritization

Source code in src/xlranker/pipeline.py
def run_full_pipeline(data_set: XLDataSet, threshold: float = 0.5) -> XLDataSet:
    """Run the full XLRanker pipeline.

    Args:
        data_set (XLDataSet): Cross-linking dataset that needs prioritization
        threshold (float): Score threshold for the expanded report

    Returns:
        XLDataSet: XLDataSet with full prioritization

    """
    data_set.build_proteins()  # TODO: Determine if this should be done when loaded/initialized
    parsimony = ParsimonySelector(data_set)
    parsimony.run()
    model = PrioritizationModel(data_set)
    model.run_model()
    get_final_network(data_set, ThresholdSelector(threshold))
    make_all_reports(list(data_set.protein_pairs.values()))
    return data_set