Skip to contents

Match STR profiles between bloodmeals and humans based on threshold of most similar human-human pair. Twins are not included when computing the threshold. Note that bloodmeal peak height threshold is optional here because it is only used for filtering. Also note that if rm_twins = FALSE, then a match to a twin will result in multiple rows returned for that bloodmeal.

Usage

match_similarity(
  bloodmeal_profiles,
  human_profiles,
  bloodmeal_ids = NULL,
  human_ids = NULL,
  peak_thresh = NULL,
  rm_twins = TRUE,
  rm_markers = NULL,
  return_similarities = FALSE
)

Arguments

bloodmeal_profiles

Tibble or data frame with alleles for all bloodmeals in reference database including 4 columns: SampleName, Marker, Allele, Height. Height must be numeric or coercible to numeric.

human_profiles

Tibble or data frame with alleles for all humans in reference database including three columns: SampleName, Marker, Allele.

bloodmeal_ids

Vector of bloodmeal ids from the SampleName column in bloodmeal_profiles for which to compute log10_lrs. If NULL, all ids in the input dataframe will be used. Default: NULL

human_ids

Vector of human ids from the SampleName column in human_profiles for which to compute log10_lrs. If NULL, all ids in the input dataframe will be used. Default: NULL

peak_thresh

Allele peak height threshold in RFUs. All peaks under this threshold will be filtered out. If prior filtering was performed, this number should be equal to or greater than that number. Also used for threshT argument in euroformix::contLikSearch().

rm_twins

A boolean indicating whether or not to remove likely twins (identical STR profiles) from the human database prior to identifying matches. Default: TRUE

rm_markers

A vector indicating what markers should be removed prior to calculating log10LRs. NULL to include all markers. By default, for the bistro function AMEL is removed as it is not standard to include it in LR calculations.

return_similarities

A boolean indicating whether or not to return human-human and bloodmeal-human. Default: FALSE

Value

Dataframe with three columns:

  • bloodmeal_id: bloodmeal ID

  • human_id: human ID of match (or NA)

  • match: whether or not a match was identified (yes or no)

  • similarity: similarity value if a match as found

    If return_similarities = TRUE, then a named list of length 4 is returned:

  • matches: the dataframe described above

  • max_hu_hu_similarity: maximum human-human similarity (the threshold used for matching)

  • hu_hu_similarities: all human-human similarity values, bm_hu_similarities: all bloodmeal-human similarities for profiles that have identical alleles at at least one marker

Examples

match_similarity(bloodmeal_profiles, human_profiles)
#> Calculating human-human similarities
#> Maximum similarity between people: 0.117647058823529
#> Calculating bloodmeal-human similarities
#> Identifying matches
#> # A tibble: 4 × 4
#>   bloodmeal_id human_id match similarity
#>   <chr>        <chr>    <chr>      <dbl>
#> 1 evid1        P1       yes       0.294 
#> 2 evid3        NA       no        0.0588
#> 3 evid2        NA       no       NA     
#> 4 evid4        NA       no       NA