Identify similarity matches between STR profiles from bloodmeals and a human database
Source:R/match_similarity.R
match_similarity.Rd
Match STR profiles between bloodmeals and humans based on threshold of most
similar human-human pair. Twins are not included when computing the
threshold. Note that bloodmeal peak height threshold is optional here because
it is only used for filtering. Also note that if rm_twins = FALSE
, then a
match to a twin will result in multiple rows returned for that bloodmeal.
Usage
match_similarity(
bloodmeal_profiles,
human_profiles,
bloodmeal_ids = NULL,
human_ids = NULL,
peak_thresh = NULL,
rm_twins = TRUE,
rm_markers = NULL,
return_similarities = FALSE
)
Arguments
- bloodmeal_profiles
Tibble or data frame with alleles for all bloodmeals in reference database including 4 columns: SampleName, Marker, Allele, Height. Height must be numeric or coercible to numeric.
- human_profiles
Tibble or data frame with alleles for all humans in reference database including three columns: SampleName, Marker, Allele.
- bloodmeal_ids
Vector of bloodmeal ids from the SampleName column in
bloodmeal_profiles
for which to compute log10_lrs. If NULL, all ids in the input dataframe will be used. Default: NULL- human_ids
Vector of human ids from the SampleName column in
human_profiles
for which to compute log10_lrs. If NULL, all ids in the input dataframe will be used. Default: NULL- peak_thresh
Allele peak height threshold in RFUs. All peaks under this threshold will be filtered out. If prior filtering was performed, this number should be equal to or greater than that number. Also used for
threshT
argument ineuroformix::contLikSearch()
.- rm_twins
A boolean indicating whether or not to remove likely twins (identical STR profiles) from the human database prior to identifying matches. Default: TRUE
- rm_markers
A vector indicating what markers should be removed prior to calculating log10LRs. NULL to include all markers. By default, for the bistro function AMEL is removed as it is not standard to include it in LR calculations.
- return_similarities
A boolean indicating whether or not to return human-human and bloodmeal-human. Default: FALSE
Value
Dataframe with three columns:
bloodmeal_id
: bloodmeal IDhuman_id
: human ID of match (or NA)match
: whether or not a match was identified (yes or no)similarity
: similarity value if a match as foundIf
return_similarities = TRUE
, then a named list of length 4 is returned:matches
: the dataframe described abovemax_hu_hu_similarity
: maximum human-human similarity (the threshold used for matching)hu_hu_similarities
: all human-human similarity values,bm_hu_similarities
: all bloodmeal-human similarities for profiles that have identical alleles at at least one marker
Examples
match_similarity(bloodmeal_profiles, human_profiles)
#> Calculating human-human similarities
#> Maximum similarity between people: 0.117647058823529
#> Calculating bloodmeal-human similarities
#> Identifying matches
#> # A tibble: 4 × 4
#> bloodmeal_id human_id match similarity
#> <chr> <chr> <chr> <dbl>
#> 1 evid1 P1 yes 0.294
#> 2 evid3 NA no 0.0588
#> 3 evid2 NA no NA
#> 4 evid4 NA no NA