admin 管理员组

文章数量: 1086019


2024年12月29日发(作者:指针在内存中占几个字节)

Package‘hdImpute’

August7,2023

TypePackage

TitleABatchProcessforHighDimensionalImputation

Version0.2.1

BugReports/pdwaggoner/hdImpute/issues

MaintainerPhilipWaggoner<*************************>

DescriptionAcorrelation-basedbatchprocessforfast,accurateimputationfor

highdimensionalmissingdataproblemsviachainedrandomforests.

SeeWaggoner(2023)formoreon'hdImpute',

StekhovenandBühlmann(2012)formoreon'missForest',

andMayer(2022)formoreon'missRanger'.

LicenseMIT+fileLICENSE

EncodingUTF-8

ImportsmissRanger,plyr,purrr,magrittr,tibble,dplyr,tidyselect,

tidyr,cli

Suggeststestthat(>=3.0.0),knitr,rmarkdown,usethis,missForest,

tidyverse

VignetteBuilderknitr

RoxygenNote7.2.3

Config/testthat/edition3

URL/pdwaggoner/hdImpute

NeedsCompilationno

AuthorPhilipWaggoner[aut,cre]

RepositoryCRAN

Date/Publication2023-08-0721:20:02UTC

Rtopicsdocumented:

check_

check_

1

2

2

2

feature_cor..

flatten_mat..

impute_batches

Index

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

check_row_na

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3

4

4

5

6

8

check_feature_naFindfeatureswith(specifiedamountof)missingness

Description

Findfeatureswith(specifiedamountof)missingness

Usage

check_feature_na(data,threshold)

Arguments

data

threshold

Adataframeortibble.

Missingnessthresholdinagivencolumn/featureasaproportionboundedbe-

tsettosensitivelevelat1e-04.

Value

Avectorofcolumn/featurenamesthatcontainmissingnessgreaterthanthreshold.

Examples

##Notrun:

check_feature_na(data=any_data_frame,threshold=1e-04)

##End(Notrun)

check_row_naFindnumberofandwhichrowscontainanymissingness

Description

Findnumberofandwhichrowscontainanymissingness

Usage

check_row_na(data,which)

feature_cor

Arguments

data

which

Adataframeortibble.

3

alistbereturnedwiththerownumberscorrespondingtoeach

rowwithmissingness?DefaultsettoFALSE.

Value

Eitheranintegervaluecorrespondingtothenumberofrowsindatawithanymissingness(ifwhich

=FALSE),oratibblecontaining:1)numberofrowsindatawithanymissingness,and2)alistof

whichrows/rownumberscontainmissingness(ifwhich=TRUE).

Examples

##Notrun:

check_row_na(data=any_data_frame,which=FALSE)

##End(Notrun)

feature_corHighdimensionalimputationviabatchprocessedchainedrandom

forestsBuildcorrelationmatrix

Description

HighdimensionalimputationviabatchprocessedchainedrandomforestsBuildcorrelationmatrix

Usage

feature_cor(data,return_cor)

Arguments

data

return_cor

Value

Across-featurecorrelationmatrix

References

Waggoner,P.D.(2023).ationalStatistics,

:<10.1007/s00180-023-01325-9>

vanBuurenS,Groothuis-OudshoornK(2011)."mice:MultivariateImputationbyChainedEqua-

tionsinR."JournalofStatisticalSoftware,45(3),:<10.18637/jss.v045.i03>

Adataframeortibble.

thecorrelationmatrixbeprinted?DefaultsettoFALSE.

4

Examples

##Notrun:

feature_cor(data=data,return_cor=FALSE)

##End(Notrun)

hdImpute

flatten_matFlattenandarrangecormatrixtobedf

Description

Flattenandarrangecormatrixtobedf

Usage

flatten_mat(cor_mat,return_mat)

Arguments

cor_mat

return_mat

Value

Avectorofcorrelation-basedrankedfeatures

Examples

##Notrun:

flatten_mat(cor_mat=cor_mat,return_mat=FALSE)

##End(Notrun)

Acorrelationmatrixoutputfromrunningfeature_cor()

theflattenedmatrixbeprinted?DefaultsettoFALSE.

hdImputeCompletehdImputeprocess:correlationmatrix,flatten,rank,create

batches,impute,join

Description

CompletehdImputeprocess:correlationmatrix,flatten,rank,createbatches,impute,join

Usage

hdImpute(data,batch,pmm_k,n_trees,seed,save)

impute_batches

Arguments

data

batch

pmm_k

n_trees

seed

save

Originaldataframeortibble(withmissingvalues)

ize.

tsetat5.

tsetat15.

besetforreproducibility.

5

filetoworking

directory?DefaultsettoFALSE.

Details

atabydividingtherow_number()bybatchsize(batch,numberofbatchessetby

user)roughgroup_split()

tecompleted(unlisted/joined)imputeddataframe

Value

Acompleted,imputeddataset

References

Waggoner,P.D.(2023).ationalStatistics,

:<10.1007/s00180-023-01325-9>

Stekhoven,D.J.,&Bühlmann,P.(2012).MissForest—non-parametricmissingvalueimputation

ormatics,28(1),:<10.1093/bioinformatics/btr597>

Examples

##Notrun:

impute_batches(data=data,

batch=2,pmm_k=5,n_trees=15,

seed=123,save=FALSE)

##End(Notrun)

impute_batchesImputebatchesandreturncompleteddataframe

Description

Imputebatchesandreturncompleteddataframe

Usage

impute_batches(data,features,batch,pmm_k,n_trees,seed,save)

6

Arguments

data

features

batch

pmm_k

n_trees

seed

save

Originaldataframeortibble(withmissingvalues)

mad

Correlation-basedvectorofrankedfeaturesoutputfromrunningflatten_mat()

ize.

tat5.

tat15.

besetforreproducibility.

filetoworking

directory?DefaultsettoFALSE.

Details

atabydividingtherow_number()bybatchsize(batch,numberofbatchessetby

user)roughgroup_split()

tecompleted(unlisted/joined)imputeddataframe

Value

Acompleted,imputeddataset

References

Waggoner,P.D.(2023).ationalStatistics,

:<10.1007/s00180-023-01325-9>

Stekhoven,D.J.,&Bühlmann,P.(2012).MissForest—non-parametricmissingvalueimputation

ormatics,28(1),:<10.1093/bioinformatics/btr597>

Examples

##Notrun:

impute_batches(data=data,features=flat_mat,

batch=2,pmm_k=5,n_trees=15,seed=123,

save=FALSE)

##End(Notrun)

madComputevariable-wisemeanabsolutedifferences(MAD)between

originalandimputeddataframes.

Description

Computevariable-wisemeanabsolutedifferences(MAD)betweenoriginalandimputeddataframes.

mad

Usage

mad(original,imputed,round)

Arguments

original

imputed

round

Value

Adataframeortibblewithoriginalvalues.

Adataframeortibblethathasbeenimputed/completed.

tsetto3.

7

‘mad_scores‘as‘p‘foreachvariableinoriginal,from1to‘p‘.Twocolumns:

firstisvariablenames(‘var‘)andsecondisassociatedMADscore(‘mad‘)aspercentagesforeach

variable.

Examples

##Notrun:

mad(original=original_data,imputed=imputed_data,round=3)

##End(Notrun)

Index

check_feature_na,2

check_row_na,2

feature_cor,3

flatten_mat,4

hdImpute,4

impute_batches,5

mad,6

8


本文标签: 内存 指针 作者 字节