Estimating the support of a high-dimensional distribution

admin 管理员组

文章数量: 1087139

2024年2月18日发(作者：css字体透明度)

EstimatingtheSupportofaHigh-DimensionalDistributionBernhardSch¨olkopf,,JohnShawe-Taylor,,msonMicrosoftResearchLtd,1GuildhallStreet,CambridgeCB23NH,UKMicrosoftResearch,1MicrosoftWay,Redmond,WA,USARoyalHolloway,UniversityofLondon,Egham,UKDepartmentofEngineering,AustralianNationalUniversity,Canberra0200,Australiabsc@,jplatt@,john@@,mson@27November1999TechnicalReportMSR-TR-99-87MicrosoftResearchMicrosoftCorporationOneMicrosoftWayRedmond,WA98052

AbstractSupposeyouaregivensomedatasetdrawnfromanunderlyingproba-bilitydistributionandyouwanttoestimatea“simple”subsetofinputspacesuchthattheprobabilitythatatestpointdrawnfromliesoutsideofisboundedbysomeapriorispeciﬁoseamethodtoapproachthisproblembytryinctionalformofisgivenbyakernelexpansionintermsofapotentiallysmallsubsetofthetrainingdata;itisregularizedbyansioncoefﬁcientsarefoundbysolvingaquadraticprogrammingproblem,whichprovideapreliminorithmisanaturtVectorMachines,KernelMethods,DensityEstima-tion,UnsupervisedLearning,NoveltyDetection1IntroductionDuringrecentyears,anewsetofkerneltechniquesforsupervisedlearninghasbeendeveloped(Vapnik,1995;Sch¨olkopfetal.,1999a).Speciﬁcally,supportvec-tor(SV)algorithmsforpatternrecognition,regressionestimatiavebeenafewattemptstotransfertheideaofusingkernelstocom-publemsinthatdomainare,however,lesspreciselyspeciﬁlly,theycanbecharacterizedasestimatingfunctionsofthedatawtance,kernelPCAcanbechar-acterizedascomputingfunctionswhichonthetrainingdataproduceunitvarianceoutputswhilehavingminimumnorminfeaturespace(Sch¨olkopfetal.,1999b).Anotherkernel-basedunsupervisedlearningtechnique,regularizedprincipalman-ifolds(Smolaetal.,1999),computesfunctionswhichgiveamappingontoalringal-gorithmsarefurtherexamplesofunsupervisedlearningtechniqueswhichcanbekernelized(Sch¨olkopfetal.,1999b).Anextremey,knowledgeofthedensityofwouldthenallowustosolvewhateverproblemcanbesolvedonthebasisofthedata.1

Thepresentworkaddressesaneasierproblem:itproposesanalgorithmwhichcomputesabinaryfunctionwhichissupposedtocaptureregionsininputspacewheretheprobabilitydensitylives(itssupport),ionsuchthatmostofthedatawillliveintheregionwherethefunctionisnonzero(Sch¨olkopfetal.,1999).Indoingso,itisinlinewithVapnik’sprincipleneverter,itisapplicablealsoincaseswherethedensityofthedata’sdistributionisnotevenwell-deﬁned,reviewofsomepreviousworkinSec.2,.4givesde-tailsontheimplementationoftheoptimizationprocedure,.6,weapplythealgorithmtoartiﬁludewithadiscussion.2PreviousWorkPartofthemotivationforthepresentworkwasapaperofBen-DavidandLin-denbaum(1997).Itturnsoutthatthereisaconsiderableamountofpriorworkinthestatisticalliterature,andinthissectionwebrieﬂtattemptadetailedcomparisonoftheprooftechniquesofthespeciﬁcresultsachieved,butconﬁrtosummarizethemethods,itisconvenienttointroducethefollowingdeﬁnitionofa(multi-dimensional)quantilefunction(introducedbyEinmalandMason(1992)).classofmeasurablesubsetsofandletbeareal-valuedfunctiondeﬁntilefunctionwithrespecttoisIfistheempiricaldistribution(tilefunctionis),theempiricalquan-Wedenotebyandthe(notnecessarilyunique)thatattainstheinﬁmum(whenitisachievable).ThemostcommonchoiceofisLebesguemeasure,inwhichcaseisttorsoftheformarecalledminimumvolumeestimators.2

ethatforbeingallBorelmea-surablesets,isthesupportofthedensitycorrespondingto,assumingitexists.(Notethatiswelldeﬁnedevenwhendoesnotexist.)Forsmallerclasses,blemofestimatingappearstohaveﬁrstbeenstudiedbyGeffroy(1964)avebeenanumberofworksstudyinganaturalnonparametricestimatorof(-lier(1976);DevroyeandWise(1980);Cuevas(1990);see(Gayraud,1997)forfurtherreferences).Thenonparametricestimatorissimply(1)whereistheballeandWise(1980)showedtheasymp-toticconsistencyof(1)(1990)didthesame,andFraiman(1997)studiedtheasymptoticconsistencyofaplug-inestimatorof:rtoavoidproblyforagivendistribution,isrelatedto,trecentworkrelatingtotheestimationofisbyGayraud(1997)mplesareorthecenterof.(Seealso(KorostelevandTsy-bakov,1993,Chapter8).)Estimatinghighprobabilityregions().Turningtothecasewhere,itseemstheﬁrstworkwasreportedbySager(1977)andthenHartigan(1987)whoconsideredwithbeingtheclassofclosedconvexsetsin.(Theyactuallyconsidereddensitycontourclusters;seebelowforadeﬁnition.)Nolan(1991)ov(1997)hasstudiedanestimatorbasedonpiecewisepolynomialap-proximationofandhasshownitk(1997)vedaidersbothVCclassesandclasseswithalog-coveringnumberwithbracket-3

summarizesanumberofotherprk(1995b)hasalsostudiedtheuseofthe“excessmassapproach”(M¨uller,1992)toconstructanestimatorof“generalized-clusters”ﬁnetheexcetheirempiricalcounterpartsandbyinthesedeﬁrwords,lstisclearlydifferentto,itisrelatedtoitinthatitattemptstoﬁndsmallregionswithasmuchexcessmass(whichissimilartoﬁndingsmallregionswithagivenamountofprobabilitymass).Actuallyismorecloselyrelatedtothedeterminationofdensitycontourclustersatlevel:Simultaneously,andindependently,Ben-DavidandLindenbaum(1997)omadeuseofVCclassesbutstatedtheirresultsinastrongerformwhichismeaningfulforﬁywepointoutacuriousconnectionbetweenminimumvolumesetsofadishesupportofanddeﬁnethedifferentialentropyofbyForand,deﬁnethetypicalsetwithrespecttobywhere.4

Ifandaresequences,thenotation(CoverandThomas,1991,p.227)showthatforallmeans,thenTheypointoutthatthisresult“indicatesthatthevolumeoa-dimensionalvolume,ovidesaninterpretationofdifferentialentropy.”cludeproblemsinmedicaldiagnosis(Tarassenkoetal.,1995),marketing(Ben-DavidandLindenbaum,1997),conditionmonitoringofmachines(DevroyeandWise,1980),estimatingmanufacturingyields(Stoneking,1999),econometricsandgeneralizednonlinearprincipalcurves(Tsybakov,1997;Ko-rostelevandTsybakov,1993),regressionandspectralanalysis(Polonik,1997),testsformultimodalityandclustering(Polonik,1995b)andothers(M¨uller,1992).Polonik(1995a)ntofdoingthisisthatitallowsonetoencodearangeofpriorassumptionsaboutthetruedensitythatwoulhownasymptoticconsistencyandratesofconvergencefordensitiesbelongingtosentpaperdescribesanalgorithmwhichﬁssisdeﬁnedimplicitlyviaakettryandﬁtherhand,ouralgorithmhastractablecomputationalcomplexity,ory,whichusesverysimilartoolstothoseusedbyPolonik,givesresultsthatweexpectwillbeofmoreuseinaﬁnitesamplesizesetting.3Algorithms(2)plicity,Weﬁidertrainingdatawhereisthenumberofobservations,and5

featuremap,toadotproductspacesuchthatthedotproductintheimageofcanbecomputedbyevaluatingsomesimplekernel(Boseretal.,1992;Vapnik,1995;Sch¨olkopfetal.,1999a)(3)suchastheGaussiankernel(4)Indicesandareunderstoodtorangeover(incompactnotation:).Boldfacegreeklettersdenote-dimemainderofthissection,weshalldevelopanalgorithmwhichreturnsafunctionthattakesthevalueina“small”regioncapturingmostofthedatapoints,ategyistomapthedataintothefeaturespacecorrespondingtothekernel,wpoint,thevalueisdeterminedbyevaluatingwhichsideofthehyperplaneitfallson,freedomtoutilizedifferenttypesofkernelfunctions,thissimplegeometrratethedatasetfromtheorigin,wesolvethefollowingquadraticpro-gram:(5)subjectto(6)Here,onzeroslackvariablesarepenalizedintheobjectivefunction,wecanexpectthatifandsolvethisproblem,thenthedecisionfunctionsgn(7)willbepositiveformostexamplescontainedinthetrainingset,ultipliers,weintroduceaLagrangian(8)6

andsetthederivativeswithrespecttotheprimalvariablesyieldingequaltozero,(9)(10)In(9),erwith(3),theSVexpansiontransformsthedecisionfunction(7)intoakernelexpansionsgn(11)Substituting(9)–(10)into(8),andusing(3),weobtainthedualproblem:subjectto(12)Onecanshowthatattheoptimum,thetwoinequalityconstraints(6)becomeequal-itiesifandarenonzero,ore,wecanrecoverbyexploitingthatforanysuch,thecorrespondingpatternsatisﬁes(13)Notethatifapproaches,theupperboundariesontheLagrangemultiplierstendtoinﬁnity,ondinequalityconstraintin(12)blemthenresemblesthecorrespondinghardmarginalgorithm,sincethepenal-izationoferrorsbecomesinﬁnite,ascanbeseenfromtheprimalobjectivefunction(5).Itisstillafeasibleproblem,sincewehaveplacednorestrictionon,socanbecomealargenegativenumberinordertosatisfy(6).Ifwehadrequiredfromthestart,wewouldhaveendedupwiththeconstraintinsteadofthecorrespondingequalityconstraintin(12),ludethissection,wenotethatonecanalsouseballstodescribethedatainfeaturespace,closeinspirittothealgorithmsofSch¨olkopfetal.(1995),withhardboundaries,andTaxandDuin(1999),with“softmargins.”Again,wetrytoputmostofthedataintoasmallballbysolving,for,subjecttofor(14)7

Thisleadstothedual(15)(16)(17)subjecttoandthesolutioncorrespondingtoadecisionfunctionoftheformsgn(18)Similartotheabove,nelswhichonlydependon,case,theequalityconstraintimpliesthatthelinearterminthedualtargetfunctionisconstant,andtheproblem(15–16)turnsouttobeequivalentto(12).Itcanbeshownthatthesameholdstrueforthedecisionfunction,hencethetwoalgorithmscoincideinthatcase.4OptimizationThelastsectionhasformulatedquadraticprograms(QPs)onstrainedoptimizationpro,however,possessfeaturesthatsetthemapartfromgenericQPs,resentsection,wedescribeanalgorithmwhichtakesadvantageofthesefeaturesandempiricallyscalesbettertolargedatasetsizesthanastandardQPsolverwithtimecomplexityoforder(,1999).ThealgorithmisamodiﬁedversionofSMO(SequentialMinimalOptimization),anSVtrainingalgorithmoriginallyproposedforclassiﬁcation(Platt,1999),andsubsequentlyadaptedtoregressionestimation(SmolaandSch¨olkopf,1998).ThestrategyofSMOistobreakuptheconstrainedminimizationof(12)heconstraintonthesumofthedualvariables,itisimpossibletomodifyindeforeresorttooptimizingoverpairsofvariables.8

tance,consideroptimizingoverandwithallothervariablesﬁheshorthand,(12)thenreducesto(19)withand,subjectto(20)whereWediscard.,whichisindependentofand,andeliminatetoobtain(21)(22)withthederivativeSettingthistozeroandsolvingfor,weget(23)isfound,ewpointisoutsideof,theconstrainedoptimumisfoundbyprojectingfrom(23)intoheregionallowedbytheconstraints,onalinsightcanbeobtainedbyrewritingthelastequationintermsoftheoutpuecorrespondingoutputs(cf.(11))readOnce(24)Usingthelattertoeliminatethe,weendupwithanupdateequationforwhichdoesnotexplicitlydependon,(25)whichshowsthattheupdateisessentiallythefractionofﬁrstandsecondderivativeoy,thesameelementaryoptimizationstepcanbeappliedtoanypairoftwovariables,brieﬂydescribehowtodotheoveralloptimization.9

taninteger,er,selectaﬁrstvari,Attheend,thesecorrespondtopointsthatwillsitexactlyonthehyperplane,andthatwillthereforehaveastronginﬂuenceonitspreciseposition.(i)Wescanovertheentiredataset1untilweﬁndavariableviolatingaKKTcondition(Bertsekas,),havefoundone,say,wepickaccordingto(26)(ii)Sameas(i),tice,onescanoftype(i)isfollowedbymultiplescansoftype(ii),untiltherearenoKKTviolatorsin,whereupontheoptimizationgoesbacktoasinglescanoftype(i).Ifthetype(i)scanﬁndsnoKKTviolators,ualcircumstances,thechoiceheuristic(26)ore,ahitherheuristicsarethesameasinthecaseofpatternrecognition,cf.(Platt,1999),xperimentswithSMOappliedtodistributionsupportestimation,r,toensureconvergenceeveninrarepathologicalconditions,thealgorithmcanbemodiﬁedslightly,cf.(Keerthietal.,1999).Weendthissessitice,onehastouseanonzeroaccuracytolerancesuicular,compariewanttheﬁnaldecisionfunctiontoevaluatetoforpointswhichlieonthemargin,ancanbeacceleratedbynotcheckingpatternswhichareonthecorrectsideofthehy-perplanebyalargemargin,usingthemethodofJoachims(1999).110

5TheoryInthissection,weanalysethealgorithmtheoretically,startingwiththeuniquenessofthehyperplane(Proposition2).Wethendescribetheconnectiontobinaryclas-siﬁcation(Proposition3),andshowthattheparametercharacterizesthefractionsofSVsandoutliers(Proposition4).Followingthat,wegivearobustnessresultforthesoftmargin(Proposition5)andﬁnallywebrieﬂystateerrorbounds(Theorem9).Inthissection,wewilluseitalicletterstode(27)(28)Deﬁition2Ifthedataset(28)isseparable,thenthereexistsauniquesupport-inghyperplanewiththepropertiesthat(1)itseparatesalldatafromtheorigin,and(2),itisgivenbysubjectto(29)ProofDuetotheseparability,stenceanduniquenessofthehyperplanethenfollowsfromthesupportinghyperplanetheorem(kas,1995).Moreover,separabilityimpliesthatthereactuallyexistssomeandsuchthatfor(byrescaling,thiscanbeseentoworkforarbitrarilylarge).BytheCauchy-Schwartzinequality,oretheoptimalhyperplaneisobtainedbyminimizingsubjecttotheseconstraints,olutionof(29).Thefollowingresultelucidatestherelationshipbetweensingle-classclassiﬁcationandbinaryclassiﬁcation.11

Proposition3(i)Supposeparametrizesthesupportinghyperplaneforthedata(28).Thenparametrizestheoptimalseparatinghyperplane(passingthroughtheorigin(Vapnik,1995))forthelabelleddataset(30)(ii)Supposeparametrizestheoptimalseparatinghyperplanepassingthroughtheoriginforalabelleddatasetfor(31)Suppose,moreover,thatisalignedsuchthatispositivewhenever,rametrizesthesupportinghyperplanefortheunlabelleddataset(32)ProofAd(i).Observethatparametrizesthesupportinghyperplaneforthedatasetreﬂectedthroughtheorigin,ovidesanoptimalseparationofthetwosets,withdistance,(ii).Byassumption,wehave(,1995),case,marginerrorsinbinaryclassiﬁcation(whichareeitheronthewrongsideoftheseparatinghyperplaneorwhichfallinsidethemargin)translateintooutliersinsingle-classclassiﬁcation,ition3thenholds,cumgranosalis,forthetrainingsetswithmarginerrorsandoutliers,respectively,lityofProposition3liesinthefactthatitallowsustorecyclecer-tainresultsprovenforbinaryclassiﬁcation(Sch¨olkopfetal.,1999c)lowing,explainingthesigniﬁcanceoftheparameter,ition4Assumethesolutionof(6)satisﬁeshold:(i)isanupperboundonthefractionofoutliers.(ii)isalowerboundonthefractionofSVs..Thefollowingstatements12

(iii)Supposethedata(28)weregeneratedindee,moreover,obability1,asymptotically,(i)and(ii)followdirectlyfromProposition3andthefactthatoutliersaredealtwithinexactlythesamewayasmarginerrorsintheoptimizationproblemforthebinaryclassiﬁcationcase(Sch¨olkopfetal.,1999c).Thebasicideaisthat(10)imposesconstraintsonthefractionofpatternsthatcanhave,upperboundingthefractionofoutliers,andonthefractionofpatternsthatmusthave,atively,theresultcanbeprovendirectlybasedontheprimalobjectivefunction(5),assketchedpresently:tothisend,notethatwhenchanging,thetermwillchangeproportionallytothenumberofpointsthathaveanonzero(theoutliers),plus,whenchanginginthepositivedirection,thenumberofpointswhicharejustabouttogetanonzero,itonthehyperplane(theSVs).Attheoptimumof(5),wethereforehave(i)and(ii).Part(iii)canbeprovenbyauniformconvergenceargumentshowingthatsincethecoveringnumbersofkernelexpansionsregularizedbyanorminsomefeaturespacearewell-behaved,thefractionofpointswhichlieexactlyonthehyperplaneisasymptoticallynegligible(¨olkopfetal.,1999c).Proposition5(Resistance),hencebytheKKTconditions(kas,1995).Transformingitinto,where,,ore,isstillfeasible,,weusefor,,andascomputedfrom(13).Finally,theKKTconditionsarestillsatisﬁed,(Bertsekas,),atalthoughthehyperplanedoesnotchange,listoboundtheprobabilitythatanovelpointdrawnfromthesameunderltbyintroducingacommontoolformeasuringthecapacityofaclassoffunctionsthatmapto.13

Deﬁnition6Letbeapseudo-metricspace,an-coverforif,forevery,-coveringnumberof,,istheminimalcardinalityofan-coverfor(ifthereisnosuchﬁnitecoverthenitisdeﬁnedtobe).Theideaisthatshouldbeﬁusethedistanceoveraﬁnitesampleforthepseudo-metricinthespaceoffunctions,(33),thprobabilityoversuchan-sample,ifweﬁndsuchthatforall,isoftheproofis(Shawe-Tayloretal.,1998,Lemma3.9).Wenorrespondstohavinganon-zeroslackvariableinthealgorithm,wherewetakeandusethecﬁﬁnition8Letdeﬁ,Similarlyforatrainingsequence,wedeﬁeraﬁxedbutunknowistancefunctionthatdiffersfromametricinthatitisonlysemideﬁnite14

withprobabilityandanyoverrandomlydrawntrainingsequencesandofsize,forall,whereTheproofisbasedonsimilarproofsfortheclassiﬁcationcasein(Shawe-TaylorandCristianini,1999,Theorem3).Thetheoremboundstheprobabilityofanewpointfallingintheregionforwhichhasvaluelessthan,thlgorithmdescribedinthispaper,onewouldusethehyperplaneshiftedbytowardstheorigintodeﬁatthereisnorestrictionplacedontheclassoficeofgivesatrade-offbetweenthesizeoftheregionoverwhichtheboundholds(increasingincreasesthesizeoftheregion)andthesizeoftheprobabilitywithwhichitholds(increasingdecreasesthesizeofthelogcoveringnumbers).Theresultshowsthatwecanboundtheprobabilityofpointsfallingoutsidetheregionofestimatedsupportbyaquantityinvolvingtheratioofthelogcoveringnumbers(whichcanbeboundedbythefatshatteringdimensionatscalepropor-tionalto)andthenumberoftrainingexamples,ultisstrongerthanrelatedresultsgivenbyBen-DavidandLindenbaum(1997),sincetheirboundinvolvesthesquarerootoftheratioofthePollarddimen-sion(thefatshatteringdimensionwhentendsto0)veboundsare,nevertheless,notentirelysatisfactory,andtheirinclu-sionhereismuchmoreasasanitycheckthanasa“closed-case”mostoftheapparenttechnicalgapscanbereadilyﬁlled(forexampledeterminingthecoveringnumbersfortheclassoffunctionsin-ducedbyuseofaparticularkernelusingmethodsasinWilliamsonetal.(1999)),apsdonotinvalidatethealgorithm;theysimplyindicateanincompletetheory,difﬁinthesupportvectormachinecase,rmore,whilstnotimmediatelyapparent,theresultsstateddonotactuallygiveguidanceastohowtochosethekernelparameter,al-thterconnectionisnotnecessary,butitcouldbemotivatedbynotingthatit15

,/OLsmargin0.5,0.50.54,0.430.840.5,0.50.59,0.470.700.1,0.50.24,0.030.620.5,0.10.65,0.380.48Figure1:Firsttwopictures:Asingle-classSVMappliedtotwotoyproblems;,domain:.Notehowinbothcases,atleastafractionofofallexamplesisintheestimatedregion().Thelargevalueofcausestheadditionaldatapointsintheupperleftcornertohavealmostnoinﬂllervaluesof,suchas(thirdpicture),atively,onecanforcethealgorithmtotakethese‘outliers’(OLs)intoaccountbychangingthekernelwidth(4):inthefourthpicture,using,thedataiseffectivelyanalyzedonadifferentlengthscalausiblethatifweobtainaverylargemarginofseparationtotheorigin,wewouldbemorelikelytoacceptalarge(withtheassociatedriskofendingupwithmorefalsepositivesfromthe“unknown”class).Measuringrelativetothemarginwouldthenleadtoboundswhichdependonthemargin,lently,wecouldarguethatwetrytomaximizethemargininordertohavethefreedomtousealarge,leadingtosmallervaluesoftheerrorbounds,whilestillnotincludingthe“unknown”tly,thisargumentimplicitlymakespriorassumptionsabouttheunknownclass,inparticularthatitisinorithmcouldbemodiﬁedtoaccomodatethiscase,butpresently,weshallnotgointofurtherdetailonthatmatter.6ExperimentsWeapplythemethodtoartiﬁ1displays2-Dtoyexamples,andshowshowtheparametersettingsinﬂ2showsaplotoftheoutputsontraabasecontainsdigitimagesofsize;ur16

test

train

other

thresholdtest

train

other

thresholdFigure2:izerfordigit;outputhistogramfortheexemplarsofinthetraining/testset,-axisgivestheoutputvalues,umentofthesgnfunctionin(11).For(top),wegetSVsandoutliers(consistentwithProposition4),truepositivetestexamples,andzerofalsepositivesfromthe“other”(bottom),wegetandforSVsandoutliers,case,thetruepositiverateisimprovedto,,ﬁnally,thaity,manyexamplessitexactlyatthethresholdvalue(thenon-boundSVs).Sincethispeakissmoothedoutbytheestimator,thefractionsofoutliersinthetrainingsetappearslightlylargerthanitshouldbe.17

algorithm,usingaGaussiankernel(4)ofwidth(acommonvalueforSVMclassiﬁersonthatdataset,¨olkopfetal.(1995)),ninﬁgure2,leadstozerofalsepositives(oughthelearningmachinehasnotseenanynon--sduringtraining,itcorrectlyidentiﬁesallnon--sassuch),recognitionratescanbeachievedusingsmallervaluesof:for,wegetcorrectrecognitionofdigitsinthetestset,leadingtoencouragingresults,thisexperore,,weutilizedtheUSPSset;however,thistimewetrainedthealgorithmonthetestsetandusedittoidentifyoutliers—itisfolkloreinthecommunitythattheUSPStestset(Fig.3)containsanumberofpatternswhicharehardorimpossibletoclassify,duetosegmentationerrorsormislabelling(,1995).Intheexperiment,weaugmentedtheinputpatteionaleforthisisthatifwedisregardedthelabels,rsa,withthelabels,thealgorithmhasthechanc.4showsthe20worstoutliersfortheUSPStestset,atthealgorithmindeedexperiment,weusedthesamekernelwidthasabove,astexperiment,wetestedthescalingbehaviouroftheproposedSMOsolverwhichisusedfortrainingthelearningmachine(Fig.5).smallvaluesofwhicharetypicallyusedinoutlierdetectiontasks,thealgorithmscalesverywelltolargerdatasets,withadependencyoftrainingtimesonthesamplesizewhichisatmostquadratic.692830827Figure3:esrandomlydrawnfromtheUSPStestset,with18

1%2%3%4%5%6%7%8%9%10%20%30%40%50%60%70%80%90%fractionofOLs0.0%0.0%0.1%0.6%1.4%1.8%2.6%4.1%5.4%6.2%16.9%27.5%37.1%47.4%58.2%68.3%78.5%89.4%fractionofSVs10.0%10.0%10.0%10.1%10.6%11.2%11.5%12.0%12.9%13.7%22.6%31.8%41.7%51.2%61.0%70.7%80.5%90.1%trainingtime36393512841159Table1:atboundsthefractionsofoutliersandsupportvectorsfromaboveandbelow,respectively(ition4).Aswearenotintheasymptoticregime,thereissomeslackinthebounds;nevertheless,,moreover,thattrainingtimes(CPUtimeinsecondsonaPentiumIIrunningat450MHz)relatedtothefactthatalmostallLagrangemultiplierswillbeattheupperboundinthatcase(.4).Thesystemusedintheoutlierdetectionexperimentsisshowninboldface.19

−5139−5071−4580−3771−2827−2162−2003−1869−1795−1620−1533−1436−1286−1230−1177−935−780−587−526−483Figure4:Outliersidentiﬁedbytheproposedalgorithm,rankedbythenegativeoutputoftheSVM(theargumentof(11)).Theoutputs(forconvenienceinunitsof)arewrittenunderneatheachimageinitalics,the(alleged)atmostoftheexamplesare“difﬁcult”inthattheyareeitheratypicalorevenmislabelled.7DiscussionOnecouldviewthepresentworkasanattempttoprovideanewalgorithmwhichisinlinewithVapnik’sprinciplenevertosolveap,insituationswhereoneisonlyinterestedindetectingnovelty,,densityestimationismoredifﬁcultthanwhatwearedoing,aticallyspeaking,adensitywillonlyexistiftheunderlyingprobabr,thegeneralproblemofestimatingthemeasureforalargeclassofsets,saythesetsmeasureableinBorel’ssense,isnotsolvable(foradiscussion,,1998).Thereforeweneesmallclassofsets,thesimplestestimatorwhichaccomplishesthistaskistheempiricalmeasure,whitswiththenumberoftrainingpointsthataresupposedtofallintotheregion,,therewillbemanysuchregions—thesolutionbecomesuniqueonlybyapplyingaregularizer,whichinourcaseeore,wemustkeepinmindthatthemeasureofsmallnessinthissensedependsonthekernelused,inawayarproblem,however,20

8ν= 1%ν= 2%ν= 5%ν=10%11109876ν=20%ν=30%ν=40%ν=50%7654543322167891Figure5:tsizes(bothaxesdepictlogsatbase2;CPUtimeinsecondsonaPentiumIIrunningat450MHz).AsinTable1,itcanbeseenthatlargervaluesofgenerallyleadtolongertrainingtimes(notethattheplotsusedifferenty-axisranges).However,gevaluesof,trainingtimesareroughlyproportionaltothesamplesizeraisedtothepowerof(rightplot).Forvalues(leftplot),ypicallyusedinoutlierdetectionexperiments(inFig.4,weused),thescalingexponentisbelow(theexponentscanbedirectlyreadofffromtheslopeofthegraphs,astheyareplottedinlogscalewithequalaxisspacing).Notethatthescalingsarebetterthanthecubiconethatonewouldexpectwhensolvingtheoptimizationproblemusingallpatternsatonce,eotherexperiments,weused,howeverweonlytrainedonsubsetsoftheUSPStestset.21

performa(nonlinear)coordinatetransformationintheinputdomain,thenthedensityvalueswillchange;looselyspeaking,whatremainsconstantis,whileistransformed,rectlyestimatingtheprobabilitymeasureofregions,wearenotfacedwiththisproblem,activepropertyofthemeasureofsmallnessthatwechosetouseisthatitcanalsobeplacedinthecontextofregularizationtheory,leadingtoaninterpretationofthesolutionasmaximallysmoothinasensewhichdependsonthespeciﬁeciﬁcally,letusassumethatisGreen’sfunctionofforanoperatormappingintosomedotproductspace(Smolaetal.,1998;Girosi,1998),andtakealookatthedualobjectivefunctionthatweminimize,ularizationoperatorsofcommonkernelscanbeshowntocorrespondtoderivativeoperators(PoggioandGirosi,1990)—therefore,minimizingthedualobjectivefunctioncorrespondstomaximizingthesmoothnessofthefunction(whichis,uptoathresholdingoperation,thefunctionweestimate).This,inturn,stingly,astheminimizationofthedualobjectivefunctionalsocorre-spondstoamaximizationofthemargininfeaturespace,anequivalentinterpreta-tionisintermsofaprioronthedistributionoftheunknownotherclass(the“novel”classinanoveltydetectionproblem)—tryingtoseparatethedatafromtninspiratio1962,theyproposedanalgorithmforcharacterizingasetofunlabelleddatapointsbyseparatingitfromtheoriginusingahyperplane(VapnikandLerner,1963;VapnikandChervonenkis,1974).However,theyquicklymovedontotwo-classclassiﬁcationproblems,bothintermsofalgorithmsandintermsofthetheoreticaldevelopmentofstatisticallearningtheorywhichoriginatedinthosedays.22

Fromanalgorithmicpointofview,wecanidentifytwoshortcomingsoftheoriginalapproachwhichmay,theoriginalalgorithmin(VapnikandChervonenkis,1974)waslimitedtolineardecisionrulesininputspace,secondly,unction,theserestrictionsareindeedsevere—agenericmodiﬁ,thekerneltrickallowsforamuchlargerclassoffunctionsbynon-linearlymappingintoahigh-dimensionalfeaturespace,andthicular,usingaGaus-siankernel(4),suchaseparationexistsforanydataset:toseethis,notethatforall,thusalldotproductsbetweenmappedpatternsarepositive,er,sinceforall,ondmodiﬁincorporatedthis‘softness’ofthedecisionruleusingthe-trick(Sch¨olkopfetal.,1999c)evethatourapproach,proposingaconcretealgorithmwithwell-behavedcomputationalcomplexity(convexquadraticprogramming)foraproblemthatsofarhasmainlybeenstudthealgorithmintoaneasy-to-useblack-boxmethodforpracticioners,questionsliketheselectionofkernelparameters(suchasthewidthofaGaussiankernel)rexpectationthatthetheoreticalresultswhichwehavebrieﬂyoutlrkwassupportedbytheARCandtheDFG(#Ja379/9-1).PartsofitweredonewhileBSandASwerewithGMDFIRST,-David,,,,¨orr,ngdistributionsbytheirdensitylevels:lofComputerandSystemSciences,55:171–182,Scientiﬁc,Belmont,MA,1995.23

,,ingalgorithmforoptimalmarginclassiﬁer,editor,Proceedingsofthe5thAnnualACMWorkshoponComputationalLearningTheory,pages144–152,Pittsburgh,PA,tiondusupportetducontourdusupportd’uneloideproba-bilit´sdel’InstitutHenriPoincar´desProbabilit´leS´erie,12(4):339–364,,NewYork,etes,19(6):26–33,alsofStatistics,25(6):2300–2312,urnalonAppliedMathematics,38(3):480–488,alsofStatistics,20(2):1062–1078,aticalMethodsofStatistics,6(1):26–46,robl`emed’estimationg´eom´ationsdel’InstitutdeStatistiquedel’Universit´edeParis,13:191–210,Computation,10(6):1455–1480,loftheAmericanStatisticalAssociation,82:267–270,¨olkopf,,,editors,AdvancesinKernelMethods—SupportVectorLearning,pages185–ss,Cambridge,MA,i,e,charyya,ementstoPlatt’sSMOalgorithmforSVMclassiﬁcalReportCD-99-14,anicalandProductionEngineering,ore,Singapore,1999.24

er,NewYork,1993.D.W.M¨¨agezurStatistik,Uni-versit¨atHeidelberg,lofMultivariateAnalysis,39:348–371,¨olkopf,,,editors,AdvancesinKernelMethods—SupportVectorLearning,pages185–ss,Cambridge,MA,dingsoftheIEEE,78(9),lofMultivariateAnalysis,55(1):61–81,ingmassconcentrationsandestimatingdensitycontourclus-ters—alsofStatistics,23(3):855–881,sticProcessesandtheirApplications,69:1–24,loftheAmericanStatisticalAssociation,74(366):329–339,¨olkopf,,samy,editors,Proceedings,FirstInternationalConferenceonKnowledgeDiscovery&ess,MenloPark,CA,¨olkopf,,esinKernelMethods—ss,Cambridge,MA,¨olkopf,,andK.-R.M¨¨olkopf,,,editors,AdvancesinKernelMethods—ss,Cambridge,MA,1999b.327–¨olkopf,,mson,arin:NeuralComputation,:NeuroColt2-TR1998-031.25

¨olkopf,mson,,n,,,,editors,UnsupervisedLearning,Dagstuhl-Seminar-Report235,pages19–20,-Taylor,tt,mson,,44(5):1926–1940,FischerandHansUlrichSimon,editors,ComputationalLearningThe-ory,4thEuropeanConference,EuroCOLT’er,¨olt2-TR1998-030,svm.ﬁ,,¨olkopf,andK.-R.M¨Networks,11:637–649,,mson,,¨utationalLearningTheory:4thEuropeanConference,vol-ume1572ofLectureNotesinArtiﬁcialIntelligence,pages214–er,ec-trum,36(6):70–76,enko,,z,ydetectionfortheidentiﬁeedingsFourthIEEInterna-tionalConferenceonArtiﬁcialNeuralNetworks,pages442–447,Cambridge,sen,editor,ProceedingsESANN,pages251–256,Brussels,alsofStatistics,25(3):948–969,erVerlag,NewYork,,NewYork,1998.26

ofPatternRecognition[inRussian].Nauka,Moscow,1974.(GermanTranslation:&wonenkis,TheoriederZeichenerkennung,Akademie-Verlag,Berlin,1979).-tomatikaiTelemekhanika,24:774–780,mson,,¨ynumbers,¨olkopf,,,editors,Ad-vancesinKernelMethods—SupportVectorLearning,pages127–ss,Cambridge,MA,1999.27

本文标签：透明度字体作者

版权声明：本文标题：Estimating the support of a high-dimensional distribution_ 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://roclinux.cn/b/1708248299a517690.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

Estimating the support of a high-dimensional distribution_

更多相关文章

Kongsberg

PureProteome Protein A和Protein G磁性珠粒体数据表说明书

SAP的校验和替代(validations and substitutions)

Stata MI命令详解说明书

Western BLoT Stripping Buffer 使用说明书

硅烷衍生化

纤维素酶经典综述_免费下载

VB 225 FS 技术数据说明书

RayBio Human IDO ELISA Kit 说明书

Sikadur30_tds

最新国家开放大学电大专科《管理信息系统》机考网络考试标准题库及答案

(完整word版)中南民族大学《Java语言程序设计》期末试卷A卷

编译原理课程设计报告——词法分析器

编译原理(陈火旺第三版)练习答案

C语言程序设计基础试题一及答案

浏览器字体变大|变小怎么办，浏览器字体大小设置方法

Tiny11作者开源：利用微软官方镜像制作独属于你的Tiny11镜像

AutoCAD缺少字体怎么办？（如找不到指定字体，显示不出）

windows窗口cmd命令修改字体为consola

如何设置app字体跟随系统_Android中App字体大小不随系统改变而改变

发表评论

推荐文章

Convert UTC time ONLY to local time ONLY (without date) using javascript or momentjs - Stack Overflow

html - How to check if an Javascript script has been loaded Asynchronously (Async) or async attribute is present? - Stack Overfl

javascript - Add button inside textbox - Stack Overflow

【应急响应】Windows应急响应 - 基础命令篇

python 登录tplink路由器

热门文章

php - How to change textContent via javascript? - Stack Overflow

javascript - pjax : HTML link that works like a browser back button - Stack Overflow

javascript - D3 - Large GeoJSON File does not show draw map properly using projections - Stack Overflow

javascript - Network Status Detect In Reactjs - Stack Overflow

java - Why is my response already committed in my lowest precedence filter in spring when using ContentCachingResponseWrapper? -

javascript - Stop an interval from another function in React - Stack Overflow

javascript - Get text (not url) from address bar in Google Chrome - Stack Overflow

jquery - How to change background color select 2 using javascript? - Stack Overflow

10代cpu可以装win7系统吗_10代cpu装win7系统详细安装分析

U盘启动盘安装系统，使用Diskpart命令对磁盘进行分区

最新文章

javascript - How do I toggle the readonly attribute of all child element with jquery - Stack Overflow

javascript - Might it be possible to block an entire US state from accessing my site, using PHP? - Stack Overflow

c++ - Is dereferencing std::span::end always undefined? - Stack Overflow

javascript - Delay function execution if it has been called recently - Stack Overflow

javascript - Google Maps Autocomplete List - Stack Overflow

Windows 安装和连接使用 PgSql数据库

cmd打开计算机D盘,Win7利用cmd命令进入d盘文件夹的操作方法

如何在VMare中制作Windows Embedded Standard 7 (WES 7)

开机、注销后自动登录Windows

【教程】Python Flask快速学习

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA