Skip to main content

Table 2 Examples of tokens eliminated and retained during the feature selection process on drug "warfarin"

From: BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

r 2

Examples of tokens

≈ 1

(perfect correlation)

only, when, however, well, another, same, results, other, observed, possible, different, since, even, could, though, occurring, therefore, high, although, also, both, so, result, appeared

≈ 0.90

restricted, controls, implicated, followed, diverse, stable, display, rate, plays, indicative, inhibit, typically, describe, excluded, terminal, excessive, largest, knowledge, employing, se

≈ 0.80

life, mature, loading, preincubation, problem, failure, binds, resolved, physiology, shock, signs, molecule, bind, elevations, chinese, usual, surface, aid, unit, accurate

≈ 0.70

intervention, stimulus, transition, closed, enable, bands, requiring, ester, nervous, sizes, electrophoresis, polymorphonuclear, aging, associations, accounts, practical, selective, choice, routine, attached

≈ 0.60

subset, undergoes, success, antagonist, artery, mr, depolarization, fields, suppression, precipitation, temperatures, records, mg2, adjustment, oxygen, picture, assembly, transcripts, encoded, organic

≈ 0.50

hydrogen, coated, glycol, antisense, coronary, adsorbed, histology, scan, formulation, foods, holding, resorption, gestational, filling, locus, memory, atrophy, ringer, prospectively, recruitment

≈ 0.40

diuretics, atrial, lysis, spinal, camp, bmax, vein, proteases, chelator, arachidonic, alzheimer, ascorbic, histamine, rhythm, ouabain, gas, preoperative, bladder, menopause, pertussis

≈ 0.33

(moderate correlation)

chromatographic, endothelin, relaxed, acceptable, stenosis, withdrawal, january, trypsin, oxidized, infiltration, forearm, et-1, enrolled, electrochemical, peroxidation, mothers, phosphodiesterase, cystic, compression, countries

  1. r = Pearson's correlation coefficient
  2. Examples of tokens retained by feature selection (in decreasing order of document frequencies)
  3. Warfarin (8060), anticoagulation (2508), anticoagulant (1953), heparin (1699), thrombosis (1651), bleeding (1633), international (1324), venous (1238), aspirin (1231), fibrillation (1191), inr (1106), prothrombin (1035), thromboembolism (1017), anticoagulants (864), thromboembolic (860), coagulation (790), embolism (706), deep (698), prophylaxis (636), antithrombotic (606)
  4. Examples of rare tokens eliminated by feature selection
  5. vestige, bacteroides, ca-laurell, gd2, idaho, i475s, h2-blocker, depots, viic/viiam, left-hemispheric, p = .37, laboratory-developed, cardio, frames, thistle, thy1, homolog, videotapes, u-105665, five-years, cold-labeled, workups, fviiic
  6. Examples of common tokens eliminated by feature selection