Title:

Kind
Code:

A1

Abstract:

A method that yields more accurate Bayesian network classifiers when learning from unlabeled data in combination with labeled data includes learning a set of parameters for a structure of a classifier using a set of labeled data and learning a set of parameters for the structure using the labeled data and a set of unlabeled data and then modifying the structure if the parameters based on the labeled and unlabeled data leads to less accuracy in the classifier in comparison to the parameters based on the labeled data only. The present technique enable an increase in the accuracy of a statistically learned Bayesian network classifier when unlabeled data are available and reduces the likelihood of degrading the accuracy of the Bayesian network classifier when using unlabeled data.

Inventors:

Cohen, Ira (Urbana, IL, US)

Cozman, Fabio G. (Boqueirao Santos, BR)

Bronstein, Alexandre (Palo Alto, CA, US)

Duro, Marsha Prescott (Los Altos Hills, CA, US)

Cozman, Fabio G. (Boqueirao Santos, BR)

Bronstein, Alexandre (Palo Alto, CA, US)

Duro, Marsha Prescott (Los Altos Hills, CA, US)

Application Number:

10/425463

Publication Date:

11/04/2004

Filing Date:

04/29/2003

Export Citation:

Assignee:

COHEN IRA

COZMAN FABIO G.

BRONSTEIN ALEXANDRE

DURO MARSHA PRESCOTT

COZMAN FABIO G.

BRONSTEIN ALEXANDRE

DURO MARSHA PRESCOTT

Primary Class:

International Classes:

View Patent Images:

Related US Applications:

Primary Examiner:

BUSS, BENJAMIN J

Attorney, Agent or Firm:

HP Inc. (Fort Collins, CO, US)

Claims:

1. A method for generating a classifier, comprising the steps of: learning a set of parameters for a structure of the classifier using a set of labeled data; learning a set of parameters for the structure using the labeled data and a set of unlabeled data; modifying the structure if the parameters based on the labeled and unlabeled data leads to less accuracy in the classifier in comparison to the parameters based on the labeled data only.

2. The method of claim 1, wherein the step of learning a set of parameters for a structure of the classifier using a set of labeled data comprises the step of learning the parameters in response to a set of labeled records each comprising a value for each of a set of features and a corresponding label.

3. The method of claim 1, wherein the step of learning a set of parameters for the structure using the labeled data and a set of unlabeled data comprises the step of learning the parameters in response to a set of labeled records each comprising a value for each of a set of features and a corresponding label and a set of unlabeled records each comprising a value for a subset of the features.

4. The method of claim 1, wherein the step of modifying the structure if the parameters based on the labeled and unlabeled data leads to less accuracy in the classifier in comparison to the parameters based on the labeled data only comprises the steps of: generating a first classifier based on the structure using the parameters derived from the labeled data only; generating a second classifier based on the structure using the parameters derived from the labeled data and the unlabeled data; determining an accuracy of the first classifier and an accuracy of a second classifier; modifying the structure if the accuracy of the second classifier is less than the accuracy of the first classifier.

5. The method of claim 4, further comprising the step of learning the parameters for the second classifier using a set of additional data if the accuracy of the second classifier is not less than the accuracy of the first classifier.

6. The method of claim 5, wherein the step of determining an accuracy comprises the step of determining the accuracy using a set of labeled test data.

7. A method for generating a classifier, comprising the steps of: generating an initial structure for the classifier; generating a first classifier by learning a set of parameters for the initial structure in response to a set of labeled data; determining a second classifier by learning a set of parameters for the initial structure in response to the labeled data and a set of unlabeled data; modifying the initial structure for the classifier if the second classifier is less accurate than the first classifier.

8. The method of claim 7, further comprising the step of determining whether the second classifier is less accurate by testing the first and second classifiers using a set of test data.

9. The method of claim 8, wherein the step of testing the first and second classifiers using a set of test data comprise the step of testing the first and second classifiers using a set of labeled test data.

10. The method of claim 7, further comprising the step of learning the parameters for the second classifier using a set of additional data if the accuracy of the second classifier is not less than the accuracy of the first classifier.

11. A Bayesian network learning system, comprising: a set of labeled data; a set of unlabeled data; Bayesian network generator that determines a set of parameters for a structure of a classifier in response to the labeled data and a set of parameters for the structure in response to a combination of the labeled data and the unlabeled data and that modifies the structure if the parameters based on the labeled and the unlabeled data leads to less accuracy in the classifier in comparison to the parameters based on the labeled data only.

12. The Bayesian network learning system of claim 11, wherein the labeled data includes a set of labeled records each comprising a value for each of a set of features and a corresponding result to be determined by the classifier.

13. The Bayesian network learning system of claim 12, wherein the unlabeled data includes a set of unlabeled records each comprising a value for a subset of the features.

14. The Bayesian network learning system of claim 11, wherein the Bayesian network generator determines a first classifier based on the structure using the parameters derived from the labeled data only and determines a second classifier based on the structure using the parameters derived from the labeled data and the unlabeled data and modifies the structure if an accuracy of the second classifier is less than an accuracy of the first classifier.

15. The Bayesian network learning system of claim 14, wherein the Bayesian network generator determines the parameters for the second classifier using a set of additional data if the accuracy of the second classifier is not less than the accuracy of the first classifier.

16. The Bayesian network learning system of claim 15, further comprising a set of labeled test data.

17. The Bayesian network learning system of claim 16, wherein the Bayesian network generator determines the accuracy in response to the labeled test data.

Description:

[0001] 1. Field of Invention

[0002] The present invention pertains to the field of Bayesian network classifiers. More particularly, this invention relates to learning Bayesian network classifiers using labeled and unlabeled data.

[0003] 2. Art Background

[0004] Bayesian network classifiers may be employed in a wide variety of applications. Examples of applications of Bayesian network classifiers include diagnostic systems, decision making systems, event predictors, etc.

[0005] A typical Bayesian network classifier may be represented as a graph structure having a set of nodes and interconnecting arcs that define parent-child relationships among the nodes. A Bayesian network classifier usually includes a set of Bayesian network parameters which are associated with the nodes of the graph structure. The Bayesian network parameters usually specify the probabilities that each child node in the graph structure is in a particular state given that its parent nodes in the graph structure are in a particular state. Typically, the nodes of a Bayesian network classifier are associated with variables of an underlying application and the Bayesian network parameters indicate the strength of dependencies among the variables. Typically, the variables of a Bayesian network classifier include a set of features and a classification result.

[0006] The process of generating a Bayesian network classifier usually includes determining a structure of nodes and interconnecting arcs and then learning the Bayesian network parameters for the structure. The Bayesian network parameters are usually learned using a set of data that pertains to an application for which the classifier is being designed. The data that may be used to learn Bayesian network parameters may include labeled data and/or unlabeled data. Labeled data may be defined as a set of values for the features for which a classification result is known. The classification result is usually referred to as a label. Unlabeled data may be defined as a set of values for the features for which a classification result is not known.

[0007] Prior methods for learning Bayesian network parameters may use only labeled data. Unfortunately, labeled data are often difficult and/or expensive to obtain. Moreover, labeled data are usually required in large quantities to yield an accurate Bayesian network classifier which renders the task of acquiring labeled data even more daunting.

[0008] Prior methods for learning Bayesian network parameters may use only unlabeled data. Unfortunately, methods for learning from unlabeled data are usually computationally expensive and may not yield an accurate Bayesian network classifier.

[0009] Prior methods for learning Bayesian network parameters may use a combination of labeled and unlabeled data. Unfortunately, prior methods for learning from a combination of unlabeled and labeled data usually lead to inconsistent results. Sometimes such methods yield a more accurate Bayesian network classifier and sometimes such methods yield a less accurate Bayesian network classifier.

[0010] A method is disclosed that yields more accurate Bayesian network classifiers when learning from unlabeled data in combination with labeled data. The present technique enable an increase in the accuracy of a statistically learned Bayesian network classifier when unlabeled data are available and reduces the likelihood of degrading the accuracy of the Bayesian network classifier when using unlabeled data.

[0011] A method according to the present teachings includes learning a set of parameters for a structure of a classifier using a set of labeled data only and learning a set of parameters for the given structure using the labeled data and a set of unlabeled data and then modifying the structure if the parameters based on the labeled and unlabeled data leads to less accuracy in the classifier in comparison to the parameters based on the labeled data only.

[0012] Other features and advantages of the present invention will be apparent from the detailed description that follows.

[0013] The present invention is described with respect to particular exemplary embodiments thereof and reference is accordingly made to the drawings in which:

[0014]

[0015]

[0016]

[0017]

[0018]

[0019] The Bayesian network generator

[0020] The Bayesian network learning system

[0021]

[0022]

[0023] Each of the nodes 20-24 has an associated conditional probability table in the initial structure from step

P (R = Y) | |

P (R = N) | |

[0024] The result R associated with the node 20 in this example is a binary result, i.e. Yes/No. The conditional probability table for the node 20 includes the probability that R=Yes (P(R=Y)) and the probability that R=No (P(R=N)).

[0025] The following illustrates an example conditional probability table for the node 21.

P (F1 = Y|R = Y) | P (F1 = N|R = Y) | |

P (F1 = Y|R = N) | P (F1 = N|R = N) | |

[0026] The feature F1 associated with the node 21 in this example is a binary value of Yes/No. The conditional probability table for the node 21 includes the probability that F

[0027] The following illustrates an example conditional probability table for the node 22 which is associated with a binary feature F2.

P (F2 = Y|R = Y) | P (F2 = N|R = Y) | |

P (F2 = Y|R = N) | P (F2 = N|R = N) | |

[0028] The conditional probability table for the node 22 includes the probability that F2=Yes given that R=Yes (P(F2=Y|R=Y)) and the probability that F2=Yes given that R=No (P(F2=Y|R=N)) and the probability that F2=No given that R=Yes (P(F2=N|R=Y)) and the probability that F2=No given that R=No (P(F2=N|R=N)).

[0029] The nodes 23-24 have similar arrangements for the probabilities associated with the features F3-F4, respectively.

[0030] At step

[0031] The following is a set of example records (Record 1-4) of the labeled data

F1 | F2 | F3 | F4 | R | ||

Record 1 | Y | N | N | N | Y | |

Record 2 | Y | Y | Y | Y | N | |

Record 3 | Y | Y | N | Y | Y | |

Record 4 | N | N | Y | N | N | |

[0032] The Bayesian network generator

[0033] The Records 1-4 yield the following probabilities for the conditional probability table of the node 21.

P (F1 = Y|R = Y) = 2/2 | |

P (F1 = Y|R = N) = 1/2 | |

P (F1 = N|R = Y) = 0/2 | |

P (F1 = N|R = N) = 1/2 | |

[0034] The Records 1-4 yield the following probabilities for the conditional probability table of the node 22.

P (F2 = Y|R = Y) = 1/2 | |

P (F2 = Y|R = N) = 1/2 | |

P (F2 = N|R = Y) = 1/2 | |

P (F2 = N|R = N) = 1/2 | |

[0035] The probability values in the conditional probability tables for the nodes 23-24 may be determined in a similar manner.

[0036] At step

[0037] The unlabeled data

[0038] A technique for learning from the unlabeled data

[0039] At step

[0040] At step

[0041]

[0042] The Bayesian network generator

[0043] At step

[0044] A variety of known learning methods may be used for processing labeled and unlabeled records at step

[0045] The present technique for learning Bayesian network classifiers benefits from observations that learning from unlabeled data in prior methods can degrade the accuracy of a classifier. Given this observation, if learning from unlabeled data degrades a classifier then it may be inferred that the classifier structure does not match the structure of the underlying reality. The observation that additional data, albeit unlabeled, degrades classification performance may be counter-intuitive but may be nevertheless demonstrated by experimentation and theoretical analysis.

[0046] The present technique may be employed when exploring a space of Bayesian network structures for a particular classification application. In such activities, the effect of processing a particular batch of unlabeled data may be used to decide whether to keep processing the training data to improve the parameters in the conditional probability tables of the current structure or alternatively to backtrack to a different possibly richer structure and start over.

[0047] The present systematic technique for learning from unlabeled data in combination with labeled data may yield more accurate Bayesian network classifiers. These technique may be used to increase the accuracy of statistically learned Bayesian network classifiers when unlabeled data are available as is frequently the case. These technique also reduce the likelihood of degrading the resulting Bayesian network classifier when using unlabeled data as is common in prior technique.

[0048] The present systematic technique provides a systematic method to leverage a moderate number of labeled data in the presence of a large number of unlabeled data to reach a more accurate classifier. As such, these technique advance the state of the art in the field of semi-supervised learning and thereby increases the field of applicability of Bayesian network classifiers to circumstances where a moderate amount of labeled data are available.

[0049] The foregoing detailed description of the present invention is provided for the purposes of illustration and is not intended to be exhaustive or to limit the invention to the precise embodiment disclosed. Accordingly, the scope of the present invention is defined by the appended claims.