IBM®
Skip to main content
    Israel [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Machine learning for healthcare (EuResist)

Machine Learning


Generative-discriminative Hybrid Technique


We plan to use a technique that combines two kinds of learning algorithms: discriminative and generative. We plan to employ Bayesian networks in the generative phase, and SVM in the discriminative phase.

Algorithms under the generative framework try to find a statistical model that best represents the data. The predictions are then based on the likelihood scores derived from the model. This category includes algorithms such as Hidden Markov Models (HMM) [1], Gaussian Mixture Models (GMM) [2] and more complicated graphical models such as Bayesian networks [3].

Bayesian networks offer an efficient modeling scheme by providing a compact representation of the complex (possibly stochastic) relationships among a set of ingredients, together with the possibility of encoding the essential domain knowledge. Their capability for bidirectional inferences, combined with a rigorous probabilistic foundation, led to the rapid emergence of Bayesian networks as the method of choice for uncertain reasoning in AI and expert systems.

In contrast, discriminative framework algorithms are aimed directly at the classification task. These algorithms try to parameterize the optimal boundary separating the classes. Over the last decade, a new classification methodology based on Support Vector Machines (SVM) [4] has found an increased interest in the machine learning realm. Favorable properties of the SVM represent an attractive way of enhancing the standard methods. These properties include their inherent class-discriminative model structure and the use of nonlinear-kernel methods, combined with relatively low computational complexity. To date, SVM is considered to be the most powerful algorithm among the discriminative machine learning techniques.

When comparing the two strategies in the practical sense of classification, the discriminative algorithms usually prevail. On the other hand, the generative algorithms perform better when the available data is incomplete, or when its labeling is uncertain.

More advanced methods try to combine algorithms from the two paradigms in order to exploit the advantages of both. One way is to incorporate the statistical analysis through the SVM kernel function.[5,6] The basic SVM algorithm finds the maximum margin linear separator between the classes (maximizing the margin was proved to result with a good generalization error on previously unseen data.[7]). The role of the kernel function in SVM is to project the input vectors into a higher dimensional space, in which these vectors are linearly separable. Using kernel functions, the class of possible separators is extended to more complicated non-linear separators in the original input space. A recent view by Lebanon et al. presents the kernel function as a similarity measure between the examples in a twisted geometric space[8].

In a study by Jaakkola and Haussler[6], the authors describe a general formalism for deriving a kernel function from a generative probabilistic model. The kernel function, called the Fisher kernel, specifies a similarity score between the examples based on the parameters of the generative model. The Fisher score is defined as the derivative of the log?likelihood score for a query example x, with respect to each of the model's parameters. In essence, the process as a whole learns the distribution of parameters for the generative models and then finds the maximum margin boundary that best separates them. This way, the generative model is being utilized as a distance measure between the examples. Additional kernel functions that are derived from generative models have been suggested, including the family of probability product kernels[9], the heat kernel [10], and kernels based on the Kullback-Leibler divergence[11].

References

  1. L. R Rabiner and B.H. Juang (1986) An introduction to hidden Markov models. IEEE ASSP Magazine 3(1):4-16

  2. D. M. Titterington, A. F. M. Smith, and U. E. Makov (1985) Statistical Analysis of Finite Mixture Distributions. Wiley.

  3. J. Pearl (1988) Probabilistic Reasoning in Intelligent Systems: Network of Plausible Inference. Morgan Kaufmann.

  4. V. Vapnik (1995) The nature of statistical learning theory. Springer-Verlag.

  5. T. S. Jaakkola and D. Haussler (1999). Probabilistic kernel regression models. In D. Heckerman and J. Whittaker, editors, Workshop on Artificial Intelligence and Statistics 7. Morgan Kaufmann, 1999.

  6. T. S. Jaakkola and D. Haussler (1999) Exploiting generative models in discriminative classifiers. Proceedings of the 1998 Conference on AI and Statistics 487-493.

  7. N. Cristianini and J. Shawe-Taylor (2000) An Introduction to Support Vector Machines, Cambridge University Press.

  8. J. Lafferty and G. Lebanon (2005) Diffusion kernels on statistical manifolds. Journal of Machine Learning Research 6:129-163.

  9. T. Jebara, R. Kondor and Howard (2004) A Probability Product Kernels Journal of Machine Learning Research 5:819-844.

  10. J. Lafferty and G. Lebanon (2002) Information diffusion kernels. Neural Information Processing Systems 2002.

  11. P. J. Moreno, P. P. Ho, and N. Vasconcelos (2004) A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. Neural Information Processing Systems 2003.








































 
 

 


    About IBMPrivacyContact