Skip to main content

Machine Learning

CLERK - Clustering with ExteRnal Knowledgebase

CLERK is software developed jointly by the Machine Learning group and the Information Retrieval group. It aims to structure a large, unstructured or partially structured collection of documents. It is developed under the TextECM2Clust project in collaboration with the ILSL software group. CLERK is a hybrid of various novel solutions: probabilistic, unsupervised and semi-supervised clustering and cluster labeling based on the structure provided in an external database such as Wikipedia. It has been tested against several datasets and outperformed K-means in all tests.

Contact: Michal Rosen-Zvi (rosen@il.ibm.com)

AdAd - Addressable Advertizing

Addressable Advertizing is a joint project with the Search Technologies Development group. It is devoted to the development of software named Tailor that aims to provide profiling analysis and a recommendation platform. Tailor's input is information about products and customers (such as their preferences for movies or music). A number of possible outputs can be derived once Tailor digests the information. For example, a server is devoted to queries on what additional items to recommend to a customer who just purchased Item X. Alternatively, a user interface platform supports analysis of what products to recommend for a subpopulation, say, from a particular geographical location. Tailor is a combination of two main modules; one is search-based and the other is a machine-learning module based on state-of-the art probabilistic graphical models that analyze the customers' preferences based on shared interests between different persons, while exploring connections between different products.

Contact: Michal Rosen-Zvi (rosen@il.ibm.com)

HYPERGENES

The HYPERGENES project is an FP7 initiative aiming to construct a genetic-epidemiological model for a complex disease, namely, a disease that is associated with many genomic variants. Specifically, the focus is on essential hypertension and relevant target organ damages known to affect 50% of the population over the age of 60. In the course of the project IBM will develop machine learning techniques for a genome-wide association on high-throughput genomic data. This information will be integrated into a disease model binding the genotypic variants with phenotypic observations. On the technological aspects, the project will develop a holistic architecture for biomedical information infrastructure integrating data storage capabilities with advanced analysis tools.

For details, see:

Contact: Hani Neuvirth-Telem (hani@il.ibm.com)

Anatomic and Symbolic Mapper Engine

Anatomic and Symbolic Mapper Engine (ASME) is a first-of-a-kind initiative led by a research team at IBM's Zurich Research Lab. ASME will allow physicians to search through a patient's electronic health records by looking at a 3-D multi-layer model of the human body. When clicking on a part of the body, all the relevant information found in those records will be displayed. The physician will also be able to refine the search by supplying keywords. Our group's contribution to the project is a plug-in that will suggest keywords for searching through the patient's records. The plug-in will determine those keywords most likely to return good results by analyzing the health records in conjunction with recently typed keywords, the selected part of the body, and additional information.

Contact: Ehud Aharoni (aehud@il.ibm.com)

EuResist

EuResist is a Pharmacogenomics project that integrates viral genomics with clinical data to predict responses to anti-HIV treatment. The system will provide clinicians with a prediction of responses to antiretroviral treatment in HIV patients, thus helping them choose the best drugs and drug combinations for any given HIV genetic variant.
For details, see:

Contact: Michal Rosen-Zvi (rosen@il.ibm.com)

PML - Parallel Machine Learning

The Parallel Machine Learning (PML) Toolbox, a joint effort of the Machine Learning group at the IBM R&D Labs in Israel and the Data Analytics department at the IBM Watson Lab, provides tools for execution of data mining and machine learning algorithms on multiple processor environments or on multiple threaded machines. The toolbox comprises two main components: an API for running the users' own machine learning algorithms, and several pre-programmed algorithms, which serve both as examples and for comparison. The pre-programmed algorithms include a parallel version of the Support Vector Machine (SVM) classifier, linear regression, transform regression, nearest neighbors, k-means, fuzzy k-means, kernel k-means, PCA, and kernel PCA.
For details, see:

Contact: Elad Yom-Tov (yomtov@il.ibm.com)

IR - Information Retrieval

Our work with the Information Retrieval team at the IBM R&D Labs in Israel centers on assessing the amount of content in an information repository. We are developing methods to assess gaps in knowledge as well as identify content that users cannot find due to its format or it being masked by other information. Our work has been published in several leading conferences and has won the SIGIR 2005 Best Paper award. For details, see: http://www.haifa.il.ibm.com/projects/verification/ml_ir/index.html

Contact: Elad Yom-Tov (yomtov@il.ibm.com)

300mm

The aim of this project is to improve manufacturing at the East Fishkill fab using machine learning methods. The project belongs to the 300mm OR Joint Project, led by Robert Baseman from Watson. This project is devoted to three sectors at the fab: photo lithography (Litho), reactive ion etcher (RIE) and chemical mechanical polishing (CMP).

Contact: Noam Slonim (noams@il.ibm.com)

MeLoDy - Machine Learning for Dynamic System Analysis

The Melody project aims to improve xServer problem diagnosis processes by applying Machine Learning techniques to configuration data of operational xServers.

This is a joint project of the Machine Learning group in HRL and the xServer Tool and Support Center in RTP. For details, see: http://www.haifa.il.ibm.com/projects/verification/ml_melody/index.html

Contact: Sivan Sabato (sivans@il.ibm.com)

Vigilant (virtual guest inspection, learning, and control)

PC hardware virtualization is a maturing technology that allows us to make an operating system "believe" it is running on real hardware, while in fact being controlled by another piece of software called a "hypervisor". This opens new possibilities in personal productivity, reliability, security, consolidation, disaster recovery, availability, and more. In the Machine Learning group, we are looking at ways to help the hypervisor identify both extreme and mundane conditions of the running ("guest") OS.

Contact: Dan Pelleg (dpelleg@il.ibm.com)

MilePost - MachIne Learning for Embedded PrOgramS opTimization

Milepost is about using machine learning methods to improve compilation of computer programs, with a focus on embedded and reconfigurable architectures. Several approaches are being investigated, including these:

  • Iterative compilation for high performance libraries - Compiling multiple versions using machine learning aided iterative compilation and rapidly selecting the best version at runtime.
  • Predictive modeling for code transformations - Learning a predictive model that can later be queried to provide an optimal set of compiler optimizations based on code/data/architecture features.
  • Continuous optimization - Exploiting data collected across different code/data/hardware to continuously improve compiler performance. Also, recompilation during execution.

Contact: Elad Yom-Tov (yomtov@il.ibm.com)

CDG - Coverage Directed Test Generation

One of the main bottlenecks of the verification process in general and coverage analysis in particular, is closing the loop between the coverage results and directives to the stimuli generators. To address this bottleneck, we research and develop coverage-directed generation (CDG) methodology and technology, which are designed to automate the process of using feedback from coverage analysis for tuning generation stimuli towards areas not adequately verified. CDG casts the problem as a statistical inference problem, and uses Bayesian Networks to encode the complex joint input-output distribution space, for ultimately inferring generation directives to the stimuli generator. For details, see: http://www.haifa.il.ibm.com/projects/verification/ml_cdg/index.html

Contact: Avi Ziv (aziv@il.ibm.com)

CodeGuru

IBM is a sponsor of two computer contests: CodeGuru (for youngsters, 15-18 years old), and CodeGuru Xtreme (to write an assembly program that would survive the longest). For more details, see: http://www.codeguru.co.il

Contact: Oded Margalit (odedm@il.ibm.com)