Learning a taxonomy from company data

The words used inside a company are an important key to find information - information should be described in word (=terms) known to the employees.

As part of OrganiK rollout, terms can be extracted from documents, to build a tree of terms, a taxonomy.

First Time Taxonomy Learning

  • The first step is to follow the instructions on ImportYourData? to get your existing data into Organik.
  • Once the data is imported, download the most recent OrganikOntologyLearning? zip-file from here:
  • Unzip and run a command like, replacing the parameters with your data-base settings of course.
    • Linux:
       sh bin/ --skosOutput mytaxonomy.skos --dbUrl jdbc:mysql://localhost --dbUser root --dbPassword password 
    • Windows:
      bin/DrupalTaxonomyLearner.bat --skosOutput mytaxonomy.skos --dbUrl jdbc:mysql://localhost --dbUser root --dbPassword password 
  • For large amounts of data the process may take long.
  • By default a taxonomy with about 250 terms is built. You can change this by providing the --taxonomyNoTerms option on the commandline.
  • Upload the SKOS file into your organik installation in the taxonomy administration settings, i.e. http://your.drupal.root/?q=admin/content/taxonomy/import

Taxonomy Refinement

After a while may new terms may have been added in the system, and they are not yet fitted into the taxonomy. By running the Taxonomy Refiner, the learning process can be repeated for new terms. Similarily to above, run the scripts bin/ or bin/DrupalTaxonomyRefiner.bat. These change your Drupal DB directly, there is no need to reimport the SKOS file.

You can find more technical details about the internal workings of the taxonomy-learning under TaxonomyLearningImplementation and wiki:Tutorials/AdminTaxonomyXml

Last modified 12 years ago Last modified on 06/08/10 19:07:02