Data::Storage

NAME
ABSTRACT ;-)
INTENTIONS
SYNOPSIS

ACCESS
SYNCHRONIZATION

proposal V1
proposal V2

NOTE

REQUIREMENTS
DESCRIPTION

Data::Storage
Why?
What else?

AUTHORS / COPYRIGHT
ACKNOWLEDGEMENTS
SUPPORT / WARRANTY
TODO

BUGS
FEATURES

LINKS / REFERENCES

NAME

  Data::Storage - Interface for accessing various Storage implementations for Perl in an independent way

ABSTRACT ;-)

   Data Storage 
   
   "Where is the wisdom? Lost in the knowledge.
   Where is the knowledge? Lost in the information." 
   - T.S. Eliot 
   
   "Where is the information? Lost in the data.
   Where is the data? Lost in the #@$%?!& database." 
   - Joe Celko

 
  from: MacPerl: Power and Ease - Chapter 15
  url: http://www.macperl.com/ptf_book/r/MP/330.Data_Storage.html

INTENTIONS

  - should encapsulate Tangram, DBI, DBD::CSV and LWP:: to access them in an unordinary (more convenient) way ;)
  - introduce a generic layered structure, refactor *SUBLAYER*-stuff, make (e.g.) this possible:
    Perl Data::Storage[DBD::CSV]  ->  Perl LWP::  ->  Internet HTTP/FTP/*  ->  Host Daemon  ->  csv-file
  - provide generic synchronization mechanisms across arbitrary/multiple storages based on ident/checksum
    maybe it's possible to have schema-, structural- and semantical modifications synchronized???
  - might be similar to http://sourceforge.net/projects/perl-repository

SYNOPSIS

ACCESS

  # connect to LDAP
  my $ldapLocator = Data::Storage::Locator->new(
    ldap => {
      type => "NetLDAP",
      dsn => "ldap:host=192.168.10.150;binddn='cn=root, o=netfrag.org, c=de';pass=secret",
      basedn => "o=netfrag.org, c=de",
      want_transactions => 0,
      syncable => 1,
    },
  );
  my $ldapStorage = Data::Storage->new($ldapLocator);
  $ldapStorage->connect();

  # connect to MAPI
  my $mapiLocator = Data::Storage::Locator->new(
    outlook => {
      type => "MAPI",
      showProfileChooser => $self->{config}->get("mapi_showProfileChooser"),
      ProfileName => $self->{config}->get("mapi_ProfileName"),
      ProfilePass => $self->{config}->get("mapi_ProfilePass"),
      syncable => 1,
    },
  );
  my $mapiStorage = Data::Storage->new($mapiLocator);
  $mapiStorage->connect();

SYNCHRONIZATION

  This functionality is (in the meanwhile) provided by the Data::Transfer::Sync module.

proposal V1

  my $nodemapping = {
    'LangText' => 'langtexts.csv',
    'Currency' => 'currencies.csv',
    'Country'  => 'countries.csv',
  };

  my $propmapping = {
    'LangText' => [
      [ 'source:lcountrykey'  =>  'target:country' ],
      [ 'source:lkey'         =>  'target:key' ],
      [ 'source:lvalue'       =>  'target:text' ],
    ],
    'Currency' => [
      [ 'source:ckey'         =>  'target:key' ],
      [ 'source:cname'        =>  'target:text' ],
    ],
    'Country' => [
      [ 'source:ckey'         =>  'target:key' ],
      [ 'source:cname'        =>  'target:text' ],
    ],
  };

  s ub syncResource {

    my $self = shift;
    my $node_source = shift;
    my $mode = shift;
    my $opts = shift;
    
    $mode ||= '';
    $opts->{erase} ||= 0;
    
    $logger->info( __PACKAGE__ . "->syncResource( node_source $node_source mode $mode erase $opts->{erase} )");
  
    # resolve metadata for syncing requested resource
    my $node_target = $nodemapping->{$node_source};
    my $mapping = $propmapping->{$node_source};
    
    if (!$node_target || !$mapping) {
      # loggger.... "no target, sorry!"
      print "error while resolving resource metadata", "\n";
      return;
    }
    
    if ($opts->{erase}) {
      $self->_erase_all($node_source);
    }
  
    # create new sync object
    my $sync = Data::Transfer::Sync->new( 
      storages => {
        L => $self->{storage}->{backend},
        R => $self->{storage}->{resources},
      },
      id_authorities        =>  [qw( L ) ],
      checksum_authorities  =>  [qw( L ) ],
      write_protected       =>  [qw( R ) ],
      verbose               =>  1,
    );
    
    # sync
    # todo: filter!?
    $sync->syncNodes( {
      direction       =>  $mode,                 # | +PUSH | +PULL | -FULL | +IMPORT | -EXPORT
      method          =>  'checksum',            # | -timestamp | -manual
      source          =>  "L:$node_source",
      source_ident    =>  'storage_method:id',
      source_exclude  =>  [qw( id cs )],
      target          =>  "R:$node_target",
      target_ident    =>  'property:oid',
      mapping         =>  $mapping,
    } );

proposal V2

  # create a new synchronization object
    my $sync = Data::Transfer::Sync->new( 'sync_version' => $sync_version, __parent => $self );

  # configure the synchronization-object
    $sync->configure(
      source => {
        storage => {
          handle => $mapiStorage,
          #isIdentAuthority => 1,
          #isChecksumAuthority => 1,
          #writeProtected => 1,
        },
      },
      target => {
        storage => {
          handle => $ldapStorage,
          #idAuthority => 1,
          #isChecksumAuthority => 1,
          #isWriteProtected => 0,
        },
      },
      verbose => 1,
    );

NOTE

  This module heavily relies on DBI and Tangram, but adds a lot of additional bugs and quirks. 
  Please look at their documentation and/or this code for additional information.

REQUIREMENTS

  For full functionality:
    DBI              from CPAN
    DBD::mysql       from CPAN
    Tangram 2.04     from CPAN         (hmmm, 2.04 won't do in some cases)
    Tangram 2.05     from http://...   (2.05 seems okay but there are also additional patches from our side)
    Class::Tangram   from CPAN
    DBD::CSV         from CPAN
    MySQL::Diff      from http://adamspiers.org/computing/mysqldiff/
    ... and all their dependencies

DESCRIPTION

Data::Storage

  Data::Storage is a module for accessing various "data structures / kinds of structured data" stored inside
  various "data containers".
  We tried to use the AdapterPattern to implement a wrapper-layer around known CPAN modules.
  (e.g. DBI, Tangram, XML::Simple)
  References:
  - http://c2.com/cgi/wiki?AdapterPattern
  - http://home.earthlink.net/~huston2/dp/adapter.html

Why?

  You will get a better code-structure (not bad for later maintenance) in growing Perl code projects,
  especially when using multiple database connections at the same time.
  You will be able to switch between different _kinds_ of implementations used for storing data.
  Your code will use the very same API to access these storage layers.
      ... implementation has to be changed for now
  Maybe you will be able to switch "on-the-fly" without changing any bits in code in the future.... 
      ... but that's not the focus

What else?

  Having this, we were able to do implement a generic data synchronization module more easy,
  please look at Data::Transfer.

AUTHORS / COPYRIGHT

  The Data::Storage module is Copyright (c) 2002 Andreas Motl.
  All rights reserved.
  You may distribute it under the terms of either the GNU General Public
  License or the Artistic License, as specified in the Perl README file.

ACKNOWLEDGEMENTS

  Larry Wall for Perl, Tim Bunce for DBI, Jean-Louis Leroy for Tangram and Set::Object, 
  Sam Vilain for Class::Tangram, Jochen Wiedmann and Jeff Zucker for DBD::CSV & Co.,
  Adam Spiers for MySQL::Diff and all contributors.

SUPPORT / WARRANTY

  Data::Storage is free software. IT COMES WITHOUT WARRANTY OF ANY KIND.

TODO

  o interface with Jeff Zucker's AnyData:: modules, e.g. AnyData::Storage::RAM
  o what about DBD::RAM? (DBD::RAM - a DBI driver for files and data structures)
  o use DBD::Proxy!
  o what about DBIx::AnyDBD?
  o enhance schema information:
    - DBIx::SystemCatalog
    - DBIx::SystemCatalog::MSSQL?
    - Data::Reporter

BUGS

``DBI-Error [Tangram]: DBD::mysql::st execute failed: Unknown column 't1.requestdump' in 'field list'''

  ... occours when operating on object-attributes not introduced yet:
  this should be detected and appended/replaced through:
  "Schema-Error detected, maybe (just) an inconsistency. 
  Please check if your declaration in schema-module "a" matches structure in database "b" or try to run"
  db_setup.pl --dbkey=import --action=deploy

Compare schema (structure diff) with database ...

  ... when issuing "db_setup.pl --dbkey=import --action=deploy"
  on a database with an already deployed schema, use an additional "--update" then
  to lift the schema inside the database to the current declared schema.
  You will have to approve removals and changes on field-level while
  new objects and new fields are introduced silently without any interaction needed.
  In future versions there may be additional options to control silent processing of
  removals and changes.
  See this CRUD-table applying to the actions occouring on Classes and Class variables when deploying schemas,
  don't mix this up with CRUD-actions on Objects, these are already handled by (e.g.) Tangram itself.
  Classes:
    C create    ->  yes, handled automatically
    R retrieve  ->  no, not subject of this aspect since it is about deployment only
    U update    ->  yes, automatically for Class meta-attributes, yes/no for Class variables (look at the rules down here)
    D delete    ->  yes, just by user-interaction
  Class variables:
    C create    ->  yes, handled automatically
    R retrieve  ->  no, not subject of this aspect since it is about deployment only
    U update    ->  yes, just by user-interaction; maybe automatically if it can be determined that data wouldn't be lost
    D delete    ->  yes, just by user-interaction
  
  It's all about not to be able to loose data simply while this is in pre-alpha stage.
  And loosing data by being able to modify and redeploy schemas easily is definitely quite easy.
  
  As we can see, creations of Classes and new Class variables is handled 
  automatically and this is believed to be the most common case under normal circumstances.

FEATURES

  - Get this stuff together with UML (Unified Modeling Language) and/or standards from ODMG.
  - Make it possible to load/save schemas in XMI (XML Metadata Interchange), 
    which seems to be most commonly used today, perhaps handle objects with OIFML.
    Integrate/bundle this with a web-/html-based UML modeling tool or 
    some other interesting stuff like the "Co-operative UML Editor" from Uni Darmstadt. (web-/java-based)
  - Enable Round Trip Engineering. Keep code and diagrams in sync. Don't annoy/bother the programmers.
  - Add support for some more handlers/locators to be able to 
     access the following standards/protocols/interfaces/programs/apis transparently:
    +  DBD::CSV (via Data::Storage::Handler::DBI)
   (-) Text::CSV, XML::CSV, XML::Excel
    -  MAPI
    -  LDAP
    -  DAV (look at PerlDAV: http://www.webdav.org/perldav/)
    -  Mbox (use formail for seperating/splitting entries/nodes)
    -  Cyrus (cyrdeliver - what about cyrretrieve (export)???)
    -  use File::DiffTree, use File::Compare
    -  Hibernate
    -  "Win32::UserAccountDb"
    -  "*nix::UserAccountDb"
    -  .wab - files (Windows Address Book)
    -  .pst - files (Outlook Post Storage?)
    -  XML (e.g. via XML::Simple?)
  - Move to t3, look at InCASE
  - some kind of security layer for methods/objects
    - acls (stored via tangram/ldap?) for functions, methods and objects (entity- & data!?)
    - where are the hooks needed then?
      - is Data::Storage & Co. okay, or do we have to touch the innards of DBI and/or Tangram?
      - an attempt to start could be: 
         - 'sub getACLByObjectId($id, $context)'
         - 'sub getACLByMethodname($id, $context)'
         - 'sub getACLByName($id, $context)'
            ( would require a kinda registry to look up these very names pointing to arbitrary locations (code, data, ...) )
  - add more hooks and various levels
  - better integrate introduced 'getObjectByGuid'-mechanism from Data::Storage::Handler::Tangram

LINKS / REFERENCES

  Specs:
    UML 1.3 Spec: http://cgi.omg.org/cgi-bin/doc?ad/99-06-08.pdf
    XMI 1.1 Spec: http://cgi.omg.org/cgi-bin/doc?ad/99-10-02.pdf
    XMI 2.0 Spec: http://cgi.omg.org/docs/ad/01-06-12.pdf
    ODMG: http://odmg.org/
    OIFML: http://odmg.org/library/readingroom/oifml.pdf

  CASE Tools:
    Rational Rose (commercial): http://www.rational.com/products/rose/
    Together (commercial): http://www.oi.com/products/controlcenter/index.jsp
    InCASE - Tangram-based Universal Object Editor 
    Sybase PowerDesigner: http://www.sybase.com/powerdesigner
  
  UML Editors:
    Fujaba (free, university): http://www.fujaba.de/
    ArgoUML (free): http://argouml.tigris.org/
    Poseidon (commercial): http://www.gentleware.com/products/poseidonDE.php3
    Co-operative UML Editor (research): http://www.darmstadt.gmd.de/concert/activities/internal/umledit.html
    Metamill (commercial): http://www.metamill.com/
    Violet (university, research, education): http://www.horstmann.com/violet/
    PyUt (free): http://pyut.sourceforge.net/
    (Dia (free): http://www.lysator.liu.se/~alla/dia/)
    UMLet (free, university): http://www.swt.tuwien.ac.at/umlet/index.html
    Voodoo (free): http://voodoo.sourceforge.net/
    Umbrello UML Modeller: http://uml.sourceforge.net/

  UML Tools:
    http://www.objectsbydesign.com/tools/umltools_byPrice.html

  Further readings:
    http://www.google.com/search?q=web+based+uml+editor&hl=en&lr=&ie=UTF-8&oe=UTF-8&start=10&sa=N
    http://www.fernuni-hagen.de/DVT/Aktuelles/01FHHeidelberg.pdf
    http://www.enhyper.com/src/documentation/
    http://cis.cs.tu-berlin.de/Dokumente/Diplomarbeiten/2001/skinner.pdf
    http://citeseer.nj.nec.com/vilain00diagrammatic.html
    http://archive.devx.com/uml/articles/Smith01/Smith01-3.asp

Data::Storage