Monday, April 19, 2010

PALO MOLAP SERVER and Excel Add-in

 
What is Palo MOLAP Server:
  • Palo is cell-related
  • Palo is an in-memory database
  • Palo is multi-dimensional
  • Support Hierarchies in Dimension
  • Support Attributes for Dimension member (e.g. to allow for localization, provide longer name … ) in addition to Element names. Note that these Attributes are stored in separate tables.
 
A local server is installed on your computer when you install the Palo client. You can log into the local servers of other computers in a network. In practice, this makes each local server public. You can create users and groups and also assign passwords to manage necessary access rights to your server. But you can't add another server on the same computer.
 
A Server can host many different Databases which contain one or more Data Cubes. Dimensions can be shared among different Cubes. A server can be installed and run privately (Localhost, by default: http://127.0.0.1:7777) or on a shared server for concurrent multi-users access.
 
Palo OLAP Server with Excel Add-In is open source designed to help working with large data quantities represented multidimensionally. Palo server can host multiple end-users on a network and thus eliminate the need to distribute all Excel tables with all well known problems associated to that.
 
Although its main focus is to work on top of Excel, it seemed to be supporting the OpenOffice Calc equivalent as well, although I did not try it.
 
 
My observations done on the Palo BI Server 3.0 & Excel Add-in:
 
Very robust and well designed with a full integration with Excel. So if you are a big Excel fan and don’t like the mess that this lead too in big organization this can help mitigate the often disastrous Excel over-used in today’s large enterprise.

Who has not suffered from Excel hell where merely all managers start their own mini Reporting app initiative within Excel…another illustration of the Peter Principle applied to software application ;-)!!!

So if you are very good with Excel, then you can basically create Cube right-off the bath without merely leaving your comfort zone. Even all Cube management and Administration is done through a dedicated Menu Palo added in Excel.

Ok, at one point you will need to import data from external source, but even then you can easily use Excel built-in function to load a cube from a simple CSV file and leverage Palo provided Excel function to generate your hierarchies levels, your calculated metrics, your ragged dimensions (for example you could easily create a time dimension from a simple date format by deriving the quarter, week, months and year data using normal Excel functions).
…It is quite cool you actually see these import in real-time with cells being refreshed in front of you (ok yes not so good for performance but you have the option to stop the refresh).
Concerning the dimension, I’ve encountered a few limitations concerning its dimensions structure and Elements. Besides the documented naming restrictions concerning Elements (see below), consolidated Elements cannot have the same names (case insensitive, in contradiction with user-doc) as one of its sub-elements (i.e. parent can’t have the same name as children). It seems to be applicable throughout all the dimension hierarchy, no duplicates can exist at any level! (I remember Essbase having similar constraints?)
Also, complex dimension structure having multiple independent hierarchies in parallel tends to degrade performance considerably.
 
Documented limitation and rules about Characters allowed in element-names:
Characters allowed in element name:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-.
Rules to follow for name of an element:
- The length cannot be 0.
- It cannot begin with a blank.
- It cannot end with a blank.
- Names in Palo are case sensitive!
 
 
The most commons functions you would use within Excel are:
 
--- PALO.DATA()
This is used to represent data of a cube inside Excel (work in both read/write mode ) based on the arguments listed. The arguments are used to identify the host, database and the cube. Obviously we must include the dimension/measures elements in order to address the cube cell precisely.
Other similar function exist to represent data (PALO.DATAC and .DATAV) but these differ only by how the data is fetched: each sheet cell individually (PALO.DATA being the slowest), all sheet cells having PALO.DATAC formulas will be calculated in one-pass, whereas the PALO.DATAV is even faster because it leverages Excel array formula (not always possible).
---PALO.EADD()
This is used to Add Elements within existing Dimension. We have also arguments to provide server/database as well one to provide the Dimension name. Other argument is to give the Type (‘n’ : base element, ‘c’: consolidated) specify the type of elements. The use of type "n” can always be used since it will anyway be converted to ‘c’ later on during import when sub elements are assigned to parent element.
We need obviously to provide the Element name, the Parent name (optional since top parent may not have any) and Weight (number used to describe the consolidation factor). A final important argument is Clear : this decides if the dimension should be cleared for all elements before importing, so when this parameter is set to TRUE all elements are deleted before importing and when it is set to FALSE it just updates old and inserts new elements.
So you should change this parameter in all your PALO.EADD functions from TRUE to FALSE to avoid clearing the dimensions.
Possible parameters values are 0 (or FALSE), 1 (or TRUE) and 2
0: Removes before importing no elements of the dimension.
1: Removes before importing all elements of the existing dimension.
2: Removes before importing all existing elements of the C-dimension, basic
elements are not deleted.

Attention with Problem of removing Dimension element with PALO.EADD:
I had this problem until I realised that PALO imports work in the entire workbook (not only the active worksheet)! So when you are importing using the PALO.SETDATA function in one worksheet any PALO.EADD function in other open workbook will also be run. But since new rows are only imported in the active sheet, only the currently displayed (last) row in inactive sheets are imported again using PALO.EADD.
---PALO.EADD()
And obviously you also have a function to load fact data into the Cube: PALO.SETDATA. This function works similarly as PALO.EADD (must specify server/database/cube), as well as all the coordinates elements (for which dimension member the value is loaded) and obviously the cell value (the value of the measure specify in the coordinate). It has also a Splash (a funny terminology ?!!) argument which indicates whether we can write to a consolidated element (and distribute data evenly among children afterwards..).
 
 
Important Design PALO features:
 
1.  Attributes
Attributes are used to provide additional element properties (e.g. other languages, elements description like Product description, code and other attributes, and also special number like Product Unit Prices, etc..).
A very Palo-specific way to manage these Attributes is done through a dedicated special Cube (Attribute-type) stored independently of the Cubes having the fact data. These independencies between Attributes and Elements data is flexible, but force you to switch back and forth between different Cubes during normal Reporting….
 
2.  Hierarchies
Palo support Hierarchy inside a Dimension, and these hierarchies can be Balance or Unbalanced and even Ragged,. You can even have multiple hierarchies within the same dimension (same elements can belong to different consolidations element, i.e. different Parent).
3.  Comments
You can add Comment elements (defined as normal elements in Dimension but important to set in String format and not Numeric format typically used for normal Elements).
4.  Calculated Elements
To create calculated Elements (typically Calculated measures), we have two options:
1) set up so-called server Enterprise rules in the database where computations are done within the database… these Enterprise rules may involve complex models and large data volumes impacting potentially the speed as a great deal of computing may be required to calculate the entire cube).
2) a simpler approach and sometimes good alternative is the use of Push Rules, i.e. operations, which are performed in Excel and the results of which are then written back into the cube, so no need for Server-processing. This is done through the use of PALO.SETDATA function to upload into the cube any elements cells and then leverage all Excel functions used to calculate the Elements.


5.  Consolidation Factors
With hierarchical structures you may factor elements differently among all children of a Parent. Typically you normally have an equal factor (=1), but you may need to weight the elements differently: negative weightings for Measure-type element for which you would derive a consolidated Parent (e.g., when consolidating expenditures and revenue to have a Profit-type consolidated member).


 
How-tos:
- Creating new cubes


The following sequences are normally necessary to Create a complete new Cube:
I. Create a new database
II. Create dimensions
III. Create a cube with the dimensions
IV. Importing elements into the dimensions
V. Import data cells
Important: Dimensions are created independently of Cube (they are stored at the Database-level), so that we can delete complete Cube without loosing any Dimension data. On the other hand, updating a Dimension structure/data will impact all associated Cubes.
- Importing External Data
Although it is quite simple to populate Dimension and Cube directly doing data entry Excel, it is certainly not practical with large volume of data. So other options exist.
Import options (through special Excel built-in functions) from different source:
- text files
- other Palo cubes
- database through ODBC driver
- from direct excel Worksheet
- some external ETL standard tools (including PALO ETL Server application)
Final option is through Database back-up: although Palo is memory-based, it obviously keeps a copy of each database into dedicated directory. Each Database-directory has three different files types (1-, 2- and 3- ) directly stored within the server file system.
It is convenient to do backup copies of database by simply copying the full directory content (after having stopped the PaloServerService)…that you can also use to transfer full Database (with all their cubes/dimensions) into a different host.
- Working with Snapshot
Also convenient is to work on a normal Excel Worksheet without being connected to the Palo server. This is possible on any worksheet tied to a PaloServer with the options Save as Snapshot… this will change all Palo-specific formulas in cells by normal data values… as any standard Excel sheet.
Similarly but not quite the same, you can also work on an offline mode (so the cells keep their Palo-specific formulas) to be able to see data without any connection to the Palo MOLAP Server (this requires to activate the Use client Cache in the Database settings).
 
Martin

No comments: