Friday, April 27, 2012

Generic Delta Mechanism


Generic delta - How does it work (and not).

Generic delta in custom extractors allows you to incorporate delta capability into your datasource. If this is setup for your datasource, it will let BW request and extract delta. Now, if it is as easy to choose this option in RSO2, why would I not enable all my custom datasources with delta (regardless whether I use delta or not) and use it when needed? Let us see if it is as easy as it seems.

How does it work?

You define a generic extractor based on a table/view, an info set query, or on top of a function module. The S-API, calledwhen a BW extraction is scheduled, gets the data from the underlying extractor (view eg), and applies the delta functionality to filter delta records before passing it to BW. For identifying delta, it needs an identifier - a field that is part of the extractor structure. It also needs to know how to apply this delta filter on the specified field. It can not know what is going on in the underlying data tables, what all have changed there; it can only build selections based on some marker value up to which data was extracted, and the next selection needs to be greater than that.
Let us see how it works in each of the case it allows you to choose -

Limitations of each of the delta identifier options

Numeric pointer

You specify a field (type numeric) from the extract structure as the delta identifier. Every time a delta extraction is called, S-API filters out records which have a numeric pointer value greater  than the maximum extracted value in the last run. Let us dig what the system does here (true for other delta identifiers also), so we know where to debug when we don't get our delta-
When the first initialization call is made, a delta queue entry is created (RSA7), in addition to selecting all the data under the datasource and sending it to BW. Current value of delta identifier field is written to delta table ROOSGENDLM (in current pointer field). This same is visible in RSA7 if we click the stat icon for the datasource. It tells us the maximum value extracted to BW (for the delta field).
Next initialization, this value is read to filter records with values greater than this (with safety interval adjustments, which are mentioned later) value. The pointer value is updated at the end. This continues in each delta extraction.
An additional field 'repeat pointer' stores the last run value (Same table), so we can do a 'repeat delta' if our delta doesn't succeed.
This process is similar for other delta identifier types.
A numeric pointer field is not easy to find in most cases. Sure we have fields like document no, but this will not capture a document that was extracted earlier and has been changed (as the pointer value can only go up). Records are extracted only once in this case, and it is not easy to find examples of such delta extraction. (Additional problem: if there is a record out of sequence, it will also create problems; eg say record no 1001, 1002 were for one type of docs, and 2002, 2003 for another type; these were extracted. The pointer value will have 2003, and if now docs 1004, 1005, 2004 are created, next delta will only extract 2004. Quite common scenario with tables with multiple document ranges).
Let us now look at the other two options

Calendar day

This seems like a good candidate, what with most data tables having fields like 'Change date'. It is. However, before usingit, we need to understand how BW/S-API uses it. Let us say AEDAT is the field in the view from which we are extracting the delta.
For one, S-API filters data based on system date, and extracts data up to today only (if there is no safety interval specified). Another thing to note (both for timestamp and date) is that the  delta pointer stores the 'calendar' day (and 'system timestamp') values in the pointers, not the largest value in the extracted data. To clarify this, Let us say we run the extraction of 08/04 (MM/DD), records will be selected up to AEDAT value of 08/04. Let us say, we had records for 08/01, and 08/02, and these were extracted successfully. The pointer in RSA7 will not have 08/02, but 08/04. For any reason, if there are now records added with value of 08/03 (because they are updated from some other system and there is a one day delay), these would not be extracted in the next run. Next run would extract data for 08/05 onwards, and you may wonder how come you got data up to 08/02 and then for 08/05 but the 08/03 data is not selected. But that is logical, as it is CALDAY, not AEDAT that essentially is being used by the system to store the pointer.
Also, if you put data for a future date expecting it to be selected (as it is higher than current pointer value), that will also not work. In effect, S-API will apply filter as 'AEDAT GT pointer_value AND AEDAT le current_date'.
One limitation is that you can get data only up to a date, so it can bring delta only once a day. Another thing to note is what happens if records are posted on the same date after extraction has happened? These records will not be selected. To avoid this, it makes sense to only extract data up to 'yesterday'. This is achieved by setting the safety interval limit (upper) to 1, which adjusts the selection to "up to current date minus one".

Timestamp

Timestamp seems to be the most obvious preference, since you can get the latest delta to BW. However, in most cases, it is not easy to find a timestamp field (you do have change_date and change_time fields, but not one field with change_date and time in most tables). You also need to be aware of the fact that SAP applies system-timestamp values for selecting the data. So, similar to example above, Let us say you are extracting at 20060804000000, you get all data up to 2006080400000. This value is stored in the pointer (20060804000000). However, your source table is getting updated by a timestamp value of another system (say by getting it from a file, or, from RFC call, whatever) and is not in sync with SAP server time. Now, after this extraction, another record comes to the underlying table which has a timestamp value of 20060803230001 (because it is coming from PuertoRico instead of Hawaii, say). Now, this record is not going to be extracted in the next delta (or any delta afterwards), because its timestamp value is lower than the pointer from the last extraction.
The case where the timestamp field is being updated with a value/field other than the source system current timestamp is a different issue than that of the system taking a finite amount of time in updating the record; the later being handled by using safety interval limits.

Safety intervals: How safe?

Safety interval limits are introduced to ensure that we do take into account of the fact that records are being modified in the source system at the time of delta extraction. If an update is  happening at the time of delta extraction, how do we make sure that it is extracted? Safety interval adjusts the selection filter on the delta field to a 'reasonable' extent (specified by the user) thus making sure that the datasource takes care of such possibility.
Safety intervals are applicable for all delta types; however it is mostly usable with 'Timestamp' only (and with CALDAY, it is normal to have the lower limit blank and upper limit as '1' as explained earlier). An understanding of how safety intervals work will make it clear.
Upper limit (5), lower limit (1) [specified in safety intervals]
Case 1 - Numeric pointer (last extracted value 10011; Data in underlying table (Max value = 10025) )
Data would be extracted from a min-value of 10010 (lower limit adjusts the last counter 10011 by specified value - 1) max-value of 10020 (after adjustment for upper safety limit - by 5). The current pointer would now indicate 10020 in RSA7.
Case 2 - Timestamp (Last extracted value 20060804113000; current system time: 20060805080000)
Next extraction will select records from timestamp value 20060804112959 (after adjusting 1 second as specified in lower limit) to timestamp value 20060805075955 (adjusting it by 5 seconds as specified in upper limit).
Similar would be the case for calendar days, however, since there is not a likelihood of records overlapping over days, the limits are not used for such delta type.
For more understanding of these limits, do an F1 on these fields in RSO2. It also explains why and when one may need to specify these limits.
Since the lower safety interval limit adjusts the lower selection value of the delta identifier field, it could be possible that some records are brought twice. One needs to be mindful of this fact (and so it shall go to an 'overwrite' type of ODS first), and for this reason, it is only available if the datasource is defined as 'new status for changed records' and not for 'additive delta'.

When should I be using these safety interval limits?

If you are extracting on timestamp, you can think of using it. In most cases, upper limit will suffice if you can correctlyassume what the value should be. If your system takes at the maximum 2 minutes to post a corresponding record, specify 120 (for timestamp) as the upper limit. If, it is a numeric pointer type, and in 2 minutes maximum number of txns posted to the system is 20, you should specify 20. In case of CALDAY type field, it is pretty much standard to set '1' as upper safety limit and blank lower limit.
You will not be extracting everything changed in the source system if you specify an upper limit (since the upper limit of selection is being adjusted), so if you see some records in the source which should but is not coming through in your delta, do check if it is due to safety interval settings.
If still you want to make sure you are getting everything (since there can be cases when your assumption above doesn't hold), you can set the lower limit. If you think that maximum 100 new records are created (while your extraction is taking place), you can specify 100 (for numeric pointer) as the lower limit. Remember that in this case, there is a possibility that you extract some records that has already been extracted in the last run.

What is this option of additive delta/new records?

This sets the delta property of the datasource. Depending on its value, the datasource may be incompatible to certain data targets. For example, if it is additive delta (ROOSOURCE-DELTA = 'ADD'), it may not be fit for updating into an ODS with overwrite. For 'New image' option, the field ROOSOURCE-DELTA is set to 'AIE'.

What do I get my function module to work with in case of delta (in addition to template FM)?

This is not in scope of this document. However, in brief one point is worth mentioning. You get your selections coming from the info-source (or RSA3) in the internal table I_T_SELECT when the first call to FM is made. What is not obvious is that you are also getting the selection for the delta-pointer field in this same table, being added by S-API.
Let us say, you specified AEDAT as the delta identifier field. Your infopackage is sending a request to select data based on CO_CODE and CURR_TYPE fields. You will get corresponding entries in I_T_SELECT table with the selection values for CO_CODE and CURR_TYPE fields which you can read and accordingly select data in your FM. In addition, I_T_SELECT will also have selection values for AEDAT. This would have been appended by S-API, and it would be based on the 'current pointer' entry that you see in RSA7, and current date (in case of CALDAY type delta).
Similar would be the case for any other type of delta-identifier field.

What of deleted records? Well, what of it?

You do not want everything to work perfect, do you? You will be jobless else.
Before you get happy with your testing and declare it working as it should, do consider the case when data is deleted from the source tables. If it is not physically deleted, just marked as being deleted, you can have this marker field in your extractor and build your update accordingly. However, what if the data in the source table is deleted physically? You do not have the record any more, so there is no chance of getting it as a delta.
There are no easy solutions for this. One option is to try and see if the deletion is captured in the change document tables. If it is, you can build the delta record from it (at least the key), which will further need to be handled in the update rules so as to properly cancel out the existing record. This of course means building FM based extractor.
Other options could be to enhance the application logic (while posting the transaction to delete the record) to capture the deletion information in some Z table and read/refresh it while extracting.

Thursday, April 26, 2012

Write-Optimized DSO


The concept of Write Optimized DSO was introduced in BI 7.0. Write Optimized DSO unlike Standard DSO has only one relational table, i.e. Active Table and moreover there is no SID generated in Write Optimized DSO, and hence loading the data from Data Source to Write Optimized DSO takes less time and acuires less disk space.  
Business Case :
Require Data storage for storing detailed level of data with immediate reporting or further update facility. No over write functionality required. 

Limitation of Standard DSO

-          A standard DSO allows to store detailed level of information, However, activation process is mandatory.
-          Reporting or further update is not possible until activation is completed.
  
Write Optimized DSO - Properties

-          Primarily designed for initial staging of source system data
-          Business rules are only applied when the data is updated to additional Info Providers.
-          Stored in at most granular form
-          Can be used for faster upload
-          Records with the same key are not aggregated ,But inserted as new record, as every record has new technical  key
-          Data is available in active version immediately  for further Processing
-          There is no change log table and activation queue in it.
-          Data is saved quickly.
-          Data is stored in it at request level, same as in PSA table.
-          Every record has a new technical key, only inserts.
-          It is Partitioned on request ID (automatic).
-          It allows parallel load, which saves time for data loading.
-          It can be included in Process chain, and we do not need activation step for it.
-          It supports archiving.   

Write-Optimized DSO - Semantic Keys

Semantic Key identifies error in incoming records or Duplicate records .
Semantic Keys protects Data Quality such that all subsequent Records with same key are written into error stack along with incorrect Data Records.
To Process the error records or duplicate  records , Semantic Group is defined in DTP.

Note : if we are sure there are no incoming duplicate or error records, Semantic Groups need not be defined. 

Write Optimized DSO-  Data Flow
1.     Construct Data Flow model.
2.     Create Data source
3.     Create Transformation
4.     Create Info Package
5.     Create  DTP 

Write-Optimized-Settings

If we do not check the Check Box  "Do not Check Uniqueness of Data ", the data coming from source is checked for duplication, i.e, if the same record (semantic keys) already exist in the DSO, then the current load is terminated.
If we select the check box , Duplicate records are loaded as a new record. There is no relevance of semantic keys in this case. 

When Write Optimized DSO is Recommended ?

-           For faster data load., DSOs can be configured to be Write optimized
-          When the access for Source system is for a small duration.
-          It can be used as a first Staging  layer.
-          In cases where delta in Data Source is not enabled, we first load data into Write Optimized  DSO and then delta load can be done to Standard DSO.
-          When we need to load large volume of data into Info Providers, then WO DSO helps in executing complex transformations.
-          Write Optimized DSO can be used to fetch history at request level, instead of going to PSA  archive. 

Write Optimised DSO - Functionality 

-           It contains only one table, i.e. active data table (DSO key: request ID, Packet No, and Record No)
-           It does not have any change log table and  activation queue.
-           Every record in Wrtite Optimized DSO has a new technical key, and delta in it works record wise.
-           In Wrtite Optimized DSO data is stored at request level like in PSA table.
-           In Wrtite Optimized DSO SID is not generated.
-           In Wrtite Optimized DSO Reporting is possible but it is not a good practice as it will effect the performance  of DSO.
-           In Wrtite Optimized DSO BEx Reporting is switched off.
-           Wrtite Optimized DSO can be included in Info Set or Multiprovider.
-           Due to Wrtite Optimized DSO performance is  better during data load as there is no Activation step involved. The system generates a unique technical key
-           The technical key in Wrtite Optimized DSO consists of the Request GUID field (0REQUEST), the Data Package field (0DATAPAKID) and the Data Record Number field (0RECORD).  

Write-Optimized DSO-Points to Remember

Points to Remember :-
-          Generally Write Optimized DSO is not prefered for reporting, but If we want to use it for reporting then it is recommended to defina a semantic in order to ensure the uniqueness of the data.
-          Write-optimized DSOs can force a check of the semantic key for uniqueness when data is stored.
-          If this option is active and if duplicate records are loaded with regard to semantic key, these are logged in the error stack of the Data Transfer Protocol (DTP) for further evaluation.
-          If we need to use error stack in our flow then we need to define the semantic key in the DSO level.
-          Semantic group definition is necessary to do parallel loads. 

Write-Optimized DSO - Reporting

If we want to use write-optimized DataStore object in BEx queries(not preferred):
 it is recommended to
1.      Have a semantic key and
2.       Ensure that the data is unique.
Here the Technical key is not visible for reporting, so it looks like any regular DSO  

Write-Optimized DSO-Useful OSS Notes

-          OSS 1077308   : In a write-optimized DSO, 0FISCVARNT is treated as a key, even though it is only a semantic key.
-          OSS 1007769  : Parallel updating in write-optimized DSO
-          OSS 966002    : Integration of write-opt DSO in DTP error stack
-          OSS 1054065  : Archiving supports.

Steps to run a strucked Process in a Process Chain


Scenario(s):- 


1)      If a proces chain is stuck at a particular step and we want to restart it.  
2)      Data loading is successful, but the next step of the chain is not executed.. 
Reason :- The event that will register in the back end data base is not set properly or may go to hold state, so as a result will have to change the parameter manually.

 Solution:-

Step 1:  At the step where the process chain is stuck 
Go to display messages >>>chain>>>copy variant, instance & start date 
Step 2:  Go to se16>>>rspcprocesslog 
Enter the below details, 
Variant >>> variante 
Instance >>> instance 
Batch date >>> start date 
This will give a single entry in the table rspcprocesslog
Step 3:  Execute the function module  RSPC_PROCESS_FINISH  
 Enter the below details, 
Rspcprocesslog >>>instance >>> I_INSTANCE. 
Rspcprocesslog >>>variante>>> I_VARIANT 
Rspcprocesslog >>>logid>>>I_LOGID 
Rspcprocesslog >>> type >>>I_TYPE 
Enter 'G' for parameter I_STATE and press execute (F8). 

After executing the above FM, with the above stated parameters,   the actual process will be set to green and the following process in the chain will be started and the chain can run to the end.


The program that I was talking about to progress those ‘failed’ processes in the Process Chain that are not repeatable is RSPC_PROCESS_FINISH.

In SE38, when you type in the above program name and execute, you should see the below screen.


As an example, you know that the ZNOTIFICATN_ATTR InfoPackage failed yesterday, due to an ALEREMOTE lock. Right Clicking on this failed InfoPackage and then choose ‘Displaying Messages’ . Then go to the ‘Chain’ tab to the ‘Variant’ & ‘Instance’ of the process that failed.


After getting the ‘Variant’ & ‘Instance’, we need to go to the table RSPCPROCESSLOG and enter the ‘Variant’ & ‘Instance’ identified above in the corresponding fields of this table and click execute.


From this table, we would be able to get data for the fields LOGID & TYPE that are needed for the program RSPC_PROCESS_FINISH.


As you can see the selected Instance [which is usually the Request Number generated by the InfoPackage in the Data Target] has a STATE of ‘R’ [Ended with errors]

We now enter values for the fields LOGID, TYPE, VARIANT, INSTANCE in the program RSPC_PROCESS_FINISH and select the value of ‘G’ [Successfully Completed] for the field STATE as shown below.


Clicking on execute will send a message to the system that the entered ‘Instance’ of the ‘Variant’ [i.e. the InfoPackage] had completed successfully. The failed InfoPackage Process that ‘red’ would now turn to ‘green’ and the subsequent processes will be automatically kicked off.

Note: We must make sure that the manual process that is needed to correct the error [in this case running the InfoPackage for ZNOTIFICATN_ATTR manually] is successfully complete before we run the program RSPC_PROCESS_FINISH to progress to the subsequent steps in the Process Chain.



On a separate note, the processes like ODS Activation, Attribute Change Run, Index Deletion can be ‘Repeated’ when they fail due to some reason in a Process Chain. But as of now, we do not have the option to ‘Repeat’ an InfoPackage that has failed. I believe that we have to turn ‘ON’ a setting in RSPC to make the InfoPackage’s ‘Repeatable.

On entering into RSPC, click on any of the Process Chains and then select the ‘Process Type’ button.

 

In the following screen, choose Settings à Maintain Process Types


Choose the Process Type ‘Loading’ and click on ‘Details’.


As you can see in the below screen, the ‘Repeatable’ option has not been selected for the LOAD process.


This is the reason why a ‘Repeat’ option doesn’t show up when an InfoPackage fails in a Process Chain.

I have also shown below the settings for the DSO Activation & Attribute Change Run process where the ‘Repeatable’ option has been selected.



I believe making the InfoPackage process ‘Repeatable’ would help to save us some time and effort in case of a failure at the InfoPackage step as we just need to ‘Repeat’ the failed InfoPackage rather than running it manually and also that the subsequent steps are automatically trigerred.