The computers have the bad habit of doing exactly what you tell them to do, not what you want them to do. So, when your computer is doing something else than what you expect, means that it’s something wrong with what you told him to do. Or not?
What you expect is based on your previous experiences. Having an intuitive software is a very important aspect of any producer. But, sometimes, better functionality supersede intuitive behavior.
So, QlikView has some “against intuition” behaviors:
1) Tables Auto-link
How do you define table links in QlikView? Answer: you don’t! Tables will automatically link when they have fields with the same name.
Why?
Because usually “CustomerID” have the same meaning anywhere. And it’s easier to remove two or three links than to create all of them.
How to prevent?
Rename fields using aliases when loading “LOAD a AS b FROM c” or use the dedicated command “RENAME FIELDS a TO b”
2) Auto concatenate
Where did my new table disappeared? I clearly gave it a new name, loaded the data, I can see the rows loading when running the script, yet it’s gone when the script is finished. Well, the answer is: you already had a table with the exact same structure and the new rows were concatenated to old table than just create a new table.
Why?
Because two or more tables with the same structure means they have similar information. And if it didn’t auto concatenate, you would end up with a huge synthetic key (remember autolink).
How to prevent?
Use the “NOCONCATENATE” keyword before your “LOAD” command or change the structure (add a dummy field or rename one)
3) Preceding LOAD
Let’s say you have this script where I want to strip down the time information:
TABLE1: SELECT MyDateTime from MyTable; FINAL_TABLE: LOAD Date(M
yDateTime) as MyDate
RESIDENT TABLE1;
DROP TABLE TABLE1;
You can use a preceding LOAD like this:
FINAL_TABLE:
LOAD Date(M
yDateTime) as MyDate;
SELECT MyDateTime from MyTable;
Preceding LOAD is an elegant way to perform several transformations in one step. It might be confusing the fact that the SELECT is executed first, but it becomes clear if you think that the SELECT is a sub-query. Also, you can nest several “preceding LOAD” to perform linked transformations.
4) EXIST() function
Let’s say I want to do incremental load of invoices. I have an old table containing all the invoices until yesterday and a new table with new or modified invoices. So, I load the new table and then, using “WHERE NOT EXISTS(InvoiceID)” I load the old table to avoid data duplication, just like I would do
"SELECT * FROM OLD_INVOICES WHERE InvoiceID NOT IN (SELECT InvoiceID FROM NEW_INVOICES)"
in SQL, right? Wrong!
You will get only the first row of each invoice, because for the other rows, EXISTS() will return TRUE. This behavior is better illustrated using this script:
TABLE1:
LOAD * INLINE [
F1, F2
1, a
2, b
3, c
];
LOAD * INLINE [
F1, F2
4, d
4, e
]
WHERE NOT Exists(F1)
;
The last line from second table (4, e) will not be loaded despite the fact that 4 does not exists in TABLE1.
Why?
Because EXIST() function reevaluates existence of value for each row.
How to prevent?
Use a copy of the target field as second parameter for the EXISTS() function
TABLE1:
LOAD *, F1 as F3 INLINE [
F1, F2
1, a
2, b
3, c
];
concatenate
LOAD * INLINE [
F1, F2
4, d
4, e
]
WHERE NOT Exists(F3, F1);
DROP FIELD F3;
5) PEEK() function
Besides the fact that it is a very handy function which can reference previous loaded rows, it has a very interesting property: it can reference output fields, aka fields that do not exist in source table.
Here is an example of creating total stock:
LOAD
Product,
Date,
Qty,
Price,
if(Product=peek('Product'), peek('TotalStock')+Qty, Qty) as TotalStock
FROM Transactions
ORDER BY Product, Date;
Note how we create the TotalStock field: we have a reference to itself, even if the field doesn’t exists in “Transactions” table.
Why?
Because it’s a function that should existed in SQL in the first place years ago 🙂
6) DISTINCT keyword
We all know it from SQL syntax, right? It removes the duplicate rows from the source table.
What you don’t know is that in QlikView it has a strange behavior, as it “infects” the tables he comes in contact with. This was covered in detail by Oleg Troyansky and Barry Harmsen.
Lets say we have a fact table (TABLE1):
TABLE1: LOAD * INLINE [ ID, Qty a, 1 a, 1 ];
This scenario, having valid duplicate rows, is very common in certain industries. Multiple deliveries of the same product, same quantity, to the same customer, in the same day happen often in bread making business, for example.
We want to join a dimension table to this fact table:
LEFT JOIN (TABLE1) LOAD DISTINCT * INLINE [ ID, Name a, Name_a ];
You might be expecting that the DISTINCT keyword should act on the rows of the dimension table. When running the script, we observe that TABLE1 have now only 1 row, which is incorrect.
But wait, there’s more. Now we want to add historical transactions:
CONCATENATE (TABLE1) LOAD * INLINE [ ID, Amount, Name b, 1, Name_b b, 1, Name_b ];
Run the script and observe that also the new transactions concatenated to TABLE1 have DISTINCT applied.
Why?
Because DISTINCT is applied not to the source table but to the resulting table and somehow it sticks as a table attribute.
How to prevent?
Use a preceding load statement like this:
LEFT JOIN (TABLE1) LOAD *; LOAD DISTINCT * INLINE [ ID, Name a, Name_a ];
Happy Qliking!
Leave a Reply