Archive

Author Archive

DB2 for i › Living Large

March 12th, 2013 Comments off

There is a difference - a BIG difference - between “doing database” and doing “very large database”.

Tom McKinley, my longtime partner in this business of database regularly states, “bugs don’t scale” - meaning: what works just fine at a small size doesn’t necessarily work at a large size, and more than likely will NEVER work at a very large size.


It’s vitally important to first recognize and acknowledge that you are living in the realm of very large database (which is commonly abbreviated as “VLDB” by the way). Once recognized and acknowledged, you can begin to understand. That is, understand the critical success factors and best practices for maintaining and exploiting an IBM i VLDB environment.


Some History


I’ve been working in and around very large database since 1997, having helped to establish and lead the AS/400 Teraplex Center in Rochester Minnesota.  During the five year reign of our terrorTeraplex, we had the privilege to privately own and use the largest AS/400 systems built to date. Embarrassingly huge things. We not only had one, but we had several maxed out configurations. What can you do with a very large system at your disposal? Well, our mission was to research, quantify and explain very large database capabilities in the OS/400 environment. The driving force at the time was primarily data warehousing and business intelligence, as well as the emerging technology of parallelism. We were the first to demonstrate the positive benefits of exploiting 20GB of memory, a new system maximum at the time. I know, I know, 20GB doesn’t sound like much today but remember, WinTel database servers were stuck at 4GB and hitting that limit required buying another server for the farm (pause for effect and reflection). Of course today, an IBM Power system can have 16TB of memory. Wow, have times changed.

Back in the day, we were the first to build and use a 2.15 billion row table, then a 4.3 billion row table. And the first to build and use a 512GB table, then a 1TB table and then a 1.7TB table. We were the first to build large indexes using DB2 Symmetrical Multiprocessing feature to take advantage of all 8, then 12, then 24 processors. Ahhh, the days when a single job could light up all the front panel lights while pushing 24 CPUs to the max in one big whoosh of power! I really miss the lights…

Lo and behold, through the infinite power of the internet, I managed to dig up an article from 1998 that announced the results from my team’s creation and use of the largest single DB2/400 table to date – an unimaginable 2.5 terabytes comprised of 10 billion rows in a single table. For a blast from the past, you can find and read the article here

As you can see, the database capabilities of IBM i running on past and present Power systems are historic and vast – and unfortunately, little known (but you’re now coming up to speed, and spreading the word, right?).


What is VLDB, and Do I Have It?


The definition of very large database varies greatly, with new high water marks and attributes being established almost daily. This is especially true given the explosive expansion of data from all sorts of sources and events (see Big Data). But what about in the IBM i environment, what do we consider to be very large? Or better yet, when should you start to think and behave differently due to the size of a table or index?
 

Generally speaking, we start to seriously consider VLDB techniques and practices when a single table (or physical file) approaches 500GB in total size and/or the number of rows approaches or exceeds 1 billion (that’s 1 000 000 000 for my UK friends).

Another major consideration is the rate of data growth and the trend line illustrating when the table or physical file size limit will be reached. An event to watch out for is acquisition; when your company decides to branch out and/or acquire a large set of clients, customers or market share. If the current state of the database is unknown or unaccounted for, the acquisition and corresponding mass addition of data can push you into VLDB territory, or throw you up against a limit in one fell swoop.

Limit! What Limit?


Yes, Virginia, there is a limit. Even though DB2 for i enjoys some of the highest limits of any relational database system available, there are indeed maximums for a single data space (i.e. the container that physically holds the data). The two limits we refer to when talking about very large database are: size and number of rows.

Drum roll please... AND THE LIMITS ARE:

     1.7 terabytes or 4.3 billion (4 294 967 288) rows for a table / physical file

     1 terabyte for an index / keyed logical file

By the way, all of the DB2 for i limits can be found in the SQL Reference - Appendix A. 

Hey… wait a minute. If the limit is 1.7TB and/or 4.3 billion rows, how did you create a table that was 2.5TB in size containing 10 billion rows?

Simple, my team not only knows the limits, but more importantly, knows how and when to overcome them.

And so should you!


What Not To Do


In the past 2 months, the DB2 for i Center of Excellence has been involved in no less than four critical situations where IBM i users unknowingly hit the table limit. Sorry, let me rephrase that. They knew they hit something, ‘cause their application stopped working. But they didn’t know what they hit.  And frankly, THAT is the issue. These folks had no idea that a) there is a limit, b) they were really close to it, and c) when the limit is reached, the business stops. Have a nice day. 


What To Do


If you think your tables are getting big, or you are just curious, ask the system to show the metadata (i.e. data about data).  Try this:

Get the latest version of IBM i Navigator (aka IBM i Access - System i Navigator) – this would be the IBM i 7.1 Windows client. Install it, along with the latest service pack. Connect to your DB2 for i system and invoke Health Center.



The DB2 for i Health Center can show you a lot of things, but we are interested in the Size Limits for objects in a particular schema.



If you want to change the schema of interest, hit the button and do so:



When selecting the Size Limits tab, we get the opportunity to view the table and index current size compared to the maximum size or limit:




Take a look at both the table size and number of rows. If either of these are approaching the limit, you'll see a warning indicating a critical state. Please do be aware that if the schema has a lot of tables in it, this operation might take a few minutes.




If you are interested in where the index size is at, select this object type:



If you want a programmatic approach to reviewing your table and index sizes as compared to the limits, there is a stored procedure called QSYS2.Health_Size_Limits () and the description can be found here.


Do More


Strategically speaking, here is my list of IBM i Very Large Database critical success factors (aka “do or die”): 

2: get educated - obtain skills and expertise in all things DB2 for i
3: assess the current state of data, and identify the data growth and usage trends
4: build and implement a plan to handle large tables and indexes if approaching limits
5: build and implement a plan to modernize and re-engineer database objects and data access 

_____


The IBM DB2 for i Center of Excellence is the foremost (and dare I say only) team in the world fully qualified to assist you with very large database in the IBM i environment. As a matter of fact, the table limits and VLDB capabilities are what they are today because of the past research and leadership of the AS/400 Teraplex Center and the expertise of the men and women in the DB2 for i Development Laboratory.

If you need help with understanding the current state of your growing databases, or with meeting requirements for large data and scalability, contact me. We have a DB2 for i Very Large Database Consulting Workshop available that can identify solutions for overcoming limits to growth and ensure you are getting the most out of IBM i. We can also do remote database and application assessments to help identify any looming issues sooner than later – that is to say, before your business hits a wall. 

At the end of the day, rest assured that DB2 for i can handle your data whether you count rows in the millions or in the billions, assuming you embark on proper planning and implementation of the very large database practices and procedures for IBM i.





Read the original at DB2 for i.

Categories: Blogs Tags:

DB2 for i › Visit the Home of Coke

February 26th, 2013 Comments off
No, not Colombia.

Georgia.

The Spring 2013 RPG & DB2 Summit conference will be held the week of March 18 in Atlanta Georgia. Atlanta is the home of the world famous beverage Coca-cola (which by the way, did indeed have cocaine as an ingredient in the original formula).

Besides getting a refreshing beverage, you can get some refreshing updates on IBM i and DB2 technology.

Kent Milligan and Doug Mack from the DB2 for i Center of Excellence will be there presenting several topics on extracting usable information from data, faster and more efficiently.

I have also been asked to reprise my role as rabble-rouser in a session entitled: "i Advocate" Tutorial & Discussion.  If you want to know how to fight advocate for your IBM i system, and articulate the value YOU provide when coding for it, this session is for you. If you want to see your beloved platform continued to be marginalized and your skills mothballed, by all means stay home.

Obviously I believe that attending the RPG & DB2 Summit is a very effective way in which to increase your knowledge and skills, all while growing your network of really good people. Don't believe me? Check out all of the sessions and presenters here.

Hope to see you in Atlanta next month - with a curvy bottle in your hand!

Oh, and while in Atlanta, take time to walk up the world's largest exposed granite monolith - known as Stone Mountain. This is a great a way to acknowledge Spring's arrival in the northern hemisphere.


Read the original at DB2 for i.

Categories: Blogs Tags:

DB2 for i › Visit the Home of Coke

February 26th, 2013 Comments off
No, not Colombia.

Georgia.

The Spring 2013 RPG & DB2 Summit conference will be held the week of March 18 in Atlanta Georgia. Atlanta is the home of the world famous beverage Coca-cola (which by the way, did indeed have cocaine as an ingredient in the original formula).

Besides getting a refreshing beverage, you can get some refreshing updates on IBM i and DB2 technology.

Kent Milligan and Doug Mack from the DB2 for i Center of Excellence will be there presenting several topics on extracting usable information from data, faster and more efficiently.

I have also been asked to reprise my role as rabble-rouser in a session entitled: "i Advocate" Tutorial & Discussion.  If you want to know how to fight advocate for your IBM i system, and articulate the value YOU provide when coding for it, this session is for you. If you want to see your beloved platform continued to be marginalized and your skills mothballed, by all means stay home.

Obviously I believe that attending the RPG & DB2 Summit is a very effective way in which to increase your knowledge and skills, all while growing your network of really good people. Don't believe me? Check out all of the sessions and presenters here.

Hope to see you in Atlanta next month - with a curvy bottle in your hand!

Oh, and while in Atlanta, take time to walk up the world's largest exposed granite monolith - known as Stone Mountain. This is a great a way to acknowledge Spring's arrival in the northern hemisphere.


Read the original at DB2 for i.

Categories: Blogs Tags:

DB2 for i › Integrating XML – Past, Present and Future

February 15th, 2013 Comments off

In my entry on IBM i 7.1 TR6, I called out querying the web with new HTTP functions and using XML to do more with DB2 for i. It just so happens that the very cool example on developerWorks came from Nick Lawrence.

Nick is one of our hotshot software engineers working on DB2 for i in Rochester Minnesota.  He has been involved with putting good things in your tool bag since 1999.  His most recent set of goodies include full text search (think OmniFind), SQL/XML and XMLTABLE.

I asked Nick if he can share some additional insight and give us some guidance on XML and DB2 integration - past, present and future.

Ladies and gentlemen, Mr. Nick Lawrence...

____


Integrating SQL and XML


How do you process XML data?  In particular, how do you publish your relational data in an XML format?  If you’re like most DB2 for i developers, you’ve found working with XML both a necessity and a challenge.  The conversion from rows and columns into an XML document can be a daunting task, but one that cannot be avoided.  While rows and columns are optimal for the transactional and data warehouse workloads that characteristically run on IBM i, XML is the predominate standard for information exchange between web-based applications.  The desire to have web-accessible applications makes publishing relational data as XML data a business imperative.  Development communities can tackle the issue one of two ways.

    1.    Purchase or develop non-integrated and non-standardized tools
    2.    Utilize the integrated SQL/XML publishing functions available in DB2 for i 7.1

For customers not yet running on IBM i 7.1, the database has no native understanding of XML.  In other words, those folks are forced to take option one. 

For some customers, option one means purchasing the DB2 XML Extender portion of the IBM DB2 Extenders product (5761DE1 and 5770DE1).  The DB2 XML Extender option defines a set of user-defined types to represent XML, as well as many user-defined functions and stored procedures for performing common XML tasks. 

For other customers, option one means writing an in-house solution to process XML.   Unfortunately, this set of customers typically ends up reinventing a subset of the capabilities that are already available in the XML Extender option.  Worse yet, these XML solutions are usually not reusable and will need to be reinvented (again) when new XML requirements emerge.

Customers that have moved up to IBM i 7.1 have a built-in and standardized solution available to them for working with XML data using SQL/XML.  It shouldn’t be a surprise that IBM recommends the DB2 for i SQL/XML solution over in-house solutions and the XML Extender option.
 

But why is it better to have an integrated and standardized solution?


To answer the question, let’s look at a specific scenario where relational data is to be published as XML.  A comparison can then be made between a solution that uses SQL/XML and a similar solution that uses the DB2 XML Extender.  Because DB2 XML Extender is a chargeable product, we’d expect it to have more impressive features and capabilities than an average in-house solution.

Figure 1 shows a simplistic and normalized SQL schema that contains customer orders.  Each order has exactly one customer associated with it, and one or more kinds of parts that have been ordered.

Figure 1 - Sample schema





Figure 2 shows a query that produces one row for each type of part in an order. 

Figure 2 - Inner Join

        SELECT           order_id,
                                order_time,
                                customer_name,
                                customer_address,
                                part_code,
                                quantity
        FROM              order
        INNER JOIN    customer
             USING (cust_id)
        INNER JOIN    order_part
             USING (order_id);


The output of the join from Figure 2 (using a set of sample data) is shown in Figure 3.

Figure 3 - Output of inner join




From this data, assume that we wish to produce an XML document that represents order number two, as shown in Figure 4.

Figure 4 - Order document for order number 2

<?xml version="1.0" encoding="UTF-8"?>
<order>
    <id>2</id>
    <time>2012-12-16T10:00:00.000000</time>
    <customer>
        <name>Wally</name>
        <address>Minneapolis</address>
    </customer>
    <part_list>
        <part_list>
            <part>
                <code>Soda</code>
                <quantity>200</quantity>
            </part>
            <part>
                <code>Lunch</code>
                <quantity>10</quantity>
            </part>
            <part>
                <code>XML Book 123</code>
                <quantity>1</quantity>
            </part>
        </part_list>
    </part_list>
</order>


With SQL/XML, this can be accomplished with the query shown in Figure 5.  In this query, the common table expression (order_part_list) is used to aggregate the repeating part elements of an order into a single XML value.  The rest of the XML document is then built around that value using a join.  The join is very similar to what was used in Figure 2, except the join is to order_part_list (the aggregation of order_part), instead of the order_part table.

Figure 5 -SQL/XML query

        WITH order_part_list AS (
         SELECT order_id,
                XMLGROUP(part_code   AS "code",
                         quantity    AS "quantity"
                         ORDER BY quantity DESC
                         OPTION ROW "part"
                                ROOT "part_list"
                         )
                      AS part_list
         FROM        order_part
         GROUP BY    order_id
        )

        SELECT XMLDOCUMENT( 
                XMLELEMENT(NAME "order",
                XMLFOREST(order_id                   AS "id",
                           order_time                 AS "time",
                           XMLFOREST(customer_name    AS "name",
                                     customer_address AS "address")    
                           AS "customer",
                           opl.part_list     AS "part_list"
                          )
                )
               ) AS order_doc
        FROM               order
        INNER JOIN    customer
                    USING (cust_id)
        INNER JOIN    order_part_list opl
                    USING (order_id)
        WHERE           order_id = 2;


The XML Extender approach will encounter many obstacles in performing the same task.  The only way to build an XML document similar to Figure 4 with the XML Extender is to use a document access definition (DAD) file that defines an SQL mapping.  A DAD file that uses an RDB mapping appears to be simpler at first glance, but the RDB mapping’s lack of support for predicates (other than the join condition) makes the RDB mapping approach impossible for this scenario (order_id = 2 is not part of a join condition).  The SQL mapping maps the columns of an SQL statement to elements and attributes in the XML document.  In addition, the condition is allowed to include additional predicates in the WHERE clause.  A DAD file that contains an SQL mapping is shown in Figure 6.

Figure 6 - DAD file with SQL mapping

<?xml version="1.0" encoding="UTF-8"?>
<DAD>
    <validation>NO</validation>
    <Xcollection>
        <SQL_stmt>
         SELECT        order_id,
                        order_time,
                        customer_name,
                        customer_address,
                        part_code,
                        quantity
         FROM        order
         INNER JOIN    customer
             USING (cust_id)
         INNER JOIN    order_part
             USING (order_id)
         WHERE        order_id = 2
         ORDER BY    order_id,
                order_time,
                customer_name,
                customer_address,
                part_code,
                quantity
        </SQL_stmt>
        <prolog>?xml version="1.0"?</prolog>
        <root_node>
            <element_node name="order">
                <element_node name="id">
                    <text_node>
                        <column name="order_id" />
                    </text_node>
                </element_node>

                <element_node name="time">
                    <text_node>
                        <column name="order_time" />
                    </text_node>
                </element_node>

                <element_node name="customer">
                    <element_node name="name">
                        <text_node>
                            <column name="customer_name" />
                        </text_node>
                    </element_node>

                    <element_node name="address">
                        <text_node>
                          <column name="customer_address"/>
                        </text_node>
                    </element_node>

                </element_node>
                <element_node name="part_list">
                    <element_node name="part"
                                  multi_occurrence="YES">
                        <element_node name="code">
                          <text_node>
                                <column name="part_code" />
                           </text_node>
                        </element_node>
                        <element_node name="quantity">
                          <text_node>
                                <column name="quantity" />
                          </text_node>
                        </element_node>
                    </element_node>
                </element_node>

            </element_node>

        </root_node>

    </Xcollection>
</DAD>


One thing to keep in mind is that, when using the SQL mapping support in XML Extender, the columns in the SELECT list and ORDER BY clause of the SQL_stmt element must appear in the exact order they are presented in Figure 6.  The XML Extender product depends on these orderings to correctly generate the XML document.  An SQL/XML solution follows the normal SQL rules and does not have these less apparent restrictions.

The DAD file is considerably longer and more complicated than the SQL query that was shown in Figure 5.  In addition, it hasn’t simplified anything.  The join between the tables still needs to be coded explicitly in the condition element.  Additionally, the multi_occurrence attribute is needed when defining the part element; this option constructs an aggregation that is similar to the aggregation used by the common table expression in Figure 5. 

The "simple" mapping offered by the XML Extender has very quickly turned into a complicated solution!

An XML document can be published by passing the DAD file shown in Figure 6  as a parameter to the dxxGenXMLClob stored procedure.  For comparison, the XML document that is generated from the DAD is shown in Figure 7.

Figure 7 - XML Document created by dxxGenXMLClob

<?xml version="1.0"?>
<order>
  <id>2</id>
  <time>2012-12-16-10.00.00.000000</time>
  <customer>
    <name>Wally</name>
    <address>Minneapolis</address>
  </customer>
  <part_list>
    <part>
      <code>Lunch</code>
      <quantity>10</quantity>
    </part>
    <part>
      <code>Soda</code>
      <quantity>200</quantity>
    </part>
    <part>
      <code>XML Book 123</code>
      <quantity>1</quantity>
    </part>
  </part_list>
</order>


There are a few important differences between the document in Figure 7 and the document that was shown in Figure 4.

The order of the part elements is not the same.  In the SQL/XML query shown in Figure 5, the ORDER BY clause is used within the XMLGROUP function to guarantee a particular order of part elements, however the DAD file solution is not capable of specifying this ordering.

In Figure 7, the content of the time element is not stored in the industry standard xs:dateTime data type.  When SQL/XML publishing functions are used, the conversion happens automatically (as can be seen in Figure 4).

There are some other limitations with the dxxGenXMLClob that are important to mention.

The dxxGenXMLClob stored procedure is capable of returning only a single XML document.  An SQL/XML query returns a result set that can contain many rows and columns.  This means that the dxxGenXMLClob may be inadequate if it became necessary to generate one row for each of the order documents, instead of a single XML document for a specific order.

There is no way to use a parameter marker or host variable when using a DAD file and the dxxGenXMLClob stored procedure.  When using a solution based on SQL/XML, host variables and parameter markers work in a very similar way as any other SQL query.

The DB2 XML Extender has NOT been modernized to support XML namespaces.  In contrast, SQL/XML fully supports namespaces.

Using a DAD file and the dxxGenXMLClob procedure has an advantage over a writing an in-house RPG procedure to publish XML document; the DAD file can be changed to produce a structurally different XML document without recompiling any programs or procedures.  The SQL/XML query in Figure 5 offers the same advantage if the query is encapsulated within a view.  A view allows the query (and the resulting XML document) to be changed without the need to modify the programs that are using the view.  Another idea to keep in mind is that data produced by a view can be reused in many different queries, which is something an XML extender solution cannot accomplish as easily.

A performance comparison between the XML Extender solution and the SQL/XML query is not something that can be easily assessed, given that every use case has unique characteristics.  However, the dxxGenXMLClob procedure needs to run multiple SQL queries to perform the required aggregation(s), and will construct the XML document using the results of the queries, rather than as a part of the query.  The SQL/XML solution in Figure 5 is a single SQL query that does everything.  Figure 8 shows the SQL statements that were logged in the SQL Performance Monitor during a single call to the dxxGenXMLClob procedure.

Figure 8 - SQL statements used by dxxGenXMLClob

SELECT "ORDER_ID", "ORDER_TIME", "CUSTOMER_NAME" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "CUSTOMER_NAME"

SELECT "ORDER_ID", "ORDER_TIME", "CUSTOMER_NAME", "CUSTOMER_ADDRESS" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "CUSTOMER_ADDRESS"

SELECT "ORDER_ID", "ORDER_TIME", "CUSTOMER_NAME", "CUSTOMER_ADDRESS", "PART_CODE" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "PART_CODE"

SELECT "ORDER_ID" FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp ORDER BY "ORDER_ID"

SELECT "ORDER_ID", "ORDER_TIME" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "ORDER_TIME"



Hopefully, a comparison between the query in Figure 5 and the ones in Figure 8 provides convincing evidence that the SQL/XML solution (Figure 5) is going to be less work for DB2 for i to optimize and evaluate. 


This blog entry has only very briefly touched on some of the advantages of publishing XML using SQL/XML functions; there’s also support in DB2 for i 7.1 for decomposing XML, validating XML, querying XML, transforming XML, and full text search of XML data.  Similar to the publishing functions, there are significant advantages to using an integrated and standardized solution for each of these tasks.

The SQL/XML function in DB2 for i 7.1 is standardized, has more capabilities, and is simpler to code and maintain.  There really isn’t any convincing motivation to choose a non-integrated solution over the built in support in 7.1.  In fact, DB2 for z/OS and DB2 for LUW have already discontinued their support for the XML Extender option; instructing customers to modify their applications to use an SQL/XML solution as a replacement. 

As with any new function, learning to use SQL/XML effectively will require some investment of time and education, but in this case the return on investment is worth it.

The really great news is that because the SQL/XML functionality is standardized across the industry and DB2 family, there is a significant amount of support and educational opportunities available.


More Sources of Information


We’ve recently published a whitepaper here that describes the differences between XML Extender and the built-in XML data type and support in great detail.  The paper discusses many practical use cases for XML and includes a number of RPG examples.  This is well worth a read by anyone interested in SQL and XML technology.


Be sure to also check out the SQL/XML programmer’s guide in the IBM i information center here.  This document has a number of tutorials included in it.


See also several articles in developerWorks and IBM Systems Magazine...

XML Overview
http://www.ibmsystemsmag.com/ibmi/developer/general/xml_db2_part1/
http://www.ibmsystemsmag.com/ibmi/developer/general/xml_db2_part2/

Using XML and RPG
http://www.ibm.com/developerworks/ibmi/library/i-using-rpg/index.html

i Can Use XML in a Relational World
http://ibmsystemsmag.blogs.com/i_can/2012/05/i-can-use-xml-in-a-relational-world.html

If you are currently using IBM DB2 XML Extender option, you may also find the manual in the IBM Information Center for this product helpful here.

Finally, the IBM STG Lab Services DB2 for i Center of Excellence team provides education and consulting on a wide range of database topics.  This team works closely with the DB2 for i Development team and is a great resource.

____


Fantastic, thanks Nick!  This really helps to put the XML integration options in perspective.


Read the original at DB2 for i.

Categories: Blogs Tags:

DB2 for i › Integrating XML – Past, Present and Future

February 15th, 2013 Comments off

In my entry on IBM i 7.1 TR6, I called out querying the web with new HTTP functions and using XML to do more with DB2 for i. It just so happens that the very cool example on developerWorks came from Nick Lawrence.

Nick is one of our hotshot software engineers working on DB2 for i in Rochester Minnesota.  He has been involved with putting good things in your tool bag since 1999.  His most recent set of goodies include full text search (think OmniFind), SQL/XML and XMLTABLE.

I asked Nick if he can share some additional insight and give us some guidance on XML and DB2 integration - past, present and future.

Ladies and gentlemen, Mr. Nick Lawrence...

____


Integrating SQL and XML


How do you process XML data?  In particular, how do you publish your relational data in an XML format?  If you’re like most DB2 for i developers, you’ve found working with XML both a necessity and a challenge.  The conversion from rows and columns into an XML document can be a daunting task, but one that cannot be avoided.  While rows and columns are optimal for the transactional and data warehouse workloads that characteristically run on IBM i, XML is the predominate standard for information exchange between web-based applications.  The desire to have web-accessible applications makes publishing relational data as XML data a business imperative.  Development communities can tackle the issue one of two ways.

    1.    Purchase or develop non-integrated and non-standardized tools
    2.    Utilize the integrated SQL/XML publishing functions available in DB2 for i 7.1

For customers not yet running on IBM i 7.1, the database has no native understanding of XML.  In other words, those folks are forced to take option one. 

For some customers, option one means purchasing the DB2 XML Extender portion of the IBM DB2 Extenders product (5761DE1 and 5770DE1).  The DB2 XML Extender option defines a set of user-defined types to represent XML, as well as many user-defined functions and stored procedures for performing common XML tasks. 

For other customers, option one means writing an in-house solution to process XML.   Unfortunately, this set of customers typically ends up reinventing a subset of the capabilities that are already available in the XML Extender option.  Worse yet, these XML solutions are usually not reusable and will need to be reinvented (again) when new XML requirements emerge.

Customers that have moved up to IBM i 7.1 have a built-in and standardized solution available to them for working with XML data using SQL/XML.  It shouldn’t be a surprise that IBM recommends the DB2 for i SQL/XML solution over in-house solutions and the XML Extender option.
 

But why is it better to have an integrated and standardized solution?


To answer the question, let’s look at a specific scenario where relational data is to be published as XML.  A comparison can then be made between a solution that uses SQL/XML and a similar solution that uses the DB2 XML Extender.  Because DB2 XML Extender is a chargeable product, we’d expect it to have more impressive features and capabilities than an average in-house solution.

Figure 1 shows a simplistic and normalized SQL schema that contains customer orders.  Each order has exactly one customer associated with it, and one or more kinds of parts that have been ordered.

Figure 1 - Sample schema





Figure 2 shows a query that produces one row for each type of part in an order. 

Figure 2 - Inner Join

        SELECT           order_id,
                                order_time,
                                customer_name,
                                customer_address,
                                part_code,
                                quantity
        FROM              order
        INNER JOIN    customer
             USING (cust_id)
        INNER JOIN    order_part
             USING (order_id);


The output of the join from Figure 2 (using a set of sample data) is shown in Figure 3.

Figure 3 - Output of inner join




From this data, assume that we wish to produce an XML document that represents order number two, as shown in Figure 4.

Figure 4 - Order document for order number 2

<?xml version="1.0" encoding="UTF-8"?>
<order>
    <id>2</id>
    <time>2012-12-16T10:00:00.000000</time>
    <customer>
        <name>Wally</name>
        <address>Minneapolis</address>
    </customer>
    <part_list>
        <part_list>
            <part>
                <code>Soda</code>
                <quantity>200</quantity>
            </part>
            <part>
                <code>Lunch</code>
                <quantity>10</quantity>
            </part>
            <part>
                <code>XML Book 123</code>
                <quantity>1</quantity>
            </part>
        </part_list>
    </part_list>
</order>


With SQL/XML, this can be accomplished with the query shown in Figure 5.  In this query, the common table expression (order_part_list) is used to aggregate the repeating part elements of an order into a single XML value.  The rest of the XML document is then built around that value using a join.  The join is very similar to what was used in Figure 2, except the join is to order_part_list (the aggregation of order_part), instead of the order_part table.

Figure 5 -SQL/XML query

        WITH order_part_list AS (
         SELECT order_id,
                XMLGROUP(part_code   AS "code",
                         quantity    AS "quantity"
                         ORDER BY quantity DESC
                         OPTION ROW "part"
                                ROOT "part_list"
                         )
                      AS part_list
         FROM        order_part
         GROUP BY    order_id
        )

        SELECT XMLDOCUMENT( 
                XMLELEMENT(NAME "order",
                XMLFOREST(order_id                   AS "id",
                           order_time                 AS "time",
                           XMLFOREST(customer_name    AS "name",
                                     customer_address AS "address")    
                           AS "customer",
                           opl.part_list     AS "part_list"
                          )
                )
               ) AS order_doc
        FROM               order
        INNER JOIN    customer
                    USING (cust_id)
        INNER JOIN    order_part_list opl
                    USING (order_id)
        WHERE           order_id = 2;


The XML Extender approach will encounter many obstacles in performing the same task.  The only way to build an XML document similar to Figure 4 with the XML Extender is to use a document access definition (DAD) file that defines an SQL mapping.  A DAD file that uses an RDB mapping appears to be simpler at first glance, but the RDB mapping’s lack of support for predicates (other than the join condition) makes the RDB mapping approach impossible for this scenario (order_id = 2 is not part of a join condition).  The SQL mapping maps the columns of an SQL statement to elements and attributes in the XML document.  In addition, the condition is allowed to include additional predicates in the WHERE clause.  A DAD file that contains an SQL mapping is shown in Figure 6.

Figure 6 - DAD file with SQL mapping

<?xml version="1.0" encoding="UTF-8"?>
<DAD>
    <validation>NO</validation>
    <Xcollection>
        <SQL_stmt>
         SELECT        order_id,
                        order_time,
                        customer_name,
                        customer_address,
                        part_code,
                        quantity
         FROM        order
         INNER JOIN    customer
             USING (cust_id)
         INNER JOIN    order_part
             USING (order_id)
         WHERE        order_id = 2
         ORDER BY    order_id,
                order_time,
                customer_name,
                customer_address,
                part_code,
                quantity
        </SQL_stmt>
        <prolog>?xml version="1.0"?</prolog>
        <root_node>
            <element_node name="order">
                <element_node name="id">
                    <text_node>
                        <column name="order_id" />
                    </text_node>
                </element_node>

                <element_node name="time">
                    <text_node>
                        <column name="order_time" />
                    </text_node>
                </element_node>

                <element_node name="customer">
                    <element_node name="name">
                        <text_node>
                            <column name="customer_name" />
                        </text_node>
                    </element_node>

                    <element_node name="address">
                        <text_node>
                          <column name="customer_address"/>
                        </text_node>
                    </element_node>

                </element_node>
                <element_node name="part_list">
                    <element_node name="part"
                                  multi_occurrence="YES">
                        <element_node name="code">
                          <text_node>
                                <column name="part_code" />
                           </text_node>
                        </element_node>
                        <element_node name="quantity">
                          <text_node>
                                <column name="quantity" />
                          </text_node>
                        </element_node>
                    </element_node>
                </element_node>

            </element_node>

        </root_node>

    </Xcollection>
</DAD>


One thing to keep in mind is that, when using the SQL mapping support in XML Extender, the columns in the SELECT list and ORDER BY clause of the SQL_stmt element must appear in the exact order they are presented in Figure 6.  The XML Extender product depends on these orderings to correctly generate the XML document.  An SQL/XML solution follows the normal SQL rules and does not have these less apparent restrictions.

The DAD file is considerably longer and more complicated than the SQL query that was shown in Figure 5.  In addition, it hasn’t simplified anything.  The join between the tables still needs to be coded explicitly in the condition element.  Additionally, the multi_occurrence attribute is needed when defining the part element; this option constructs an aggregation that is similar to the aggregation used by the common table expression in Figure 5. 

The "simple" mapping offered by the XML Extender has very quickly turned into a complicated solution!

An XML document can be published by passing the DAD file shown in Figure 6  as a parameter to the dxxGenXMLClob stored procedure.  For comparison, the XML document that is generated from the DAD is shown in Figure 7.

Figure 7 - XML Document created by dxxGenXMLClob

<?xml version="1.0"?>
<order>
  <id>2</id>
  <time>2012-12-16-10.00.00.000000</time>
  <customer>
    <name>Wally</name>
    <address>Minneapolis</address>
  </customer>
  <part_list>
    <part>
      <code>Lunch</code>
      <quantity>10</quantity>
    </part>
    <part>
      <code>Soda</code>
      <quantity>200</quantity>
    </part>
    <part>
      <code>XML Book 123</code>
      <quantity>1</quantity>
    </part>
  </part_list>
</order>


There are a few important differences between the document in Figure 7 and the document that was shown in Figure 4.

The order of the part elements is not the same.  In the SQL/XML query shown in Figure 5, the ORDER BY clause is used within the XMLGROUP function to guarantee a particular order of part elements, however the DAD file solution is not capable of specifying this ordering.

In Figure 7, the content of the time element is not stored in the industry standard xs:dateTime data type.  When SQL/XML publishing functions are used, the conversion happens automatically (as can be seen in Figure 4).

There are some other limitations with the dxxGenXMLClob that are important to mention.

The dxxGenXMLClob stored procedure is capable of returning only a single XML document.  An SQL/XML query returns a result set that can contain many rows and columns.  This means that the dxxGenXMLClob may be inadequate if it became necessary to generate one row for each of the order documents, instead of a single XML document for a specific order.

There is no way to use a parameter marker or host variable when using a DAD file and the dxxGenXMLClob stored procedure.  When using a solution based on SQL/XML, host variables and parameter markers work in a very similar way as any other SQL query.

The DB2 XML Extender has NOT been modernized to support XML namespaces.  In contrast, SQL/XML fully supports namespaces.

Using a DAD file and the dxxGenXMLClob procedure has an advantage over a writing an in-house RPG procedure to publish XML document; the DAD file can be changed to produce a structurally different XML document without recompiling any programs or procedures.  The SQL/XML query in Figure 5 offers the same advantage if the query is encapsulated within a view.  A view allows the query (and the resulting XML document) to be changed without the need to modify the programs that are using the view.  Another idea to keep in mind is that data produced by a view can be reused in many different queries, which is something an XML extender solution cannot accomplish as easily.

A performance comparison between the XML Extender solution and the SQL/XML query is not something that can be easily assessed, given that every use case has unique characteristics.  However, the dxxGenXMLClob procedure needs to run multiple SQL queries to perform the required aggregation(s), and will construct the XML document using the results of the queries, rather than as a part of the query.  The SQL/XML solution in Figure 5 is a single SQL query that does everything.  Figure 8 shows the SQL statements that were logged in the SQL Performance Monitor during a single call to the dxxGenXMLClob procedure.

Figure 8 - SQL statements used by dxxGenXMLClob

SELECT "ORDER_ID", "ORDER_TIME", "CUSTOMER_NAME" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "CUSTOMER_NAME"

SELECT "ORDER_ID", "ORDER_TIME", "CUSTOMER_NAME", "CUSTOMER_ADDRESS" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "CUSTOMER_ADDRESS"

SELECT "ORDER_ID", "ORDER_TIME", "CUSTOMER_NAME", "CUSTOMER_ADDRESS", "PART_CODE" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "PART_CODE"

SELECT "ORDER_ID" FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp ORDER BY "ORDER_ID"

SELECT "ORDER_ID", "ORDER_TIME" FROM (SELECT * FROM (SELECT order_id, order_time, customer_name, customer_address, part_code, quantity FROM order INNER JOIN customer USING (cust_id) INNER JOIN order_part USING (order_id) WHERE order_id = ?) dxx_tab_exp WHERE ORDER_ID=?) dxx_tab_exp ORDER BY "ORDER_TIME"



Hopefully, a comparison between the query in Figure 5 and the ones in Figure 8 provides convincing evidence that the SQL/XML solution (Figure 5) is going to be less work for DB2 for i to optimize and evaluate. 


This blog entry has only very briefly touched on some of the advantages of publishing XML using SQL/XML functions; there’s also support in DB2 for i 7.1 for decomposing XML, validating XML, querying XML, transforming XML, and full text search of XML data.  Similar to the publishing functions, there are significant advantages to using an integrated and standardized solution for each of these tasks.

The SQL/XML function in DB2 for i 7.1 is standardized, has more capabilities, and is simpler to code and maintain.  There really isn’t any convincing motivation to choose a non-integrated solution over the built in support in 7.1.  In fact, DB2 for z/OS and DB2 for LUW have already discontinued their support for the XML Extender option; instructing customers to modify their applications to use an SQL/XML solution as a replacement. 

As with any new function, learning to use SQL/XML effectively will require some investment of time and education, but in this case the return on investment is worth it.

The really great news is that because the SQL/XML functionality is standardized across the industry and DB2 family, there is a significant amount of support and educational opportunities available.


More Sources of Information


We’ve recently published a whitepaper here that describes the differences between XML Extender and the built-in XML data type and support in great detail.  The paper discusses many practical use cases for XML and includes a number of RPG examples.  This is well worth a read by anyone interested in SQL and XML technology.


Be sure to also check out the SQL/XML programmer’s guide in the IBM i information center here.  This document has a number of tutorials included in it.


See also several articles in developerWorks and IBM Systems Magazine...

XML Overview
http://www.ibmsystemsmag.com/ibmi/developer/general/xml_db2_part1/
http://www.ibmsystemsmag.com/ibmi/developer/general/xml_db2_part2/

Using XML and RPG
http://www.ibm.com/developerworks/ibmi/library/i-using-rpg/index.html

i Can Use XML in a Relational World
http://ibmsystemsmag.blogs.com/i_can/2012/05/i-can-use-xml-in-a-relational-world.html

If you are currently using IBM DB2 XML Extender option, you may also find the manual in the IBM Information Center for this product helpful here.

Finally, the IBM STG Lab Services DB2 for i Center of Excellence team provides education and consulting on a wide range of database topics.  This team works closely with the DB2 for i Development team and is a great resource.

____


Fantastic, thanks Nick!  This really helps to put the XML integration options in perspective.


Read the original at DB2 for i.

Categories: Blogs Tags:

DB2 for i › What to Make of IBM i 7.1 TR6?

February 6th, 2013 Comments off
More of the same...

Let me explain why this is a good thing.

No, a GREAT thing.

_______

Better, faster, cheaper hardware is great.

Better, faster, cheaper software is great.

But what else can you take away from this announcement?

The promise of useful enhancements delivered on a regular basis via a "technology refresh" is once again fulfilled.

In real business terms, this means the IBM i engineering and development teams have been working hard on your behalf. The result of this hard work is that you get more features, more functions and more benefit from your investment in, and use of a Power system.

For the C level executives and their accountants, this means a higher ROI.  GREAT, keep it coming.

For the programmers doing the heavy lifting, this means innovative solutions and higher productivity.  More of the same please.


DB2 is Doing More, So You Don't Have To


For years I have discussed and debated the merits of getting DB2 for i to do more. This is the very essence of data centric processing - allowing the integrated database management system to handle more of the work, so you don't have to.  Imagine, someone else designs it, codes it, tests it, tunes it. You, well... you use it.

But this can only happen if the feature or function you require exists inside of DB2 (and you are using SQL to invoke or access it). Obviously, this is what a technology refresh is all about! Namely, providing you more options, more techniques, more methods to get the job done.



Great! Give Some Examples

 

Taking a look inside IBM i 7.1 TR6, we find a number of interesting and useful things on the database side of the house...

FOR SYSTEM NAME clause

Given that IBM i users enjoy (and take advantage of) two naming conventions and two interfaces into DB2 for i (SQL and native), controling both the SQL "long" name and the system "short" name can be cumbersome. If you are only using SQL to reference objects, no problem. But what if you also use IBM i commands? Specifying a proper short name is normally a good practice. This is now easier via the new SQL clause.  Creating a table, view or index with a specific short name to go with the long name is much simpler - as it should be.

The optional FOR SYSTEM NAME clause has been added to SQL statements:

    CREATE TABLE
    CREATE VIEW
    CREATE INDEX
    DECLARE GLOBAL TEMPORARY TABLE


Multiple events within a single SQL trigger

SQL triggers are a very powerful way to invoke data centric processing by extending what happens during a database operation or transaction.

A multiple event trigger is a trigger that can handle INSERT, UPDATE, and DELETE triggering events within a single SQL (trigger) program.

To visualize this new support, think about this logic:

Database operation occurs, trigger fires, program is called AND...

IF INSERTING
THEN do these things;

IF DELETING
THEN do this stuff;

IF UPDATING
THEN do this other stuff;


New FUNCTIONS for RESTful services

I think of this support as "query the web". And I think it's awesome.

Delivered as a set of "tools" in the SYSTOOLS schema, UDFs now provide RESTful services to integrate information sources that can be addressed via a URL and accessed using HTTP.

Think "building blocks" to assemble a clever application that makes use of the vast repository out and up in the clouds.  These functional building blocks are:

    httpGetBlob
    httpGetClob
    httpPutBlob
    httpPutClob
    httpPostBlob
    httpPostClob
    httpDeleteBlob
    httpDeleteClob
    httpBlob
    httpClob
    httpHead

This means it is now very easy to write an SQL statement that can access and/or manipulate unstructured information via HTTP. Imagine pulling in text from a weather service, news outlet or blog via SQL, like this:

SELECT SYSTOOLS.HTTPGETCLOB('http://db2fori.blogspot.com/2013/01/but-mike-can-i-trust-query-optimizer.html','') FROM SYSIBM.SYSDUMMY1;

Combine these new building blocks with the IBM i 7.1 XML support found in DB2 and you are well on your way to providing more value. And that's what it's all about isn't it - more value.

_______

Once again, more of the same from the fine folks in DB2 Development; great things to be found inside the IBM i box.

For a complete look at Technology Refresh 6, go here.

If you have any questions or concerns on how to take advantage of IBM i 7.1, especially when it comes to information management, reach out. We're here to help.


Read the original at DB2 for i.

Categories: Blogs Tags:

DB2 for i › What to Make of IBM i 7.1 TR6?

February 6th, 2013 Comments off
More of the same...

Let me explain why this is a good thing.

No, a GREAT thing.

_______

Better, faster, cheaper hardware is great.

Better, faster, cheaper software is great.

But what else can you take away from this announcement?

The promise of useful enhancements delivered on a regular basis via a "technology refresh" is once again fulfilled.

In real business terms, this means the IBM i engineering and development teams have been working hard on your behalf. The result of this hard work is that you get more features, more functions and more benefit from your investment in, and use of a Power system.

For the C level executives and their accountants, this means a higher ROI.  GREAT, keep it coming.

For the programmers doing the heavy lifting, this means innovative solutions and higher productivity.  More of the same please.


DB2 is Doing More, So You Don't Have To


For years I have discussed and debated the merits of getting DB2 for i to do more. This is the very essence of data centric processing - allowing the integrated database management system to handle more of the work, so you don't have to.  Imagine, someone else designs it, codes it, tests it, tunes it. You, well... you use it.

But this can only happen if the feature or function you require exists inside of DB2 (and you are using SQL to invoke or access it). Obviously, this is what a technology refresh is all about! Namely, providing you more options, more techniques, more methods to get the job done.



Great! Give Some Examples

 

Taking a look inside IBM i 7.1 TR6, we find a number of interesting and useful things on the database side of the house...

FOR SYSTEM NAME clause

Given that IBM i users enjoy (and take advantage of) two naming conventions and two interfaces into DB2 for i (SQL and native), controling both the SQL "long" name and the system "short" name can be cumbersome. If you are only using SQL to reference objects, no problem. But what if you also use IBM i commands? Specifying a proper short name is normally a good practice. This is now easier via the new SQL clause.  Creating a table, view or index with a specific short name to go with the long name is much simpler - as it should be.

The optional FOR SYSTEM NAME clause has been added to SQL statements:

    CREATE TABLE
    CREATE VIEW
    CREATE INDEX
    DECLARE GLOBAL TEMPORARY TABLE


Multiple events within a single SQL trigger

SQL triggers are a very powerful way to invoke data centric processing by extending what happens during a database operation or transaction.

A multiple event trigger is a trigger that can handle INSERT, UPDATE, and DELETE triggering events within a single SQL (trigger) program.

To visualize this new support, think about this logic:

Database operation occurs, trigger fires, program is called AND...

IF INSERTING
THEN do these things;

IF DELETING
THEN do this stuff;

IF UPDATING
THEN do this other stuff;


New FUNCTIONS for RESTful services

I think of this support as "query the web". And I think it's awesome.

Delivered as a set of "tools" in the SYSTOOLS schema, UDFs now provide RESTful services to integrate information sources that can be addressed via a URL and accessed using HTTP.

Think "building blocks" to assemble a clever application that makes use of the vast repository out and up in the clouds.  These functional building blocks are:

    httpGetBlob
    httpGetClob
    httpPutBlob
    httpPutClob
    httpPostBlob
    httpPostClob
    httpDeleteBlob
    httpDeleteClob
    httpBlob
    httpClob
    httpHead

This means it is now very easy to write an SQL statement that can access and/or manipulate unstructured information via HTTP. Imagine pulling in text from a weather service, news outlet or blog via SQL, like this:

SELECT SYSTOOLS.HTTPGETCLOB('http://db2fori.blogspot.com/2013/01/but-mike-can-i-trust-query-optimizer.html','') FROM SYSIBM.SYSDUMMY1;

Combine these new building blocks with the IBM i 7.1 XML support found in DB2 and you are well on your way to providing more value. And that's what it's all about isn't it - more value.

_______

Once again, more of the same from the fine folks in DB2 Development; great things to be found inside the IBM i box.

For a complete look at Technology Refresh 6, go here.

If you have any questions or concerns on how to take advantage of IBM i 7.1, especially when it comes to information management, reach out. We're here to help.


Read the original at DB2 for i.

Categories: Blogs Tags:

DB2 for i › …But Mike, Can I Trust the Query Optimizer?

January 27th, 2013 Comments off
This is usually the 2nd question I hear most often, right after:

"...but Mike, will SQL perform as well as my RPG/COBOL program"?

Before I provide an answer (and it's not exactly what you think), let's explore a few things.

According to merriam-webster.com

TRUST is defined as:

- assured reliance on the character, ability, strength, or truth of someone or something

- one in which confidence is placed

And CONFIDENCE is defined as:

- a feeling or consciousness of one's powers or of reliance on one's circumstances

- faith or belief that one will act in a right, proper, or effective way

So, if you don't trust the query optimizer, what you are saying is that you don't have confidence in it's ability. If you don't have confidence in the query optimizer, you are really stating that it will not act in the right, proper or effective way. And to make matters worse, you find yourself stuck between a rock and a hard place. Because how can you make use of the latest and greatest DB2 features available only via SQL, if you don't have faith in the SQL query optimizer?

Hold on a minute...  how do YOU know what the right, proper or effective way really is?

Are you as omnipresent as the query optimizer?

By the way, Merriam Webster tells us OMNIPRESENT means: present in all places at all times.


Query Optimization Defined


The query optimizer's job is to produce a plan. Producing a plan is the result of many factors. Factors that will limit or expand what is possible with regards to the strategies and methods of execution to produce the correct result in a timely manner.

Knowing and understanding these factors are key to trusting the query optimizer. Experience with how these factors affect the final plan also goes a long way in building confidence that the query optimizer has provided the best plan for your query, your data and your environment.

What is the "best" plan anyway?

Most folks don't appreciate that the query optimizer has two equally important jobs.

Job 1: build a plan that produces the CORRECT output

Job 2: build a plan that is ESTIMATED to be the fastest possible given all the factors present and accounted for

We all forget (and thus have full trust and confidence in) the optimizer's first job. We all assume the plan for our query will produce the correct output. Don't believe me? When was the last time you actually validated your query's output was correct?  HA!  You DO trust the optimizer don't you.

What most users focus on is Job 2.  In other words, we all keep our eyes on the query run time. This is where we begin to loose faith. This is where we loose confidence with the programmer in the box.


Ah, The Programmer in the Box


Or is it The Ghost in the Machine

If we think of the query plan as a program - a program executed by the database engine and managed by the operating system, then there must be a programmer behind the scenes to design and code that program. In essence, this is exactly what the query optimizer is. So, getting back to the original question on top, you are really asking: "can I trust the programmer in the box". Let's open the box and peer inside the machine...

The programmer is highly trained, highly experienced, up-to-date on all things computer science past and present. The programmer never sleeps, is always ready.  The programmer is fast - capable of designing, writing and compiling programs in milliseconds.  The programmer is vigilant - watching, learning, adapting to the changing environment.


The Process


What does the programmer have to work with? What are the requirements and the inputs?

As with any programming endeavor, query optimization starts with a request, and continues with identifying the attributes of the underlying data and data structures, as well as the computing and I/O resources available.

Most users realize that the SQL statement is the request. This request, can include items such as:
  • Column list projected or returned
  • Source or target of query
  • Constraining or identify criteria
  • Grouping or aggregation criteria
  • Constraining criteria on groups or aggregates
  • Ordering criteria for final result set
  • Constraining number of  results
 Syntactically, these are known to SQL coders respectively as:

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
FETCH FIRST

Most SQL users DO NOT fully understand or appreciate the data profile, the data structures, and the system's computing and I/O capabilities (speeds and feeds if you will). Yet, the programmer in the box certainly does.

Imagine a programmer that not only has all the information, but also has access to all the structures above and below the machine interface. Imagine having access to the latest computational techniques materialized in the IBM i microcode. This would be a true advantage, and hopefully would in turn inspire confidence and trust in the user.


The Key Factors


From my perspective, the 3.5 most important factors that positively or negatively affect the programmer in the box are:

   1. Indexing strategy

   2. Fair share of memory

   3. Optimization goal

   3.5. Reusable open data path (ODP)

Without an adequate indexing strategy, the query optimizer is limited in the techniques available.  Not unlike the RPG or COBOL programmer who wants desperately to access one row by key (think CHAIN or random read), but no one provided the correct keyed logical file that enables the technique.

Without an adequate fair share of memory, the database engine is limited in how aggressive it can be with asynchronously reading data, and keeping that data in memory to minimize wait time.  Not unlike the RPG or COBOL programmer who wants desperately to use an internal array or table, but can't due to size limitations.

Without properly communicating the optimization goal, the query optimizer is unaware of the application's fetch behavior. This can result in a plan that is best for returning all of the query results instead of a plan that returns the first subset of results. Not unlike the RPG or COBOL programmer who reads all the data and loads the entire subfile, instead of filling the first page and returning control to the user sooner than later.

Without proper data base design, good set-at-a-time coding techniques and data centric processing, the query optimizer is forced to revisit the query plan over and over. To do this, the cursor is (hard) closed and the query optimizer is called upon to ensure that the current plan is doing the right things with the right objects. Not unlike the RPG or COBOL programmer who is frequently and repeatedly tapped to end their program, revisit the design, objects and techniques - all while the user patiently waits for the program to come back on line.


Criticize Things You Don't Know About


In my experience, trust and confidence come from KNOWLEDGE, EXPERIENCE and TIME. In other words, the more knowledge you have, the more experience you have, and the more time you've spent, the more comfortable you become. This is the same with query optimization. You become wiser to the ways and means of the programmer in the box.

Those of us who have worked extensively with the DB2 for i Query Optimizer (and the fantastic programmers of the box), never rarely question the plan produced. We first look around at all of the inputs, and seek to see the problem/solution from the optimizer's point of view. More often than not, an epiphany occurs.


Get Some


It just so happens you can acquire the knowledge and initial experience with SQL query optimization by attending the very workshop dedicated to it. This class is the very best way, short of a master/apprentice relationship to get more confidence and trust in DB2 for i.

More information on the workshop can be found here.

_______

...But Mike, Can I Trust the Query Optimizer?

My knowledge and experience tells me, "yes, you can".


Read the original at DB2 for i.

Categories: Blogs Tags:

DB2 for i › …But Mike, Can I Trust the Query Optimizer?

January 27th, 2013 Comments off
This is usually the 2nd question I hear most often, right after:

"...but Mike, will SQL perform as well as my RPG/COBOL program"?

Before I provide an answer (and it's not exactly what you think), let's explore a few things.

According to merriam-webster.com

TRUST is defined as:

- assured reliance on the character, ability, strength, or truth of someone or something

- one in which confidence is placed

And CONFIDENCE is defined as:

- a feeling or consciousness of one's powers or of reliance on one's circumstances

- faith or belief that one will act in a right, proper, or effective way

So, if you don't trust the query optimizer, what you are saying is that you don't have confidence in it's ability. If you don't have confidence in the query optimizer, you are really stating that it will not act in the right, proper or effective way. And to make matters worse, you find yourself stuck between a rock and a hard place. Because how can you make use of the latest and greatest DB2 features available only via SQL, if you don't have faith in the SQL query optimizer?

Hold on a minute...  how do YOU know what the right, proper or effective way really is?

Are you as omnipresent as the query optimizer?

By the way, Merriam Webster tells us OMNIPRESENT means: present in all places at all times.


Query Optimization Defined


The query optimizer's job is to produce a plan. Producing a plan is the result of many factors. Factors that will limit or expand what is possible with regards to the strategies and methods of execution to produce the correct result in a timely manner.

Knowing and understanding these factors are key to trusting the query optimizer. Experience with how these factors affect the final plan also goes a long way in building confidence that the query optimizer has provided the best plan for your query, your data and your environment.

What is the "best" plan anyway?

Most folks don't appreciate that the query optimizer has two equally important jobs.

Job 1: build a plan that produces the CORRECT output

Job 2: build a plan that is ESTIMATED to be the fastest possible given all the factors present and accounted for

We all forget (and thus have full trust and confidence in) the optimizer's first job. We all assume the plan for our query will produce the correct output. Don't believe me? When was the last time you actually validated your query's output was correct?  HA!  You DO trust the optimizer don't you.

What most users focus on is Job 2.  In other words, we all keep our eyes on the query run time. This is where we begin to loose faith. This is where we loose confidence with the programmer in the box.


Ah, The Programmer in the Box


Or is it The Ghost in the Machine

If we think of the query plan as a program - a program executed by the database engine and managed by the operating system, then there must be a programmer behind the scenes to design and code that program. In essence, this is exactly what the query optimizer is. So, getting back to the original question on top, you are really asking: "can I trust the programmer in the box". Let's open the box and peer inside the machine...

The programmer is highly trained, highly experienced, up-to-date on all things computer science past and present. The programmer never sleeps, is always ready.  The programmer is fast - capable of designing, writing and compiling programs in milliseconds.  The programmer is vigilant - watching, learning, adapting to the changing environment.


The Process


What does the programmer have to work with? What are the requirements and the inputs?

As with any programming endeavor, query optimization starts with a request, and continues with identifying the attributes of the underlying data and data structures, as well as the computing and I/O resources available.

Most users realize that the SQL statement is the request. This request, can include items such as:
  • Column list projected or returned
  • Source or target of query
  • Constraining or identify criteria
  • Grouping or aggregation criteria
  • Constraining criteria on groups or aggregates
  • Ordering criteria for final result set
  • Constraining number of  results
 Syntactically, these are known to SQL coders respectively as:

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
FETCH FIRST

Most SQL users DO NOT fully understand or appreciate the data profile, the data structures, and the system's computing and I/O capabilities (speeds and feeds if you will). Yet, the programmer in the box certainly does.

Imagine a programmer that not only has all the information, but also has access to all the structures above and below the machine interface. Imagine having access to the latest computational techniques materialized in the IBM i microcode. This would be a true advantage, and hopefully would in turn inspire confidence and trust in the user.


The Key Factors


From my perspective, the 3.5 most important factors that positively or negatively affect the programmer in the box are:

   1. Indexing strategy

   2. Fair share of memory

   3. Optimization goal

   3.5. Reusable open data path (ODP)

Without an adequate indexing strategy, the query optimizer is limited in the techniques available.  Not unlike the RPG or COBOL programmer who wants desperately to access one row by key (think CHAIN or random read), but no one provided the correct keyed logical file that enables the technique.

Without an adequate fair share of memory, the database engine is limited in how aggressive it can be with asynchronously reading data, and keeping that data in memory to minimize wait time.  Not unlike the RPG or COBOL programmer who wants desperately to use an internal array or table, but can't due to size limitations.

Without properly communicating the optimization goal, the query optimizer is unaware of the application's fetch behavior. This can result in a plan that is best for returning all of the query results instead of a plan that returns the first subset of results. Not unlike the RPG or COBOL programmer who reads all the data and loads the entire subfile, instead of filling the first page and returning control to the user sooner than later.

Without proper data base design, good set-at-a-time coding techniques and data centric processing, the query optimizer is forced to revisit the query plan over and over. To do this, the cursor is (hard) closed and the query optimizer is called upon to ensure that the current plan is doing the right things with the right objects. Not unlike the RPG or COBOL programmer who is frequently and repeatedly tapped to end their program, revisit the design, objects and techniques - all while the user patiently waits for the program to come back on line.


Criticize Things You Don't Know About


In my experience, trust and confidence come from KNOWLEDGE, EXPERIENCE and TIME. In other words, the more knowledge you have, the more experience you have, and the more time you've spent, the more comfortable you become. This is the same with query optimization. You become wiser to the ways and means of the programmer in the box.

Those of us who have worked extensively with the DB2 for i Query Optimizer (and the fantastic programmers of the box), never rarely question the plan produced. We first look around at all of the inputs, and seek to see the problem/solution from the optimizer's point of view. More often than not, an epiphany occurs.


Get Some


It just so happens you can acquire the knowledge and initial experience with SQL query optimization by attending the very workshop dedicated to it. This class is the very best way, short of a master/apprentice relationship to get more confidence and trust in DB2 for i.

More information on the workshop can be found here.

_______

...But Mike, Can I Trust the Query Optimizer?

My knowledge and experience tells me, "yes, you can".


Read the original at DB2 for i.

Categories: Blogs Tags:

DB2 for i › 6.1 or 7.1?

December 19th, 2012 Comments off

 

No question, No contest...


I am just returning from two weeks in Asia (Taipei and Shenzhen specifically). While there, I had the opportunity to engage and interact with a number of medium and large companies who are using IBM Power Systems, IBM i and DB2 for i to run their core business.

Needless to say, our beloved system is alive and well in so many major institutions around the world. As a side note, it would be really interesting (and thoroughly impossible) to know just how much electronic money flows on daily basis through the so called "legacy" systems designated by the simple letters of "i" and "z". Mind boggling I'm sure.

While interacting with my clients, I came to the realization that very few of them are using IBM i 7.1. For some strange reason they are planning to move, or have recently moved to 6.1. When I dug a bit deeper, I found that the general conscience was that 6.1 is "stable", and that 7.1 might not be. The rationale being that 7.1 is the most recent, thus "newest" code base. It needs time to be broken in. Frankly, this is old and outdated thinking.

After much head scratching on my part, I politely told my clients (and their anxious IBM counterparts) "you are mistaken, AND you are missing the boat".

The notion that IBM i version 7 release 1 is somehow not stable, not ready for action, not worthwhile, is simply ridiculous.

The code base has been available and in production for well over two years! The effort and energy to upgrade to 7.1 is the same as 6.1, yet the return on that investment (i.e. moving to 7.1) is way higher in my opinion - especially when it comes to DB2 for i and data centric programming. Add to this the steady cadence of IBM i 7.1 technology refreshes and you have a clear choice, either:

a) keep really current with the latest features, functions and benefits

  -or-

b) continue to fall further and further behind in capabilities and your ability to meet requirements

Furthermore, if you find yourself in a position to explain and/or defend DB2 for i against competitive technology, you better be talking about what's current, not what is 5, or even 7 years old.


The Bottom Line


If you are not already running IBM i 7.1 (and DB2 within), you should have plans on the drawing board to upgrade sooner than later.

If your plans have version 6 release 1 has the final destination, get the eraser (aka "rubber" for the British readers), remove the number "6" and write in the number "7", and be sure to add a line:  "plus TR5".

Keep all your implementation, testing and roll out plans in place, as this is still best practice.

Read and understand the 6.1 and 7.1 memo to users BEFORE testing and BEFORE going live.

Of course, if you need any kind of assistance, or rationale for the upgrade, certainly reach out. If you have some solution or application that for some reason isn't capable of running on 7.1, let us know that too.


Read the original at DB2 for i.

Categories: Blogs Tags: