Smart Scan

Disclaimer: This post was written in 2012 and covers the Exadata X2 model. I have no plans to update it to cover the later models, so please be aware that the information below is no longer be up to date.

Smart Scan is a subset of Cell Offload Processing and is perhaps the feature that is most highly associated with Exadata. Smart Scan allows Oracle databases to push query information down to the storage level, specifically:

  • Filtration (predicate information)
  • Column Projection
  • Join Filtering

To understand what this means think of a very simple example where a table with ten million rows is the subject of a full table scan by an Oracle database. In a non-Exadata environment this data has to be read from disk over the IO subsystem (most commonly fibre channel) using large, sequential multi-block IOs. During the time of this IO request the database process waiting for the data will be left in a wait state (known as db file scattered read) whilst the blocks from disk are scattered to available slots in the data cache (in memory on the database server). This will inevitable cause many useful blocks to be aged out of the cache, having adverse implications on performance going forward. Once the read completes Oracle will apply a number of filters to the data based on the query being run. For example:

SELECT     invoice_amount, sales_rep       <- Column projection
FROM       invoice_data
WHERE      country_code = 'UK'             <- Filtration
AND        invoice_quarter = '2011Q3';     <- Filtration

In this query we are only interested in a subset of the total number of columns (and there may be a significant number of additional columns on the INVOICE_DATA table). Likewise we are only interested in a subset of the rows, those which match the filters of being related to the UK and the third quarter of 2011. If this table contains rows for the last seven years this represents a significant amount of data which is not going to be part of the final result. To put this in different terms, an often significant percentage of the work undertaken is wasted effort.

In the world of data warehousing, where this type of activity is commonplace, the traditional strategies to avoid this situation are partitioning and parallel processing. The partitioning feature allows segments such as tables to be broken up into partitions which can be read place of the whole table – to use the example from above each financial year could be placed in a different table partition resulting in our query only needing to scan one-year’s worth of data instead of seven years. By using parallelism Oracle can then create multiple slave processes which can divide up the work of scanning the partition and run concurrently. This also has an additional advantage in that parallel reads are “direct path” which means they bypass the data cache, avoiding the impact to the useful blocks located there.

Oracle Exadata makes considerable use of parallel processing in order to achieve better performance – the obvious example being the 14 storage cells in a full rack which divide up the work accordingly (not through any managed process, but simply because ASM randomly scatters data blocks across all storage cells). Partitioning is still a separately licensed cost option but, in comparison to the cost of an Exadata system, one that most customers would be expected to purchase.

However, Exadata’s unique selling point – arguably the most important feature of the product – is its ability to offload the column projection and row filtration to the storage cells. This means that the details of SQL being executed are available to the storage cells and they can use them to reduce the volume of data being returned across the storage network to the database servers.

In the example above the cells are able to discard all column data other than those requested (the invoice amount and the details of the sales rep) as well as discarding all data which does not conform to the predicates of being in the UK and in the third quarter of 2011. Only the small subset of data which still fits these criteria will be returned over the storage network to the database servers for processing. Note that results are returned to calling session’s private memory and not the data cache.

In addition to column projection and row filtration Exadata is also able to offload some join filtration to the storage cells. Since Oracle Database 11g the possibility has existed to use Bloom Filters to reduce the amount of work undertaken when joining sets of data together (see page 21 of white paper). Exadata is able to use this method to examine two sets of data being returned from different tables which need to be joined on the database server – in a normal join only intersecting data from each data set will be retained. A Bloom filter is a probabilistic algorithm which can quickly allow for a large number of the non-intersecting rows to be discarded. By offloading this work to the storage cells Exadata can further reduce the amount of data being sent back to the database processes. Oracle claims that a “10x reduction in data sent to DB servers is common” (see page 24 of this slide deck).

[ A possible misconception about Bloom filters is that they can only be employed by Exadata when joining two tables – however this is incorrect and Exadata can apply multiple Bloom filters to a table scan.]

An important fact to understand about Smart Scan is that the full scans of segments still take place at the storage level. The reduction in workload (and consequential performance increase) comes from the ability to avoid sending data over the storage network to the database servers and then process it. Exadata does have another feature, “Storage Indexes”, aimed at avoiding the need to read from disk at all – as well as the Smart Flash Cache which attempts to keep useful data blocks on flash storage, but the basic Smart Scan operation does not avoid the need to perform reads.

Scope of Smart Scan

Whilst Smart Scan is a feature which Oracle is alone in being able to introduce (as the owner of the database code), the actual impact of Smart Scan is limited by the database workload. Although Oracle claims that a “10x reduction in data sent to DB servers is common”, this only applies to data returned via Smart Scan and so is restricted to full table scans and fast full index scans.

Full table scans and fast full index scans are common in data warehousing environments but almost non-existent in a typical OLTP environment.

A further restriction on Smart Scan is that only the standard “heap” table in Oracle is a suitable candidate; objects such as clusters and index organised tables cannot be Smart Scanned.

The decision on whether to use Smart Scan is taken by Oracle at run time, i.e. when a query is executed. This means that it is impossible to predict with accuracy whether a smart scan will take place when writing code and designing applications. Traditionally developers and performance architects would use the explain plan method within Oracle to find out what type of execution plan the optimizer would use for a given piece of SQL. This allowed for SQL tuning and hinting to take place as well as the redesigning of a schema’s indexing strategy.

Since Smart Scan is a run-time decision, the execution plan is only able to offer an indication of whether a query might be able to use Smart Scan or not.

Advertisements

8 Responses to Smart Scan

  1. Vasant says:

    “Full table scans and fast full index scans are common in data warehousing environments but almost non-existent in a typical OLTP environment.”

    I disagree on this statement. If I do a select * from table_name it will do a FTS.

    – Vasant

    • flashdba says:

      Maybe you are misunderstanding my point. OLTP workloads do not usually perform “select * from table_name” and they rarely run SQL which results in full object scans.

  2. Carlos says:

    Who will disturb a OLTP enviroment with a SELECT * FROM SOME_TABLE ?, …DEAR LORD….SPECIALLY IF A TABLE HAS A MILLION OF ROWS…..

  3. Samer says:

    Dear flashdba
    thank you for this article, but and wondering if you please let me know (scientifically) how can RDBMS inform the Storage system of which data to transfer!! storage systems are blindly saving “blocks” and RDBMS only knows which data to retrieve. and if “joint filtering” is going to happen on a storage level; wouldn’t that be equivalent to retrieval of data to the compute level to do the RDBMS magic .. cause it will travel anyway through the infini-band (even for the storage compute)… I am searching for the logic of the smartscan since 2 days and couldn’t find any scientific justification… and I believe that that only thing that will help in Exadata is the HCC.. which is the same even across other compute/storage systems.

    by the way, the sql statement you have placed as an example is supposed to use the index after the the execution plan identified based on the CBO; so I dealy, rows to retrieve are identified in the Parse stage precisely…. and if FTS are encountered and storage is involved in such processing, then the storage compute will not be able to handle the rest of the host requests if multiple FTS are encountered as the requests will throttle there.

  4. Wes says:

    Hi Flashdba,

    We have two 1/8 racks of X5-2 Exadata. I’ve been the dba in charge of everything (because I’m the only Oracle guy at my company). My organization develops everything in house for apps, and right now we are doing a mainframe VSAM conversion to Oracle. Some of the procs and packages created thus far are so slow that many times we cancel the queries…. We’ve been performance tuning on a case by case basis, but the real problem, I feel, is that the OLTP database and the batch component are all in one database. When I think OLTP I think of quick DML operations. When it comes to the batch cycle, the queries are complex.

    From reading above, Exadata doesn’t create and use bloom filtering correct? I have never used Bloom filters before as I have always been around OLTP my whole career. However, with these batch jobs in the same database they set the environment to a DSS situation instead of OLTP. I enabled write back mode, and this made some of the batch jobs run faster, but had no effect on other batch jobs. Things that run on the mainframe for 20 minutes take 3 to 4 hours to complete on Exadata. I’m wondering if using Bloom filtering would help further, and if I need to create them? Or is there something I am not considering?

    • flashdba says:

      Sorry for the delayed reply, I was enjoying a relaxing vacation when you posted this question. However, I’m afraid I’m not the best person to ask – I haven’t touched Exadata for over four years now. I do hope though that you haven’t been sold a solution which doesn’t fit your real workload requirements.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s