Tutorials

Introspection Level Auto-Selection

Back in 2015, I was the only DataGrip team member working on database introspectors. Since then, several of my teammates have followed suit with their own introspectors for various database management systems (DBMSs). The Oracle introspector, however, remains solely my project.

The Oracle introspector is peculiar: it differs from introspectors of other relational database management systems (RDBMSs) in that it is very slow – sometimes unbearably so. The reason for this slowness is that queries to the Oracle catalog can take time.

Oracle catalogs

A catalog is a collection of tables and/or views, providing metadata about all objects from a database schema. Oracle has several such table and view collections:

  • A set of views with prefix ALL
  • A set of views with prefix DBA
  • A set of internal dynamic tables with the suffix $

The ALL catalog lists only those objects to which the current user has access. This would be the ideal approach, but the problem is that the permission system in Oracle is very complex, and it takes a long time to verify a user’s access for each object. For example, in my test database, a query from the ALL_objects view that retrieves names of objects from just one schema can take up to three seconds (and that’s just one query – the introspector performs a lot of them).

The DBA catalog is much faster, listing all objects without checking for access rights. However, retrieving the object names in the user’s schema from the DBA_objects view still takes a long time because this view joins a lot of tables and other views, and then translates numeric information into string representations. The introspector then translates those string representations back into numeric values.

The last catalog, $, is the fastest. It doesn’t do any unnecessary joining and doesn’t translate numerics to strings. So why not always use this catalog, or at least the DBA catalog for introspection? Unfortunately, in some cases, the user has no access to the fast catalogs and is stuck with the slowest option. Therefore, the introspector needs to be able to obtain information from any of them. At the beginning of the process, the introspector checks which catalogs are accessible and selects the fastest one available.

This raises the important question of why the Oracle RDBMS, which is one of the fastest systems, has such slow catalogs. The answer is that during regular database use (as opposed to development), applications don’t perform queries to catalogs, so the developers of Oracle probably had no motivation to optimize catalog views.

Introspector

Why does DataGrip perform so many queries on the catalog in the first place? After all, there are other tools on the market for working with databases, and they do not need to perform so many (very slow) catalog queries.

For DataGrip to intelligently highlight all errors and offer reasonable autocompletion, it must be able to correctly resolve all identifiers and their types. To do so, it needs to know everything about the data schema, and that is why the introspector tries to obtain a complete set of metadata. In many cases, without a complete set of metadata, it is not possible to check the code for correctness or to provide typing assistance. Other database tools don’t perform such deep code analysis, so they can get by with incomplete metadata.

Moreover, the complete set of metadata is required for generating different scripts.

Another reason why DataGrip performs so many catalog queries is that it holds a whole introspected model of a schema in memory (and also serializes it to disk). This approach allows a user to work on the model without having access to the database – even offline. All features in this case remain functional.

To reduce the number of queries performed and to speed up introspection, we use incremental introspection. If a schema has already been introspected, then upon the next introspection, the introspector performs the modified queries, which retrieve only those changes that have occurred since the last introspection of this schema.

But even with an incremental approach, introspection of very large schemas can take too long.

Level-by-level introspection

Several years ago, we introduced level-by-level introspection. For every schema, the user could specify at which level of detail the introspector should request metadata. 

These are the three levels:

LevelNameDescription
L1Names onlyThe fastest level with the minimum information.
At this level, names and internal identifiers of most objects are retrieved. For arguments of routines, the argument types are also fetched.
L2DetailsAt this level, almost all metadata is retrieved. The only metadata that is skipped is that which requires fetching and parsing of sources.
L3Details and SourcesThe most detailed level, but also the slowest.
At this level, all metadata and all sources are retrieved.

The lower the level, the less time the introspector needs, and the difference can be quite significant.

I’ve checked how long it takes to retrieve metadata on different levels. For example, I’ve selected the SYS schema, one medium-sized example schema, and one large example schema with many objects (4K+ tables, 7K+ views, 1K+ packages, 2K+ sequences, 4K+ synonyms, 4K+ indices, 2K+ check constraints, and 900+ triggers). The connection is configured to use the fastest catalog. The table below shows how long introspection took in each case:

Introspection duration, in seconds:

SchemaL1L2L3
SYS34550
Medium-sized schema23770
Big example schema6105140

We can see that specifying level 1 for schemas that are rarely used saves us a lot of time.

However, when we released this feature, we wanted to preserve the DataGrip behavior for our existing users, so we set level 3 as the default. As it turns out, practically nothing changed for our users. To reduce the introspection time, it was necessary to open the context menu for each data source and select level 1, and then find the desired schema and select level 2 or 3 for it. In addition, our UI didn’t help this at all. Only a small number of users found this feature and took advantage of it.

Another inconvenience was that, when working with a schema with a low level of introspection, sometimes there was a need to obtain all the metadata of an object. Our introspector can obtain the metadata of one specified object (without introspecting the entire schema), but every time this was done, the user would have to confirm that the data retrieval is necessary.

It was obvious that the problem still hadn’t been solved. 

The idea

When we were planning this release, we understood that the slow introspector was still the number one problem for those who use DataGrip to work with Oracle databases.

With this release, we wanted to provide users with the same experience as competing tools. We’ve been working under the assumption that such detailed information about all objects of the selected schemas is not necessary, and that users would prefer fewer details to long wait times.

The approach is actually rather simple: Select the introspection level automatically on the first introspection (if the level was not already specified explicitly, of course).

How did we implement it in version 2023.3

Database introspection currently works like this:

Just like before, the first step is for DataGrip to introspect all selected schemas at the first level. In other words, the names of all objects and most of their inner elements are retrieved from the database catalog.

Then, for every schema for which the level was not specified, DataGrip selects the most suitable level of introspection. This depends on the status of the schema in relation to the user, as well as on object statistics obtained during first-level introspection.

After that, for each schema for which level 2 or 3 is either manually selected or automatically specified , DataGrip performs introspection on level 2. In other words, all details except sources are retrieved.

Finally, sources are retrieved for each schema with level 3 selected or specified.

When a user opens an object for which details or source text were not retrieved, DataGrip detects that the opened object was not introspected enough, and if the connection to the database server is established, DataGrip retrieves the necessary details automatically, without additional user interaction. This ensures that the info is obtained quickly enough and without interrupting the user’s work. If the connection is not established yet, DataGrip shows a yellow stripe or a dialog suggesting that a connection be established to retrieve the necessary details.

We believe that this approach makes working with DataGrip more smooth and efficient.

An additional benefit is that when a user has selected a lot of schemas that are not actually used, the model is not cluttered with details of unused objects, thereby reducing the amount of space needed to save data sources and the time needed to load them upon IDE startup.

How the introspector select levels

We currently have only one strategy to select an introspection level. It is based on whether we’re working with the current or the system schema and how many objects are in this schema.

We operate under the assumption that a database developer works most frequently with the current schema, less often with other schemas, and very rarely with system ones. The current schema is the one the Oracle session is logged in to.

For each schema, the introspector counts objects (after the first-level introspection, the introspector knows all objects) and selects the introspection level using the following thresholds (where N is the number of objects):

SchemaCurrentNon-currentSystem
Level 3 is selectedwhen N ≤ 1000nevernever
Level 2 is selectedwhen N ≤ 3000when N ≤ 3000when N ≤ 100
Level 1 is selectedin all other casesin all other casesin all other cases

In future versions, we plan to add other strategies for automatically selecting introspection levels (for instance, selecting levels depending on the query execution time) and create a setting that allows you to switch between strategies.

How to use it all

For a new data source, the introspection level auto-selection is enabled by default, and nothing needs to be explicitly configured. DataGrip will select a level for each schema upon the first introspection. The selected level will be remembered and will not change during subsequent introspections. 

In existing data sources, the schemas are already introspected, and level auto-selection is not required. However, if many schemas are selected and you want to use new functionality, then just select the desired data source in the Database Explorer, and in the context menu, select Introspection Level | Auto select.

The selected level is also reset whenever the Forget All Cached Schemas action is carried out on a data source or Forget This Schema Cache is carried out on a schema. After you perform one of these actions, the introspection level will be selected again on the next introspection of the affected schemas.

If you find that the wrong introspection level has been selected for a given schema, you can always change it. To do so, select the schema in the Database Explorer, then go to the context menu, select Introspection Level, and choose the desired level.

Further improvements

The first possible improvement is localizing on-demand introspection.

Currently if DataGrip detects that a performed SQL statement could modify an object, then it starts an incremental introspection of all objects in all schemas. This over-insurance was put in place to avoid situations where the executed statement may modify other objects, but DataGrip may not notice this, and the model will not be completely up to date. 

Of course, incremental introspection is much faster than complete introspection, but in large databases, it can take significant time and interfere with the user’s work. One possible approach in this case is to re-introspect only the object that has been explicitly modified. We trust that our users are literate enough to click on the Refresh button if they see that the executed statement may modify something else. In any case, our competitors’ tools do not re-introspect everything after each executed statement, and this doesn’t cause problems for users.

Another possible improvement would be not introspecting server objects when they’re not needed. 

Upon every introspection, including the incremental ones, DataGrip checks server objects (tablespaces, data files, and users). This can take significant time. Moreover, in further versions of DataGrip, we’re going to support more server objects, resulting in even greater time consumption. In most cases, server objects are not needed during database development. 

In the future, it would make sense to add a feature that allows users to disable introspection for certain server objects altogether.

image description