To Proc SQL or Not?

  • Home
  • To Proc SQL or Not?
nadmin March 6, 2024 0 Comments

SAS has a powerful alternative – SQL procedure – to traditional data step processing. A large group of SAS programmers love to use Proc SQL, especially those with databases or computer science background. I started using Proc SQL about 10 years ago and grew fond of it immediately. Over time I realized its limitations and now use it mostly for special scenarios where the data step processing facility falls short.  

Via Proc SQL, one can usually achieve same results as with use of combination of procedures and data step in much lesser code (see this example). One can sort, join, summarize and print the results within a few lines of proc SQL code. Due to its advantage, it is often used to merge tables without sorting, and to create of bunch of macro variables easily (see examples here).

Use of Proc SQL becomes indispensable for some special types of data joins.  These include many to many join for Cartesian product of data (see example here) and conditional/fuzzy join (see example here). Sometimes proc SQL can be faster than comparable code written via data step and other SAS procedures. E.g. when querying dictionary tables.  

Even with all its advantages, Proc SQL is not as popular among all SAS users. SQL code can easily get very hard to understand. While squeezing the lines of code, it becomes less modular, time consuming to read/write and hard to debug. Proc SQL as not as powerful and versatile for data manipulation as are other procs/data step limiting its application. While updating program due to changing needs, often, I had replaced my proc SQL code with other procs/data step, as it could not be tweaked for needed modifications. Finally, reusing the code across coworkers becomes an issue, since not all SAS programmers are comfortable with Proc SQL.

Choice of use of proc SQL is impacted by several factors, such as prior experience with SQL, expected reuse of your code by your coworkers, performance needs and complexity of requirements. But there is no doubt, it is a handy tool for special cases where data step processing falls short.