.. _realworldref: SQL in the Wild =============== SQL has been around for a long time, as a language for communicating with databases, like a communication protocol. More recently with the rise of *data* as a business function, or a domain in it's own right SQL has also become an invaluable tool for defining the *structure* of data and analysis - not just as a one off but as a form of `infrastructure as code`_. As *analytics* transitions from a profession of people doing one-offs, and moves to building stable and reusable pieces of analytics, more and more principles from software engineering are moving in the analytics space. One of the best articulations of this is written in the `viewpoint section of the docs for the open-source tool dbt`_. Two of the principles mentioned in that article are `quality assurance`_ and `modularity`_. Quality assurance ----------------- The primary aim of `sqlfluff` as a project is in service of that first aim of `quality assurance`_. With larger and larger teams maintaining large bodies of SQL code, it becomes more and more important that the code is not just *valid* but also easily *comprehensible* by other users of the same codebase. One way to ensure readability is to enforce a `consistent style`_, and the tools used to do this are called `linters`_. Some famous `linters`_ which are well known in the software community are `flake8`_ and `jslint`_ (the former is used to lint the `sqlfluff` project itself). **Sqlfluff** aims to fill this space for SQL. Modularity ---------- SQL itself doesn't lend itself well to `modularity`_, so to introduce some flexibility and reusability it is often `templated`_. Typically this is done in the wild in one of the following ways: 1. Using the limited inbuilt templating abilities of a programming language directly. For example in python this would be using the `format string syntax`_: .. code-block:: python "SELECT {foo} FROM {tbl}".format(foo="bar", tbl="mytable") Which would evaluate to: .. code-block:: sql SELECT bar FROM mytable 2. Using a dedicated templating library such as `jinja2`_. This allows a lot more flexibility and more powerful expressions and macros. See the :ref:`templateconfig` section for more detail on how this works. - Often there are tools like `dbt`_ or `apache airflow`_ which allow `templated`_ sql to be used directly, and they will implement a library like `jinja2`_ under the hood themselves. All of these templating tools are great for `modularity`_ but they also mean that the SQL files themselves are no longer valid SQL code, because they now contain these configured *placeholder* values, intended to improve modularity. Sqlfluff allows limited templating using both of the methods outlined above, to allow you to still lint those SQL files as part of your CI/CD pipeline (which is great 🙌), rather than waiting until you're in production (which is bad 🤦, and maybe too late). During the CI/CD pipeline (or any time that we need to handle `templated`_ code), Sqlfluff needs additional info in order to interpret your templates as valid SQL code. You do so by providing dummy parameters in Sqlfluff configuration files. When substituted into the template, these values should evaluate to valid SQL (so Sqlfluff can check its style, formatting, and correctness), but the values don't need to match actual values used in production. This means that you can use *much simpler* dummy values than what you would really use. The recommendation is to use *the simplest* possible dummy value that still allows your code to evaluate to valid SQL so that the configuration values can be a streamlined as possible. .. _`infrastructure as code`: https://en.wikipedia.org/wiki/Infrastructure_as_code .. _`viewpoint section of the docs for the open-source tool dbt`: https://docs.getdbt.com/docs/viewpoint .. _`quality assurance`: https://docs.getdbt.com/docs/viewpoint#section-quality-assurance .. _`modularity`: https://docs.getdbt.com/docs/viewpoint#section-modularity .. _`consistent style`: https://www.smashingmagazine.com/2012/10/why-coding-style-matters/ .. _`linters`: https://en.wikipedia.org/wiki/Lint_(software) .. _`flake8`: http://flake8.pycqa.org/ .. _`jslint`: https://www.jslint.com/ .. _`templated`: https://en.wikipedia.org/wiki/Template_processor .. _`format string syntax`: https://docs.python.org/3/library/string.html#formatstrings .. _`jinja2`: https://jinja.palletsprojects.com/ .. _`apache airflow`: https://airflow.apache.org .. _`dbt`: https://getdbt.com