Developing Rules¶
Rules in SQLFluff are implemented as classes inheriting from BaseRule
.
SQLFluff crawls through the parse tree of a SQL file, calling the rule’s
_eval()
function for each segment in the tree. For many rules, this allows
the rule code to be really streamlined and only contain the logic for the rule
itself, with all the other mechanics abstracted away.
Running Tests¶
The majority of the test cases for most bundled rules are “yaml test cases”, i.e. test cases defined in yaml files. You can find those yaml fixtures on github. While this provides a very simple way to write tests, it can be occasionally tedious to run specific tests.
Within either a tox environment or virtualenv (as described in the contributing.md file), you can either run all of the rule yaml tests with:
pytest test/rules/yaml_test_cases_test.py -vv
…or to just run tests for a specific rule, there are two options for a syntax to select only those tests:
pytest -vv test/rules/yaml_test_cases_test.py --rule_id=RF01
pytest -vv test/rules/ -k RF01
The --rule_id
syntax relies on the name of the yaml file, and the -k
option simply searches for the content of the argument being in the name of the test.
The latter is slightly less to type and so is generally the most frequently used.
Traversal Options¶
recurse_into
¶
Some rules are a poor fit for the simple traversal pattern described above. Typical reasons include:
The rule only looks at a small portion of the file (e.g. the beginning or end).
The rule needs to traverse the parse tree in a non-standard way.
These rules can override BaseRule
’s recurse_into
field, setting it to
False
. For these rules False
, _eval()
is only called once, with
the root segment of the tree. This can be much more efficient, especially on
large files. For example, see rules LT13
and LT12
, which only look at
the beginning or end of the file, respectively.
_works_on_unparsable
¶
By default, SQLFluff calls _eval()
for all segments, even “unparsable”
segments, i.e. segments that didn’t match the parsing rules in the dialect.
This causes issues for some rules. If so, setting _works_on_unparsable
to False
tells SQLFluff not to call _eval()
for unparsable segments and
their descendants.
Base Rules¶
base_rules Module¶
Implements the base rule class.
Rules crawl through the trees returned by the parser and evaluate particular rules.
The intent is that it should be possible for the rules to be expressed as simply as possible, with as much of the complexity abstracted away.
The evaluation function should take enough arguments that it can evaluate the position of the given segment in relation to its neighbors, and that the segment which finally “triggers” the error, should be the one that would be corrected OR if the rule relates to something that is missing, then it should flag on the segment FOLLOWING, the place that the desired element is missing.
- class BaseRule(code: str, description: str, **kwargs: Any)¶
The base class for a rule.
- Parameters:
code (
str
) – The identifier for this rule, used in inclusion or exclusion.description (
str
) – A human readable description of what this rule does. It will be displayed when any violations are found.
- crawl(tree: BaseSegment, dialect: Dialect, fix: bool, templated_file: TemplatedFile | None, ignore_mask: IgnoreMask | None, fname: str | None, config: FluffConfig) Tuple[List[SQLLintError], Tuple[RawSegment, ...], List[LintFix], Dict[str, Any] | None] ¶
Run the rule on a given tree.
- Returns:
A tuple of (vs, raw_stack, fixes, memory)
- static discard_unsafe_fixes(lint_result: LintResult, templated_file: TemplatedFile | None) None ¶
Remove (discard) LintResult fixes if they are “unsafe”.
By removing its fixes, a LintResult will still be reported, but it will be treated as _unfixable_.
- static filter_meta(segments: Sequence[BaseSegment], keep_meta: bool = False) Tuple[BaseSegment, ...] ¶
Filter the segments to non-meta.
Or optionally the opposite if keep_meta is True.
- classmethod get_config_ref() str ¶
Return the config lookup ref for this rule.
If a name is defined, it’s the name - otherwise the code.
The name is a much more understandable reference and so makes config files more readable. For backward compatibility however we also support the rule code for those without names.
- classmethod get_parent_of(segment: BaseSegment, root_segment: BaseSegment) BaseSegment | None ¶
Return the segment immediately containing segment.
NB: This is recursive.
- Parameters:
segment – The segment to look for.
root_segment – Some known parent of the segment we’re looking for (although likely not the direct parent in question).
- static split_comma_separated_string(raw: str | List[str]) List[str] ¶
Converts comma separated string to List, stripping whitespace.
- class LintResult(anchor: BaseSegment | None = None, fixes: List[LintFix] | None = None, memory: Any | None = None, description: str | None = None, source: str | None = None)¶
A class to hold the results of a rule evaluation.
- Parameters:
anchor (
BaseSegment
, optional) – A segment which represents the position of the problem. NB: Each fix will also hold its own reference to position, so this position is mostly for alerting the user to where the problem is.fixes (
list
ofLintFix
, optional) – An array of any fixes which would correct this issue. If not present then it’s assumed that this issue will have to manually fixed.memory (
dict
, optional) – An object which stores any working memory for the rule. The memory returned in any LintResult will be passed as an input to the next segment to be crawled.description (
str
, optional) – A description of the problem identified as part of this result. This will override the description of the rule as what gets reported to the user with the problem if provided.source (
str
, optional) – A string identifier for what generated the result. Within larger libraries like reflow this can be useful for tracking where a result came from.
- class RuleGhost(code, name, description)¶
- code¶
Alias for field number 0
- description¶
Alias for field number 2
- name¶
Alias for field number 1
- class RuleLoggingAdapter(logger, extra=None)¶
A LoggingAdapter for rules which adds the code of the rule to it.
- process(msg: str, kwargs: Any) Tuple[str, Any] ¶
Add the code element to the logging message before emit.
- class RuleManifest(code: str, name: str, description: str, groups: Tuple[str, ...], aliases: Tuple[str, ...], rule_class: Type[BaseRule])¶
Element in the rule register.
- class RuleMetaclass(name: str, bases: List[BaseRule], class_dict: Dict[str, Any])¶
The metaclass for rules.
This metaclass provides provides auto-enrichment of the rule docstring so that examples, groups, aliases and names are added.
The reason we enrich the docstring is so that it can be picked up by autodoc and all be displayed in the sqlfluff docs.
- class RulePack(rules: List[BaseRule], reference_map: Dict[str, Set[str]])¶
A bundle of rules to be applied.
This contains a set of rules, post filtering but also contains the mapping required to interpret any noqa messages found in files.
The reason for this object is that rules are filtered and instantiated into this pack in the main process when running in multi-processing mode so that user defined rules can be used without reference issues.
- reference_map¶
A mapping of rule references to the codes they refer to, e.g. {“my_ref”: {“LT01”, “LT02”}}. The references (i.e. the keys) may be codes, groups, aliases or names. The values of the mapping are sets of rule codes only. This object acts as a lookup to be able to translate selectors (which may contain diverse references) into a consolidated list of rule codes. This mapping contains the full set of rules, rather than just the filtered set present in the rules attribute.
- Type:
dict
- codes() Iterator[str] ¶
Returns an iterator through the codes contained in the pack.
- class RuleSet(name: str, config_info: Dict[str, Dict[str, Any]])¶
Class to define a ruleset.
A rule set is instantiated on module load, but the references to each of its classes are instantiated at runtime. This means that configuration values can be passed to those rules live and be responsive to any changes in configuration from the path that the file is in.
Rules should be fetched using the
get_rulelist()
command which also handles any filtering (i.e. allowlisting and denylisting).New rules should be added to the instance of this class using the
register()
decorator. That decorator registers the class, but also performs basic type and name-convention checks.The code for the rule will be parsed from the name, the description from the docstring. The eval function is assumed that it will be overridden by the subclass, and the parent class raises an error on this function if not overridden.
- get_rulepack(config: FluffConfig) RulePack ¶
Use the config to return the appropriate rules.
We use the config both for allowlisting and denylisting, but also for configuring the rules given the given config.
- register(cls: Type[BaseRule], plugin: PluginSpec | None = None) Type[BaseRule] ¶
Decorate a class with this to add it to the ruleset.
@myruleset.register class Rule_LT01(BaseRule): "Description of rule." def eval(self, **kwargs): return LintResult()
We expect that rules are defined as classes with the name Rule_XXXX where XXXX is of the form LNNN, where L is a letter (literally L for linting by default) and N is a three digit number.
If this receives classes by any other name, then it will raise a
ValueError
.
- rule_reference_map() Dict[str, Set[str]] ¶
Generate a rule reference map for looking up rules.
Generate the master reference map. The priority order is: codes > names > groups > aliases (i.e. if there’s a collision between a name and an alias - we assume the alias is wrong)
Functional API¶
These newer modules provide a higher-level API for rules working with segments and slices. Rules that need to navigate or search the parse tree may benefit from using these. Eventually, the plan is for all rules to use these modules. As of December 30, 2021, 17+ rules use these modules.
The modules listed below are submodules of sqlfluff.utils.functional.
segments Module¶
Surrogate class for working with Segment collections.
- class Segments(*segments, templated_file=None)¶
Encapsulates a sequence of one or more BaseSegments.
The segments may or may not be contiguous in a parse tree. Provides useful operations on a sequence of segments to simplify rule creation.
- all(predicate: Callable[[BaseSegment], bool] | None = None) bool ¶
Do all the segments match?
- any(predicate: Callable[[BaseSegment], bool] | None = None) bool ¶
Do any of the segments match?
- apply(fn: Callable[[BaseSegment], Any]) List[Any] ¶
Apply function to every item.
- children(predicate: Callable[[BaseSegment], bool] | None = None) Segments ¶
Returns an object with children of the segments in this object.
- find(segment: BaseSegment | None) int ¶
Returns index if found, -1 if not found.
- first(predicate: Callable[[BaseSegment], bool] | None = None) Segments ¶
Returns the first segment (if any) that satisfies the predicates.
- get(index: int = 0, *, default: Any = None) BaseSegment | None ¶
Return specified item. Returns default if index out of range.
- iterate_segments(predicate: Callable[[BaseSegment], bool] | None = None) Iterable[Segments] ¶
Loop over each element as a fresh Segments.
- last(predicate: Callable[[BaseSegment], bool] | None = None) Segments ¶
Returns the last segment (if any) that satisfies the predicates.
- property raw_slices: RawFileSlices¶
Raw slices of the segments, sorted in source file order.
- recursive_crawl(*seg_type: str, recurse_into: bool = True) Segments ¶
Recursively crawl for segments of a given type.
- select(select_if: Callable[[BaseSegment], bool] | None = None, loop_while: Callable[[BaseSegment], bool] | None = None, start_seg: BaseSegment | None = None, stop_seg: BaseSegment | None = None) Segments ¶
Retrieve range/subset.
NOTE: Iterates the segments BETWEEN start_seg and stop_seg, i.e. those segments are not included in the loop.
segment_predicates Module¶
Defines commonly used segment predicates for rule writers.
For consistency, all the predicates in this module are implemented as functions returning functions. This avoids rule writers having to remember the distinction between normal functions and functions returning functions.
This is not necessarily a complete set of predicates covering all possible requirements. Rule authors can define their own predicates as needed, either as regular functions, lambda, etc.
- and_(*functions: Callable[[BaseSegment], bool]) Callable[[BaseSegment], bool] ¶
Returns a function that computes the functions and-ed together.
- get_type() Callable[[BaseSegment], str] ¶
Returns a function that gets segment type.
- is_code() Callable[[BaseSegment], bool] ¶
Returns a function that checks if segment is code.
- is_comment() Callable[[BaseSegment], bool] ¶
Returns a function that checks if segment is comment.
- is_keyword(*keyword_name: str) Callable[[BaseSegment], bool] ¶
Returns a function that determines if it’s a matching keyword.
- is_meta() Callable[[BaseSegment], bool] ¶
Returns a function that checks if segment is meta.
- is_raw() Callable[[BaseSegment], bool] ¶
Returns a function that checks if segment is raw.
- is_templated() Callable[[BaseSegment], bool] ¶
Returns a function that checks if segment is templated.
- is_type(*seg_type: str) Callable[[BaseSegment], bool] ¶
Returns a function that determines if segment is one of the types.
- is_whitespace() Callable[[BaseSegment], bool] ¶
Returns a function that checks if segment is whitespace.
- not_(fn: Callable[[BaseSegment], bool]) Callable[[BaseSegment], bool] ¶
Returns a function that computes: not fn().
- or_(*functions: Callable[[BaseSegment], bool]) Callable[[BaseSegment], bool] ¶
Returns a function that computes the functions or-ed together.
- raw_is(*raws: str) Callable[[BaseSegment], bool] ¶
Returns a function that determines if segment matches one of the raw inputs.
- raw_slices(segment: BaseSegment, templated_file: TemplatedFile | None) RawFileSlices ¶
Returns raw slices for a segment.
- raw_upper_is(*raws: str) Callable[[BaseSegment], bool] ¶
Returns a function that determines if segment matches one of the raw inputs.
- templated_slices(segment: BaseSegment, templated_file: TemplatedFile | None) TemplatedFileSlices ¶
Returns raw slices for a segment.
raw_file_slices Module¶
Surrogate class for working with RawFileSlice collections.
- class RawFileSlices(*raw_slices, templated_file=None)¶
Encapsulates a sequence of one or more RawFileSlice.
The slices may or may not be contiguous in a file. Provides useful operations on a sequence of slices to simplify rule creation.
- all(predicate: Callable[[RawFileSlice], bool] | None = None) bool ¶
Do all the raw slices match?
- any(predicate: Callable[[RawFileSlice], bool] | None = None) bool ¶
Do any of the raw slices match?
- select(select_if: Callable[[RawFileSlice], bool] | None = None, loop_while: Callable[[RawFileSlice], bool] | None = None, start_slice: RawFileSlice | None = None, stop_slice: RawFileSlice | None = None) RawFileSlices ¶
Retrieve range/subset.
NOTE: Iterates the slices BETWEEN start_slice and stop_slice, i.e. those slices are not included in the loop.
raw_file_slice_predicates Module¶
Defines commonly used raw file slice predicates for rule writers.
For consistency, all the predicates in this module are implemented as functions returning functions. This avoids rule writers having to remember the distinction between normal functions and functions returning functions.
This is not necessarily a complete set of predicates covering all possible requirements. Rule authors can define their own predicates as needed, either as regular functions, lambda, etc.
- is_slice_type(*slice_types: str) Callable[[RawFileSlice], bool] ¶
Returns a function that determines if segment is one of the types.