Audience Note

This document is written for consumption by anyone who has written a BBEdit language module, either codeless or compiled. It documents the changes in the language module API as well as information that is essential for developing language modules that make the most of the improvements in BBEdit 11.0 and later.

This document supplements the information provided in the Codeless Language Module Reference as well as the information in the "Writing Language Modules" document included as part of the BBEdit SDK.

Useful Debugging Tip

  • Use this one weird trick to debug your compiled language module in BBEdit with minimal effort:

    In your language module project's "Run" scheme (in Xcode), go to the "Info" tab, and from the Executable popup, choose "Ask on Launch".

    Then, go to the Arguments tab, and add an argument as follows:

    --debugLanguageModule $(BUILT_PRODUCTS_DIR)/$(WRAPPER_NAME)

    Now choose "Run". Xcode will ask you to choose the application to run. Choose BBEdit. (Note that it must not be running.) Xcode will then launch BBEdit with the --debugLanguageModule argument as provided above, which tells it to load your language module from its build location. You can then debug it in place.

Run Kinds and Spell Checking

  • Language modules support two new property list keys: BBLMSpellableRunKinds and BBLMNonSpellableRunKinds. Between them these keys can eliminate the need for modules to implement the kBBLMCanSpellCheckRunMessage message, and the appropriate use of these keys adds considerable flexibility.

    Each of these keys is an array, listing the run kinds that can (or cannot) be inspected by the spell checker. The test is exclusionary: BBLMNonSpellableRunKinds is checked first, and if a run is found there, it is not spell checked. If a run is not found in BBLMNonSpellableRunKinds, then BBLMSpellableRunKinds is checked. Either or both of these arrays may contain wildcards.

    A common case is to have BBLMNonSpellableRunKinds contain an appropriate list of runs, and BBLMSpellableRunKinds contains a single entry: "". Thus, specific run kinds listed in BBLMNonSpellableRunKinds are not spell checked, and all* other run kinds are. This is a useful and typical construction for text-oriented language modules, such as TeX and Markdown, thus:

    <key>BBLMNonSpellableRunKinds</key>
    <array>
        <string>com.barebones.bblm.TeX.verbatim</string>
        <string>com.barebones.bblm.TeX.inline-verbatim</string>
        <string>com.barebones.bblm.TeX.command</string>
        <string>com.barebones.bblm.TeX.math-string</string>
        <string>com.barebones.bblm.TeX.delimiter-start</string>
        <string>com.barebones.bblm.TeX.delimiter-stop</string>
        <string>com.barebones.bblm.TeX.param-command</string>
        <string>com.barebones.bblm.TeX.param-math-string</string>
        <string>com.barebones.bblm.TeX.param-string-command</string>
    </array>
    
    <key>BBLMSpellableRunKinds</key>
    <array>
        <string>*</string>
    </array>
    

    Typically, a programming or scripting language will want to allow spell checking in comments, but not elsewhere. For example, in the Python module:

    <key>BBLMSpellableRunKinds</key>
    <array>
        <string>com.barebones.bblm.line-comment</string>
        <string>com.barebones.bblm.block-comment</string>
    </array>
    
    <key>BBLMNonSpellableRunKinds</key>
    <array>
        <string>com.barebones.bblm.code</string>
        <string>com.barebones.bblm.double-string</string>
    </array>
    

    Either of these arrays may be absent or empty. Note that if a match is not found (and this includes the case in which BBLMSpellableRunKinds and/or BBLMNonSpellableRunKinds is absent or empty), then BBEdit will still call the language module. If the module does not implement kBBLMCanSpellCheckRunMessage, then the run is not checked.

  • Made a change to the language module support for BBLMSpellableRunKinds and BBLMNonSpellableRunKinds, namely: if at least one of these is present, the application will not call the language module with kBBLMCanSpellCheckRunMessage; and so the keys should be complete as needed. If either key is absent or fails to match the run kind, the behavior is unspecified (but the application will always try to behave predictably).

Run Kinds and Completion

  • Language modules support two new property list keys: BBLMCompletableRunKinds and BBLMNonCompletableRunKinds. Between them these keys eliminate the need for modules to implement the kBBLMFilterRunForTextCompletion message, and the appropriate use of these keys adds considerable flexibility.

    Each of these keys is an array, listing the run kinds that can (or cannot) be tokenized for autocompletion. The test is exclusionary: BBLMNonCompletableRunKinds is checked first, and if a run is found there, it is not tokenized. If a run is not found in BBLMNonCompletableRunKinds, then BBLMCompletableRunKinds is checked. Either or both of these arrays may contain wildcards.

    Either of these arrays may be absent or empty. Note that if at least one of these is present, the application will not call the language module with kBBLMFilterRunForTextCompletion; and so the keys should be complete as needed. If either key is absent or fails to match the run kind, the behavior is unspecified (but the application will always try to behave predictably).

Keywords, Run Kind Patterns, and More

  • The old kBBLMMatchKeywordMessage message is no longer sent to compiled language modules; only kBBLMMatchKeywordWithCFStringMessage is used, with a CFStringRef parameter.

  • Language modules can now specify arbitrary sets of keywords, each grouped by the run kind that should be used to color them. The BBLMKeywords key is an array of dictionaries. In each dictionary, there is a RunKind key that specifies the run kind to be used (one of the factory-supplied run kinds, or one defined in your language module's BBLMRunColors array), and either a Keywords key whose value is an array of keywords to be colored using that run kind, or a KeywordFileName key which refers to a file in the language module's bundle (for compiled modules).

    So, for example, the BBLMKeywords list looks like this for the built-in PHP language module:

    <key>BBLMKeywords</key>
    <array>
        <dict>
            <key>RunKind</key> <string>com.barebones.bblm.keyword</string>
            <key>KeywordFileName</key> <string>PHP Keywords.txt</string>
        </dict>
    
        <dict>
            <key>RunKind</key> <string>com.barebones.bblm.predefined-symbol</string>
            <key>KeywordFileName</key> <string>PHP Predefined Names.txt</string>
        </dict>
    </array>
    

    Alternatively, you could write something like this:

    <key>BBLMKeywords</key>
    <array>
        <dict>
            <key>RunKind</key> <string>com.barebones.bblm.keyword</string>
            <key>Keywords</key>
            <array>
                <string>abstract</string>
                <string>and</string>
                <string>array</string>
                <string>as</string>
                <string>break</string>
                <string>case</string>
                <string>catch</string>
                <string>cfunction</string>
                <string>class</string>
                <string>clone</string>
                <!-- and so on ... -->
            </array>
        </dict>
    
        <dict>
            <key>RunKind</key> <string>com.barebones.bblm.predefined-symbol</string>
            <key>KeywordFileName</key> <string>PHP Predefined Names.txt</string>
        </dict>
    </array>
    

    The run kinds you can use are not limited to the built-in ones; you can define your own run kinds and color mappings using a BBLMRunColors key, as previously described. You must also add a BBLMRunNames key which maps those run kinds to human-readable names, so that users can adjust the color settings.

    Note that BBLMKeywords supersedes the four old keys, which are still supported but should no longer be used:

    • BBLMKeywordList
    • BBLMKeywordFileName
    • BBLMPredefinedNameList
    • BBLMPredefinedNameFileName
  • kBBLMMatchKeywordWithCFStringMessage and kBBLMMatchPredefinedNameMessage are no longer sent to language modules, and BBLMSupportsCFStringKeywordLookups and BBLMSupportsPredefinedNameLookups are no longer used in module plists. Instead, there's a new key, BBLMSupportsWordLookup, which triggers the sending of a new message: kBBLMRunKindForWordMessage. This allows arbitrary mapping at runtime of words to run kinds, which in turn provides additional flexibility for coloring.

    Static listing of keyword-to-run-kind mapping in the module plist is still desirable (because it's faster), but for situations where the test must be done at runtime based on certain string transformations, implementing kBBLMRunKindForWordMessage support is an appropriate solution.

    The parameters to this message are (input) the potential keyword, and (output) the run kind that should be used to color the word. (If the word is not known, return nil.)

  • Language modules may now use an (optional) key: BBLMKeywordPatterns. This key contains an array of dictionaries, each with two key/value pairs. The first key, RunKind, contains the name of the run (in the module's name space, or one of the factory-defined run kinds). The second key, Pattern, contains a Grep pattern which is used to match the keyword. For example:

    <key>BBLMKeywordPatterns</key>
    <array>
        <dict>
            <key>RunKind</key>
            <string>com.example.bblm.fo</string>
            <key>Pattern</key>
            <string>fo.*</string>
        </dict>
    
        <dict>
            <key>RunKind</key>
            <string>com.example.bblm.fa</string>
            <key>Pattern</key>
            <string>fa.*</string>
        </dict>
    
        <dict>
            <key>RunKind</key>
            <string>com.example.bblm.fl</string>
            <key>Pattern</key>
            <string>fl.*</string>
        </dict>
    </array>
    

    If the module has no static BBLMKeywords entry, or if the word being examined fails to match an entry in the BBLMKeywords entry, then BBEdit will attempt to match the keyword against one of the patterns. If a match is found, the appropriate run kind is generated for coloring.

  • Codeless language modules now support a Number Pattern key in the Language Features property. The Number Pattern key may be omitted; if so, BBEdit will apply a default pattern which matches integers, floating point numbers, and hexadecimal numbers prefixed with 0x.

    Here is the default pattern, in a representation suitable for inclusion in codeless language modules. For readability it's formatted as an SGML CDATA section and uses the (?x:...) pattern modifier for extended syntax, which allows comments and whitespace.

    <![CDATA[(?x: (?# this just turns on extended syntax, which allows whitespace and comments)
    (?<![\d\w.]) (?# must not be preceded by a digit or word char or period)
    (?: (?# non-capturing group for alternation)
    (?# version 1: hex notation like 0x0123456789abcdef)
    (?:0x[[:xdigit:]]+) (?# the number written in hex form)
    |
    (?: (?# version 2: all other numbers, including whole numbers, decimals and exponentials)
    [-+]? (?# optional plus or minus sign is included as part of the number)
    (?: (?# non-capturing group for alternation)
    \d+\.\d+ (?# version 2a: digits followed by a decimal followed by digits)
    |
    \d+ (?# version 2b: just digits)
    )
    (?: (?# optional exponent notation)
    [eE][-+]? (?# with optional pos/neg)
    \d+ (?# numeric portion of the exponent)
    )?
    )
    )
    (?=\b) (?# required word boundary after number. Here a decimal is fine.)
    )]]>
    
  • Codeless language modules support a new key in the Language Features dictionary: Keyword Pattern. This can be used to specify runs of text that are to be colored using the Keywords color, based on a Grep pattern. The intention is to support languages with multi-word "keywords" which contain word-break characters or white space; so the pattern you use should be written accordingly. A pattern that matches across a line boundary will probably produce unexpected results, so we recommend using the non-greedy quantifiers when possible, or character classes which don't include line breaks.

  • If a language module supplies a BBLMRunNameUIOrdering key, the run kinds in that array are used in the specified order to map names for the preferences UI. If no BBLMRunNameUIOrdering key is supplied, the keys in the BBLMRunNames array are sorted alphabetically for presentation in the UI.

Function Menu Badging

  • Beginning with BBEdit 12.5, plug-in language modules may specify custom badge information for use in the function menu.

    Here's how it works:

    The BBLMFunctionKinds enumeration in BBLMInterface.h provides a pre-defined list of function kinds, and the range between kBBLMFirstUserFunctionKind and kBBLMLastUserFunctionKind is available for use by language modules.

    When you call bblmAddFunctionToList() or bblmUpdateFunctionEntry(), set the fKind field of the function information to an value from the range of built-in function kinds (kBBLMFunctionMark through kBBLMLastUsedFunctionKind - 1), or use a value in the range of kBBLMFirstUserFunctionKind through kBBLMFirstUserFunctionKind.

    Note that the range of user function kinds corresponds roughly to the printable ASCII range. This is intentional, because the next thing you'll do is add a section to your language module's language property list. Here is an example for Java:

    <key>BBLMFunctionItemKinds</key>
    <dict>
        <key>P</key>
        <dict>
            <key>typeString</key>
                <string>com.barebones.bblm.Java.package-decl</string>
            <key>displayName</key>
                <string>package declaration</string>
            <key>labelBadgeShape</key>
                <string>circle</string>
            <key>labelCharacter</key>
                <string>p</string>
            <key>labelColorName</key>
                <string>CodeSenseLightRed</string>
        </dict>
    </dict>
    

    Each top-level key in the BBLMFunctionItemKinds dictionary corresponds to the character value that you used for the function information's fKind field. Thus, making it a printable ASCII character is useful for various reasons. The key is required to be a single character.

    For each function kind, the values are as follows:

    typeString: (required) a reverse-domain description of the function type. (required) The form is similar to that used for custom run kinds that you generate: should begin with your plug-in's bundle identifier.

    displayName: (required) a brief human-readable description of the function type.

    labelBadgeShape: (optional) describes the shape of the badge that appears in the function menu. Allowed values are default, square, circle, triangle, and roundRect. If this key is absent, default is used.

    labelCharacter: (optional) tells BBEdit what character to use in the badge. If this is absent, BBEdit will use the character value of the item kind's key (in the example above, this would be P).

    labelColorName: (optional) tells BBEdit what background color to use for the badge. You may use any CSS3 color name; the following built-in colors are also provided:

    `CodeSenseLightBlue`
    `CodeSenseLightRed`
    `CodeSenseLightGreen`
    `CodeSenseLightPurple`
    `MarkerBadgeColor`
    `CodeSenseOrange`
    `BBEditDarkPurple`
    

    If this key is absent, BBEdit will use CodeSenseLightBlue.

    Note: Use the built-in function kinds whenever possible. For example, if your language has the notion of an object class, use kBBLMFunctionClassDeclaration, kBBLMFunctionClassInterface, or kBBLMFunctionClassImplementation as appropriate, rather than creating your own badge.

    Also, do not attempt to override the built-in mappings.

Generating and Storing Internal Data

Beginning with BBEdit 13.0, compiled language modules now have the ability to generate and use their own document-specific data. (Unless you're writing a compiled language module, you can skip this note.)

This can be for any suitable purpose; for example, if a hypothetical C-family language module wanted to generate an abstract syntax tree for the document using clang, it could do so.

BBEdit does not inspect or use any data created by the language module, nor does it inspect it nor make any assumptions about what's in it. The only rule is that it will be treated as an NSObject and passed through the API boundary as such, but the language module can instantiate it as any NSObject subclass (including one defined by the module itself) and assume that it will be of that type.

The main BBLMParamBlock structure gains the following top-level fields:

  • fDocumentParseData: the module-generated data object for this document

  • fOutDocumentParseDataIsNew: if the module creates a new data object for this document, it should set fDocumentParseData to the new object value, and set fDocumentParseDataIsNew to true.

  • fDocumentIdentifier: a unique identifier for the document. The language module can use this to keep track of data for different documents, for the lifetime of the application

  • fDocumentLocation: if not nil, provides the location of the document's backing file on disk. Note: you cannot assume that the document data on disk is consistent with what's in memory. You should always (and continue) to rely on the data provided by fText/fTextLength as authoritative.

There are four new messages relating to the management and lifetime of parse data:

  • kBBLMInitParseDataMessage: When this is called, the language module may allocate any data specific to this document. Note that doing so is not required; you could certainly wait until you receive a kBBLMRecalculateParseDataMessage to do so.

  • kBBLMDisposeParseDataMessage: When this is called, the language module should deallocate any data contained in fDocumentParseData, in the case that it is not intrinsically reference-counted. (Read below for more on this.)

  • kBBLMRecalculateParseDataMessage: When this is called, the language module may calculate from scratch and return any appropriate parse data for the document. fDocumentParseData will be the result of a previous kBBLMInitParseDataMessage. If you opted not to do anything previously, then fDocumentParseData will be nil on entry; you should create it as needed, return it in fDocumentParseData, and set fOutDocumentParseDataIsNew to true.

  • kBBLMUpdateParseDataMessage: When this is called, the parameter block's fUpdateParseDataParams member contains information about the location and nature of the change. You can use this information to incrementally recalculate your parse data; or you can recalculate it all from scratch as though you had received a kBBLMRecalculateParseDataMessage. If you decide to recalculate from scratch and create a new parse data object, put it in fDocumentParseData and set fOutDocumentParseDataIsNew to true.

Important Notes About Object Lifetimes

Under no circumstances should you attempt to assume ownership of the NSObject subclass that you return in fDocumentParseData, even if you are changing its value and setting fOutDocumentParseDataIsNew. If you return a new parse data object, BBEdit will release the old one for you.

Considerations for non-refcounted data

In some cases, your parse data might be a C++ class instance, or even an allocated C structure. In order to pass it back and forth across the API boundary, you must wrap it in an NSValue as a pointer value. In that case, you must also take some care to manage the object lifetime yourself, since BBEdit can't otherwise know what needs to be done with it. Thus, given some hypothetical ParseTree C++ class, you would write something like:

myParseTree = new ParseTree;
/* ...do some parsing... */
params.fDocumentParseData = [NSValue valueWithPointer: myParseTree];
params.fDocumentParseDataIsNew = true;

You would use this pattern in response to kBBLMInitParseDataMessage, but also if you calculated a new parse tree in response to kBBLMRecalculateParseDataMessage or kBBLMUpdateParseDataMessage.

One additional wrinkle, though: when recalculating or updating, if you make a new C++ object, you need to dispose of the old one, but not release the NSValue instance itself. This is because BBEdit doesn't know what's wrapped up in the NSValue, or how it should be managed.

So in the case where you're changing the object during update or recalculate, you'd have code like this:

ParseTree   *oldParseTree = NULL;
ParseTree   *newParseTree = NULL;

oldParseTree = static_cast<ParseTree*>(params.fDocumentParseData.pointerValue);
delete oldParseTree;    //  clean up the old data

myParseTree = new ParseTree;
/* ...do some parsing... */
params.fDocumentParseData = [NSValue valueWithPointer: myParseTree];
params.fDocumentParseDataIsNew = true;

When receiving a kBBLMDisposeParseDataMessage, you'll have to do the same:

ParseTree   *oldParseTree = NULL;

oldParseTree = static_cast<ParseTree*>(params.fDocumentParseData.pointerValue);
delete oldParseTree;    //  clean up the old data

Note that you do not ever release params.fDocumentParseData! BBEdit will manage it for you once you've created it. (If you do release it, you'll rapidly find out what a bad idea that was.)


Back to Technical Notes