Ghidra Tip 0x07: Iterating over all strings in a Program

This article is based on the public release of Ghidra 11.2.

Ghidra provides an overview of strings within the graphical user interface, but there is no directly accessible API call within the FlatAPI to access it. However, it is possible to obtain them programmatically. The code in this blog comes from a script I wrote for Trellix.

To get the defined strings, one has to use the DefinedDataIterator, which contains the aptly named definedStrings method to get all defined strings. This returns a data iterator with Data instances. The Data object is useful as it contains more information than just the string. The method described in this blog is meant to return a deduplicated list of strings as an ArrayList. If your goal is to modify (some of the found) strings within the Program, keeping the Data object in a mapping with the string value will provide easy access to the location of the string combined with its value. Note that strings which aren’t identified by Ghidra aren’t present in the iterator.

The StringDataInstance class contains the also aptly named getStringDataInstance method. This function returns a StringDataInstance object for the given string, which contains the getStringValue method. Calling this method will return the string’s literal value.

DefinedDataIterator ddi = DefinedDataIterator.definedStrings(currentProgram);
for (Data d : ddi) {
    StringDataInstance sdi = StringDataInstance.getStringDataInstance(d);
    String s = sdi.getStringValue();
}

Additional checks can then be included to ensure the object isn’t null nor empty, and to ensure the length matches the provided minimum length. Additionally, the string at hand can be converted to lowercase if casing isn’t of importance. This makes searching the list easier later on. To easily deduplicate the strings are added to a set. A set cannot contain duplicates by default, but allows the addition of already existing items within the set. It simply ignores additions it already contains. Before returning, the set can be converted into an ArrayList, which can then be sorted alphabetically.

List<String> output = new ArrayList<>(strings);
output.sort(String::compareToIgnoreCase);
return output;

The entire code is given below and can also be found here.

* @param minimumLength the minimum length of the string to be included in the
*                      matches
* @return all strings that are at least as long as the minimum length
*/
private List<String> getStringsFromCurrentProgram(int minimumLength, boolean isCaseSensitive) {
    // Create a set to store all strings in
    Set<String> strings = new HashSet<>();
 
    // Get a data iterator
    DefinedDataIterator ddi = DefinedDataIterator.definedStrings(currentProgram);
    // Iterate over the data iterator
    for (Data d : ddi) {
        // Get an instance of the currently selected data
        StringDataInstance sdi = StringDataInstance.getStringDataInstance(d);
        // Get the string value of said string
        String s = sdi.getStringValue();
 
        // If the string is not null nor empty
        if (s != null && s.isEmpty() == false) {
            /*
             * If the length of the string is equal to, or larger than the predefined
             * minimum length
             */
            if (s.length() >= minimumLength) {
                /*
                 * If there is no casing check, convert the string to lower case, for easier
                 * checking later on
                 */
                if (isCaseSensitive == false) {
                    s = s.toLowerCase();
                }
                // Add the string to the set
                strings.add(s);
            }
        }
    }
 
    /*
     * Sets do not contain duplicate items by their nature, but cannot always be
     * accessed in the same way as a list can (i.e. when sorting, given the
     * hashset's nature).
     * 
     * The unique strings from the list, are stored in a newly created array list,
     * which maintains the order. Next, they are sorted alphabetically, ignoring the
     * casing during sorting to avoid an order where A-Za-z would occur, but rather
     * any casing of A through Z.
     */
    List<String> output = new ArrayList<>(strings);
    output.sort(String::compareToIgnoreCase);
    return output;
}

To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], or DM me on Twitter @Libranalysis.