API: Object sets

An object set represents an unordered collection of objects of a single type. You can use the Functions APIs to filter object sets, perform Search Arounds to other object types based on defined link types, and compute aggregated values or retrieve the concrete objects. In addition to passing individual objects as inputs into a Function, you can search for object sets at any time using the object search APIs.

Filtering, ordering, and aggregations only work on properties that have the Searchable render hint enabled in the Ontology app. These properties have been indexed for search. Learn how to enable the Searchable render hint.

The Objects.search() interface allows you to initiate a search for any of the object types imported into your project. In this example, the Function uses the given airportCode to find all flights that departed from that airport. Then, it finds all the distinct destinations of those flights and returns them.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import { Function } from "@foundry/functions-api"; import { Objects } from "@foundry/ontology-api"; export class FlightFunctions { @Function() public currentFlightDestinations(airportCode: string): Set<string> { const flightsFromAirport = Objects.search() .flights() .filter(flight => flight.departureAirportCode.exactMatch(airportCode)) .all(); const destinations = flightsFromAirport.map(flight => flight.arrivalAirportCode!); return new Set(destinations); } }

Object sets can also be created from a list of objects, list of object resource identifiers or an object set resource identifier by passing them as an argument to the searched object type. For example: Objects.search().flights([flight]).

Once you have an object set of a given type, you can perform various operations on the set as documented below.

Filtering

The .filter() method on an object set allows you to filter the object set based on the searchable properties of the objects. The filter method takes a filter definition, which is based on the type of the property you are filtering on.

  • All property types support the .exactMatch() filter, which filters to objects with an exact match on that property value. This is useful to filter for exact matches on strings (as in the example above), or to filter on the primary key of an object (for example,.filter(object => object.primaryKey.exactMatch(PrimaryKey))).
    • To check whether a property is null or undefined, use the hasProperty() method.
    • To pass multiple values, use the spread operator .exactMatch(...listVariable). If an empty array is passed in, the filter will be ignored.
  • String properties support several keyword filters. See the documentation on each method in Code Assist for full details.
    • .phrase() splits the search query into tokens (usually individual words) and then filters values based on whether they contain all of the given tokens in order with no other tokens in between. Note that string values that are separated by underscores or periods will be treated as one token. For example, when searching for "banana", an object with the property value "banana_pudding" or "banana.pudding" will not be returned.
    • .phrasePrefix() is almost identical to phrase(), but the last token will also match tokens starting with that token. For example, a .phrasePrefix() search for fresh banana would match the property value fresh banana_pudding, but not the property value banana_pudding fresh. A .phrasePrefix() search for pudding would not match the property value banana_pudding.
    • .prefixOnLastToken() splits the search query into tokens and then filters values based on whether they contain all of the given tokens, where the last token will also match tokens starting with that token. For example, fresh app would match fresh apples as well as apples from the tree.
    • .matchAnyToken(), .fuzzyMatchAnyToken() split the search query into tokens and then filter values based on whether they contain any of the given tokens. The fuzzy version allows approximate values to match.
    • .matchAllTokens(), .fuzzyMatchAllTokens() split the search query into tokens and then filter values based on whether they contain all of the given tokens. The fuzzy version allows approximate values to match.
      • Fuzzy filters can take an optional Fuzziness parameter imported from @foundry/functions-api.
      • Explanations of the available Fuzziness options can be found in the ElasticSearch documentation ↗. More information can also be found below.
  • Numbers, dates, and timestamp properties support .range() filters.
    • Range filters have a set of .lt(), .lte(), .gt() and gte() methods for performing less than / less than or equal to / greater than / greater than or equal to (respectively) comparisons.
  • Boolean properties support .isTrue() and .isFalse() filters.
  • Geohash properties support .withinDistanceOf(), .withinPolygon(), and .withinBoundingBox() filters.
  • GeoShape properties support .withinBoundingBox(), .intersectsBoundingBox(), .doesNotIntersectBoundingBox(), .withinPolygon(), .intersectsPolygon(), and doesNotIntersectPolygon() filters.
  • Link filters can be used to filter objects that do or do not have any linked objects of a specific type using the .isPresent() method.
  • Array properties support the .contains() filter, which filters to objects whose array property values contain any of the given values.

Combining filters

You can compose filters together using the Filters API exported from @foundry/functions-api. The available methods are:

  • and() filters the object set to objects that pass all the given filters
  • or() filters the object set to objects that pass any of the given filters
  • not() negates the given filter

In the example below, we can filter an object set of flights by flight destination using and():

Copied!
1 2 3 4 5 6 7 8 9 import { Filters } from "@foundry/functions-api"; Objects.search() .flights() .filter(flight => Filters.or( Filters.and(flight.destination.exactMatch("SFO"), flight.passengerCount.gt(100)), Filters.and(flight.destination.exactMatch("LAX"), flight.passengerCount.gt(300)), ))

The above code would filter to flights that either arrived at SFO with more than 100 passengers or arrived at LAX with more than 300 passengers.

Warning

The .filter() method on an object set does not use the operators && or ||. To apply multiple filters, you must use one of the methods on Filters listed above (or call .filter() multiple times to achieve an and condition).

Specifying the optional fuzziness parameter can provide more fine-tuned control over Fuzzy matching behavior. If you do not specify fuzziness, then an automatic edit distance is allowed based on the length of the token you are searching for. You will need to import Fuzziness from @foundry/functions-api in order to specify edit distance.

Fuzzy match any token

Objects.search().employee().filter(employee => employee.firstName.fuzzyMatchAnyToken("Michael", { fuzziness: Fuzziness.LEVENSHTEIN_TWO })).all();

The code above returns any employees with a first name within two edits of the provided search term (with Levenshtein distance of two). In this example, that would include Michael, Micheal, Mikhael, Michel, Mikhail, Mihail (but not Miguel for example). If you have more certainty in the accuracy of your search term, you can search with a smaller edit distance (with different Levenshtein distances), refining your search results a little more.

Fuzzy match all tokens

Objects.search().employee().filter(employee => employee.fullName.fuzzyMatchAllTokens("Michael Smith", { fuzziness: Fuzziness.LEVENSHTEIN_ONE })).all();

You can also use fuzzy filters on a multiple token phrase. The code above would match on employees whose full name contains both Michael and Smith with up to one edit in each token - for example, Mikhael Smitt (that is, each with a Levenshtein distance of one each). The ordering of tokens is not taken into account with a fuzzyMatchAllTokens or fuzzyMatchesAllTokens filter.

Fuzzy match on string array properties

All filters on array-based properties can use the methods available to their underlying type. For example, string array properties can be filtered based on any methods available to string properties, though the naming of the methods may differ slightly. Filtering on array properties requires a single match among the array elements in order for that object to be returned.

Search Around

Search around limits

Object sets loaded into memory .all() or .allAsync() are allowed to have a maximum of 3 search arounds. If more than 3 search arounds are used, an error is thrown. When performing a search around from object set A to object set B in Object Storage V2, the resulting object set B cannot have more than 10 million object instances, or an error will be thrown. For Object Storage V1, the limit is 100,000 object instances.

Based on the object type of your object set, Search Around methods are generated to enable traversing links based on the object type of your object set. In the below example, we filter to an object set of Flights based on the departure code, then Search Around to the passengers on those flights. This results in an object set of Passengers, which can be further filtered or searched around on.

Copied!
1 2 3 4 const passengersDepartingFromAirport = Objects.search() .flights() .filter(flight => flight.departureAirportCode.exactMatch(airportCode)) .searchAroundPassengers();

Search Around methods will only be generated for link types that are imported into your project. Refer to the tutorial for details on how to import link types.

Note that for performance reasons, the number of Search Around operations you can conduct in a single search is currently limited to 3. If you attempt to run a search with more than three levels of Search Around depth, the search will fail at runtime.

K-nearest neighbors (KNN)

KNN Limits

KNN is only supported on object types indexed into OSv2. The k value is limited to the range 0 < K <= 100. Also, the search vector must be the same size as the one used for indexing and has a 2048 dimension limit. An error will be thrown if any of these limits are exceeded.

Object types with embedding properties will be available for KNN searches. These searches will return the k value objects that have an embedding property nearest to the provided embedding parameter. The following example returns the most similar movies to a provided movie script. Generating embeddings at query time will require the set up of a modeling live deployment.

Make sure that the functions.json has the enableVectorProperties set to true.

Copied!
1 2 3 4 5 6 7 8 9 10 import { Objects } from "@foundry/ontology-api"; const kValue: number = 2; // Vector can be generated from FML Live or come from an existing object const vector: Double[] = [0.7, 0.1, 0.3]; const movies: Movies[] = Objects.search() .movies() .nearestNeighbors(obj => obj.vectorProperty.near(vector, { kValue })) .orderByRelevance() .take(kValue);

For an example of a full semantic search workflow, review the semantic search workflow guide.

Set operations

Object sets of the same object type can be combined in various ways using set operations:

  • .union() creates a new object set composed of objects present in any of the given object sets.
  • .intersect() creates a new object set composed of objects present in all of the given object sets.
  • .subtract() removes any objects present in the given object sets.

Retrieving all objects

The .all() and .allAsync() methods retrieve all objects in the object set. Note that if you attempt to load too many objects at once, your Function will fail to execute. Currently, the maximum number of objects you can load is 100,000. However, loading more than 10,000 objects may also cause your Function execution to time out. Learn more about time and space limits in Functions.

You can use the .allAsync() method to retrieve a Promise that resolves to all the objects in the object set. This can be useful for loading data from multiple object sets in parallel.

Ordering and limiting

Instead of retrieving all objects, you can load a limited number by applying an ordering clause to your object set, then specifying a specific number of objects to load. To do this, you can use the following methods:

  • .orderBy() specifies a searchable property to order by, and allows you to specify an ordering direction. Only properties whose types can be ordered (numbers, dates, and strings) are available for selection in this method. You can call .orderBy() multiple times to sort by multiple properties.
  • .orderByRelevance() specifies that the objects should be returned in order of how well they match the provided filters, with the most relevant listed first. Relevance for a query term against a property value on a given object is a complex determination that takes into account the frequency of the term appearing in the property value, the frequency of the term appearing across all objects, and more. Relevance is less appropriate when performing only .exactMatch() filters or filtering on non-string properties. Note that only one of .orderBy() and .orderByRelevance() may be used in a single search.
  • .take() and .takeAsync() enable you to retrieve a specified number of objects from the set. These methods are only available after you have specified an ordering.

For example, the following code would retrieve the ten employees with the earliest start dates:

Copied!
1 2 3 4 Objects.search() .employees() .orderBy(e => e.startDate.asc()) .take(10)

As another example, imagine an object type claims which contains text of accident claims for an insurance company. We'd like to find a specific claim involving a red car and a deer. Without the .orderByRelevance() line, any results containing any of the words red, car, collision, with, or deer may have been returned in the top 10 results. With the .orderByRelevance() line, the first 10 results will be the claims that contain the most search terms, so that the most relevant claims will appear first.

Copied!
1 2 3 4 5 const results = Objects.search() .claims() .filter(doc => doc.text.matchAnyToken("red car collision with deer")) .orderByRelevance() .take(10)

Computing aggregations

Aggregation limits

Aggregations returned from the Objects API are limited to 10,000 total buckets. An error will be thrown if this limit is exceeded.

When bucketing using .topValues(), results will be approximate if the data has more than 1,000 distinct values. The list of top values may not be accurate in that case.

Grouping objects by properties

In many cases, it's unnecessary to load all of the objects in your object set. Instead, you can simply load a bucketed aggregation of values to conduct further analysis.

To begin computing an aggregation, call the .groupBy() method on an object set. This allows you to specify bucketing on one of the searchable properties of the object type in the object set. For example, this code groups employees by their start date:

Copied!
1 2 3 Objects.search() .employees() .groupBy(e => e.startDate.byDays())

When specifying which property to bucket by, you will have to provide additional information about how the bucketing should be done depending on the property type:

  • For boolean properties, the only option is .topValues(). This returns two buckets, one for true and one for false.
  • For string properties, there are two options:
    • .topValues(): For rapid response times and properties with a smaller cardinality. This buckets by the top 1,000 values for the string property. This limit is to ensure that the returned aggregation is not excessively large.
    • .exactValues(): For more exact aggregations and the possibility to consider up to 10,000 buckets for high cardinality properties. The amount of considered buckets can be specified via .exactValues({"maxBuckets": numBuckets}) where numBuckets must be an integer value between 0 and 10,000. The response time for this method can take longer, as more results have to be considered.
  • For numeric properties (e.g. Integer, Long, Float, Double), the two bucketing options are:
    • .byRanges() allows you to specify the exact ranges that should be used. For example, you could use .byRanges({ min: 0, max: 50 }, { min: 50, max: 100 }) to bucket objects into the two ranges of [0, 50] and [50, 100] that you specify here. The min of the range is inclusive and the max is exclusive. You may omit either min or max to represent a bucket containing values from -∞ to max or min to ∞ respectively.
    • .byFixedWidth() specifies the width of each bucket. For example, you could use .byFixedWidth(50) to bucket objects into ranges that each have a width of 50.
  • For LocalDate properties, various convenience methods are provided for easy bucketing:
    • .byYear()
    • .byQuarter()
    • .byMonth()
    • .byWeek()
    • .byDays() buckets values into days. You may pass in a number of days to use for bucket widths.
  • For Timestamp properties, the same bucketing options apply as for LocalDate, as well as the following additions:
    • .byHours() buckets values by hours. You may pass in a number of hours to use for bucket widths.
    • .byMinutes() buckets values by minutes. You may pass in a number of minutes to use for bucket widths.
    • .bySeconds() buckets values by seconds. You may pass in a number of seconds to use for bucket widths.
  • For Array properties, the bucketing options are determined by the type of the elements in the array. In particular, you get the same bucketing methods for Array<PropertyType> as you would get for the PropertyType (for example, Array<boolean> gets the same bucketing methods as boolean).
    • For example, if you have an Array<string> called employeeSet consisting of Alice and Bob who have respectively worked in ["US", "UK"] and ["US"]. Then employeeSet.groupBy(e => e.pastCountries.exactValue()).count() will return { "US": 2, "UK": 1 }.

After grouping by one property, you may optionally call the .segmentBy() method to perform further bucketing. This allows you to compute a three-dimensional aggregation bucketed by two searchable properties. For example, you could group employees by their start date as well as their role as follows:

Copied!
1 2 3 4 Objects.search() .employees() .groupBy(e => e.startDate.byDays()) .segmentBy(e => e.role.topValues())

Choosing an aggregation metric

After grouping your object set, you can call various aggregation methods to compute aggregation metrics on each bucket. Methods that require a property only accept properties marked searchable. Possible aggregation methods are:

  • .count() simply returns the number of objects in each bucket
  • .average() returns the average number for the given numeric, timestamp, date property
  • .max() returns the maximum value for the given numeric, timestamp, date property
  • .min() returns the minimum value for the given numeric, timestamp, date property
  • .sum() returns the sum of values for the given numeric property
  • .cardinality() returns the approximate number of distinct values for the given property

Calling one of these methods returns either a TwoDimensionalAggregation or ThreeDimensionalAggregation. A ThreeDimensionalAggregation is returned if you called .segmentBy() before calling one of the final aggregation methods.

Learn more about the structure of these aggregation types, including valid bucketing types.

Note that the resulting aggregations are wrapped in a Promise, as computing the aggregation requires loading data from a remote service. You can use the async/await ↗ syntax to unwrap the Promise result.

Below is a full example of loading an aggregation and returning it as a result.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 import { Function, ThreeDimensionalAggregation } from "@foundry/functions-api"; import { Objects } from "@foundry/ontology-api"; export class AggregationFunctions { @Function() public async employeesByRoleAndOffice(): Promise<ThreeDimensionalAggregation<string, string>> { return Objects.search() .employee() .groupBy(e => e.title.topValues()) .segmentBy(e => e.office.topValues()) .count(); } }

Below is a full example of aggregating without groupBy statements:

Copied!
1 2 3 4 5 6 7 8 9 10 import { Function } from "@foundry/functions-api"; import { Objects } from "@foundry/ontology-api"; export class AggregationFunctions { @Function() public async employeesStats(): Promise<Double> { // Count of all employees, default to zero if count() returns undefined return Objects.search().employee().count() ?? 0; } }

You can also perform other aggregations without groupBy by replacing the appropriate line in the code example above, such as:

  • Count of all employees: Objects.search().employee().count(); (as seen in example above)
  • Average tenure of employees: Objects.search().employee().average(e => e.tenure);
  • Maximum tenure of employees: Objects.search().employee().max(e => e.tenure);
  • Minimum tenure of employees: Objects.search().employee().min(e => e.tenure);
  • Sum of all employee salaries: Objects.search().employee().sum(e => e.salary);
  • Number of offices: Objects.search().employee().cardinality(e => e.office);

For an example of manipulating aggregation results in memory, try the guide for creating custom aggregations.