An object set represents an unordered collection of objects of a single type. You can use the Functions APIs to filter object sets, perform Search Arounds to other object types based on defined link types, and compute aggregated values or retrieve the concrete objects. In addition to passing individual objects as inputs into a Function, you can search for object sets at any time using the object search APIs.
Filtering, ordering, and aggregations only work on properties that have the Searchable
render hint enabled in the Ontology app. These properties have been indexed for search. Learn how to enable the Searchable
render hint.
The Objects.search()
interface allows you to initiate a search for any of the object types imported into your project. In this example, the Function uses the given airportCode
to find all flights that departed from that airport. Then, it finds all the distinct destinations of those flights and returns them.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
import { Function } from "@foundry/functions-api"; import { Objects } from "@foundry/ontology-api"; export class FlightFunctions { @Function() public currentFlightDestinations(airportCode: string): Set<string> { const flightsFromAirport = Objects.search() .flights() .filter(flight => flight.departureAirportCode.exactMatch(airportCode)) .all(); const destinations = flightsFromAirport.map(flight => flight.arrivalAirportCode!); return new Set(destinations); } }
Object sets can also be created from a list of objects, list of object resource identifiers or an object set resource identifier by passing them as an argument to the searched object type. For example: Objects.search().flights([flight])
.
Once you have an object set of a given type, you can perform various operations on the set as documented below.
The .filter()
method on an object set allows you to filter the object set based on the searchable properties of the objects. The filter method takes a filter definition, which is based on the type of the property you are filtering on.
.exactMatch()
filter, which filters to objects with an exact match on that property value. This is useful to filter for exact matches on strings (as in the example above), or to filter on the primary key of an object (for example,.filter(object => object.primaryKey.exactMatch(PrimaryKey))
).
null
or undefined
, use the hasProperty()
method..exactMatch(...listVariable)
. If an empty array is passed in, the filter will be ignored..phrase()
splits the search query into tokens (usually individual words) and then filters values based on whether they contain all of the given tokens in order with no other tokens in between. Note that string values that are separated by underscores or periods will be treated as one token. For example, when searching for "banana", an object with the property value "banana_pudding" or "banana.pudding" will not be returned..phrasePrefix()
is almost identical to phrase()
, but the last token will also match tokens starting with that token. For example, a .phrasePrefix()
search for fresh banana
would match the property value fresh banana_pudding
, but not the property value banana_pudding fresh
. A .phrasePrefix()
search for pudding
would not match the property value banana_pudding
..prefixOnLastToken()
splits the search query into tokens and then filters values based on whether they contain all of the given tokens, where the last token will also match tokens starting with that token. For example, big app
would match big apples
as well as apples from the big tree
, though it would not match apples from the biggest tree
..matchAnyToken()
, .fuzzyMatchAnyToken()
split the search query into tokens and then filter values based on whether they contain any of the given tokens. The fuzzy
version allows approximate values to match..matchAllTokens()
, .fuzzyMatchAllTokens()
split the search query into tokens and then filter values based on whether they contain all of the given tokens. The fuzzy
version allows approximate values to match.
Fuzziness
parameter imported from @foundry/functions-api
.Fuzziness
options can be found in the ElasticSearch documentation ↗. More information can also be found below..range()
filters.
.lt()
, .lte()
, .gt()
and gte()
methods for performing less than / less than or equal to / greater than / greater than or equal to (respectively) comparisons..isTrue()
and .isFalse()
filters..withinDistanceOf()
, .withinPolygon()
, and .withinBoundingBox()
filters..withinBoundingBox()
, .intersectsBoundingBox()
, .doesNotIntersectBoundingBox()
, .withinPolygon()
, .intersectsPolygon()
, and doesNotIntersectPolygon()
filters..isPresent()
method..contains()
filter, which filters to objects whose array property values contain any of the given values.You can compose filters together using the Filters
API exported from @foundry/functions-api
. The available methods are:
and()
filters the object set to objects that pass all the given filtersor()
filters the object set to objects that pass any of the given filtersnot()
negates the given filterIn the example below, we can filter an object set of flights by flight destination using and()
:
Copied!1 2 3 4 5 6 7 8 9
import { Filters } from "@foundry/functions-api"; Objects.search() .flights() .filter(flight => Filters.or( Filters.and(flight.destination.exactMatch("SFO"), flight.passengerCount.gt(100)), Filters.and(flight.destination.exactMatch("LAX"), flight.passengerCount.gt(300)), ))
The above code would filter to flights that either arrived at SFO with more than 100 passengers or arrived at LAX with more than 300 passengers.
The .filter()
method on an object set does not use the operators &&
or ||
. To apply multiple filters, you must use one of the methods on Filters
listed above (or call .filter()
multiple times to achieve an and
condition).
Specifying the optional fuzziness
parameter can provide more fine-tuned control over Fuzzy matching behavior. If you do not specify fuzziness, then an automatic edit distance is allowed based on the length of the token you are searching for. You will need to import Fuzziness
from @foundry/functions-api
in order to specify edit distance.
Objects.search().employee().filter(employee => employee.firstName.fuzzyMatchAnyToken("Michael", { fuzziness: Fuzziness.LEVENSHTEIN_TWO })).all();
The code above returns any employees with a first name within two edits of the provided search term (with Levenshtein distance of two). In this example, that would include Michael
, Micheal
, Mikhael
, Michel
, Mikhail
, Mihail
(but not Miguel
for example). If you have more certainty in the accuracy of your search term, you can search with a smaller edit distance (with different Levenshtein distances), refining your search results a little more.
Objects.search().employee().filter(employee => employee.fullName.fuzzyMatchAllTokens("Michael Smith", { fuzziness: Fuzziness.LEVENSHTEIN_ONE })).all();
You can also use fuzzy filters on a multiple token phrase. The code above would match on employees whose full name contains both Michael
and Smith
with up to one edit in each token - for example, Mikhael Smitt
(that is, each with a Levenshtein distance of one each). The ordering of tokens is not taken into account with a fuzzyMatchAllTokens
or fuzzyMatchesAllTokens
filter.
All filters on array-based properties can use the methods available to their underlying type. For example, string array properties can be filtered based on any methods available to string properties, though the naming of the methods may differ slightly. Filtering on array properties requires a single match among the array elements in order for that object to be returned.
Object sets loaded into memory .all()
or .allAsync()
are allowed to have a maximum of 3 search arounds. If more than 3 search arounds are used, an error is thrown. When performing a search around from object set A to object set B in Object Storage V2, the resulting object set B cannot have more than 10 million object instances, or an error will be thrown. For Object Storage V1, the limit is 100,000 object instances.
Based on the object type of your object set, Search Around methods are generated to enable traversing links based on the object type of your object set. In the below example, we filter to an object set of Flights based on the departure code, then Search Around to the passengers on those flights. This results in an object set of Passengers, which can be further filtered or searched around on.
Copied!1 2 3 4
const passengersDepartingFromAirport = Objects.search() .flights() .filter(flight => flight.departureAirportCode.exactMatch(airportCode)) .searchAroundPassengers();
Search Around methods will only be generated for link types that are imported into your project. Refer to the tutorial for details on how to import link types.
Note that for performance reasons, the number of Search Around operations you can conduct in a single search is currently limited to 3. If you attempt to run a search with more than three levels of Search Around depth, the search will fail at runtime.
KNN is only supported on object types indexed into OSv2. The k value is limited to the range 0 < K <= 100. Also, the search vector must be the same size as the one used for indexing and has a 2048 dimension limit. An error will be thrown if any of these limits are exceeded.
Object types with embedding properties will be available for KNN searches. These searches will return the k value objects that have an embedding property nearest to the provided embedding parameter. The following example returns the most similar movies to a provided movie script. Generating embeddings at query time will require the set up of a modeling live deployment.
Make sure that the functions.json
has the enableVectorProperties
set to true
.
Copied!1 2 3 4 5 6 7 8 9 10
import { Objects } from "@foundry/ontology-api"; const kValue: number = 2; // Vector can be generated from FML Live or come from an existing object const vector: Double[] = [0.7, 0.1, 0.3]; const movies: Movies[] = Objects.search() .movies() .nearestNeighbors(obj => obj.vectorProperty.near(vector, { kValue })) .orderByRelevance() .take(kValue);
For an example of a full semantic search workflow, review the semantic search workflow guide.
Object sets of the same object type can be combined in various ways using set operations:
.union()
creates a new object set composed of objects present in any of the given object sets..intersect()
creates a new object set composed of objects present in all of the given object sets..subtract()
removes any objects present in the given object sets.The .all()
and .allAsync()
methods retrieve all objects in the object set. Note that if you attempt to load too many objects at once, your Function will fail to execute. Currently, the maximum number of objects you can load is 100,000. However, loading more than 10,000 objects may also cause your Function execution to time out. Learn more about time and space limits in Functions.
You can use the .allAsync()
method to retrieve a Promise that resolves to all the objects in the object set. This can be useful for loading data from multiple object sets in parallel.
Instead of retrieving all objects, you can load a limited number by applying an ordering clause to your object set, then specifying a specific number of objects to load. To do this, you can use the following methods:
.orderBy()
specifies a searchable property to order by, and allows you to specify an ordering direction. Only properties whose types can be ordered (numbers, dates, and strings) are available for selection in this method. You can call .orderBy()
multiple times to sort by multiple properties..orderByRelevance()
specifies that the objects should be returned in order of how well they match the provided filters, with the most relevant listed first. Relevance for a query term against a property value on a given object is a complex determination that takes into account the frequency of the term appearing in the property value, the frequency of the term appearing across all objects, and more. Relevance is less appropriate when performing only .exactMatch()
filters or filtering on non-string properties. Note that only one of .orderBy()
and .orderByRelevance()
may be used in a single search..take()
and .takeAsync()
enable you to retrieve a specified number of objects from the set. These methods are only available after you have specified an ordering.For example, the following code would retrieve the ten employees with the earliest start dates:
Copied!1 2 3 4
Objects.search() .employees() .orderBy(e => e.startDate.asc()) .take(10)
As another example, imagine an object type claims
which contains text of accident claims for an insurance company. We'd like to find a specific claim involving a red car and a deer. Without the .orderByRelevance()
line, any results containing any of the words red
, car
, collision
, with
, or deer
may have been returned in the top 10 results. With the .orderByRelevance()
line, the first 10 results will be the claims that contain the most search terms, so that the most relevant claims will appear first.
Copied!1 2 3 4 5
const results = Objects.search() .claims() .filter(doc => doc.text.matchAnyToken("red car collision with deer")) .orderByRelevance() .take(10)
Aggregations returned from the Objects API are limited to 10,000 total buckets. An error will be thrown if this limit is exceeded.
When bucketing using .topValues()
, results will be approximate if the data has more than 1,000 distinct values. The list of top values may not be accurate in that case.
In many cases, it's unnecessary to load all of the objects in your object set. Instead, you can simply load a bucketed aggregation of values to conduct further analysis.
To begin computing an aggregation, call the .groupBy()
method on an object set. This allows you to specify bucketing on one of the searchable properties of the object type in the object set. For example, this code groups employees by their start date:
Copied!1 2 3
Objects.search() .employees() .groupBy(e => e.startDate.byDays())
When specifying which property to bucket by, you will have to provide additional information about how the bucketing should be done depending on the property type:
boolean
properties, the only option is .topValues()
. This returns two buckets, one for true
and one for false
..topValues()
: For rapid response times and properties with a smaller cardinality. This buckets by the top 1,000 values for the string property. This limit is to ensure that the returned aggregation is not excessively large..exactValues()
: For more exact aggregations and the possibility to consider up to 10,000 buckets for high cardinality properties. The amount of considered buckets can be specified via .exactValues({"maxBuckets": numBuckets})
where numBuckets
must be an integer value between 0 and 10,000. The response time for this method can take longer, as more results have to be considered.Integer
, Long
, Float
, Double
), the two bucketing options are:
.byRanges()
allows you to specify the exact ranges that should be used. For example, you could use .byRanges({ min: 0, max: 50 }, { min: 50, max: 100 })
to bucket objects into the two ranges of [0, 50] and [50, 100] that you specify here. The min
of the range is inclusive and the max
is exclusive. You may omit either min
or max
to represent a bucket containing values from -∞ to max
or min
to ∞ respectively..byFixedWidth()
specifies the width of each bucket. For example, you could use .byFixedWidth(50)
to bucket objects into ranges that each have a width of 50.LocalDate
properties, various convenience methods are provided for easy bucketing:
.byYear()
.byQuarter()
.byMonth()
.byWeek()
.byDays()
buckets values into days. You may pass in a number of days to use for bucket widths.Timestamp
properties, the same bucketing options apply as for LocalDate
, as well as the following additions:
.byHours()
buckets values by hours. You may pass in a number of hours to use for bucket widths..byMinutes()
buckets values by minutes. You may pass in a number of minutes to use for bucket widths..bySeconds()
buckets values by seconds. You may pass in a number of seconds to use for bucket widths.Array
properties, the bucketing options are determined by the type of the elements in the array. In particular, you get the same bucketing methods for Array<PropertyType>
as you would get for the PropertyType
(for example, Array<boolean>
gets the same bucketing methods as boolean
).
Array<string>
called employeeSet
consisting of Alice and Bob who have respectively worked in ["US", "UK"]
and ["US"]
. Then employeeSet.groupBy(e => e.pastCountries.exactValue()).count()
will return { "US": 2, "UK": 1 }
.After grouping by one property, you may optionally call the .segmentBy()
method to perform further bucketing. This allows you to compute a three-dimensional aggregation bucketed by two searchable properties. For example, you could group employees by their start date as well as their role as follows:
Copied!1 2 3 4
Objects.search() .employees() .groupBy(e => e.startDate.byDays()) .segmentBy(e => e.role.topValues())
After grouping your object set, you can call various aggregation methods to compute aggregation metrics on each bucket. Methods that require a property only accept properties marked searchable. Possible aggregation methods are:
.count()
simply returns the number of objects in each bucket.average()
returns the average number for the given numeric, timestamp, date property.max()
returns the maximum value for the given numeric, timestamp, date property.min()
returns the minimum value for the given numeric, timestamp, date property.sum()
returns the sum of values for the given numeric property.cardinality()
returns the approximate number of distinct values for the given propertyCalling one of these methods returns either a TwoDimensionalAggregation
or ThreeDimensionalAggregation
. A ThreeDimensionalAggregation
is returned if you called .segmentBy()
before calling one of the final aggregation methods.
Learn more about the structure of these aggregation types, including valid bucketing types.
Note that the resulting aggregations are wrapped in a Promise
, as computing the aggregation requires loading data from a remote service. You can use the async/await ↗ syntax to unwrap the Promise
result.
Below is a full example of loading an aggregation and returning it as a result.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13
import { Function, ThreeDimensionalAggregation } from "@foundry/functions-api"; import { Objects } from "@foundry/ontology-api"; export class AggregationFunctions { @Function() public async employeesByRoleAndOffice(): Promise<ThreeDimensionalAggregation<string, string>> { return Objects.search() .employee() .groupBy(e => e.title.topValues()) .segmentBy(e => e.office.topValues()) .count(); } }
Below is a full example of aggregating without groupBy statements:
Copied!1 2 3 4 5 6 7 8 9 10
import { Function } from "@foundry/functions-api"; import { Objects } from "@foundry/ontology-api"; export class AggregationFunctions { @Function() public async employeesStats(): Promise<Double> { // Count of all employees, default to zero if count() returns undefined return Objects.search().employee().count() ?? 0; } }
You can also perform other aggregations without groupBy by replacing the appropriate line in the code example above, such as:
Objects.search().employee().count();
(as seen in example above)Objects.search().employee().average(e => e.tenure);
Objects.search().employee().max(e => e.tenure);
Objects.search().employee().min(e => e.tenure);
Objects.search().employee().sum(e => e.salary);
Objects.search().employee().cardinality(e => e.office);
For an example of manipulating aggregation results in memory, try the guide for creating custom aggregations.