Ingress egress and regress

4/3/2023

One simple option is to throw an exception as soon as out-of-order data is found: Any data that makes it past ingress must be ordered. Trill’s paradigm for dealing with data that arrives out of order is to attend to it at ingress. Hence, the input data may look a little more like this: However, in this strange setting called the “real world,” data does not always arrive at the query processor in the desired order, especially in streaming as opposed to offline settings. If data resides in a database and is sorted and indexed for historical queries, there is no problem with the sorting assumption. It is incredibly efficient in memory usage with high throughput, but many of the tricks it uses internally rely on data being presorted, such as the following: Trill’s query engine is an in-order data processor. While the section above demonstrates how to assign time to events in Trill, there is one more wrinkle that must be dealt with and must be done at ingress. Perhaps ironically and not surprisingly, we will cover this topic briefly at the end of this post. The final line of the example above shows how to get data out of Trill and back into an observable. Want to calculate the average value of a sequence of integers, but want to get a partial result of said average every 100 values? Assign the first 100 values a time of “0”, the next 100 values a time of “1”, and so on, and Trill will do all the rest for you: This example uses Trill’s native temporal capabilities to produce partial results from what someone would normally think of as “atemporal” data. Instead of assigning to each event a “time” that corresponds to something like what we would see on a clock, we assign a sequence number that bumps every 100 input rows: This last example is a clever way to produce progressive results from an arbitrary data stream. The following example assigns to each event the current system time at the point of ingress: What if type TPayload doesn’t have a natural time component? In that case, Trill itself can assign time values to events depending on user needs. What if instead of a single time field in TPayload there are two temporal fields, indicating start and end time? There is an overload for that as well, with lambdas for identifying the start and end fields: This method is the logical equivalent of wrapping each element in the observable with a StreamEvent.CreateStart call beforehand, but without allocating the StreamEvent objects. The method name has changed to ToTemporalStreamable because the input data has a natural time component called “Time”. The only difference is that instead of assigning time in the observable, we are assigning it in the ToStreamable call. Let’s look at one example, given just an observable of data values:īreaking this example down, note first that the result of the method call is the same IStreamable type as with the StreamEvent example. How you do it depends on if your data already has a time component, or otherwise if you want Trill to assign time to your data. If you don’t want to go through the extra step of creating StreamEvent objects, there are a few ways to bring data directly into Trill while also telling it how to reason about time. If you have an observable of these StreamEvent objects, getting data into Trill is straightforward using the method ToStreamable: Lastly, if the data really corresponds to just point events, there is a method for that as well, though inside Trill a point is simply stored as an interval of minimal length 1 tick: This method creates an event called a “start edge”, which is in effect an interval with no set end time: For instance, this method creates an event with a set interval: StreamEvent objects can be created using static factory methods on the StreamEvent class. The most explicit way to do it is using StreamEvent objects that attach to every event a lifespan. There are several ways to tell Trill what the time component of the data is. All Trill needs is for it to be a long-valued field and Trill is ready to go. No matter where your “time” concept comes from, Trill will work for it. Incremental time, or just a value that increments for each event seen.Processing time, or the time when an event arrives at the given processing node running Trill.System time, or the time when an event arrives at the server or event queue.Event time, or a time that is naturally associated with each event, such as the time a sensor reading is taken.So, what could qualify as “time” to Trill? It could be just about anything: However, Trill doesn’t assign any semantics to that notion of time other than it is some value that is always generally increasing. Trill works with data that has some intrinsic notion of time. As noted in our previous posts about basic queries and joins, Trill is a temporal query processor. Congratulations! You’ve made it to the next installment of our overview of Trill, Microsoft’s open source streaming data engine.

0 Comments

Ingress egress and regress

Leave a Reply.

Author

Archives

Categories