The post Building a Content-Based Search Engine VI: Efficient Query Processing appeared first on deep ideas.
]]>This is part 6 in a series of tutorials in which we learn how to build a content-based search engine that retrieves multimedia objects based on their content rather than based on keywords, title or meta description.
The naive way to process a k-Nearest Neighbor query entails computing the distance between the query object and all database objects, resulting in a time complexity of $\mathcal{O}(|DB|)$ where $|DB|$ refers to the size of the database (cf. [BS13]). If the distance measure is costly to compute, which is usually the case when dealing with complex multimedia objects, this is infeasible. Therefore, we rely on methods that allow us to compute the k-Nearest Neighbors without having to compute the actual distance between the query and all database objects.
One way to speed up the query processing is by means of a lower bound to the distance function $\delta$, i.e. a function $\delta_{LB} : X \times X \rightarrow \mathbb{R}$ for which it holds that $\delta_{LB}(x, y) \leq \delta(x, y)$ for all $x, y \in X$. If we know that the k-Nearest Neighbors have a distance smaller than or equal to $\epsilon_{max} \in \mathbb{R}$, then we can exclude all objects $o$ for which it holds that $\delta_{LB}(q,o) > \epsilon_{max}$ without computing the actual distance $\delta(q, o)$, since this implies that $\delta(q,o) > \epsilon_{max}$.
For most distance functions, lower bounds can be found which are significantly more efficient to compute than the actual distance function, while at the same time allowing us to rule out a large proportion of the database as potential search results.
The Multi-Step kNN Algorithm, proposed by Seidl et al. in [SK98], utilizes a lower bound for efficient kNN search by iteratively updating the pruning distance $\epsilon_{max}$ while scanning the database. As shown in [SK98], this algorithm is optimal with respect to the number of performed computations of the utilized distance function $\delta$. Assuming that we have specified multiple lower bounds $\delta_{LB_1}, …,\delta_{LB_m}$, this algorithm reads as follows:
The efficiency of this algorithm highly depends on the utilized lower bounds. A good lower bound should meet the ICES criteria defined by Assent et al. in [AWS06]: It should be indexable such that multidimensional indexing structures like X-Tree or R-Tree can be applied. Furthermore, it should be complete, i.e. no false drops occur, which is guaranteed by the lower-bounding property. Moreover, it should be efficient, i.e. its computational time complexity should be significantly lower than the complexity of the actual distance function. Finally, it should be selective, i.e. it should allow us to exclude as many objects as possible from the actual distance computation, which is achieved by approximating the actual distance as good as possible.
If the lower bounds have different time complexities and selectivities, then the ordering of the lower bounds plays a role for the efficiency of the algorithm. In each iteration of the outer loop, we skip the actual distance computation when one of the lower bounds exceeds the current pruning radius $\epsilon_{max}$. Hence, a reasonable heuristic in order to skip as early as possible is to sort the lower bounds in ascending order of their time complexities.
Lower bounds can be devised by exploiting the inner workings of the utilized distance function. There are, however, some lower bounds that are generic in nature, i.e. they are applicable to a wide variety of distance functions, as long as these distance functions fulfill certain properties. In the following, we present a generic lower bound for metric distance functions and lower bounds for the Earth Mover’s Distance.
If we are dealing with a metric distance function, we can exploit the fact that the distance fulfills the triangle inequality to devise a lower bound (cf. [ZADB06]). Given a query $q$, a database object $o$ and a set of so-called pivot objects $P$, it follows from the triangle inequality that
$$\delta_{LB-Metric}(q, o) = max_{p \in P} |\delta(q, p) – \delta(p, o)| \leq \delta(q, o)$$
The distances $\delta(p, o)$ between all pivot objects $p \in P$ and all database objects $o \in DB$ can be computed in advance and stored in a so-called pivot table of size $|P| \cdot |DB|$. When processing a query, we only need to compute the distances $\delta(q, p)$ between the query and all pivot objects $p \in P$ and store those distances in a list. After that, $\delta_{LB-Metric}$ can be computed in $\mathcal{O}(|P|)$ efficiently by looking up the stored distance values. The selectivity of this approach is highly dependent on the choice of pivot objects $P$ and the distribution of the database objects and the query object.
As shown by Rubner et al. in [RTG00], when using a norm-induced ground distance, the Earth Mover’s Distance can be lower-bounded by the ground distance between the weighted means of the two signatures.
Let $X$ be a feature signature and $\delta : \mathbb{F} \times \mathbb{F} \rightarrow \mathbb{R}$ be a norm-induced ground distance. Then it holds that
$$Rubner(X, Y) = \delta(\overline{X}, \overline{Y}) \leq EMD_\delta(X, Y)$$
where $\overline{X}$ is the weighted mean of $X$:
$$\overline{X} = \frac{\sum_{f \in R_X} X(f) \cdot f}{\sum_{f \in R_X} X(f)}$$
Proposed in [UBSS14], the Independent Minimization Lower Bound for feature signatures (short: IM-Sig) is a lower bound for EMD that corresponds to the EMD when removing the Target constraint and replacing it with the IM-Sig Target constraint defined as follows:
$$\forall g \in R_X, h \in R_Y : f(g, h) \leq Y(h)$$
Intuitively, this modified target constraint allows to distribute the flow optimally for each representative $g \in R_X$ without considering whether the total flow coming into the target representatives exceeds their weights, as long as the flow from $g$ to $h$ does not exceed the weight $Y(h)$ for all target representatives $h \in R_Y$. We use $IMSig_\delta(X,Y)$ to denote the minimum cost flow with respect to the modified target constraint. Since the set of feasible flows for IM-Sig includes the set of feasible flows for EMD, it holds that $IMSig_\delta(X,Y) \leq EMD_\delta(X,Y)$.
An efficient algorithm for computing $IMSig_\delta(X,Y)$ is given in [UBSS14].
Let us review what we have learned so far. In Part I: Quantifying Similarity, we have learned how we can quantify the similarity or dissimilarity between two multimedia objects (or representations of these objects) by means of a similarity function or a distance function. Moreover, we have seen the types of query that exist with respect to such functions: The range query and the k-Nearest Neighbor Query. In Part II: Extracting Feature Vectors, we have learned how to extract feature vectors from multimedia objects that capture the visual content of those objects. We have demonstrated this for videos, but the general approach is applicable to a wide variety of other multimedia objects. In Part III: Feature Signatures, we have learned how we can summarize a set of feature vectors into a compact representation called a feature signature which effectively comprises a compressed summary of the contents of the multimedia object. In Part IV: Earth Mover’s Distance and Part V: Signature Quadratic Form Distance, we have learned about two distance measures on these feature signatures that allow us to calculate their dissimilarity and, in effect, the dissimilarity between the two multimedia objects that they represent. In Part VI: Efficient Query Processing, we have learned how to perform k-Nearest Neighbor queries efficiently, without having to compute the pairwise distances between the query object and every object in the database.
We now have all the necessary components in place to build a multimedia search engine. First of all, the database has to be fed with objects to query. For each of these objects, it should store a previously computed feature signature. If desired, we can generate a pivot table for efficient query processing. Next, a user interface has to be provided, which allows the user to upload a query object. For this query object, a feature signature has to be computed, and subsequently the database of feature signatures can be queried using the multi-step kNN algorithm.
[AWS06] Ira Assent, Marc Wichterich, and Thomas Seidl. Adaptable distance functions for similarity-based multimedia retrieval. Datenbank-Spektrum, 19:23–31, 2006.
[BS13] Christian Beecks and Thomas Seidl. Distance based similarity models for content-based multimedia retrieval. PhD thesis, Aachen, 2013. Zsfassung in dt. und engl. Sprache; Aachen, Techn. Hochsch., Diss., 2013.
[RTG00] Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earth mover’s distance as a metric for image retrieval. International journal of computer vision, 40(2):99–121, 2000.
[SK98] Thomas Seidl and Hans-Peter Kriegel. Optimal multi-step k-nearest neighbor search. In: ACM SIGMOD Record, volume 27, pages 154–165. ACM, 1998.
[UBSS14] Merih Seran Uysal, Christian Beecks, Jochen Schmücking, and Thomas Seidl. Efficient filter approximation using the earth mover’s distance in very large multimedia databases with feature signatures. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pages 979–988. ACM, 2014.
[ZADB06] Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal Batko. Similarity search: the metric space approach, volume 32. Springer Science & Business Media, 2006.
The post Building a Content-Based Search Engine VI: Efficient Query Processing appeared first on deep ideas.
]]>The post Building a Content-Based Search Engine V: Signature Quadratic Form Distance appeared first on deep ideas.
]]>We have seen how we can compare two multimedia objects with respect to their similarity by computing the Earth Mover’s Distance on their feature signatures. To this date, the Earth Mover’s Distance has been shown experimentally to be the most effective distance measure on feature signatures, i.e. it captures the way that human beings judge the perceptual dissimilarity between the objects most adequately. However, we have also seen that the computational complexity of the Earth Mover’s Distance lies somewhere between $\mathcal{O}(n^3)$ and $\mathcal{O}(n^4)$, where $n$ is the total number of representatives of the two compared signatures. For large-scale multimedia retrieval applications, this becomes unfeasible.
An alternative distance measure, which is almost as effective as the Earth Mover’s Distance but significantly more efficient to compute, is the Signature Quadratic Form Distance (SQFD). It has been proposed in [BUS10b] and can be thought of as an adaption of the Quadratic Form Distance [FBF+94], a distance measure on histograms, to feature signatures.
Let X, Y be two feature signatures and let $s : \mathbb{F} \times \mathbb{F} \rightarrow \mathbb{R}^{\geq 0}$ be a similarity function on features. The Signature Quadratic Form Distance $SQFD_s : \mathbb{S} \times \mathbb{S} \rightarrow \mathbb{R}^{\geq 0}$ is defined as
$$SQFD_s(X, Y) = \sqrt{<X-Y, X-Y>_s}$$
where $<X, Y>_s : \mathbb{S} \times \mathbb{S} \rightarrow \mathbb{R}^{\geq 0}$ is the Similarity Correlation, which is defined as
$$<X, Y>_s = \sum_{f \in R_X} \sum_{g \in R_Y} X(f) Y(g) s(f, g)$$
Intuitively, the similarity correlation yields high values if representatives that are similar to each other also have high weights. If, on the other hand, the weight in one signature is distributed in distinct (i.e. dissimilar) regions of the feature space than for the other signature, it assigns low values. Therefore, $<X, Y>_s$ is a measure of the similarity of the two signatures.
$X-Y$ refers to the difference between the two feature signatures, which is defined as a new feature signature such that $(X-Y)(f) = X(f) – Y(f)$. The term $<X-Y, X-Y>_s$ yields the self-similarity of this difference . If $X = Y$, this value is 0. The more dissimilar $X$ and $Y$ are, the higher the distance value will be.
The previous definition pre-supposes some similarity function $s$ on feature vectors. Recall from Part 1: Quantifying Similarity that a similarity function assigns small values to objects that are dissimilar and larger values to objects that are more similar, reaching its maximum when the two compared objects are the same. Given some distance function $\delta$ on feature vectors (an obvious choice being the Euclidean Distance or the Manhattan Distance), we can define a similarity function by means of the Gaussian kernel:
$$s(f, g) = e^{-\frac{\delta(f, g)}{2\sigma^2}}$$
For a distance of 0, this similarity function yields a value of 1. For increasing distances, the similarity exponentially decays to 0. $\sigma$ controls the speed of this exponental decay, with larger values leading to a slower decay.
As we can see from the definition, the run-time complexity for computing the Signature Quadratic Form Distance is $\mathcal{O}(n^2)$, making it significantly more feasible than the Earth Mover’s Distance. Moreover, as opposed to the Earth Mover’s Distance, the computation of the sum can be fully parallelized on a GPU, effectively resulting in constant computation time for most signatures (see [KLBSS11]). Finally, it can be shown that the Signature Quadratic Form Distance is a metric, which will become important for indexing methods, as will be detailed in the next section.
While the Earth Mover’s Distance is slightly more effective than the Signature Quadratic Form Distance (see [BS13] for experimental comparisons), this difference may or may not be worth the increased computational complexity. The choice between the two distance metrics should therefore be based on the requirements of the project at hand.
We are now able to compute the similarity between arbitrary feature signatures. This effectively allows us to retrieve similar multimedia objects to a given query object. However, as of now, we would have to compute the distance between the query object and every single object in the database. If our database is large, this can lead to unacceptable computational demands and query times. For this reason, we are going to introduce indexing methods in the next section. These will allow us to retrieve similar objects to a query object without having to compute all distance values explicitly. Continue with the next section: Efficient Query Processing
[BUS10b] Christian Beecks, Merih Seran Uysal, and Thomas Seidl. Signature quadratic form distance. In Proceedings of the ACM International Conference on Image and Video Retrieval, pages 438–445. ACM, 2010.
[BS13] Christian Beecks and Thomas Seidl. Distance based similarity models for content based multimedia retrieval. PhD thesis, Aachen, 2013. Zsfassung in dt. und engl. Sprache; Aachen, Techn. Hochsch., Diss., 2013.
[KLBSS11] Kruliš, M., Lokoč, J., Beecks, C., Skopal, T., & Seidl, T. (2011, October). Processing the signature quadratic form distance on many-core gpu architectures. In: Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 2373-2376). ACM.
The post Building a Content-Based Search Engine V: Signature Quadratic Form Distance appeared first on deep ideas.
]]>The post Building a Content-Based Search Engine IV: Earth Mover’s Distance appeared first on deep ideas.
]]>We have seen how we can represent multimedia objects efficiently and expressively by summarizing a set of feature vectors into a data structure called a feature signature. Given two multimedia objects represented as feature signatures, we can measure the dissimilarity of the objects using a distance measure on their feature signatures. Numerous distance measures for feature signatures have been proposed (see [BS13] for an overview). The distance measure that has turned out to be the most effective is called the Earth Mover’s Distance.
Proposed in [RTG00] for the domain of content-based image retrieval, the Earth Mover’s Distance (short: EMD) is a distance measure on feature signatures that can be thought of as the minimum required cost for transforming one feature signature into the other one. This cost is formulated by means of a transportation problem: We determine the optimal way to move the weights from the representatives of the first signature ($X$) to the representatives of the second signature ($Y$). The cost for moving a certain amount of weight is given by the amount of weight multiplied by the distance over which it is transported.
The following image depicts an example. On the top, we see the feature signatures of two videos. Let’s call the left-hand signature $X$ and the right-hand signature $Y$. On the bottom, we see an isolated representative of $X$ and the representatives of $Y$ to which it moves weight. As we can see, the representatives in $Y$ to which $X$’s representative moves weight are quite similar to $X$’s representative, resulting in a relatively small “movement cost” or “transformation cost” for this representative.
Let’s see how we can formulate the Earth Mover’s Distance as an optimization problem. Let $\mathbb{F}$ be the set of all possible features, $\delta : \mathbb{F} \times \mathbb{F} \rightarrow \mathbb{R}_{\geq 0}$ be a distance function on features (called ground distance, e.g. the Euclidean distance) and $X, Y \in \mathbb{S}$ be two feature signatures. We call $f : R_X \times R_Y \rightarrow \mathbb{R}$ a flow from signature $X$ to signature $Y$. For two representatives $g \in R_X, h \in R_Y$, it tells us how much weight is moved from $g$ to $h$. $f$ is called a feasible flow if it fulfills the following constraints:
Now let $F = \{f \; | \; f \; \text{is a feasible flow}\}$. There are infinitely many feasible flows. We are interested in the flow with the minimum cost, where the cost is defined as the sum over all pairs of representatives $g$, $h$ of the flow $f(g, h)$ multiplied by their ground distance $\delta(g, h)$. Intuitively, this means that we want to find a flow that tends to move weights from representatives in $X$ to nearby (i.e. similar) representatives in $Y$. The Earth Mover’s Distance is then defined as the cost of the minimum cost flow, i.e. the cost required to transform one signature into the other one.
$$EMD_\delta(X, Y) = min_{f \in F} \left \{ \frac{ \sum_{g \in R_X} \sum_{h \in R_Y} f(g, h) \cdot \delta(g, h) }{ min\{ \sum_{g \in R_X} X(g), \sum_{h \in R_Y} Y(h) \} } \right \}$$
The denominator acts as a normalization term.
The definition of the EMD corresponds to a linear program, i.e. an optimization problem with a linear objective function and linear constraints. It can be solved, for instance, using the Simplex algorithm (cf. [Van01]), which has an exponential worst-time complexity. In practice, we would just use a library that calculates the Earth Mover’s Distance for us directly. According to [SJ08], the empirical time complexity for calculating the Earth Mover’s Distance between two signatures $X$ and $Y$ using the simplex algorithm lies between $\mathcal{O}(n^3)$ and $\mathcal{O}(n^4)$ where $n = |R_X| + |R_Y|$. An approximation to the Earth Mover’s Distance can, however, be computed in linear time [SJ08]. The Earth Mover’s Distance is a metric, provided that the ground distance is a metric and the compared signatures are normalized to the same total weight.
In the next section, we present another distance measure on feature signatures that is only slightly less effective, but can be computed in quadratic time: the Signature Quadratic Form Distance
[BS13] Christian Beecks and Thomas Seidl. Distance based similarity models for content based multimedia retrieval. PhD thesis, Aachen, 2013. Zsfassung in dt. und engl. Sprache; Aachen, Techn. Hochsch., Diss., 2013.
[RTG00] Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earth mover’s distance as a metric for image retrieval. International journal of computer vision, 40(2):99–121, 2000.
[Van01] Robert J Vanderbei. Linear programming. Foundations and extensions, International Series in Operations Research & Management Science, 37, 2001.
[SJ08] Sameer Shirdhonkar and David W Jacobs. Approximate earth mover’s distance in linear time. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
The post Building a Content-Based Search Engine IV: Earth Mover’s Distance appeared first on deep ideas.
]]>The post Building a Content-Based Search Engine III: Feature Signatures appeared first on deep ideas.
]]>When computing the distance between two multimedia objects, it would be highly inefficient to take into account all of the extracted feature vectors. For most practical purposes, however, it is not necessary to do this in order to achieve a good discriminability of the objects. Most of the vectors carry redundant information or fine-grained details that do not have a significant influence on the overall similarity of two objects. Hence, we summarize all of the extracted features into a structure called a feature signature.
Intuitively, a feature signature is characterized by a relatively small set of feature vectors, called the representatives, along with a weight for each representative.
Formally, if we let $\mathbb{F}$ denote the set of all possible features, a feature signature $X$ is a function $X : \mathbb{F} \rightarrow \mathbb{R}$ such that $|\{f \in \mathbb{F} \; | \; X(f) = 0\}| < \infty$ (i.e. it assigns a weight to only a finite number of vectors and assigns 0 everywhere else). We refer to $R_X =|\{f \in \mathbb{F} \; | \; X(f) = 0\}|$ as the representatives of $X$. We use $\mathbb{S}$ to refer to the set of all feature signatures.
A common way to calculate a feature signature is to apply a clustering algorithm (e.g. k-means) to the extracted set of feature vectors. From the resulting clustering, we devise a feature signature by defining the cluster means as the representatives and assigning them a weight corresponding to the relative size of the cluster, i.e. the number of cluster elements divided by the total number of extracted features. Here is an example depicting this process for 2-dimensional feature vectors:
First, the vectors are clustered, yielding 3 clusters (red, green and blue). Then we compute the cluster centers (depicted as the large red, green and blue dots), which we define as the representatives of the feature signature S, and assign them a weight corresponding to the relative cluster size (i.e. the number of feature vectors in the cluster divided by the total number of feature vectors).
Let’s formalize this process. Let $C = C_1, …, C_m$ be a clustering of feature vectors. We define the clustering-induced normalized feature signature $X_C$ as $X_C : \mathbb{F} \rightarrow \mathbb{R}$ with:
$$
X_C(f)=
\begin{cases}
\frac{|C_i|}{\sum_{1 \leq j \leq m} |C_j|} & \text{if } f = \frac{1}{|C_i|} \sum_{g \in C_i} g\\
0 & \text{else}
\end{cases}
$$
Before applying the clustering algorithm, we multiply each dimension by a certain weight, which allows us to control the importance of that dimension for the clustering. When using k-means to calculate the clustering, we can specify the desired number of representatives k in advance. This allows us to control the expressiveness of the feature signature. The higher we choose k, the more expressive the feature signature gets, with the downside of increasing the storage size and the computational complexity of the distance computation. There is a monotonous relation between k and the effectiveness (i.e. adequacy of the results) as well as the query processing time. Hence, k allows us to control the tradeoff between effectiveness and efficiency.
The following image depicts 3D visualizations of two videos and their feature signatures with k = 100. Here, the clusters are represented as spheres. Their position in the 3D coordinate system corresponds to the position (x and y) and the time (t), the color of the sphere corresponds to the L*a*b* color dimensions of the representatives and the volume of the sphere corresponds to the weight that the feature signature assigns to the representative. As we can see, the feature signature summarizes the visual content of the video as it unfolds over time using just 100 vectors.
We have seen how feature signatures reduce the rather large amount of information inherent in the feature vectors into a compact representation that still reveals a lot of information about the feature distribution, since it summarizes how many feature vectors are located at which locations in the feature space. In the next section, we will see how we can compute the similarity between two feature signatures. Continue with the next section: Earth Mover’s Distance
The post Building a Content-Based Search Engine III: Feature Signatures appeared first on deep ideas.
]]>The post Building a Content-Based Search Engine II: Extracting Feature Vectors appeared first on deep ideas.
]]>This is part 2 in a series of tutorials in which we learn how to build a content-based search engine that retrieves multimedia objects based on their content rather than based on keywords, title or meta description.
In the previous section, we saw how similarity between multimedia objects can be formalized and which types of queries exist with respect to this formalization. In a step towards efficiently computing similarity between two multimedia objects, we are now going to see how we can characterize the contents of individual multimedia objects (in our example, a video) by extracting a set of so-called feature vectors, which are vectors from the Euclidean space that describe certain local characteristic properties.
Since we are interested in visual similarity between two videos, our goal is to extract a set of vectors, each of which describes certain local visual properties of the video numerically. This process can be depicted visually as follows:
We first select a certain amount of sample frames from the video (e.g. 10 frames per second). For each of these frames, we select a fixed amount of equidistant sample pixels. Finally, for each sample pixel, we compute an 8-dimensional Euclidean vector $(x, y, L, a, b, \chi, \eta, t)$ describing the visual appearance of the pixel and its context. The choice of this vector is just a suggestion and it isn’t necessary to include all of features presented here.
The first two dimensions of this vector correspond to the x and y coordinates of the pixel inside the frame. The next 3 dimensions correspond to the color of the pixel in the L*a*b* color space, i.e. the lightness, the position between red and green and the position between blue and yellow (cf. [Wik15b]). The reason we choose this color space instead of e.g. RGB is the fact that Euclidean distances in this space have a significantly higher correlation with perceptual dissimilarity than other color spaces, making it more suitable for our task of measuring visual similarity. Additionally, we calculate the contrast $\chi$ of a 12 x 12 neighborhood of the pixel as proposed by Tamura et al. in [TMY78], which is a measure of the dynamic range of the colors. Furthermore, we calculate the coarseness $\eta$ of the pixel, as proposed in [TMY78], which is a measure of how big the structures surrounding that pixel are. Finally, we add the time t of the frame from which the pixel was sampled as another dimension (in seconds from the beginning of the video).
The whole set of extracted feature vectors, then, comprises a summary of how the visual contents of the video unfold over time.
The entries of the vectors all measure different aspects and stem from different ranges. Since we want all dimensions to have equal importance in the distance computations, irrespective of their value range, we normalize all 8 dimensions individually, yielding a vector whose entries lie between 0 and 1: The positions x and y are divided by the image width and height, respectively. The L* color coordinate ranges from 0 to 100 and is hence divided by 100. The a* and b* color coordinates range from -128 to 127. Therefore, we add 128 and divide by 255. The contrast $\chi$ ranges from 0 to 128 and is therefore divided by 128. The coarseness $\eta$ ranges from 0 to 5 and is hence divided by 5. Finally, the time is divided by the video duration.
The first 7 dimensions that describe a pixel in the context of its frame have yielded high effectiveness for the task of retrieving visually similar images (cf. [BUS10a]) and were hence adopted. Since a video can be thought of as a generalization of an image along another dimension (the time dimension), the image retrieval approach was extended simply by adding the time as another dimension to the feature vectors. The rationale for this is that there is no conceptual difference between the spatial dimensions and the time dimension. A video can be imagined to be an image changing over time. The fact that a video is usually represented as a sequence of frames is just a way to store a video digitally, and it has lead many of the video retrieval approaches to base their video representations on frame sequences, even though semantically a video can be treated reasonably as an image changing continuously over time rather than as a sequence of images.
We now know how we can express local visual properties of a video by means of a set of feature vectors. In the next section, we will see how we can summarize these vectors into a more compact representation scheme that allows us to store the contents of the video using less space, and to compute the visual similarity between two videos more efficiently. Continue with the next section: Feature Signatures
[Wik15b] Wikipedia. Lab color space http://en.wikipedia.org/wiki/Lab_color_space, 2015.
[TMY78] Hideyuki Tamura, Shunji Mori, and Takashi Yamawaki. Textural features corresponding to visual perception. IEEE Transactions on Systems, Man and Cybernetics, 8(6):460–473, 1978.
[BUS10a] Christian Beecks, Merih Seran Uysal, and Thomas Seidl. A comparative study of similarity measures for content-based multimedia retrieval. In Multimedia and Expo (ICME), 2010 IEEE International Conference on, pages 1552–1557. IEEE, 2010.
The post Building a Content-Based Search Engine II: Extracting Feature Vectors appeared first on deep ideas.
]]>The post Building a Content-Based Search Engine I: Quantifying Similarity appeared first on deep ideas.
]]>The explosion of user-generated content on the internet during the last decades has left the world of querying multimedia data with unprecedented challenges. There is a demand for this data to be processed and indexed in order to make it available for different types of queries, whilst ensuring acceptable response times.
An arguably important task is the retrieval of multimedia objects (e.g. images or videos) that are visually similar to a certain query object (e.g. a query image or a query video). We define two multimedia objects to be visually similar if they depict contents that “look similar” to humans. So far, this task has gained comparatively little research recognition.
Most of the major search engines or content suggestion engines only allow for text-based queries and the search only considers metadata such as title, description text or user-specified tags. The content of the multimedia objects is not taken into account. Such systems are very limited with respect to the types of queries that are possible, and with respect to the actual relevance of the retrieved results.
In this series of tutorials, I introduce a method for retrieving visually similar multimedia objects to a specified query object. This method is based on so-called feature signatures, which comprise an expressive summary of the content of a multimedia object that is significantly more compact than the object itself, allowing for an efficient comparison between objects. This method is applicable to virtually all kinds of multimedia objects. In the course of this tutorial, we’ll take content-based video similarity search as our main example. However, it should be clear how to adapt this method to your particular needs.
This section introduces some fundamental preliminaries for the problem of retrieving similar multimedia objects to a given query object. We will review how similarity between multimedia objects can be formalized and which types of queries exist with respect to this formalization.
In order to retrieve similar multimedia objects to a given query object, we need a way to compare the query object to the database objects and quantify the similarity or dissimilarity numerically. There are many ways in which similarity or dissimilarity between two objects can be measured, and it is highly dependent on the nature of the compared objects and on the aspects which we want to compare. For example, videos could be compared with respect to their visual content, their auditory content or meta-data such as the title or a description text.
The most common way to model similarity is by means of a distance function. A distance function assigns high values to objects that are dissimilar and small values to objects that are similar, reaching 0 when the two compared objects are the same. Mathematically, a distance function is defined as follows:
Let $X$ be a set. A function $\delta : X \times X \rightarrow R$ is called a distance function if it holds for all $x, y \in X$:
When it comes to efficient query processing, as we will see later, it is useful if the utilized distance function is a metric.
Let $\delta : X \times X \rightarrow R$ be a distance function. $\delta$ is called a metric if it holds for all $x, y, z \in X$:
An alternative way to model similarity between two objects is by means of a similarity function, which assigns small values to objects that are dissimilar and larger values to objects that are more similar, reaching its maximum when the two compared objects are the same (cf. [BS13]).
Let X be a set. A function $s : X \times X \rightarrow \mathbb{R}$ is called a similarity function if it is symmetric and if it holds for all $x, y \in X$ that $s(x, x) \geq s(x, y)$ (maximum self-similarity).
Once we have modeled the similarity for pairs of multimedia objects by means of a distance function, we can reformulate the problem of retrieving similar objects to the query object by utilizing such a function. A prominent query type is the so-called range query, which retrieves all database objects for which the distance to the query object lies below a certain threshold. The formal definition is given below (adopted from [BS13]).
Let $X$ be a set of objects, $\delta : X \times X \rightarrow R$ be a distance function, $DB \subseteq X$ be a database of objects, $q \in X$ be a query object and $\epsilon \in \mathbb{R}$ be a search radius. The range query $range(q, \delta, X)$ is defined as
$range_\epsilon(q, \delta, X) = \{x \in X \; | \; \delta(q, x) \leq \epsilon\}$
For range queries, it is hard to determine a suitable threshold $\epsilon$ to yield a result set of a desired size. When $\epsilon$ is too low, the result set might be very small or even empty. On the other hand, when choosing it too large, the result set might come near to including the entire database. This problem can be solved by issuing a k-Nearest Neighbor Query (short: kNN query) instead. In this query type, we specify the desired number of retrieved objects $k$ instead of a distance threshold. If we assume that the distances between the query object and the database objects are pairwise distinct, the k-Nearest Neighbors are the $k$ objects that have the smallest distance to the query object. The formal definition is given below (adopted from [SK98]).
Let $X$ be a set of objects, $\delta : X \times X \rightarrow R$ be a distance function, $DB \subseteq X$ be a database of objects, $q \in X$ be a query object and $k \in \mathbb{N}, k \leq |DB|$. We define the k-Nearest Neighbors of $q$ w.r.t. $\delta$ as the smallest set $NN_q(k) \subseteq DB$ with $|NN_q(k)| \geq k$ such that the following holds:
$\forall o \in NN_q(k), \forall o^\prime \in DB − NN_q(k) : \delta(o, q) < \delta(o^\prime , q)$
Our goal now is to devise a distance function that reflects human judgement of similarity. In the next section, we will learn how to extract features from multimedia objects, which are sets of vectors that characterize the content of that object. Continue with the next section: Extracting Feature Vectors.
[BS13] Christian Beecks and Thomas Seidl. Distance based similarity models for content based multimedia retrieval. PhD thesis, Aachen, 2013. Zsfassung in dt. und engl. Sprache; Aachen, Techn. Hochsch., Diss., 2013.
[SK98] Thomas Seidl and Hans-Peter Kriegel. Optimal multi-step k-nearest neighbor search. In ACM SIGMOD Record, volume 27, pages 154–165. ACM, 1998.
The post Building a Content-Based Search Engine I: Quantifying Similarity appeared first on deep ideas.
]]>The post Deep Learning From Scratch VI: TensorFlow appeared first on deep ideas.
]]>It is now time to say goodbye to our own toy library and start to get professional by switching to the actual TensorFlow.
As we’ve learned already, TensorFlow conceptually works exactly the same as our implementation. So why not just stick to our own implementation? There are a couple of reasons:
TensorFlow is the product of years of effort in providing efficient implementations for all the algorithms relevant to our purposes. Fortunately, there are experts at Google whose everyday job is to optimize these implementations. We do not need to know all of these details. We only have to know what the algorithms do conceptually (which we do now) and how to call them.
TensorFlow allows us to train our neural networks on the GPU (graphical processing unit), resulting in an enormous speedup through massive parallelization.
Google is now building Tensor processing units, which are integrated circuits specifically built to run and train TensorFlow graphs, resulting in yet more enormous speedup.
TensorFlow comes pre-equipped with a lot of neural network architectures that would be cumbersome to build on our own.
TensorFlow comes with a high-level API called Keras that allows us to build neural network architectures way easier than by defining the computational graph by hand, as we did up until now.
So let’s get started. Installing TensorFlow is very easy.
pip install tensorflow
If we want GPU acceleration, we have to install the package tensorflow-gpu
:
pip install tensorflow-gpu
In our code, we import it as follows:
import tensorflow as tf
Since the syntax we are used to from the previous sections mimics the TensorFlow syntax, we already know how to use TensorFlow. We only have to make the following changes:
tf.
to the front of all our function calls and classessession.run(tf.global_variables_initializer())
after building the graphThe rest is exactly the same. Let’s recreate the multi-layer perceptron from the previous section using TensorFlow:
In the next lesson, we will learn about Keras, which is a high-level API on top of TensorFlow that allows us to define and train neural networks more abstractly – without having to specify the internal composition of all the operations everytime. You can either subscribe to deep ideas by Email or subscribe to my Facebook page to stay updated.
The post Deep Learning From Scratch VI: TensorFlow appeared first on deep ideas.
]]>The post Connectionist Models of Cognition appeared first on deep ideas.
]]>In this video, I give an introduction to the field of computational cognitive modeling (i.e. modeling minds through algorithms) in general, and connectionist modeling (i.e. using artificial neural networks for the modeling) in particular. We deal with the following topics:
The post Connectionist Models of Cognition appeared first on deep ideas.
]]>The post Robot Localization IV: The Particle Filter appeared first on deep ideas.
]]>The last filtering algorithm we are going to discuss is the Particle Filter. It is also an instance of the Bayes Filter and in some ways superior to both the Histogram filter and the Kalman Filter. For instance, it is capable of handling continuous state spaces like the Kalman Filter. Unlike the Kalman Filter, however, it is capable of approximately representing deliberate belief distributions, not only normal distributions. It is therefore suitable for non-linear dynamic systems as well.
The idea of the Particle Filter is to approximate the belief $bel(x_t)$ as a set of $n$ so-called particles $p_t^{[i]} \in dom(x_t)$: $\chi_t := \{ p_t^{[1]}, p_t^{[2]}, …, p_t^{[n]} \}$. Each of these particles is a concrete guess of the actual state vector. At each time step the particles are randomly sampled from the state space in such a way that $P(p_t^{[i]} \in \chi_t)$ is proportional to $P(x_t = p_t^{[i]} \, \vert \, e_{1:t})$.
This means that the probability of a particle being included in $\chi_t$ is proportional to the probability of it being the correct representation of the state, given the sensor measurements so far. This way, the update step can be thought of as a process similar to the evolutionary mechanism of natural selection: Strong theories, that are compatible with the new measurement, are likely to live on and reproduce, whereas poor theories are likely to die out. This results in the fact that the particles are likely to be centered around strong theories. We will see a visual example for this later.
We take the same approach as we did with all the previous Bayes Filters. First, we calculate a particle representation of $\overline{bel}(x_{t+1})$ from $\chi_t$, which we denote $\overline{\chi}_{t+1}$: For each particle $p_t^{[i]} \in \chi_t$, we sample a new particle $\overline{p}_{t+1}^{[i]}$ from the distribution $P(x_{t+1} \, \vert \, x_t = p_t^{[i]})$, which can be obtained from the transition model. We put all these new particle into the set $\overline{\chi}_{t+1}$.
As an example, let’s consider a moving robot in one dimension. The state contains only one variable, the location. From time $t$ to $t + 1$ the robot has moved an expected distance of 1 meter to the right with Gaussian movement noise. In this case we would just add 1 to the locations of all the particles plus a random number that is sampled from the transition model.
Now we calculate the particle representation of $bel(x_{t+1})$, namely $\chi_{t+1}$, from $\overline{\chi}_{t+1}$. The key idea here is to assign a so-called importance weight, denoted $\omega[i]$, to each of the particles in $\overline{\chi}_{t+1}$. This importance weight is a measure of how compatible the particle $\overline{p}_{t+1}^{[i]}$ is with the new measurement $e_{t+1}$. This probability can be obtained from the sensor model. $\chi_{t+1}$ is then constructed by randomly picking $n$ particles from $\overline{p}_{t+1}^{[i]}$ with a probability proportional to their weight. The same particle may be picked multiple times. This procedure is called resampling.
We elucidate the Particle Filter with a localization example that’s similar to the Kalman Filter example, i.e. we use the same transition and sensor models as well as the same position and measurement chains. Since the particles are drawn from the state space, they are simply real numbers. This time, we start with a uniform distribution over the interval $[0, 5]$. In this instance, we use 30 particles. For obvious reasons, a numerical representation of the particle sets at each time step will not be given, but a graphical representation can be seen in the following figure. Each of the black/gray lines represents one or more particles. Since multiple particles can fall on the same pixel, the opacities of the lines are proportional to the number of particles on that pixel. Again, the blue line represents the actual position and the red graph represents $P(x_t \, \vert \, e_t)$.
In this series of articles, we have introduced the Bayes Filter as a means to maintain a belief about the state of a system over time and periodically update it according to how the state evolves and which observations are made. We came across the problem that, for a continuous state space, the belief could generally not be represented in a computationally tractable way. We saw three solutions to this problem, all of which have their advantages and disadvantages.
The first solution, the Histogram Filter, solves the problem by slicing the state space into a finite amount of bins and representing the belief as a discrete probability distribution over these bins. This allows us to approximately represent arbitrary probability distributions.
The second solution, the Kalman Filter, assumes the transition and sensor mod- els to be linear Gaussians and the initial belief to be Gaussian, which makes it inapplicable for non-linear dynamic systems – at least in its original form. As we showed, this assumption results in the fact that the belief distribution is always a Gaussian and can thus be represented by a mean and a variance only, which is very memory efficient.
The last solution, the Particle Filter, solves the problem by representing the belief as a finite set of guesses at the state, which are approximately distributed according to the actual belief distribution and are therefore a good representation for it. Like the Histogram Filter, it is able to represent arbitrary belief distributions, with the difference that the state space is not binned and therefore the approximation is more accurate.
[NORVIG] Peter Norvig, Stuart Russel (2010) Artificial Intelligence – A Modern Approach. 3rd edition, Prentice Hall International
[THRUN] Sebastian Thrun, Wolfram Burgard, Dieter Fox (2005) Probabilistic Robotics
[NEGENBORN] Rudy Negenborn (2003) Robot Localization and Kalman Filters
[DEGROOT] Morris DeGroot, Mark Schervish (2012) Probability and Statistics. 4th edition, Addison-Wesley
[BESSIERE] Pierre Bessire, Christian Laugier, Roland Siegwart (2008) Probabilistic Reasoning and Decision Making in Sensory-Motor Systems
The post Robot Localization IV: The Particle Filter appeared first on deep ideas.
]]>The post Robot Localization III: The Kalman Filter appeared first on deep ideas.
]]>This post deals with another solution to the continuous state space problem, the Kalman Filter, invented by Thiele, Swerling and Kalman. It has successfully been used in many applications, like the mission to Mars or automatic missile guidance systems (cf. [NEGENBORN, abstract]). The classical application is radar tracking, but there is a vast amount of other applications (cf. [NORVIG, pp. 588, 589]).
In its essence, it is an implementation of the Bayes Filter in which the belief is a normal (Gaussian) distribution and can therefore be represented by its parameters: a mean vector and a covariance matrix. In this representation, the mean vector is the expected state and the covariance matrix is a measure of uncertainty.
In order for the Kalman filter to work, we need to make a few assumptions about the system we wish to describe (in addition to the Markov assumptions of the Bayes filter). If these assumptions hold, the belief $bel(x_t)$ will be normally distributed at each time step $t$ and can thus be represented by a mean vector $\mu_t$ and a covariance matrix $\Sigma_t$. It is also true that if either of the three assumptions is violated, then the belief will always be non-Gaussian for $t \geq 1$ (cf. [RISTIC, p. 4]). Thus, these assumptions are necessary and sufficient conditions for the Kalman Filter. In the next section, we will see how the Kalman Filter algorithm follows from these assumptions for one-dimensional state spaces. After that, we will take a look at the multi-dimensional algorithm. The assumptions are as follows:
The assertion that the belief is always normally distributed is very important, since it ensures the computational tractability of the belief update for deliberate time steps, because in the general case, i.e. for deliberate sensor and measurement distributions, a representation of the belief could, as we argued in chapter 2, grow unboundedly over time.
For simplicity, we’ll first assume that we are dealing with a one-dimensional state space (i.e. $x_t$ is just a real number, e.g. a position along a line). We will take a look at the multidimensional case later. The transition phase from time $t$ to $t + 1$ just adds some number $\delta_{t+1}$ to the state, plus some unpredictable Gaussian noise $\epsilon_{t+1}$ (as before, imagine a robot moving at a desired speed of $\delta$ per time step with some unpredictable random error):
$$x_{t+1} = x_t + \delta_{t+1} + \epsilon_{t+1}$$
Then our transition model is given by
$$P(x_{t+1} \, \vert \, x_t) = \mathcal{N}(x_t + \delta_{t+1}, \phi^2)$$
The variance $\phi^2$ acts as a measure of uncertainty, reflecting the transition noise $\epsilon$. In the robot example, assuming we are at position $x_t$ at time step $t$, the position at time step $t+1$ is a Gaussian cloud around an expected position of $x_t + \delta_{t+1}$ with a variance (uncertainty) of $\phi^2$.
Our sensor model is given by
$$P(e_{t+1} \, \vert \, x_{t+1}) = \mathcal{N}(x_{t+1}, \psi^2)$$
Again, the variance $\psi^2$ acts as a measure of uncertainty, this time for the measurement noise $\zeta$. In the robot example, assuming that we are at position $x_{t+1}$, the measurement that we get can be expected to be sampled from a Gaussian cloud around $x_{t+1}$ with a variance of $\psi^2$
Assuming that the belief at some time step $t$ is a normal distribution, i.e. $bel(x_t) = \mathcal{N}(\mu_t, \sigma_t^2)$, it can be shown that the projected belief $\overline{bel}(x_{t+1})$ is also a normal distribution with mean $\overline{\mu}_{t+1} = \mu_t + \delta_{t+1}$ and variance $\overline{\sigma}_{t+1}^2 = \sigma_{t}^2 + \phi^2$.
Considering the robot example, it should not surprise us that the expected position at time step $t+1$ is just the expected position at time step $t$ plus the expected distance $\delta_{t+1}$ that we wanted to move. Moreover, it seems reasonable that our new uncertainty in the belief, $\overline{\sigma}_{t+1}^2$, is given by the old uncertainty $\sigma_{t}^2$ plus the uncertainty that we get due to the transition $\phi^2$.
Now, assuming that $\overline{bel}(x_{t+1})$ is normally distributed, it can be shown that the updated belief $bel(x_{t+1})$ after receiving a measurement $e_{t+1}$ is a normal distribution as well, this time with mean $\overline{\mu}_{t+1} + k_{t+1} \cdot (e_{t+1} – \overline{\mu}_{t+1})$ and variance $\sigma_{t+1}^2 = (1 – k_{t+1}) \overline{\sigma}_{t+1}^2$ where $k_{t+1} = \frac{\overline{\sigma}_{t+1}^2}{\overline{\sigma}_{t+1}^2 + \psi^2}$.
We can see that the new mean is a weighted average of the new measurement and the old mean, where the weights are the transition noise and the sensor noise, respectively. This makes intuitive sense: The importance of the new measurement increases with the uncertainty of the current belief, whilst the importance of the current belief increases with the uncertainty of the measurement.
A proof of these statements can be found in [NEGENBORN, pp. 34 – 37].
Now that all the preparatory work is done, we can formulate the actual Kalman Filter algorithm. It is basically a variant of the Bayes Filter with the property that the beliefs $bel(x_t)$ and $\overline{bel}(x_t)$ are now represented by their parameterizations $(\mu_t, \sigma_t^2)$ and $(\overline{\mu}_t, \overline{\sigma}_t^2)$, respectively. As with the Bayes Filter, the correctness follows by induction.
One-Dimensional Kalman Filter
- $\overline{\mu}_{t+1} = \mu_t + \delta_{t+1}$
- $\overline{\sigma}_{t+1}^2 = \sigma_{t}^2 + \phi^2$
- $k_{t+1} = \frac{\overline{\sigma}_{t+1}^2}{\overline{\sigma}_{t+1}^2 + \psi^2}$
- $\mu_{t+1} = \overline{\mu}_{t+1} + k_{t+1} \cdot (e_{t+1} – \overline{\mu}_{t+1})$
- $\sigma_{t+1}^2 = (1 – k_{t+1}) \overline{\sigma}_{t+1}^2$
- return $\mu_{t+1}, \sigma_{t+1}^2$
The variable $k$ is often called the Kalman gain (cf. [NORVIG, p. 588]) and functions as a measure of how important the new measurement is. If the uncertainty of the projected belief is low, then the Kalman gain will be low and thus the new measurement will not have a big impact on the belief. Additionally, if the uncertainty of the measurement is high, the Kalman gain will be low as well and if it is low, the Kalman gain will be high.
The Kalman gain is first incorporated in the expectation update. First, the deviation of the measurement from the expectation, $e_{t+1} – \mu_{t+1}$, is calculated, then it is weighted with the Kalman gain and finally it is added to the expectation. This has exactly the desired effect that the new measurement has an impact on the belief that is proportional to its importance. Dependent on how much new information has been incorporated, the uncertainty decreases, which is implemented in the variance update.
We will now shed some light on this algorithm by applying it to a one-dimensional robot localization problem up to time step 4. The state, i.e. the robot’s location, is simply a real number. The robot believes that it starts out at $x_0 = 0$ with some uncertainty, which is reflected by a prior belief of $\mathcal{N}(\mu_0 = 0, \sigma_0^2 = 1.0)$.
We assume that the robot moves at constant average speed $d_t = 1$ with a noise of $\phi^2 = 0.1$. The positions of the robot shall be $x_0 = 0, x_1 = 0.4543, x_2 = 1.3752, x_3 = 2.2080, x_4 = 3.4944$. I sampled these positions randomly using the specified transition model. Of course, they are not known to the algorithm and they shall only be used for a later comparison with the resulting beliefs (and to create the measurements). We can see that the transition noise really had an impact here. For example, from time step 0 to 1, the robot only moved 0.4543 units when the expected distance was 1 unit.
In our example, the robot is able to sense its position with a measurement noise of $\psi^2 = 1.0$. This is very big noise if we consider that it means that, in expectation, about 68.2% of the measurements are within a distance of 1 unit to the actual position (which is already a big interval) and 31.8% of the measurements might even be outside this interval. Let’s assume that we make the following measurements (which have been sampled from the sensor model using the actual positions specified above): $e_1 = 3.3558, e_2 = −0.0570, e_3 = 1.8155, e_4 = 3.7446$. We can see the obvious impact of the measurement noise: Although we were at position 0.4543 at time step 1, we measured the position 3.3558.
The following figure shows the development of the belief for the first four time steps, both numerically and graphically. At each time step, the black graphs show the belief specified in the upper right-hand corner, whereas the red graphs show the measurement probabilities $P(e_t \, \vert \, x_t)$. The blue line shows the position $x_t$ and the green line the expected position, i.e. the mean of the belief distribution. Take some time to go over the graphs and do not let the mass of information confuse you. After having understood this example, you are able to visualize the Kalman Filter, which helps a lot when using it.
We can see that even though the measurements have been very bad, we still arrive at a belief that is quite reasonable, with an error of only 0.144.
Let’s now take a glimpse at the multi-dimensional situation, which looks a little scary but really is completely analogous to the one-dimensional case.
As before, the transition model has to be a linear Gaussian ($x_{t+1} = A_{t+1} x_t + \Delta_{t+1} + \epsilon_{t+1}$), so our transition probability are given by $P(x_{t+1} \, \vert \, x_t) = \mathcal{N}(A_{t+1} x_t + \Delta_{t+1}, \Phi_{t+1})$. Our uncertainty is now reflected by a covariance matrix $\Phi_{t+1}$ instead of a variance along a single dimension.
The sensor model has to be a linear Gaussian as well ($e_{t+1} = B_{t+1} \cdot x_{t+1} + \zeta_{t+1}$), so our measurement probabilities are given analogously by $P(e_{t+1} \, \vert x_{t+1}) = \mathcal{N}(C_{t+1} x_{t+1}, \Psi_{t+1})$.
With these models defined, we can now state the multi-dimensional Kalman Filter algorithm (cf. [THRUN, p. 42]).
Multi-Dimensional Kalman Filter
It is worth noting that $\overline{\Sigma}_t$, $\Sigma_t$ and $K_t$ are independent of the measurements and can therefore be calculated in advance, which reduces the amount of computation that has to be done “live”.
When we are dealing with linear Gaussian systems, the Kalman Filter is the way to go, since it is very efficient, easy to implement and completely exact if the three assumptions really hold. There are, however, only very few systems that really behave like this. Normally, the state transition process is nonlinear, which means it can not be described with a simple matrix multiplication.
The so-called extended Kalman Filter attempts to overcome this issue. The idea here is that if the state transition process is approximately linear in regions that are close to $\mu_t$, then a Gaussian belief is a reasonable approximation. If the system behaves nonlinear in regions close to the mean, the extended Kalman Filter yields bad results.
A different solution is the so-called switching Kalman Filter, which works by running multiple instances of the Kalman Filter in parallel, where each of them uses a different transition model. The overall belief is then calculated as a weighted sum of the belief distributions of the different instances, where the weight is a measure of how compatible this particular instance is with the measurements.
Continue with IV: The Particle Filter.
The post Robot Localization III: The Kalman Filter appeared first on deep ideas.
]]>