**Title:** Locally persistent categories and metric properties of interleaving distances

**Abstract:** When estimating topological features of continuous objects from finite samples, one has to produce a topologically interesting object given a finite metric space. When showing that such a procedure is robust, one endows the collection of possible inputs and the collection of outputs with metrics, and shows that the procedure is continuous with respect to these metrics. Category theory has proven valuable when defining these procedures and metrics, and when proving that these procedures and metrics are well-behaved. I will present a notion of category specifically designed for these tasks, and I will discuss applications to Topological Data Analysis.

**Please email Audrey Kager for Zoom link**

**Title:** Exploration of stock price predictability in HFT with an application in spoofing detection

**Abstract: **Today many brokerage firms use computer algorithms to make trade decisions, submit orders, and manage orders after submission. This algorithmic trading is required to maximize execution speed and so minimize the cost, market impact and risk associated with trading large volumes of securities. Traders place orders to buy or sell a given amount of a security for a specific price on an exchange. These buy and sell orders accumulate in the `order book' until they either find a counter-party for execution or are canceled. All participants can also issue market orders to buy or sell at the best available prices; these orders are immediately executed on a `first come first serve' basis.

Using high frequency trading (HFT) data on the Toronto Stock Exchange, provided by the TMX Group, we explore a data driven model to detect a form of high frequency price manipulation -- known as spoofing. A spoofer manipulates prices by placing limit orders which they do not intend to be executed in order to mislead other traders about the available volume of shares. The hope is that this will cause prices to move in their favor. We show that a generalized form of volume imbalance is associated with price movements and this can be manipulated by spoofing strategies. The literature argues spoofing strategies are detrimental to the integrity of markets and new models are necessary for regulators to combat them.

The size of the data sets we use definitely qualify for the moniker `Big Data'. The limit order book must be constructed each time an order arrives for a particular stock. This process is implemented on a distributed data system using Pyspark since it would be impossible to do so, efficiently, on a local machine. We discuss some issues and complications that arise from working with very large data sets of this type.

We define a generalized volume imbalance as the weight in a convex combination of two price change distributions which forms our price change model. Price changes for different stocks happen at different time scales. We remedy this issue by comparing stocks on time intervals over which they all have the same variance in their price change distributions. Statistical and goodness of fit tests using Cramer's V statistic and Kullback–Leibler divergence, respectively, are implemented to validate our model across a large collection of stocks. The model is then used to test the sensitivity of the limit order book to spoofing and derive relationships between the spoofer's constraints and their optimal decisions. These results could then be implemented by regulators as a way to flag periods of the trading day where market conditions make spoofing possible as a means to improve market surveillance.

]]>**Zoom Link TBD**

**Title:** Hybrid Symbolic-Numeric Computing in Linear and Polynomial Algebra

**Abstract:** We introduce hybrid symbolic-numeric methods which solve the approximate GCD problem for polynomials presented in Bernstein and Lagrange bases.

We adapt Victor Y. Pan’s root-based algorithm for finding approximate GCD to the case where the polynomials are expressed in Bernstein bases. We use the numerically stable companion pencil of Guðbjörn Jónsson to compute the roots, and the Hopcroft-Karp bipartite matching method to find the degree of the approximate GCD. We offer some refinements to improve the process.

We also introduce an algorithm with similar idea, which finds an approximate GCD for a pair of approximate polynomials given in a Lagrange basis. More precisely, we suppose that these polynomials are given by their approximate values at distinct known points. We first find each of their roots by using a Lagrange basis companion pencil for each polynomial. We introduce new clustering algorithms and use them to cluster the roots of each polynomial to identify multiple roots, and then marry the two polynomials using a Maximum Weight Matching algorithm, to find their GCD.

]]>Master of Science Poster Day is designed to celebrate and showcase the work done by the MSc Student in a mini conference fashion.

MSc Poster Day is a requirement of the project-based master program for graduation.

https://www.uwo.ca/stats/graduate/masters-program/master-project.html

]]>**Title:** Classification-based method for estimating dynamic treatment regimes

**Abstract:** Dynamic treatment regimes are sequential decision rules dictating how to individualize treatments to patients based on evolving treatments and covariate history. In this thesis, we investigate two methods of estimating dynamic treatment regimes. The first method extends outcome weighted learning from two-treatments to multi-treatments and allows for negative treatment outcome. We show that under two different sets of assumptions, the Fisher consistency can be maintained. The second method estimates treatment rules by a neural classification tree. A weighted squared loss function is defined to approximate the indicator function to maintain the smoothness. A method of tree reconstruction and pruning is proposed to increase the interpretability. Simulation studies and real application to data from Sequential Treatment Alternatives to Relieve Depression (STAR*D) clinical trial are conducted to illustrate the proposed methods.

**Title:** A Treatise of PD-LGD Correlation Modelling

**Abstract:** The provision in Paragraph 468 of Basel II Framework Document for calculating loss given default (LGD) requires that parameters used in Pillar I of Basel II capital estimations must be reflective of economic downturn conditions so that relevant risks are accounted for. This provision is based on the fact that the probability of default (PD) and LGD correlations are not captured in the proposed formula for estimating economic capital. To help quantify economic downturn LGD, the Basel Committee proposed establishing a functional relationship between long-run and downturn LGD.

To the best of our knowledge, the current proposed models that map out this relationship have the same underlying framework. This thesis presents a general factor PD-LGD correlation model within the conditional independence framework, where obligors’ defaults are conditional on a common state of affairs in the economy. We highlight a mistake that is frequently made in specifying loss given default, which is, current studies ignore the difference between account-level potential loss and LGD. By effecting this mistake and deriving the correct distribution of potential loss and LGD, sensitivity analysis is conducted to ascertain the impact of the defective model on economic capital and parameter estimates. The relationship between the account and portfolio level correlations are explored. Finally, empirical analysis is conducted to validate the proposed estimation scheme of parameters in the model.

]]>