- On Many-Actions Policy Gradient
- Michał Nauman
We are researching the properties of Policy Gradients, one of the algorithms powering applications like ChatGPT or AlphaGo. Our work involves deriving a theorem that can help determine the optimal configuration of the Policy Gradient algorithm. By doing so, we can save computational resources that would otherwise be expended on discovering this configuration. Additionally, our theoretical insights can be applied to speed up the training of simulated robots!
- Concealing the network
- Paweł Ciosmak
Identyfikacja tożsamości lidera siatki przestępczej ma kluczowe znaczenie dla organów ścigania. Liderzy mogą zaplanować strukturę swojej siatki w taki sposób, aby jak najskuteczniej ukryć własną tożsamość. Analizujemy, w jaki sposób takie podejście wpływa na strukturę takich siatek poprzez wybór kanałów komunikacji pomiędzy ich członkami. W tym celu wykorzystujemy metody teorii gier Stackelberga. Badamy trzy potencjalne scenariusze, przestawiając odpowiednie algorytmy w każdym z nich oraz dowodzimy NP-trudności omawianego zagadnienia. Projekt powstał we współpracy z Michałem Godziszewskim oraz Zuzanną Bączek.
- Calculating shortest paths in changing graphs
- Adam Karczmarz
Wiele sieci istniejących w świecie rzeczywistym podlega niewielkim, ale częstym zmianom. Czy zawsze jesteśmy w stanie obliczać rozwiązanie problemu obliczeniowego po małej zmianie sieci (np. dodaniu lub usunięciu krawędzi) istotnie szybciej, niż zaczynając od nowa? W naszych badaniach zajęliśmy się problemem obliczania najkrótszych ścieżek w takim dynamicznym scenariuszu. Opowiem o kilku uzyskanych przez nas ostatnio teoretycznych wynikach w tym zakresie.
- Context Scaling for Large Language Models
- Konrad Staniszewski
Large language models have an exceptional capability to incorporate new information in a contextual manner. However, the full potential of such an approach is often restrained due to a limitation in the effective context length. One solution to this issue is to endow an attention layer with access to an external memory, which comprises of (key, value) pairs. Yet, as the number of documents increases, the proportion of relevant keys to irrelevant ones decreases, leading the model to focus more on the irrelevant keys.
We identify a significant challenge, dubbed the distraction issue, where keys linked to different semantic values might overlap, making them hard to distinguish. To tackle this problem, we introduce the Focused Transformer (FoT), a technique that employs a training process inspired by contrastive learning. This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length.
Our method allows for fine-tuning of pre-existing, large-scale models to lengthen their effective context. This is demonstrated by our fine-tuning of 3B and 7B OpenLLaMA checkpoints. The resulting models, which we named LongLLaMA, exhibit advancements in tasks requiring a long context. We further illustrate that our LongLLaMA models adeptly manage a 256k context length for passkey retrieval.
- Algorithm BERG: Bayes Experts of Regularized Gaussians for Continual Learning
- Grzegorz Rypeść
Many recent methods dealing with the problem of exemplar-free class-incremental learning rely on a pre-trained backbone model or a robust feature extractor trained on a significant portion of data during the initial task. Two factors constrain the applicability of these approaches. Firstly, they necessitate the collection of a substantial amount of data before incremental steps can be taken. Secondly, they operate under the silent assumption that subsequent tasks stem from a similar distribution, such as being derived from the same split dataset.
To overcome these limitations, we propose a novel method called BERG. It leverages an ensemble of experts, which are submodules of a neural network trained independently to perform Bayesian classification. Each expert builds internal representations of the data classes encountered thus far using regularized Gaussian distributions. During inference, the experts perform an ensemble of Bayes classifiers.
- Online matching with delays and stochastic arrivals
- Runtian Ren
We present a new research direction on the problem of online minimum-cost perfect matching with delays, which is motivated by pairing online gaming requests into sessions. Previously, the problem was studied in a worst-case online scenario, which is usually not the case in reality. In our work, we assume the requests follow a practical Poisson arrival model. Under such a model, we show that an intuitive greedy method which pairs up two pending requests once their total delay penalties exceed their distance, achieves good performance compared with the optimal solution. We hope that our work can motivate the following-up study on the other online network design problems with practical stochastic arrivals.
- Deep reinforcement learning-based approach towards effective cellular reprogramming
- Andrzej Mizera
Cellular reprogramming, that is, the artificial changing of cell fate, has been drawing increasing research attention for its therapeutic potential in treating the most complex diseases characterised by malfunctioning cells. It is believed to ultimately facilitate both the prevention and cure of complex diseases, amongst which neurodegenerative disorders and cancer are presumably the most common. This can be achieved by steering living cells into the ‘healthy’ states.
Unfortunately, finding effective interventions that trigger desired changes in biological cells using solely classical wet-lab experiments is difficult, costly, and requires lengthy time commitments. In this presentation, we will discuss our developments of computational methods for the identification of control strategies for the reprogramming of gene regulatory networks (GRNs) in biological cells and our current new approach based on deep reinforcement learning towards scalable and effective reprogramming of GRNs of realistic sizes.
- Countering Autonomous Distributed Drone Swarms to Protect Sensitive Installations
- Noor Ullah
We propose an algorithm to disrupt malicious drone swarms targeting specific sensitive locations, uniquely focusing on autonomous, homogeneous UAVs. Unlike conventional methods, our approach strategically identifies and eliminates critical nodes, thereby increasing inter-UAV distances and causing a quadratic decay in communication. This fragmentation method is particularly effective against sophisticated swarming algorithms. Empirical evaluations reveal marked superiority in terms of speed, scalability, and complexity. By emphasizing the geometric relationships and distributed control within the swarm, our algorithm provides a novel and robust countermeasure against the emergent threat of autonomous UAV formations, demonstrating significant performance gains and real-world applicability. The proposed solution advances the state-of-the-art in drone-swarm mitigation, offering enhanced protection against malicious UAV attacks.