Asya

Towards Natural-Sounding Text to Speech in English

2024

Kriss Saulitis, Evalds Urtans, Vairis Caune

This study focuses on a systematic review of the literature and an experimental comparison of 20 English speech synthesis methods. Nine of the models were subjected to a quantitative analysis, using selected samples from the Common Voice data set and using criteria to assess both quality and precision. The research methodology includes the configuration of speech synthesis models to generate audio samples, which are then used to compare models based on established criteria. The NISQA model is used to evaluate speech quality through machine learning, mimicking the subjective MOS metric. Character and word error rate metrics are used to evaluate the precision of the synthesized samples. The CoMoSpeech model showed the best quality indicators (MOS - 3.85), while the VITS model demonstrated the highest precision (CER - 1.48\%) and the total average of the metric.

Download

Using Large Language Models to Improve Sentiment Analysis in Latvian Language

2024

Pauls Purvins, Evalds Urtans

This empirical study explores the use of large language models (LLMs) in sentiment analysis and presents a new approach to creating a dataset in Latvian language using Reddit data. Using prompt engineering for the GPT-3.5-turbo model (latest at the time of writing), we achieved 82\% accuracy that exceeds previous research on Latvian Tweet Sentiment Corpus by 50\% in three class sentiment classification. We also demonstrate that LLMs can partially replace human labelers, making data set creation more cost-effective, especially for larger datasets. This work contributes to sentiment analysis in non-English languages, leveraging the power of LLMs. The paper introduces a new LVReddit dataset that contains more than 90000 samples, making it the largest available sentiment dataset for the Latvian language. Our findings confirm the LLM's underlying "understanding" of language. However, LLMs occasionally deviate from response templates, making parsing challenging. Future research should investigate fine-tuned models based on novel datasets and analyze language patterns.

Download

LSTM rollout curriculum using double pendulum

2023

Reins Freibergs, Evalds Urtans, Ansis Ecis, Henrik Gabrielyan

In this work, we model a double pendulum system with deep neural networks based on a data set generated from video recordings. For comparison, a similar model is made by describing the system with differential equations. Actually compared are the capabilities of both models in predicting the next 2s of double pendulum motion using information about the previous second. In addition, both models are compared by their ability to make predictions in specific error margins. Results show that deep learning-based approaches give much better predictions, where the best deep learning-based model could predict the next 1.5s in a specified error margin, while the best differential equation-based one only 0.12s, all other metrics agree with this result as well.

Download

Primed UNet-LSTM for weather forecasting

2023

Kristofers Volkovs, Evalds Urtans, Vairis Caune

In this work, we model a double pendulum system with deep neural networks based on a data set generated from video recordings. For comparison, a similar model is made by describing the system with differential equations. Actually compared are the capabilities of both models in predicting the next 2s of double pendulum motion using information about the previous second. In addition, both models are compared by their ability to make predictions in specific error margins. Results show that deep learning-based approaches give much better predictions, where the best deep learning-based model could predict the next 1.5s in a specified error margin, while the best differential equation-based one only 0.12s, all other metrics agree with this result as well.

Download

Noise-Based and Class-Based Curriculum Learning for Image Classifiers

2023

Ēvalds Urtāns, Yue Li

Datasets often contain different difficulty samples and even noisy samples. This paper introduces two naive curriculum learning methods, one using an image dataset with noise and another one using an image dataset that contains samples from other datasets with presumed higher difficulty. The final goal is to improve the performance of the model by gradually introducing more difficult samples during the training process rather than using them from the very beginning. Experiments demonstrated that using the proposed curriculum learning methods, a classifier can achieve higher accuracy in less training epochs.

Download

Detection of Knots in Oak Wood Planks: Instance Versus Semantic Segmentation

2022

Urtāns, Ē., Būmanis, K., Vēciņš, V., Ancāns, M., Andrijanova, A., Upenieks, M., Volkovs, K.

In this study, we present a new dataset of knotcovered oak planks. It contains 1500 images that have 1 to 11 knots per image, along with mask and bounding-box annotations. The data set was evaluated using deep machine learning methods, and it has been found that instance segmentation models are superior in this task, achieving 59% Box-IoU versus 49% Box-IoU using semantic segmentation. Instance segmentation performed better to detect knots by segmenting instances with an accuracy of 90%, while semantic segmentation detected konts with an accuracy of 89%.

Download

Let's Put a Smile on Your Face

2022

Ēvalds Urtāns, Mārcis Teodors Upenieks

This study compares three methods for the facial expression transfer task using adversarial generative networks. The facial expression transfer task aims to transfer one human facial emotion to another emotion without affecting the identifying features of the human face. The image-to-image method uses the whole image for style transfer, and the novel face-to-face and parts-to-parts methods use only segmented features of the face to do the style transfer. The results show that the image-to-image method achieves a precision of 69.7% and an FID score of 21.67, face-to-face achieves a precision of 78.6% and an FID score of 17.6, and finally the parts-to-parts method achieves a precision of 97.8% and an FID score of 17.37 to transfer emotion from neutral to happiness in a photo of a face.

Download

Bidirectional Long Short-Term Memory Networks for Automatic Crop Classification at Regional Scale using Tabular Remote Sensing Time Series

2022

Ēvalds Urtāns, Harijs Ijabs

With the arrival of European Union’s new Common Agricultural Policy (CAP 2020), a paradigm shift in subsidy control is underway. Member states are required to gradually transition from a system of on-the-spot checks, where the presence or absence of a crop is detected manually on the field, to a system of agricultural monitoring based on remote sensing data; primarily – Sentinel-1 and Sentinel-2. This paper presents a classification of regional crop types based on the Bidirectional Long-Short-Term Memory (BiLSTM) network. The approach is based on tabular time series of Sentinel-1 and Sentinel-2 sensor data over the entire territory of Latvia. Two types of LSTM architectures are evaluated in this paper – regular and bidirectional. An exhaustive grid search of network hyperparameters with 15 distinct crop types led to the conclusion that the bidirectional variant of LSTM yields the highest overall weighted test accuracy of 89.1%.

Exponential Triplet Loss

2020

Urtāns, Ē., Ņikitenko, A., Vēciņš, V.

This paper introduces a novel variant of the Triplet Loss function that converges faster and gives better results. This function can separate class instances homogeneously through the whole embedding space. With Exponential Triplet Loss function we also introduce a novel type of embedding space regularization Unit-Range and Unit-Bounce that utilizes euclidean space more efficiently and resembles features of the cosine distance. We also examined factors for choosing the best embedding vector size for specific embedding spaces. Finally, we also demonstrate how new function can train models for one-shot learning and re-identification tasks.

Download

Value Iteration Solver Networks

2020

Urtāns, Ē., Vēciņš, V.

Value Iteration Algorithm is iterative and can't be parallelized. Computation time grows exponentially when the size of the input maps is increased. We propose UNet-RNN-Skip artificial neural network architecture that can be used to parallelize Value Iteration Algorithm results. The proposed model can solve Value Iteration problem in fewer iterations than the original algorithm and computation time increases by only a small amount when increasing the size of the input map. Fundamental UNet-RNN-Skip architecture can be used also to solve and parallelize other sequential problems. With this paper synthetic dataset of maps and generator has been published to enable further studies in mapping and path planning tasks.

Download

asya: Mindful verbal communication using deep learning

2020

Evalds Urtans, Ariel Tabaks

asya is a mobile application that consists of deep learning models which analyze spectra of a human voice and do noise detection, speaker diarization, gender detection, tempo estimation, and classification of emotions using only voice. All models are language agnostic and capable of running in real-time. Our speaker diarization models have accuracy over 95% on the test data set. These models can be applied for a variety of areas like customer service improvement, sales effective conversations, psychology and couples therapy.

Download

Survey of Deep Q-Network Variants in PyGame Learning Environment

2018

Urtāns, Ē., Ņikitenko, A.

Q-value function models based on variations of Deep Q-Network (DQN) have shown good results in many virtual environments. In this paper, over 30 sub-algorithms were surveyed that influence the performance of DQN variants. Important stability and repeatability aspects of state of art Deep Reinforcement Learning algorithms were found. Multi Deep Q-Network (MDQN) as a generalization of popular Double Deep Q-Network (DDQN) algorithm was developed. Visual representations of a learning process as Q-Value maps were produced using PyGame Learning Environment.

Download

Active Infrared Markers for Augmented and Virtual Reality

2016

Ēvalds Urtāns

Our research is proposing an algorithm and technical implementation of a system that can recognize positions and orientations of IR (Infrared) LED (Light Emitting Diode) markers that are invisible to a naked human eye. The system is made to work with Oculus Rift DK2 head-mounted display coupled with a 111 Hz IRLeap Motion camera. It adds functionality to these devicesby allowing them to track different kinds of objects using active IR markers. Up to now, the most common way for tracking markers for augmented reality were using fiducial markers that are visible to human eye orother static markers. Such marker systems require flat surface for attaching a sticker of a fiducial marker. With our system it is possible to create invisible markers and they can be attached to objects without flat surfaces. As aproof of the conceptwe made a virtual ping-pong game where the player uses a physical table tennis paddle fitted with active IR markers. At the time of the research, no such tracking systems were available in public domain. Currently similar commercial systems are in development by Oculus and other companies. Limitations of hardware that we found in our research might be one of the main reasons why the commercial product is not yet completed. In case of Oculus Rift’s IRcamera, it has insufficient 30 Hz frame-rate to support active IR markers. We found that even with the capture rate of 111 Hz the proposed system works 3 times slower than conventional fiducial markers, but starting at this frame rate it can be applied for real-time applications.

Download

Chapter 3: ChromSword®: Software for Method Development in Liquid Chromatography

2018

Galushko, S., Urtāns, Ē.

Method development in chromatography can be considered as a process studying the empirical relationships between the quality of a chromatogram and the chromatographic conditions. A chromatographer changes conditions to find an acceptable method to achieve a separation in a reasonable time. The time required to find optimal conditions or to make any conclusion can be substantially reduced by using computer programs for method development. HPLC method development programs can be utilized interactively (off-line) and for automatic optimization (on-line). ChromSword for off-line computer assisted method development was launched in 1994 as an extension of ChromDream software [1]. In 1998 – 2000 the first version for unattended method development started [2]. The latest version of ChromSword combines different technologies of method development in one software platform.

Download

Association of Tumor Necrosis Factor-α (TNFα) Gene Polymorphisms with HLA Class II Alleles in HIV/AIDS Patients

2016

Eglite, J., Golushko, J., Urtāns, Ē.

Aim of Study: To identify the factors of molecular genetic risks during the development of infection in HIV, based on the TNFα cytokine gene polymorphism in combination with HLA DRB1/DQA1/DQB1 genes, as well as to analyse their possible association with the progress of the disease. 185 HIV infected patients and 173 individuals control group have been analysed. The DNA was extracted from peripheral blood, by using QiagenQIAamp DNA kit reagents. The quality and quantity of DNA was checked by using Qubit ® fluorometer HLA typing for HLA DRB1/DQB1/DQA1* was performed by RT-PCR with sequence-specific primers (SSO). TNFα gene G–238A and G–308A polymorphic variant incidence was determined by RT-PCR analysis. Results: We have detected TNFα gene allele 308A in 11% HIV infected patients, whereas in control group this allele have been detected only in 4% patients. Although the incidence of the TNFα gene –238A allele was twice as high in the control group (6%) as in the HIV infected patients (3%), it did not prove to be statistically valid (p = 0.253). The incidence analysis of three-locus haplotypes DRB1-DQB1-DQA1 – in TNFα position-238A/G -308A/G showed that haplotypes 01:01/05:01/01:01-TNFα-238(GA)/308(GG) and 01:01/03:02/03:01 - TNFα-238(AA)/308(GG) are more frequent in the control group in comparison to the groups of infected patients. This means that these haplotypes have a protective function, which significantly affects the progress of infection. The association of 15:01/05:01/01:01 -TNFα-238(GG)/308(GG) and 03:01/05:01/01:01- TNFα-238(GG)/308(GА) genotypes indicates a high risk of developing a fulminant infection. The genetic factors of AIDS-related complex of syndromes development are associated not only with the HLA complex class II alleles, but also with the SNP polymorphism in the promoter region of cytokine genes.

Download

Publications

We work on some of the
most complex challenges in AI.

Towards Natural-Sounding Text to Speech in English

Using Large Language Models to Improve Sentiment Analysis in Latvian Language

LSTM rollout curriculum using double pendulum

Primed UNet-LSTM for weather forecasting

Noise-Based and Class-Based Curriculum Learning for Image Classifiers

Detection of Knots in Oak Wood Planks: Instance Versus Semantic Segmentation

Let's Put a Smile on Your Face

Bidirectional Long Short-Term Memory Networks for Automatic Crop Classification at Regional Scale using Tabular Remote Sensing Time Series

Exponential Triplet Loss

Value Iteration Solver Networks

asya: Mindful verbal communication using deep learning

Survey of Deep Q-Network Variants in PyGame Learning Environment

Active Infrared Markers for Augmented and Virtual Reality

Chapter 3: ChromSword®: Software for Method Development in Liquid Chromatography

Association of Tumor Necrosis Factor-α (TNFα) Gene Polymorphisms with HLA Class II Alleles in HIV/AIDS Patients

Publications

We work on some of the most complex challenges in AI.

Towards Natural-Sounding Text to Speech in English

Using Large Language Models to Improve Sentiment Analysis in Latvian Language

LSTM rollout curriculum using double pendulum

Primed UNet-LSTM for weather forecasting

Noise-Based and Class-Based Curriculum Learning for Image Classifiers

Detection of Knots in Oak Wood Planks: Instance Versus Semantic Segmentation

Let's Put a Smile on Your Face

Bidirectional Long Short-Term Memory Networks for Automatic Crop Classification at Regional Scale using Tabular Remote Sensing Time Series

Exponential Triplet Loss

Value Iteration Solver Networks

asya: Mindful verbal communication using deep learning

Survey of Deep Q-Network Variants in PyGame Learning Environment

Active Infrared Markers for Augmented and Virtual Reality

Chapter 3: ChromSword®: Software for Method Development in Liquid Chromatography

Association of Tumor Necrosis Factor-α (TNFα) Gene Polymorphisms with HLA Class II Alleles in HIV/AIDS Patients

We work on some of the
most complex challenges in AI.