Łukasz Augustyniak

Data Scientist / Machine Learning Engineer / AI Consultant

Department of Computational Intelligence, Wroclaw University of Science and Technology

Biography

Łukasz is a Data Scientist / AI Consultant with 8+ years of experience in various ML projects (social media monitoring, call center’s transcriptions analysis, recommendation engines, information extraction from texts, legal texts analysis, and many more).

An award-winning Ph.D. student at the Wroclaw University of Science and Technology is working on artificial intelligence methods to analyze natural language, especially attitude analysis, generating abstract summaries of text collections for many languages, including English and Polish.

Lecturer at scientific and business conferences related to artificial intelligence, lectures given worldwide - Poland, United States, Italy, Great Britain, Belgium, China, Vietnam, Singapore. He actively participated in many national and international scientific projects: ENGINE, RENOIR, PL-Grid, TRANSFORM, cooperating with such research units as Stanford University (USA), Rensselaer Polytechnic Institute (USA), Notre Dame University (USA), Johns Hopkins University (USA), Nanyang Technological University (Singapore), Universidad Carlos III de Madrid (Spain), Jozef Stefan Institute (Slovenia).

He is co-author of many scientific publications in attitude analysis, information retrieval, and spoken language understanding, including those published in prestigious international journals such as Entropy and Neurocomputing. He is a laureate of the governmental TOP 500 Innovators program, within the framework of which he gained knowledge on the commercialization of scientific research at Cambridge and Oxford Universities. Simultaneously, with his computer science studies, he received a master’s degree in law at the University of Wroclaw.

He received MSc in Computer Science from the Wroclaw University of Technology in 2013 with distinction. He also received an MA in Law from Wroclaw University in 2014, and he is still actively interested in the legal aspects of IT and analysis of legal documents using NLP.

He has implemented business projects such as social media monitoring (Brand24), analysis of transcripts in call centers (AVAYA, Spoken Communication), recommendation engines for marketing actions (8thlab, startup), text extraction (MeaningCloud), data analysis in automotive (Edvantis) and many others.

For several years he has also been a consultant on issues related to machine learning and Data Science.

Interests

Sentiment Analysis
Information Extraction from Texts
Legal Text Analysis
Social Media Monitoring
ASR Transcriptions Analysis
Language Modeling
Recommendation Engines
IP Law

Education

PhD in Computer Science, Artificial Intelligence
Wroclaw University of Science and Technology
Science - Management - Commercialization, 2015
Cambridge University, UK
Master in Law, 2014
Wroclaw University
MsC in Computer Science (excellent grade), 2013
Wroclaw University of Science and Technology

Skills

AI Solutions Due Diligence

AI Architecture Planning

Conversational AI

Sentiment Analysis

Information Extraction

Python

LegaL NLP

IP Law

Recommendation Engines

Social Media Analytics

Spoken Language Understading

ML Teams Hiring

Experience

Machine Learning Lead

AVAYA

Sep 2022 – Present Remote

Developed and implemented AI and Data strategies to shape the vision and roadmap of Avaya’s AI and Data capabilities. Established the data-driven culture in the ACI team by driving data-driven decision-making, data-driven product development, and showcasing data-driven customer stories.

Dialog Systems Lead in Clarin-PL-Biz Project

Wroclaw University of Science and Technology

Sep 2020 – Present Remote

I am proud to have been part of the CLARIN-PL-Biz team, responsible for building chatbot engines for the Polish language and creating a Polish NLP leaderboard and benchmarking system. We worked to develop innovative solutions that enabled us to analyze and process natural language data with greater efficiency, accuracy, and scalability.

Head of Data Science

Edvantis

May 2019 – Aug 2022 Remote

Data Science for Edvantis clients: what they do, how they do it, how Edvantis we can keep improving their products with Data Science.

defining the data science PoCs, roadmaps, pipelines, productization steps, data strategies,
creation of new data sciences capabilities for the business by envisioning and executing strategies that will influence the improvement of the business performance by enabling informed decision making,
building and leading a collaborative ML/DS team,
developing strategy and methods to ensure data collection, data quality, data annotation,
presenting results of Data Science processes and dealing directly with C-level stakeholders.

Visiting Researcher

Slovenska tiskovna agencija (STA)

Jul 2018 – Sep 2018 Ljubljana

Machine Learning Engineer (Contract)

AVAYA

Jun 2017 – Aug 2022 Remote

Spoken Communications has been acquired by AVAYA.

Design and implementation of AI-enabled solutions responsible for a reduction of transaction times, improvement of agent’s productivity, and an increase in customer satisfaction. Implementation of state-of-the-art models for spoken language understanding.

Machine Learning Engineer (Contract)

Spoken Communication

Jun 2017 – Jan 2018 Poznań / Remote

Visiting Researcher

Nanyang Technological University

Jun 2017 – Aug 2017 Singapore

Co-founder & CEO

8th lab sp. z o.o.

Sep 2016 – Aug 2018 Wrocław

8lab created predictive analytics engine that enabled retails shops and petrol stations to increase sales using hyper-personalized offers and targeted advertisements.

Visiting Researcher

Jozef Stefan Institute

Jun 2016 – Jul 2016 Ljublana

Visiting Researcher

Universidad Carlos III de Madrid

Apr 2015 – May 2015 Madrid

Natural Language Processing Engineer

MeaningCloud (Internship)

Mar 2015 – May 2015 Madrid

Research Assistant / Data Scientist

Wrocław University of Science and Technology

Aug 2013 – Sep 2019 Wrocław

Design and implementation of state-of-the-art models for machine learning problems mainly but not only in NLP/NLU area.

I am mostly working in Python if it is also needed in Spark.

I consulted several ML-related projects and technologies via Wrocław University of Science and Technology for startups, banks, venture capitalists, manufacture companies and many more.

Natural Language Processing Engineer

BRAND24

May 2013 – Jun 2013 Wrocław

Projects

AI/Data Science Consultancy

As an AI researcher, architect, and consultant, I am responsible for developing and executing AI solutions for financial, automotive, social media, government, and legal companies. With my extensive experience in consulting and due diligence of AI solutions, I have helped software houses build successful ML teams and create data science proofs of concept, roadmaps, pipelines, and productization steps.

Construction Project Health

FONN’S Data-Centric journey to create Artificial Intelligence for Construction Project Health

IoT Signal Analysis for Agricultural Machinery

Data gathering and signal analysis of milling machines

Creating a Machine Learning-Based Sales Lead Generation System

The recommendation engine for sales leads - it recommends across almost all companies globally (10M+ companies).

Vehicle information analysis

Exploration and analysis of vehicle data to improve creation process of Periodic Technical Inspections.

Avaya Conversational Intelligence

The Call Center Analytics

Featured Publications

Lukasz Augustyniak, Tomasz Kajdanowicz, Przemysław Kazienko

March 2021 Computer Speech & Language

Comprehensive analysis of aspect term extraction methods using various text embeddings

Recently, a variety of model designs and methods have blossomed in the context of the sentiment analysis domain. However, there is still a lack of wide and comprehensive studies of aspect-based sentiment analysis (ABSA). We want to fill this gap and propose a comparison with ablation analysis of aspect term extraction using various text embedding methods. We particularly focused on architectures based on long short-term memory (LSTM) with optional conditional random field (CRF) enhancement using different pre-trained word embeddings. Moreover, we analyzed the influence on performance of extending the word vectorization step with character embedding. The experimental results on SemEval datasets revealed that not only does bi-directional long short-term memory (BiLSTM) outperform regular LSTM, but also word embedding coverage and its source highly affect aspect detection performance. An additional CRF layer consistently improves the results as well.

PDF DOI

Lukasz Augustyniak, Krzysztof Rajda, Tomasz Kajdanowicz, Michał Bernaczyk

July 2020 ACL 2020, WiNLP, Proceedings of the The Fourth Widening Natural Language Processing Workshop

Political Advertising Dataset: the use case of the Polish 2020 Presidential Elections

Political campaigns are full of political ads posted by candidates on social media. Political advertisements constitute a basic form of campaigning, subjected to various social requirements. We present the first publicly open dataset for detecting specific text chunks and categories of political advertising in the Polish language. It contains 1,705 human-annotated tweets tagged with nine categories, which constitute campaigning under Polish electoral law. We achieved a 0.65 inter-annotator agreement (Cohen′s kappa score). An additional annotator resolved the mismatches between the first two annotators improving the consistency and complexity of the annotation process. We used the newly created dataset to train a well established neural tagger (achieving a 70% percent points F1 score). We also present a possible direction of use cases for such datasets and models with an initial analysis of the Polish 2020 Presidential Elections on Twitter.

PDF DOI

Lukasz Augustyniak, Piotr Szymánski, Mikolaj Morzy, Piotr Zelasko, Adrian Szymczak, Jan Mizgajski, Yishay Carmiel, Najim Dehak

January 2020 Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

Automatic Speech Recognition (ASR) systems introduce word errors, which often confuse punctuation prediction models, turning punctuation restoration into a challenging task. These errors usually take the form of homonyms. We show how retrofitting of the word embeddings on the domain-specific data can mitigate ASR errors. Our main contribution is a method for better alignment of homonym embeddings and the validation of the presented method on the punctuation prediction task. We record the absolute improvement in punctuation prediction accuracy between 6.2% (for question marks) to 9% (for periods) when compared with the state-of-the-art model.

PDF DOI

Piotr Szymánski, Piotr Zelasko, Mikolaj Morzy, Adrian Szymczak, Marzena Zyla-Hoppe, Joanna Banaszczak, Lukasz Augustyniak, Jan Mizgajski, Yishay Carmiel

January 2020 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16-20 November 2020

WER we are and WER we think we are

Natural language processing of conversational speech requires the availability of high-quality transcripts. In this paper, we express our skepticism towards the recent reports of very low Word Error Rates (WERs) achieved by modern Automatic Speech Recognition (ASR) systems on benchmark datasets. We outline several problems with popular benchmarks and compare three state-of-the-art commercial ASR systems on an internal dataset of real-life spontaneous human conversations and HUB'05 public benchmark. We show that WERs are significantly higher than the best reported results. We formulate a set of guidelines which may aid in the creation of real-life, multi-domain datasets with high quality annotations for training and testing of robust ASR systems.

PDF

Roman Bartusiak, Łukasz Augustyniak, Tomasz Kajdanowicz, Przemysław Kazienko, Maciej Piasecki

January 2019 Neurocomputing

WordNet2Vec: Corpora agnostic word vectorization method

The complex nature of big data resources requires new structuring methods, especially for textual content. WordNet is a good knowledge source for the comprehensive abstraction of natural language as it offers good implementation for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism, WordNet2Vec, is proposed in this paper. This creates vectors for each word from WordNet. These vectors encapsulate a general position — the role of a given word related to all other words in the given natural language. Any list or set of such vectors contains knowledge about the context of its components within the whole language. This type of word representation can be easily applied to many analytic tasks such as classification or clustering. The usefulness of the WordNet2Vec method is demonstrated in sentiment analysis including the classification of an Amazon opinion text dataset with transfer learning.

DOI

Lukasz Augustyniak, Piotr Szymánski, Tomasz Kajdanowicz, Wlodzimierz Tuliglowicz

January 2016 Entropy

Comprehensive study on lexicon-based ensemble classification sentiment analysis

We propose a novel method for counting sentiment orientation that outperforms supervised learning approaches in time and memory complexity and is not statistically significantly different from them in accuracy. Our method consists of a novel approach to generating unigram, bigram and trigram lexicons. The proposed method, called frequentiment, is based on calculating the frequency of features (words) in the document and averaging their impact on the sentiment score as opposed to documents that do not contain these features. Afterwards, we use ensemble classification to improve the overall accuracy of the method. What is important is that the frequentiment-based lexicons with sentiment threshold selection outperform other popular lexicons and some supervised learners, while being 3-5 times faster than the supervised approach. We compare 37 methods (lexicons, ensembles with lexicon’s predictions as input and supervised learners) applied to 10 Amazon review data sets and provide the first statistical comparison of the sentiment annotation methods that include ensemble approaches. It is one of the most comprehensive comparisons of domain sentiment analysis in the literature.

PDF DOI

Recent Publications

Quickly discover relevant content by filtering publications.

Lukasz Augustyniak, Tomasz Kajdanowicz, Przemysław Kazienko (2021). Comprehensive analysis of aspect term extraction methods using various text embeddings. Computer Speech & Language.

PDF DOI

Lukasz Augustyniak, Tomasz Kajdanowicz, Przemyslaw Kazienko (2020). Graph-based approach to Unsupervised Aspect Hierarchies Extraction and Sentimental Summarization. in review.

Lukasz Augustyniak, Krzysztof Rajda, Tomasz Kajdanowicz, Michał Bernaczyk (2020). Political Advertising Dataset: the use case of the Polish 2020 Presidential Elections. ACL 2020, WiNLP, Proceedings of the The Fourth Widening Natural Language Processing Workshop.

PDF DOI

Lukasz Augustyniak, Piotr Szymánski, Mikolaj Morzy, Piotr Zelasko, Adrian Szymczak, Jan Mizgajski, Yishay Carmiel, Najim Dehak (2020). Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?. Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020.

PDF DOI

Piotr Szymánski, Piotr Zelasko, Mikolaj Morzy, Adrian Szymczak, Marzena Zyla-Hoppe, Joanna Banaszczak, Lukasz Augustyniak, Jan Mizgajski, Yishay Carmiel (2020). WER we are and WER we think we are. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16-20 November 2020.

PDF

Lukasz Augustyniak, Tomasz Kajdanowicz, Przemyslaw Kazienko (2019). Aspect Detection using Word and Char Embeddings with (Bi) LSTM and CRF. 2nd IEEE International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2019, Sardinia, Italy, June 3-5, 2019.

PDF DOI

Jan Mizgajski, Adrian Szymczak, Robert Glowski, Piotr Szymánski, Piotr Zelasko, Lukasz Augustyniak, Mikolaj Morzy, Yishay Carmiel, Jeff Hodson, Lukasz Wójciak, Daniel Smoczyk, Adam Wróbel, Bartosz Borowik, Adam Artajew, Marcin Baran, Cezary Kwiatkowski, Marzena Zyla-Hoppe (2019). Avaya Conversational Intelligence: A Real-Time System for Spoken Language Understanding in Human-Human Call Center Conversations. Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019.

PDF

P. Zelasko, M. Morzy, P. Szymański, J. Mizgajski, A. Szymczak, Ł. Augustyniak, Y. Carmiel (2019). Towards better understanding of spontaneous conversations: Overcoming automatic speech recognition errors with intent recognition. arXiv.

Roman Bartusiak, Łukasz Augustyniak, Tomasz Kajdanowicz, Przemysław Kazienko, Maciej Piasecki (2019). WordNet2Vec: Corpora agnostic word vectorization method. Neurocomputing.

DOI

Łukasz Augustyniak, Tomasz Kajdanowicz, Przemyslaw Kazienko (2018). Extracting Aspects Hierarchies Using Rhetorical Structure Theory. Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence.

PDF DOI

Lukasz Augustyniak, Krzysztof Rajda, Tomasz Kajdanowicz (2017). Method for Aspect-Based Sentiment Annotation Using Rhetorical Analysis. Intelligent Information and Database Systems - 9th Asian Conference, ACIIDS 2017, Kanazawa, Japan, April 3-5, 2017, Proceedings, Part I.

PDF DOI

Lukasz Augustyniak, Piotr Szymánski, Tomasz Kajdanowicz, Wlodzimierz Tuliglowicz (2016). Comprehensive study on lexicon-based ensemble classification sentiment analysis. Entropy.

PDF DOI

Roman Bartusiak, Lukasz Augustyniak, Tomasz Kajdanowicz, Przemyslaw Kazienko (2015). Sentiment Analysis for Polish Using Transfer Learning Approach. Proceedings - 2nd European Network Intelligence Conference, ENIC 2015.

PDF DOI

Lukasz Augustyniak, Tomasz Kajdanowicz, Przemyslaw Kazienko, Marcin Kulisiewicz, Wlodzimierz Tuliglowicz (2014). An approach to sentiment analysis of movie reviews: Lexicon based vs. classification. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).

PDF DOI

Andrzej Misiaszek, Przemysław Kazienko, Marcin Kulisiewicz, Łukasz Augustyniak, Włodzimierz Tuligłowicz, Adrian Popiel, Tomasz Kajdanowicz (2014). Belief propagation method for word sentiment in wordnet 3.0. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).

DOI

Contact

Staszica 12, Jelenia Góra, 58-560
DM Me
Skype Me

+−