DataConf is a casual data science conference by DataHack.
This is a list of agendas and resources of past and upcoming iterations of the conference.
You can find us on our website, Facebook, Meetup, Twitter and join our monthly newsletter.
Agendas:
- DataConf 2016
- Prof. Shai Shalev Shwartz, The Hebrew University of Jerusalem - Reinforcement Learning
- Shahar Weinstock, Intel - Customers discovery using machine learning
- Adi Nesher, PayPal - Defining the right label: How to create a valuable population tag in situations of uncertainty
- Assaf Feldman, CTO, Riskified - Feature engineering at scale
- Olha Shainoha, Wix - Site topic classification at Wix
- Amalia Bryl & Shahar Wilner, EDvantage - Can machine learning empower education?
- Dr. Yoram Gdalyahu, VP algorithms, Mobileye - Autonomous Driving and Crowd Mapping
- Guy Adini, Istra - Toy Models for Algorithmic Trading
- DataConf 2017
- Yakov Shambik, Mobileye - Eye of the Beholder: Object detection in Mobileye using Deep Neural Networks and other techniques
- Ofer Ron, LivePerson - Concepts before machinery: Harnessing the power of domain expertise for machine-learning-based solutions
- Alex Ran, Intuit - Using Data Science for Automated Accounting
- Meir Maor, SparkBeyond - Developing Simple and Stable Machine Learning Models
- Roii Spoliansky, PayPal - Active learning optimization as a function of label cost and mistake cost
- Gil Chamiel, Taboola - Don’t believe everything your network tells you: Uncertainty in deep learning for recommender systems
- Adina Lederhendler, Neura - General vs. subpopulation-specific modeling: When and why you need to get specific
- Yonatan Wexler, Orcam - Fast and Furious Face Recognition: Efficient metric learning for video stream data
- Itamar Ben-Ari, Research Scientist @ Intel - Differentiable Memory Allocation Mechanism For Neural Computing
- Dr. Oshri, Rafael - Multi-agent deep reinforcement learning in communication networks
- DataConf 2018
- Dana Kaner, Perimter X - Bootstrap, Random Forest and ll sorts of magic
- Pavel Levin, Booking.com - Where should I travel next? Modeling multi-destination trips with Recurrent Neural Networks.
- Ari Bornstien, Microsoft - Beyond Word Embeddings
- Dr. Michal Shmueli-Scheuer, IBM Research - Conversational bots for customer support
- Nofar Betzalel, Paypal - Semi-Supervised Learning for Tagging Coverage Extension
- Dr. Lev Faivishevsky, Intel Advanced Analytics - Using Deep-Learning to Detect Video distortions
- Prof. Danny Pfeffermann, Central Bureau for Statistics - Can Big Data Really Replace Traditional Surveys for theProduction of Official Statistics?
- Avi Hendler-Bloom, MobilEye - Overcoming the Electronic Traffic Sign Problem
- Daniel Benzaquen, Lightricks - AB testing at Scale
- Gil Chamiel, Taboola - Deep And Shallow Learning in Recommendation Systems
- Oren Shamir, Innoviz Technologies - Neural networks for point clouds: adding the 3rd dimension
Facebook event page: https://www.facebook.com/events/555816034629213
Meetup event page: https://www.meetup.com/preview/DataHack/events/234096133
Adi Nesher, PayPal - Defining the right label: How to create a valuable population tag in situations of uncertainty
Speaker: Adi Nesher, PayPal
Title: Defining the right label: How to create a valuable population tag in situations of uncertainty
Speakers: Amalia Bryl & Shahar Wilner, EDvantage
Title: Can machine learning empower education?
Held on Thursday, October 26th, between 09:00 and 18:00, DataConf 2017 drew a crowd of over 100 data science and machine learning experts from the top companies in Israel for a day of knowledge sharing.
Event website: http://dataconf.org/
Meetup event page: https://www.meetup.com/DataHack/events/244004618/
Facebook event page: https://www.facebook.com/events/1623405514382356/
Yakov Shambik, Mobileye - Eye of the Beholder: Object detection in Mobileye using Deep Neural Networks and other techniques
Speaker: Yakov Shambik, Vehicles Detection Technology Manager @ Mobileye
Title: Eye of the Beholder: Object detection in Mobileye using Deep Neural Networks and other techniques
Ofer Ron, LivePerson - Concepts before machinery: Harnessing the power of domain expertise for machine-learning-based solutions
Speaker: Ofer Ron, Head of Data Science @ LivePerson
Title: Concepts before machinery: Harnessing the power of domain expertise for machine-learning-based solutions
Video: https://www.youtube.com/watch?v=wR2u7V8D5Y8&list=PLZYkt7161wELbPfqY92vAEmKVhsyxg5Nk&index=3
Speaker: Alex Ran, Distinguished Engineer @ Intuit
Title: Using Data Science for Automated Accounting
Video: https://www.youtube.com/watch?v=_ZBos8T35D0&list=PLZYkt7161wELbPfqY92vAEmKVhsyxg5Nk&index=2
Slides: https://github.com/DataHackIL/DataConf/blob/master/DataConf_2017/DataConf_2017_Intuit_Alex_Ran.pdf
Speaker: Meir Maor, Chief Architect @ SparkBeyond
Title: Developing Simple and Stable Machine Learning Models
Speaker: Roii Spoliansky, Lead Data Scientist @ PayPal
Title: Active learning optimization as a function of label cost and mistake cost
Gil Chamiel, Taboola - Don’t believe everything your network tells you: Uncertainty in deep learning for recommender systems
Speaker: Gil Chamiel, Director of Data Science and Algorithms @ Taboola
Title: Don’t believe everything your network tells you: Uncertainty in deep learning for recommender systems
Adina Lederhendler, Neura - General vs. subpopulation-specific modeling: When and why you need to get specific
Speaker: Adina Lederhendler, Senior Data Scientist @ Neura
Title: General vs. subpopulation-specific modeling: When and why you need to get specific
Video: https://www.youtube.com/watch?v=ft36Tq5FUz0&list=PLZYkt7161wELbPfqY92vAEmKVhsyxg5Nk&index=4
Yonatan Wexler, Orcam - Fast and Furious Face Recognition: Efficient metric learning for video stream data
Speaker: Yonatan Wexler, VP R&D @ Orcam
Title: Fast and Furious Face Recognition: Efficient metric learning for video stream data
Itamar Ben-Ari, Research Scientist @ Intel - Differentiable Memory Allocation Mechanism For Neural Computing
Speaker: Itamar Ben-Ari, Research Scientist @ Intel
Title: Differentiable Memory Allocation Mechanism For Neural Computing
Video: https://www.youtube.com/watch?v=DAHTNElXXgk&list=PLZYkt7161wELbPfqY92vAEmKVhsyxg5Nk&index=4
Speaker: Dr. Oshri, Senior Research Scientist @ Rafael
Title: Multi-agent deep reinforcement learning in communication networks
Slides: https://github.com/DataHackIL/DataConf/blob/master/DataConf_2017/DataConf_2017_Rafael.pdf
Held on Thursday, October 4th, between 09:00 and 18:00, DataConf 2018 drew a crowd of over a 100 data science and machine learning experts from the top companies in Israel for a day of knowledge sharing.
YouTube playlist: https://www.youtube.com/playlist?list=PLZYkt7161wEIjQOuWA93Tt4JS8DgCyz53
Event website: http://dataconf.org/
Meetup event page: https://www.meetup.com/DataHack/events/255082526/
Facebook event page: https://www.facebook.com/events/1967922793269453/
Speaker: Dana Kaner, Data Scientist @ Perimeter X
Title: Bootstrap, Random Forest and All Sorts of Magic
Video: https://www.youtube.com/watch?v=ynkJVd6B13U
Abstract: The Bootstrap resampling method is often used for statistical inference. We demonstrate its power and simplicity through the well known Random Forest algorithm. We present both the theoretical background on the above topics and an implementation in R.
Pavel Levin, Booking.com - Where should I travel next? Modeling multi-destination trips with Recurrent Neural Networks.
Speaker: Pavel Levin, Senior Data Scientist @ Booking.com
Title: Where should I travel next? Modeling multi-destination trips with Recurrent Neural Networks.
Video: https://www.youtube.com/watch?v=pwfwUA4ZShI&t=0s&index=5&list=PLZYkt7161wEIjQOuWA93Tt4JS8DgCyz53
Abstract: Many real-world problems naturally give rise to sequential data. Language models are already widely used to tackle computational problems related to natural language. We would like to present a non-NLP example by walking through a solution to the problem of recommending next destinations to customers who are taking a single trip to multiple cities using RNN-based sequence modeling.
Speaker: Ari Bornstien, Senior Cloud Developer Advocate @ Microsoft
Title: Beyond Word Embeddings
Video: https://www.youtube.com/watch?v=zeYwMIDo05w&t=0s&list=PLZYkt7161wEIjQOuWA93Tt4JS8DgCyz53&index=6
Abstract: Since the advent of word2vec, word embeddings have become a go to method for encapsulating distributional semantics in NLP applications. This presentation will review the strengths and weaknesses of using pre-trained word embeddings, and demonstrate how to incorporate more complex semantic representation schemes such as Semantic Role Labeling, Abstract Meaning Representation and Semantic Dependency Parsing in to your applications.
Speaker: Dr. Michal Shmueli-Scheuer, Researcher @ IBM Research
Title: Conversational Bots for Customer Support
Video: https://www.youtube.com/watch?v=i567nLfEGYs&t=0s&list=PLZYkt7161wEIjQOuWA93Tt4JS8DgCyz53&index=9
Abstract: In this talk, I'll cover various aspects of conversational bots, focusing on the domain of customer support. Often, human conversations with bots mimic the way humans interact with each other. Moreover, even when customers know that they are interacting with virtual agents (bots), they still expect them to behave like humans. One way to improve interactions with bots is by giving them some human characteristics ,such as emotion and personality. I'll show how a model of neural response generation can be used to generate bot responses according to a target personality. I'll then cover a methodology for detecting egregious conversations in a setting using conversational bots by examining behavioral cues from the customer, patterns in the agents’ responses, and customer-agent interactions.
Speaker: Nofar Betzalel, Data Scientist @ Paypal
Title: Semi-Supervised Learning – to extend our Tagging Coverage
Video: https://www.youtube.com/watch?v=c4-3697xwys&index=7&list=PLZYkt7161wEIjQOuWA93Tt4JS8DgCyz53&t=0s
Abstract: When PayPal's risk decision making processes approve a transaction, we soon know whether it was the right decision. However, for declined transactions this is not the case, as our tagging coverage is not complete. This makes it more challenging for analysts and data scientists to understand our False-Positives when performing research and when measuring our decision making processes. In this talk I will discuss how we use Semi-Supervised learning to tag declined transactions as ones that would have been fraudulent or not, if were approved. This approach enables us to utilize both tagged and non-tagged transactions to train a model for the issued task.
Speaker: Dr. Lev Faivishevsky, Researcher @ Intel Advanced Analytics
Title: Using Deep Learning to Detect Video Distortions
Video: https://www.youtube.com/watch?v=FhMWZgs0kJ8&t=0s&index=8&list=PLZYkt7161wEIjQOuWA93Tt4JS8DgCyz53
Abstract: Since the acquisition of Mobileye, it became common knowledge that Intel is interested in building AI-based products and producing hardware for AI applications. A less widely known role of AI at Intel is an internal role, using the huge and diverse data related to Intel's own operations to transform the way the company works and create a large value. Processor design, manufacturing and sales are leveraging machine-learning methods, including computer-vision, natural language processing and reinforcement learning techniques. The talk will start with a little background about these applications, and focus on one deep-learning based video analytics solution, used in the context of the processor validation. We will describe this non-standard use-case and the challenges in resolving it, most of which are also relevant for other use-cases in the domain, including handling scarcity of labeled data and coping with tight requirements in terms of both accuracy and run-time.
Prof. Danny Pfeffermann, Central Bureau for Statistics - Can Big Data Really Replace Traditional Surveys for theProduction of Official Statistics?
Speaker: Prof. Danny Pfeffermann, National Statistician of Israel @ Central Bureau for Statistics
Title: Can Big Data Really Replace Traditional Surveys for theProduction of Official Statistics
Video: https://www.youtube.com/watch?v=OcD20PkNj-w&t=0s&list=PLZYkt7161wEIjQOuWA93Tt4JS8DgCyz53&index=10
Abstract: The big advancements in technology, which enable to access and analyse 'big data', coupled with increased demand for more accurate, more detailed and more timely official data, but with tightened available budgets, puts inevitable pressure on producers of official statistics to replace traditional sample surveys by big data sources. In the first part of my presentation I shall discuss some of the major challenges in the use of big data for official statistics, pointing out their advantages and limitations. In the second part I shall consider a general class of statistical models, which can possibly link the big data under consideration to the corresponding target, finite population data. The use of a model in the class may allow estimating finite population parameters, without the need for reference samples or administrative files.
Speaker: Avi Hendler-Bloom, Algorithms Developer @ MobilEye
Title: Overcoming the Electronic Traffic Sign Problem
Video: https://www.youtube.com/watch?v=QN9gfUZUqDU
Abstract: Electronic traffic signs are commonly made with LEDs. Due to the differences in frequency and phase between each LED light, classifying this type of sign is challenging.This talk will address the issues faced, and introduce a solution.
Speaker: Daniel Benzaquen, Data Scientist @ Lightricks
Title: AB testing at Scale
Video: https://www.youtube.com/watch?v=-k1X2MRgGlY
Abstract: Deep Learning have been gaining increasing attention in the recommendation systems community, replacing some of the traditional methods. In this talk, we will share some lessons we learned from using deep learning at huge scale in Taboola's recommendation system. Specifically, we will talk about the motivation for using deep learning and the tradeoffs between deep models and simpler models. We will discuss our approach to building neural networks with multiple input types (numerical, categorical, text, and images); capturing non trivial interactions between features using both deep dense architectures and Factorization Machine models; Tradeoffs between memorization and generalization and other tips regarding network architectures.
Speaker: Gil Chamiel, Director of Algorithms and Data Science @ Taboola
Title: Deep And Shallow Learning in Recommendation Systems
Video: https://www.youtube.com/watch?v=nghXG5OiUno&index=12&t=0s&list=PLZYkt7161wEIjQOuWA93Tt4JS8DgCyz53
Abstract: A/B testing is a central statistical procedure used frequently by data-scientists. Unfortunately, the standard A/B testing framework was originally designed to cope with a handful number of tests, while these days, conducting tens and even hundreds of tests, simultaneously, is a common scenario.
Directly applying the standard procedure, however, is highly problematic as many tests imply many false-discoveries, that potentially lead to sub-optimal performances. With the goal of controlling the false-discovery-rate, several procedures were designed: probably the most naive one is Bonferroni correction; More advanced schemes are Fisher's least-significant-difference, Benjamini-Hochberg etc. Yet, utilizing these schemes comes with the price of high False-negative rate that scales with the number of tests being conducted.
In this talk we discuss our attempt to bypass these challenges by utilizing a Bayesian Multi-Armed-Bandit approach, namely, Thompson-Sampling (TS) that operates in an online-learning manner. We share our experience and insights based on simulations and real-life experiments.
Finally, we discuss some generalizations of the standard TS scheme we made, that allow us to optimize over (non-trivial) statistical quantities (i.e., unnecessarily the conversion-rate/click-through-rate, which are of obvious interest, but users Life-Time-Value (LTV) etc).
Speaker: Oren Shamir, Head of CV Algorithm Development @ Innoviz Technologies
Title: Neural networks for point clouds: Adding the 3rd Dimension
Video: https://www.youtube.com/watch?v=aE3mfLm5dMA&t=0s&list=PLZYkt7161wEIjQOuWA93Tt4JS8DgCyz53&index=11
Abstract: Since Alexnet, DNNs have been used with rapidly increasing success to perform a wide variety of tasks on 2D images. This is the result of increased data availability, increased effective processing power, as well as incremental algorithmic improvements. Today, DNNs achieve super-human results on multiple tasks in the 2D data domain.
Processing of 3D data using DNNs has been studied less during that time. 3D sensors are less abundant, and are more variable in their capabilities and properties. In the past few years various methods for processing of 3D data have emerged, driven mainly by the medical imaging industry and, more recently, the autonomous car industry. 3D data may be unstructured, sparse and irregular, yielding unique challenges relative to 2D image data.
In this talk I will discuss the challenges of working with 3D data, and present an overview of approaches towards 3D data processing in DNNs.