Time series sensor data abound in many real-world settings including human
activity recognition [1], sleep stage classification [43], gesture
recognition [26], speech recognition [17], and diagnosis and mortality
prediction from medical data [32]. Labeling data in these situations is
expensive and sometimes infeasible. One way to reduce labeling effort is
to design unsupervised domain adaptation techniques that leverage the labeled
data from one or more source domains and unlabeled data from a new target
domain to build a classifier for the target domain [12, 41].
While unsupervised domain adaptation methods have been designed
for image data, very limited work has focused on adaptation approaches
for time series data [40]. A few time series methods have been introduced.
However, these prior approaches utilize recurrent neural networks (RNNs)
that can be very slow to train for reasonable-sized time series arising
in real-world problems. While researchers have found that convolutional
neural networks (CNNs) can achieve the same accuracy as RNNs while being
trained and evaluated much faster [2, 28], previously-proposed domain adapta
tion network architectures are incompatible with time series data.
In this paper, we propose a new model: Convolutional deep Domain
Adaptation model for Time Series data (CoDATS). CoDATS couples
technical principles from domain-invariant domain adaptation with a
network design that is more efficient, accurate, time-series compatible,
and extensible than prior work. The CoDATS architecture exhibits three
important features. First, it leverages existing domain-invariant domain
adaptation methods to operate on time series data. Second, it outperforms
existing singlesource time series adaptation models. Third, it is readily
extensible to additional situations including when data from multiple source
domains is available, which is particularly helpful for complex time series
datasets having high variability between domains, and when the target-domain
label distribution is available, which may be easier to collect than
additional time series data labels. While utilizing unlabeled target-domain
data in unsupervised domain adaptation is one way of reducing labeling effort,
another way is to use weakly-supervised information that is relatively
easy to acquire. Obtaining labels for time series sensor data is more
challenging than for image data. For example, cat vs. dog image classification
labels can be easily obtained after data collection by having a person look
at each image and determine if the image is of a cat or a dog. In contrast,
for time series human activity recognition, it is much more difficult to
identify what activity a human was performing by looking at raw accelerometer,
gyroscope, and magnetometer sensor data. Thus, labels for time series sensor
data are instead typically recorded while performing the activity [10, 29],
which greatly limits the number of gathered labels because of the additional
burden from interrupting a person’s activities. However, other possibilities
exist for obtaining information in the form of weak supervision. For activity recognition,
it may be easy for each participant to self-report what proportion of the time they
performed each activity. For example, providing an estimate of how many hours a day
they spend cooking is easier than labeling each data instance of cooking. We formulate
this new problem setting of Domain Adaptation with Weak Supervision (DA-WS) and develop a
novel method to effectively utilize weak supervision in the form of label proportions.
The key idea is to constrain the space of model parameters to those which approximately
matches the label proportions on unlabeled data from the target domain. To validate our
proposed CoDATS model and weak-supervision method, we performed comprehensive experiments
on diverse realworld time series benchmarks including gesture recognition and human
activity recognition. We compare CoDATS with prior singlesource time series methods
and observe that CoDATS dramatically outperforms previous approaches to time series
domain adaptation. Additionally, we demonstrate how CoDATS can further improve accuracy
by utilizing data from multiple sources. We also find that coupled with our proposed CoDATS model,
our DA-WS method yields additional improvements in accuracy. Contributions. We make three key
contributions, as summarized in Figure 1. 1) We develop a new time-series compatible model
referred as CoDATS to improve both accuracy and computationalefficiency when compared to
prior work on single-source DA. CoDATS supports utilizing data from multiple sources to
further improve accuracy. 2) We formulate a new weak supervision problem to leverage
target-domain label proportions when available and propose a novel method referred as
DA-WS to effectively solve it. 3) We perform comprehensive experimental evaluation on
multiple challenging real-world benchmarks to show the efficacy of our CoDATS model
and weak-supervision method over state-of-the-ar
