admin 管理员组文章数量: 1086019
I am trying to down-sample a time series in Pandas from 8 seconds to 10 seconds. For the purposes of this example, I've generated fake data that linearly increases with the number of seconds, over a minute. Importantly, for this example, the time intervals of the two time series are not multiples of each other.
When using .resample().interpolate() in Pandas, it seems unable to interpolate for the first few points, for which there is sufficient data. How can I work around it? Here's the example:
import numpy as np
import pandas as pd
import datetime
a = datetime.datetime(2025, 12, 2, 17, 39, 6)
interval8df = pd.DataFrame(np.linspace(60, 124, 9), columns=['Hi'], index=pd.date_range(a, periods=9, freq='8s'))
interval8df['Hi']
2025-12-02 17:39:06 60.0
2025-12-02 17:39:14 68.0
2025-12-02 17:39:22 76.0
2025-12-02 17:39:30 84.0
2025-12-02 17:39:38 92.0
2025-12-02 17:39:46 100.0
2025-12-02 17:39:54 108.0
2025-12-02 17:40:02 116.0
2025-12-02 17:40:10 124.0
Freq: 8s, Name: Hi, dtype: float64
When using resample interpolate, this is the result:
interval8df.resample('10s').interpolate(method='time')['Hi']
2025-12-02 17:39:00 NaN
2025-12-02 17:39:10 NaN
2025-12-02 17:39:20 NaN
2025-12-02 17:39:30 84.0
2025-12-02 17:39:40 94.0
2025-12-02 17:39:50 104.0
2025-12-02 17:40:00 114.0
2025-12-02 17:40:10 124.0
Freq: 10s, Name: Hi, dtype: float64
While I can understand the first 17:39:00 going NaN, both 17:39:10 and 17:39:20 are both surrounded by points in the original time series (6 and 14 seconds, then 14 and 20 seconds respectively). Why is it occurring?
I've tried using mean, but that produced no NaNs.
interval8df.resample('10s').mean()['Hi']
2025-12-02 17:39:00 60.0
2025-12-02 17:39:10 68.0
2025-12-02 17:39:20 76.0
2025-12-02 17:39:30 88.0
2025-12-02 17:39:40 100.0
2025-12-02 17:39:50 108.0
2025-12-02 17:40:00 116.0
2025-12-02 17:40:10 124.0
Freq: 10s, Name: Hi, dtype: float64
Additionally, changing the interpolate method does not seem to have improved the solution.
The workaround I've been using is up-sampling from 8 seconds to 1 second using interpolate, then down-sampling from 1 second to 10 seconds using the mean, which is obviously clunky. I would like to be able to do this directly in one step.
I am trying to down-sample a time series in Pandas from 8 seconds to 10 seconds. For the purposes of this example, I've generated fake data that linearly increases with the number of seconds, over a minute. Importantly, for this example, the time intervals of the two time series are not multiples of each other.
When using .resample().interpolate() in Pandas, it seems unable to interpolate for the first few points, for which there is sufficient data. How can I work around it? Here's the example:
import numpy as np
import pandas as pd
import datetime
a = datetime.datetime(2025, 12, 2, 17, 39, 6)
interval8df = pd.DataFrame(np.linspace(60, 124, 9), columns=['Hi'], index=pd.date_range(a, periods=9, freq='8s'))
interval8df['Hi']
2025-12-02 17:39:06 60.0
2025-12-02 17:39:14 68.0
2025-12-02 17:39:22 76.0
2025-12-02 17:39:30 84.0
2025-12-02 17:39:38 92.0
2025-12-02 17:39:46 100.0
2025-12-02 17:39:54 108.0
2025-12-02 17:40:02 116.0
2025-12-02 17:40:10 124.0
Freq: 8s, Name: Hi, dtype: float64
When using resample interpolate, this is the result:
interval8df.resample('10s').interpolate(method='time')['Hi']
2025-12-02 17:39:00 NaN
2025-12-02 17:39:10 NaN
2025-12-02 17:39:20 NaN
2025-12-02 17:39:30 84.0
2025-12-02 17:39:40 94.0
2025-12-02 17:39:50 104.0
2025-12-02 17:40:00 114.0
2025-12-02 17:40:10 124.0
Freq: 10s, Name: Hi, dtype: float64
While I can understand the first 17:39:00 going NaN, both 17:39:10 and 17:39:20 are both surrounded by points in the original time series (6 and 14 seconds, then 14 and 20 seconds respectively). Why is it occurring?
I've tried using mean, but that produced no NaNs.
interval8df.resample('10s').mean()['Hi']
2025-12-02 17:39:00 60.0
2025-12-02 17:39:10 68.0
2025-12-02 17:39:20 76.0
2025-12-02 17:39:30 88.0
2025-12-02 17:39:40 100.0
2025-12-02 17:39:50 108.0
2025-12-02 17:40:00 116.0
2025-12-02 17:40:10 124.0
Freq: 10s, Name: Hi, dtype: float64
Additionally, changing the interpolate method does not seem to have improved the solution.
The workaround I've been using is up-sampling from 8 seconds to 1 second using interpolate, then down-sampling from 1 second to 10 seconds using the mean, which is obviously clunky. I would like to be able to do this directly in one step.
Share Improve this question edited Mar 30 at 15:56 halfer 20.4k19 gold badges109 silver badges202 bronze badges asked Mar 29 at 21:59 user30106177user30106177 132 bronze badges 2 |1 Answer
Reset to default 0To see what is happening, let's add asfreq
after the resample and you can see what is passed in to the next chained function:
interval8df.resample('10s').asfreq()
Output:
Hi
2025-12-02 17:39:00 NaN
2025-12-02 17:39:10 NaN
2025-12-02 17:39:20 NaN
2025-12-02 17:39:30 84.0
2025-12-02 17:39:40 NaN
2025-12-02 17:39:50 NaN
2025-12-02 17:40:00 NaN
2025-12-02 17:40:10 124.0
And, since you doing interpolation, the lower bound is not seen hence the nulls for seconds 00, 10, 20. While doing mean
with out interpolating you, are just doing a window of 10s means of values. Since you have values within each 10s interval you are getting that mean values returned.
本文标签: pythonExcessive NaNs when resamplinginterpolating in PandasStack Overflow
版权声明:本文标题:python - Excessive NaNs when resampling + interpolating in Pandas - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://roclinux.cn/p/1744002627a2516726.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
resample
isn't quite as powerful as it sounds. If your data is sampled exactly every 8s and you want it resampled to 10s, it's probably easiest to (1) upsample (withresample
) to 2s (highest common denominator), (2)interpolate
, then (3)resample
down to 10s. This will avoid NaNs being produced where the timestamps don't align exactly with 10s intervals (as per Scott Boston's answer below). – Paul Wilson Commented Mar 30 at 6:58