The Daystream Corpus is a dataset of 3541 traffic messages in which proper names (e.g. roads, lines, stops), their reference IDs (e.g. DHID, DLID, OSM-IDs), as well as relations (e.g. traffic jam, accident, rail replacement traffic) are annotated manually. The data set can be used as a training or test body for information extraction tasks such as self-name recognition, entity linking and relations extraction.
Dataset statistics:
<TABLE>
<TR>
<TH></TH>
<th style=“text-align:left“>Twitter</th>”
<TH STYLE=“TEXT-ALIGN:LEFT“>RSS</TH>”
<th style=“text-align:left“>Total</th>”
</TR>
<TR>
<td>docs</td>
<TD> 2825</TD>
<TD> 716</TD>
<TD> 3541</TD>
</TR>
<TR>
<td>tokens</td>
<TD> 69188</TD>
<TD> 34630</TD>
<TD> 103818</TD>
</TR>
<TR>
<td>entities</td>
<TD> 15280</TD>
<TD> 8112</TD>
<TD> 23392</TD>
</TR>
<TR>
<td>relations</td>
<TD> 365</TD>
<TD> 427</TD>
<TD> 792</TD>
</TR>
<TR>
<td>docs with annotated relations</td>
<TD> 305</TD>
<TD> 338</TD>
<TD> 643</TD>
</TR>
<TR>
<td>linked entities (org|loc)</td>
<TD> 5138</TD>
<TD> 3331</TD>
<TD> 8469</TD>
</TR>
<TR>
<td>NIL entities</td>
<TD> 4764</TD>
<TD> 1698</TD>
<TD> 6462</TD>
</TR>
</TABLE>
The Daystream body is released under the CC-BY 4.0 license. If you use this data, you should quote the following publication:
A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events. Martin Schiersch, Veselina Mironova, Maximilian Schmitt, Philippe Thomas, Aleksandra Gabryszak, Leonhard Hennig. Proceedings of LREC, 2018.
Further information and details: https://github.com/DFKI-NLP/daystream-corpus/
Build on reliable and scalable technology
FAQ
Frequently Asked Questions
Some basic informations about API Store ®.
Operation and development of APIs are currently fully funded by company Apitalks and its usage is for free.
Yes, you can.
All important information such as time of last update, license and other information are in response of each API call.
In case of major update that would not be compatible with previous version of API, we keep for 30 days both versions so you will have enough time to transfer to new version. We will inform you about the changes in advance by e-mail.