The corpus is available in Kielipankki - the Language Bank of Finland (lat.csc.fi, http://urn.fi/urn:nbn:fi:lb-1001100133; download in sui.csc.fi: http://urn.fi/urn:nbn:fi:lb-201406023). Access rights instructions: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/KielipankkiAccessRights
The Finnish Broadcast Corpus is divided into two main parts: FBC-1 and FBC-2.
The Finnish Broadcast Corpus 1, FBC-1 contains 65 radio and tv recordings broadcast by YLE – the Finnish Broadcasting Company during the year 2003. Parts of the audio and video material have been annotated either manually or automatically in various levels: e.g., utterance (orthographic transcript), word, phone. FBC-1 was compiled under an initiative called Integrated Resources for Speech Technology and Spoken Language Research in Finland, funded by the Academy of Finland. It is CSC’s first multimodal corpus.
Details of the size of FBC-2 are being updated.
PLEASE NOTE: In the old Language Bank Rights application system, this corpus is misleadingly referred to as "Speech Corpora of the Language Bank of Finland" or "Kielipankin puheaineisto".
The entire corpus is currently being converted for the LAT system and will be published at http://urn.fi/urn:nbn:fi:lb-1001100133 as soon as the conversion process is completed. The old version of the corpus is downloadable at http://urn.fi/urn:nbn:fi:lb-201406023.
The material in the FBC-1 represents four categories: * Radio monologues - broadcast telegraph news (24 × 3 minutes, Nov. 2003) - broadcast lectures of the week (8 × 14 minutes). * Radio dialogues - unfinished recordings of the Moninaisuusfoorumi event (5 × 1h). * TV monologues - broadcast main news read by Arvi Lind ja Eeva Polttila (15 × 30 minutes, September - November 2003), including the very last news telecast by Arvi Lind on October 15, 2003 * TV dialogues - broadcast Aamu-TV programs (13 × ca. 12 minutes, 2003).
Formats: * WAV audio format * HQ_Pure audio format (44,1–48 KHz) (supported by the Puh-Editor, which is now obsolete) * HQ_Pure audio format (16 KHz) (supported by the Puh-Editor, which is now obsolete) * MPEG2 video
The corpus files can be accessed on the software servers of the Language Bank of Finland. Download location: https://sui.csc.fi/my-files/hippu/kielipankki/lataa/fbc
The purpose of the resource use must be outlined in a research plan.