Convert a pdf file to text in C# [closed]

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 7 years ago . I need to convert a .pdf file to a .txt file How can I do this in C#? 10.3k 15 15 gold badges 85 85 silver badges 166 166 bronze badges asked Dec 22, 2009 at 6:50 7,563 10 10 gold badges 39 39 silver badges 50 50 bronze badges

For .NET 8 what helped me is iText library: const string path = "***"; using var reader = new PdfReader(path); var pdf = new PdfDocument(reader); var builder = new StringBuilder(); for (int i = 1; i <= pdf.GetNumberOfPages(); i++) < var pageText = PdfTextExtractor.GetTextFromPage(pdf.GetPage(i)); builder.Append(pageText); >pdf.Close(); var result = builder.ToString();

Commented Jun 21 at 9:00

6 Answers 6

Ghostscript could do what you need. Below is a command for extracting text from a pdf file into a txt file (you can run it from a command line to test if it works for you):

gswin32c.exe -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f ps2ascii.ps "test.pdf" -c quit >"test.txt" 

Check here: codeproject: Convert PDF to Image Using Ghostscript API for details on how to use ghostscript with C#

answered Dec 23, 2009 at 4:53 serge_gubenko serge_gubenko 20.4k 2 2 gold badges 63 63 silver badges 64 64 bronze badges

tanks. it's working, but there is a problem, it's not saving to the txt file, it's just create it and it's remain empty..why isn't it work? i runned it like that: C:\>C:\gswin32.exe -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -d -c save -f ps2ascii.ps "C:\New Folder\2\test.pdf" -c quit >"c:\test.txt"

Commented Dec 23, 2009 at 13:24

if you would run it like this: gswin32.exe "C:\New Folder\2\test.pdf" will it show you the file? also you might want to try running it from the bin folder of the gs, smth like this: C:\Program Files\gs\gs8.64\bin>gswin32c.exe . in any case gs should give you an error if it can't find\parse your file, pls, post it up here if still no luck converting your file

Commented Dec 23, 2009 at 14:31

i tried to do: C:\Program Files\gs\gs8.64\bin>gswin32.exe "C:\New Folder\2\test.pdf" and the program told me that it can't parse the file (but it showed me the pdf file) which is wierd, because when i did gswin32.exe -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f ps2ascii.ps "c:\test.pdf" > "c:\test.txt" it did convert it, the only problen is that it create the file but don't write into it. is this suppose to work in windows?

Commented Dec 23, 2009 at 16:14

it has to work on windows and works fine for me; there are could be problems with parsing pdf files but ususally you get an error message from gs with an explanation of what is missing or broken; can you post up your pdf file somewere on file sharing service so I could try converting it

Commented Dec 23, 2009 at 16:38

megafileupload.com/en/file/170875/test-pdf.html there is the link for the file i want to convert. i don't think u will have a problem to convert it, i succeeded to convert it, but the problem is that it not svaing it to the txt file there is the command again: gswin32.exe -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f ps2ascii.ps "c:\test.pdf" > "c:\test.txt"

Commented Dec 23, 2009 at 16:57

I've had the need myself and I used this article to get me started: http://www.codeproject.com/KB/string/pdf2text.aspx

answered Dec 22, 2009 at 7:34 9,651 4 4 gold badges 27 27 silver badges 25 25 bronze badges

This codeproject article uses ITextSharp which now requires you to purchase a license if used commercially. See stackoverflow.com/questions/11324687/…

Commented Jul 9 at 19:08

The concept of converting PDF to text is not really straight forward and you wont see anyone posting a code here that will convert PDF to text straight. So your best bet now is to use a library that would do the job for you. a good one is PDFBox, you can google it. You'll probably find it written in java but fortunately you can use IKVM to convert it to .Net.

answered Dec 22, 2009 at 7:12 4,755 7 7 gold badges 54 54 silver badges 105 105 bronze badges

As an alternative to Don's solution there I found the following:

answered Feb 11, 2011 at 4:23 86.3k 49 49 gold badges 230 230 silver badges 370 370 bronze badges I have tried this one and the link provided by @Don is a far better conversion. Commented Feb 15, 2017 at 5:36

Docotic.Pdf library can extract text from PDF files (formatted or not).

Here is a sample code that shows how to extract formatted text from a PDF file and save it to an other file.

public static void ExtractFormattedText(string pdfFile, string textFile) < using (PdfDocument doc = new PdfDocument(pdfFile)) < string text = doc.GetTextWithFormatting(); File.WriteAllText(textFile, text); >> 

Also, there is an article on our site that shows other options for extraction of text from PDF files.

Disclaimer: I work for Bit Miracle, vendor of the library.

answered Dec 14, 2011 at 16:43 14.2k 20 20 gold badges 84 84 silver badges 135 135 bronze badges

But its a simple conversion. I was needed converstion to articles were the PDF file is in diffrentes layouts.

Commented Dec 21, 2011 at 9:04
 public void PDF_TEXT() < richTextBox1.Text = string.Empty; ReadPdfFile(@"C:\Myfile.pdf"); //read pdf file from location >public void ReadPdfFile(string fileName) < string strText = string.Empty; StringBuilder text = new StringBuilder(); try < PdfReader reader = new PdfReader((string)fileName); if (File.Exists(fileName)) < PdfReader pdfReader = new PdfReader(fileName); for (int page = 1; page pdfReader.Close(); > > catch (Exception ex) < MessageBox.Show(ex.Message); >richTextBox1.Text = text.ToString(); > private void Save_TextFile_Click(object sender, EventArgs e) < SaveFileDialog sfd = new SaveFileDialog(); DialogResult messageResult = MessageBox.Show("Save this file into Text?", "Text File", MessageBoxButtons.OKCancel); if (messageResult == DialogResult.Cancel) < >else < sfd.Title = "Save As Textfile"; sfd.InitialDirectory = @"C:\"; sfd.Filter = "TextDocuments|*.txt"; if (sfd.ShowDialog() == DialogResult.OK) < if (richTextBox1.Text != "") < richTextBox1.SaveFile(sfd.FileName, RichTextBoxStreamType.PlainText); richTextBox1.Text = ""; MessageBox.Show("Text Saved Succesfully", "Text File"); >else < MessageBox.Show("Please Upload Your Pdf", "Text File", MessageBoxButtons.OKCancel, MessageBoxIcon.Asterisk); >> > > 
answered Sep 3, 2015 at 7:53 shuvo sarker shuvo sarker 889 11 11 silver badges 20 20 bronze badges Just pasting some code is not helpful. Commented Sep 3, 2015 at 8:47 I think here not too much difficult thing that need to be described. Commented Sep 3, 2015 at 9:06

I think here not too much difficult thing that need to be described. - Well, out of the box your code does not even compile for the simple reason that you did not mention the dependencies. Neither the question nor your answer mentions iTextSharp. Anyone not recognizing the classes in question will be instantly lost. Furthermore you have unnecessary code elements, if the OP wants to create a command line application, GUI element event listeners are inappropriate. As a good example look at @Bobrovsky's answer, he both mentioned the library dependency and presented only pivotal code.

Commented Sep 3, 2015 at 10:20